08-27-2017 11:36 PM
Hi all,
Master = 7.7.2 SunOS(5.10)
Media = 7.7.2 Windows2008(6)
Client = 7.7.2 Windows2008(6)
I have some issue with SQL backup currently as 1 of the DB are failling with error ( 13) file read failed while using Windows Media agent. While other DB's are not impacted.
Can someone help me as i already log a call with veritas and they advise to change setting on network end that i believe does not related to this issue
08/26/2017 20:00:18 - Info nbjm (pid=19376) starting backup job (jobid=1083363) for client EAPMSSQLK232, policy KONE_EAPMSSQLK23Q_SQL, schedule Default-Application-Backup
08/26/2017 20:00:18 - Info nbjm (pid=19376) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=1083363, request id:{20E6F19C-8A56-11E7-9712-002128C38D7A})
08/26/2017 20:00:18 - requesting resource DTC_STG_TL_ALLMS
08/26/2017 20:00:18 - requesting resource TSGNBMSUX01.NBU_CLIENT.MAXJOBS.EAPMSSQLK232
08/26/2017 20:00:18 - requesting resource TSGNBMSUX01.NBU_POLICY.MAXJOBS.KONE_EAPMSSQLK23Q_SQL
08/26/2017 20:00:18 - granted resource TSGNBMSUX01.NBU_CLIENT.MAXJOBS.EAPMSSQLK232
08/26/2017 20:00:18 - granted resource TSGNBMSUX01.NBU_POLICY.MAXJOBS.KONE_EAPMSSQLK23Q_SQL
08/26/2017 20:00:18 - granted resource TSC178
08/26/2017 20:00:18 - granted resource HP.ULTRIUM5-SCSI.016
08/26/2017 20:00:18 - granted resource TSGNBMAWI01-hcart2-robot-tld-0
08/26/2017 20:00:28 - estimated 0 kbytes needed
08/26/2017 20:00:28 - Info nbjm (pid=19376) started backup (backupid=EAPMSSQLK232_1503748828) job for client EAPMSSQLK232, policy KONE_EAPMSSQLK23Q_SQL, schedule Default-Application-Backup on storage unit TSGNBMAWI01-hcart2-robot-tld-0
08/26/2017 20:00:29 - started process bpbrm (pid=20376)
08/26/2017 20:00:31 - connecting
08/26/2017 20:00:31 - Info bpbrm (pid=20376) EAPMSSQLK232 is the host to backup data from
08/26/2017 20:00:31 - Info bpbrm (pid=20376) reading file list for client
08/26/2017 20:00:33 - Info bpbrm (pid=20376) listening for client connection
08/26/2017 20:00:40 - Info bpbrm (pid=20376) INF - Client read timeout = 1800
08/26/2017 20:00:40 - Info bpbrm (pid=20376) accepted connection from client
08/26/2017 20:00:40 - Info dbclient (pid=11776) Backup started
08/26/2017 20:00:40 - connected; connect time: 0:00:00
08/26/2017 20:00:41 - Info bptm (pid=28668) start
08/26/2017 20:00:41 - Info bptm (pid=28668) using 262144 data buffer size
08/26/2017 20:00:43 - Info bptm (pid=28668) setting receive network buffer to 262144 bytes
08/26/2017 20:00:43 - Info bptm (pid=28668) using 32 data buffers
08/26/2017 20:00:43 - Info bptm (pid=28668) start backup
08/26/2017 20:00:43 - Info bptm (pid=28668) backup child process is pid 6468.6600
08/26/2017 20:00:43 - Info bptm (pid=28668) Waiting for mount of media id TSC178 (copy 1) on server TSGNBMAWI01.
08/26/2017 20:00:43 - Info bptm (pid=6468) start
08/26/2017 20:00:43 - mounting TSC178
08/26/2017 20:01:45 - Info bptm (pid=28668) media id TSC178 mounted on drive index 26, drivepath {4,0,1,4}, drivename HP.ULTRIUM5-SCSI.016, copy 1
08/26/2017 20:01:45 - mounted TSC178; mount time: 0:01:02
08/26/2017 20:01:51 - positioning TSC178 to file 11
08/26/2017 20:03:39 - positioned TSC178; position time: 0:01:48
08/26/2017 20:03:39 - begin writing
08/26/2017 20:03:41 - Info dbclient (pid=11776) dbclient(pid=11776) wrote first buffer(size=65536)
08/26/2017 21:08:49 - Error bpbrm (pid=20376) socket read failed, An existing connection was forcibly closed by the remote host. (10054)
08/26/2017 21:09:51 - Info dbclient (pid=11776) done. status: 13: file read failed
08/26/2017 21:09:52 - end writing; write time: 1:06:13
file read failed (13)
08-28-2017 12:36 AM
This looks like a timeout to me.
Default of 5 min Client Read Timeout is hardly ever enough for large db's.
Please post these sections of bpbrm, bptm (on media server) and dbclient on client:
bpbrm: PID 20376
08/26/2017 20:00:29 - started process bpbrm (pid=20376)
up to
08/26/2017 21:08:49 - Error bpbrm (pid=20376) socket read failed, An existing connection was forcibly closed by the remote host. (10054)
bptm: PID 28668
08/26/2017 20:00:41 - Info bptm (pid=28668) start
and child bptm
08/26/2017 20:00:43 - Info bptm (pid=6468) start
up to last entry for these two PIDs.
dbclient: PID 11776
08/26/2017 20:00:40 - Info dbclient (pid=11776) Backup started
08/26/2017 21:09:51 - Info dbclient (pid=11776) done. status: 13: file read failed
If there are lots of info (I guess Veritas would've asked for level 5 logs), please paste info in separate .txt files (e.g. bpbrm.txt).
08-28-2017 06:27 AM
As Marianne says this looks like a timeout, have a couple questions
Is the problematic database big compared to the working ones ?
What happens if you run backup of just that database ?
Can you SQL DBA see anything that might cause heavy load on database while your running the backup ?
Two things that often cause 10054 errors is antivirus and external firewalls, the later especially jobs with long idle time.
Increasing CLIENT_CONNECT_TIMEOUT & CLIENT_READ_TIMEOUT and/or creating/decrase the tcp KEEPALIVE on the client & servers might help.
Think windows default keepalive on windows is about 2 hours and the microsoft recommendation is about 5 minutes.