10-31-2013 01:45 AM
Hi all,
in the middle of the some backups I get the 40 network connection broken for some backups. What can I do for this problem. I have tried almost all the time-out values and Some times, the master / media server goes offline status.
nbu 7.1.0.4
master server solaris 10
media server win 2008 R2
10/31/2013 06:04:52 - Info nbjm (pid=15211) starting backup job (jobid=1399407) for client xxxxx, policy ORACLE_XXXXX_PKYSTEST_ARCH_ENCR, schedule Default-Application-Backup
10/31/2013 06:04:52 - Info nbjm (pid=15211) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=1399407, request id:{97F9569C-41E1-11E3-A9C7-001517810EE1})
10/31/2013 06:04:52 - requesting resource xxxxx-hcart3-robot-tld-4
10/31/2013 06:04:52 - requesting resource xxxxx.NBU_CLIENT.MAXJOBS.xxxxx
10/31/2013 06:04:52 - requesting resource xxxxx.NBU_POLICY.MAXJOBS.ORACLE_XXXXX_PKYSTEST_ARCH_ENCR
10/31/2013 06:04:56 - Waiting for scan drive stop IBM.ULT3580-TD5.022, Media server: xxxxx
10/31/2013 06:04:57 - granted resource xxxxx.NBU_CLIENT.MAXJOBS.xxxxx
10/31/2013 06:04:57 - granted resource xxxxx.NBU_POLICY.MAXJOBS.ORACLE_XXXXX_PKYSTEST_ARCH_ENCR
10/31/2013 06:04:57 - granted resource 7493L5
10/31/2013 06:04:57 - granted resource IBM.ULT3580-TD5.022
10/31/2013 06:04:57 - granted resource xxxxx-hcart3-robot-tld-4
10/31/2013 06:04:57 - estimated 0 kbytes needed
10/31/2013 06:04:57 - Info nbjm (pid=15211) started backup job for client xxxxx, policy ORACLE_XXXXX_PKYSTEST_ARCH_ENCR, schedule Default-Application-Backup on storage unit xxxxx-hcart3-robot-tld-4
10/31/2013 06:04:58 - started process bpbrm (pid=8428)
10/31/2013 06:04:58 - connecting
10/31/2013 06:05:06 - connected; connect time: 0:00:00
10/31/2013 06:05:06 - mounting 7493L5
10/31/2013 06:05:23 - Info bpbrm (pid=8428) xxxxx is the host to backup data from
10/31/2013 06:05:23 - Info bpbrm (pid=8428) reading file list from client
10/31/2013 06:05:24 - Info bpbrm (pid=8428) listening for client connection
10/31/2013 06:05:30 - Info bpbrm (pid=8428) INF - Client read timeout = 19200
10/31/2013 06:05:30 - Info bpbrm (pid=8428) accepted connection from client
10/31/2013 06:05:30 - Info bphdb (pid=8096) Backup started
10/31/2013 06:05:30 - Info bptm (pid=11196) start
10/31/2013 06:05:31 - Info bptm (pid=11196) using 262144 data buffer size
10/31/2013 06:05:31 - Info bptm (pid=11196) setting receive network buffer to 262144 bytes
10/31/2013 06:05:31 - Info bptm (pid=11196) using 64 data buffers
10/31/2013 06:05:31 - Info bptm (pid=11196) start backup
10/31/2013 06:05:31 - Info bptm (pid=11196) backup child process is pid 9628.9688
10/31/2013 06:05:31 - Info bptm (pid=11196) Waiting for mount of media id 7493L5 (copy 1) on server xxxxx.
10/31/2013 06:05:31 - Info bptm (pid=9628) start
10/31/2013 06:05:54 - Info bptm (pid=11196) media id 7493L5 mounted on drive index 75, drivepath {5,0,14,0}, drivename IBM.ULT3580-TD5.022, copy 1
10/31/2013 06:06:28 - mounted 7493L5; mount time: 0:01:22
10/31/2013 06:06:28 - positioning 7493L5 to file 15
10/31/2013 06:06:28 - positioned 7493L5; position time: 0:00:00
10/31/2013 06:06:28 - begin writing
10/31/2013 06:06:55 - Info bphdb (pid=8096) dbclient(pid=8096) wrote first buffer(size=262144)
10/31/2013 06:06:55 - Error bpbrm (pid=8428) could not write FILE ADDED message to OUTSOCK
10/31/2013 06:06:56 - Info bphdb (pid=8096) done. status: 0
10/31/2013 06:07:01 - Info bpbrm (pid=8428) validating image for client xxxxx
10/31/2013 06:07:01 - Error bpbrm (pid=8428) could not write EXIT STATUS to OUTSOCK
network connection broken (40)
10-31-2013 02:49 AM
Help us to uderstand what this means?
Some times, the master / media server goes offline status.
Is there perhaps a firewall between the master and media server?
Here we see comms problem between master and media server:
10/31/2013 06:06:55 - Error bpbrm (pid=8428) could not write FILE ADDED message to OUTSOCK
Similar issue in this forum discussion:
https://www-secure.symantec.com/connect/forums/backup-failing-status-40
10-31-2013 11:19 AM
Please verify the nbjm log for more details:
and please do check this refernce link
http://www.symantec.com/business/support/index?page=content&id=TECH66444
10-31-2013 12:34 PM
You said in the middle of a backup yet the details show small time frame from start to error. I did notice that your schedule was for Oracle and I was wondering if perhaps it may be a name resolution timeout. Oracle needs forward and reverse name resolution to work between the client, media server AND master server. I suggest entries to be made in each of their local hosts files. Also of coarse check the resolve file to be able to use those local host entries.