10-07-2014 02:25 AM
On a server 4 streams are running for C:\ D:\ shadows copy and all local drives among which shadow copy is failing continously with error code 636.
10/07/2014 02:08:18 - Info bpbrm (pid=29405) nmcstraining is the host to backup data from
10/07/2014 02:08:19 - Info bpbrm (pid=29405) reading file list from client
10/07/2014 02:10:13 - Info bpbrm (pid=29405) starting bpbkar on client
10/07/2014 02:11:13 - Info bpbkar (pid=1984) Backup started
10/07/2014 02:11:13 - Info bpbrm (pid=29405) bptm pid: 29531
10/07/2014 02:11:14 - Info bptm (pid=29531) start
10/07/2014 02:11:18 - Info bptm (pid=29531) using 262144 data buffer size
10/07/2014 02:11:18 - Info bptm (pid=29531) using 30 data buffers
10/07/2014 02:11:23 - Info bptm (pid=29531) start backup
10/07/2014 02:11:35 - Info bptm (pid=29531) backup child process is pid 29560
10/07/2014 02:17:09 - Error bpbrm (pid=11841) could not write FILE ADDED message to OUTSOCK
10/07/2014 02:17:21 - Error bpbrm (pid=11841) could not write FILE ADDED message to OUTSOCK
10/07/2014 02:44:58 - Info nbjm (pid=10211) starting backup job (jobid=5194250) for client nmcstraining, policy PROD_REMOTE_NM_dalmedia19, schedule DLY_INCR
10/07/2014 02:44:58 - Info nbjm (pid=10211) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=5194250, request id:{D5FC3560-4DF5-11E4-9775-AC38CE536ADD})
10/07/2014 02:44:58 - requesting resource stu_disk_dalmedia19
10/07/2014 02:44:58 - requesting resource nbutx2.NBU_CLIENT.MAXJOBS.nmcstraining
10/07/2014 02:44:58 - requesting resource nbutx2.NBU_POLICY.MAXJOBS.PROD_REMOTE_NM_dalmedia19
10/07/2014 02:44:58 - granted resource nbutx2.NBU_CLIENT.MAXJOBS.nmcstraining
10/07/2014 02:44:58 - granted resource nbutx2.NBU_POLICY.MAXJOBS.PROD_REMOTE_NM_dalmedia19
10/07/2014 02:44:58 - granted resource MediaID=@aaaeS;DiskVolume=PureDiskVolume;DiskPool=dp_disk_dalmedia19;Path=PureDiskVolume;StorageServer=dalmedia19;MediaServer=dalmedia19
10/07/2014 02:44:58 - granted resource stu_disk_dalmedia19
10/07/2014 02:44:58 - estimated 1265157 kbytes needed
10/07/2014 02:44:58 - Info nbjm (pid=10211) resumed backup (backupid=nmcstraining_1412659253) job for client nmcstraining, policy PROD_REMOTE_NM_dalmedia19, schedule DLY_INCR on storage unit stu_disk_dalmedia19
10/07/2014 02:45:03 - started process bpbrm (pid=29405)
10/07/2014 02:45:09 - connecting
10/07/2014 02:47:01 - connected; connect time: 0:00:00
10/07/2014 02:48:23 - begin writing
Solved! Go to Solution.
10-07-2014 06:06 AM
As always it depends, but would start with something like 600 or 900
10-07-2014 02:49 AM
Hi,
Take a look at these older forum discussions.
https://www-secure.symantec.com/connect/forums/status-636-status-42-errors-backing-msdp-pool
10-07-2014 05:34 AM
My guess is a timeout as there is more than 300 seconds between messages here:
10/07/2014 02:11:35 - Info bptm (pid=29531) backup child process is pid 29560
10/07/2014 02:17:09 - Error bpbrm (pid=11841) could not write FILE ADDED message to OUTSOCK
If that is the case CLIENT_READ_TIMEOUT & CLIENT_CONNECT_TIMEOUT is the solution
10-07-2014 06:00 AM
10-07-2014 06:06 AM
As always it depends, but would start with something like 600 or 900
10-07-2014 08:34 AM
I would take a look at your TCP keepalive settings on the media server, master server and any network device in between. Ensure the times match.
What can happen is that with a mismatch, the connection actually can get closed.
Bpbrm tries to send datat to NBJM and fails
"Error bpbrm (pid=11841) could not write FILE ADDED message to OUTSOCK"
Eventually NBJM will check for messages from BPBRM and fail throwing a 636 error because it determined the socket was already closed.
This is something I have seen happen during NDMP jobs, http://www.symantec.com/docs/TECH214335 , which can also happen during other jobs such as SQL, http://www.symantec.com/docs/TECH197144
My fellow coworkers have resolved about 90% of the 636 errors they encountered by pointing customers towards this keepalive setting and them finding differences in the values.
10-07-2014 11:59 PM
10-08-2014 12:13 AM
10-08-2014 05:02 AM
Both articles I linked refferenced another, http://www.symantec.com/docs/HOWTO56221 , which is the registry setting for Windows. I do not have any Linux documentation handy. Please googled \Tcp keepalive timeouts + Your Linux Distro. Then also talk with your network admins to determine the timeout on any device in the route between the two servers.
10-16-2014 05:00 AM