cancel
Showing results for 
Search instead for 
Did you mean: 

Backup failure with error code 636

ankur1809
Level 5

On a server 4 streams are running for C:\ D:\ shadows copy and all local drives among which shadow copy is failing continously with error code 636.

10/07/2014 02:08:18 - Info bpbrm (pid=29405) nmcstraining is the host to backup data from
10/07/2014 02:08:19 - Info bpbrm (pid=29405) reading file list from client
10/07/2014 02:10:13 - Info bpbrm (pid=29405) starting bpbkar on client
10/07/2014 02:11:13 - Info bpbkar (pid=1984) Backup started
10/07/2014 02:11:13 - Info bpbrm (pid=29405) bptm pid: 29531
10/07/2014 02:11:14 - Info bptm (pid=29531) start
10/07/2014 02:11:18 - Info bptm (pid=29531) using 262144 data buffer size
10/07/2014 02:11:18 - Info bptm (pid=29531) using 30 data buffers
10/07/2014 02:11:23 - Info bptm (pid=29531) start backup
10/07/2014 02:11:35 - Info bptm (pid=29531) backup child process is pid 29560
10/07/2014 02:17:09 - Error bpbrm (pid=11841) could not write FILE ADDED message to OUTSOCK
10/07/2014 02:17:21 - Error bpbrm (pid=11841) could not write FILE ADDED message to OUTSOCK
10/07/2014 02:44:58 - Info nbjm (pid=10211) starting backup job (jobid=5194250) for client nmcstraining, policy PROD_REMOTE_NM_dalmedia19, schedule DLY_INCR
10/07/2014 02:44:58 - Info nbjm (pid=10211) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=5194250, request id:{D5FC3560-4DF5-11E4-9775-AC38CE536ADD})
10/07/2014 02:44:58 - requesting resource stu_disk_dalmedia19
10/07/2014 02:44:58 - requesting resource nbutx2.NBU_CLIENT.MAXJOBS.nmcstraining
10/07/2014 02:44:58 - requesting resource nbutx2.NBU_POLICY.MAXJOBS.PROD_REMOTE_NM_dalmedia19
10/07/2014 02:44:58 - granted resource  nbutx2.NBU_CLIENT.MAXJOBS.nmcstraining
10/07/2014 02:44:58 - granted resource  nbutx2.NBU_POLICY.MAXJOBS.PROD_REMOTE_NM_dalmedia19
10/07/2014 02:44:58 - granted resource  MediaID=@aaaeS;DiskVolume=PureDiskVolume;DiskPool=dp_disk_dalmedia19;Path=PureDiskVolume;StorageServer=dalmedia19;MediaServer=dalmedia19
10/07/2014 02:44:58 - granted resource  stu_disk_dalmedia19
10/07/2014 02:44:58 - estimated 1265157 kbytes needed
10/07/2014 02:44:58 - Info nbjm (pid=10211) resumed backup (backupid=nmcstraining_1412659253) job for client nmcstraining, policy PROD_REMOTE_NM_dalmedia19, schedule DLY_INCR on storage unit stu_disk_dalmedia19
10/07/2014 02:45:03 - started process bpbrm (pid=29405)
10/07/2014 02:45:09 - connecting
10/07/2014 02:47:01 - connected; connect time: 0:00:00
10/07/2014 02:48:23 - begin writing

 

1 ACCEPTED SOLUTION

Accepted Solutions

Michael_G_Ander
Level 6
Certified

As always it depends, but would start with something like 600 or 900

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

View solution in original post

9 REPLIES 9

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi,

 

Take a look at these older forum discussions.

 

https://www-secure.symantec.com/connect/forums/status-636-status-42-errors-backing-msdp-pool

 

Michael_G_Ander
Level 6
Certified

My guess is a timeout as there is more than 300 seconds between messages here:

10/07/2014 02:11:35 - Info bptm (pid=29531) backup child process is pid 29560
10/07/2014 02:17:09 - Error bpbrm (pid=11841) could not write FILE ADDED message to OUTSOCK

If that is the case CLIENT_READ_TIMEOUT & CLIENT_CONNECT_TIMEOUT is the solution

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

ankur1809
Level 5
So you want me to increase CLIENT_READ_TIMEOUT & CLIENT_CONNECT_TIMEOUT upto ....???

Michael_G_Ander
Level 6
Certified

As always it depends, but would start with something like 600 or 900

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

mnolan
Level 6
Employee Accredited Certified

I would take a look at your TCP keepalive settings on the media server, master server and any network device in between. Ensure the times match.

What can happen is that with a mismatch, the connection actually can get closed.

Bpbrm tries to send datat to NBJM and fails
"Error bpbrm (pid=11841) could not write FILE ADDED message to OUTSOCK"

Eventually NBJM will check for messages from BPBRM and fail throwing a 636 error because it determined the socket was already closed.

 

This is something I have seen happen during NDMP jobs, http://www.symantec.com/docs/TECH214335 , which can also happen during other jobs such as SQL, http://www.symantec.com/docs/TECH197144

My fellow coworkers have resolved about 90% of the 636 errors they encountered by pointing customers towards this keepalive setting and them finding differences in the values.

ankur1809
Level 5
In timeout options client read timeout i have set =900 but they other greyed option is File browse timeout due to Use OS dependent timeouts is checked.

ankur1809
Level 5
mnolan, Do i need to check its value through registry settings or through any other mean as well. I donot have access to the client as it is a domain server.(Windows) How to check it through master, media server(Linux)

mnolan
Level 6
Employee Accredited Certified

Both articles I linked refferenced another, http://www.symantec.com/docs/HOWTO56221 , which is the registry setting for Windows. I do not have any Linux documentation handy. Please googled \Tcp keepalive timeouts + Your Linux Distro.  Then also talk with your network admins to determine the timeout on any device in the route between the two servers.

ankur1809
Level 5
mnalo, i was to add something as only shadow copy stream is failing with else other streams are perfectly fine.