11-03-2014 09:11 AM
Hi all,
I have a policy that always fails with error code 23 after exactly 10 minutes and 6 seconds, no matter if it's a full or incremental backup. I think it should be a timeout config. but I don't find which one. I'd tried changing CLIENT READ TIMEOUT in the client from 300 to 3600 but still the same issue... any idea?
Thanks
Solved! Go to Solution.
11-03-2014 09:43 AM
Think you have some kind of connectivity issue
Suggest you run the tests here:
https://www-secure.symantec.com/connect/blogs/general-connectivity-troubleshooting
11-03-2014 09:34 AM
Could be something on a specific file system/mount point, assuming you are doing a file backup
Please post the full Job Details from the failing backup
11-03-2014 09:38 AM
Here it is:
11/03/2014 16:25:38 - Info nbjm (pid=10476) starting backup job (jobid=1737730) for client preside10, policy SIG_TX_PRESIDE10, schedule Full_SIG_TX_PRESIDE10
11/03/2014 16:25:38 - Info nbjm (pid=10476) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=1737730, request id:{AA35F3E8-636D-11E4-AB5E-8888AE45A246})
11/03/2014 16:25:38 - requesting resource STU_MSDP_FLO7
11/03/2014 16:25:38 - requesting resource backup00.NBU_CLIENT.MAXJOBS.preside10
11/03/2014 16:25:38 - requesting resource backup00.NBU_POLICY.MAXJOBS.SIG_TX_PRESIDE10
11/03/2014 16:25:38 - granted resource backup00.NBU_CLIENT.MAXJOBS.preside10
11/03/2014 16:25:38 - granted resource backup00.NBU_POLICY.MAXJOBS.SIG_TX_PRESIDE10
11/03/2014 16:25:38 - granted resource MediaID=@aaab7;DiskVolume=PureDiskVolume;DiskPool=pool_msdp_flo7;Path=PureDiskVolume;StorageServer=bckflo07;MediaServer=bckflo07
11/03/2014 16:25:38 - granted resource STU_MSDP_FLO7
11/03/2014 16:25:38 - estimated 12090163 kbytes needed
11/03/2014 16:25:38 - Info nbjm (pid=10476) started backup (backupid=preside10_1415028338) job for client preside10, policy SIG_TX_PRESIDE10, schedule Full_SIG_TX_PRESIDE10 on storage unit STU_MSDP_FLO7
11/03/2014 16:25:40 - started process bpbrm (pid=1732)
11/03/2014 16:30:41 - Error bpbrm (pid=1732) bpcd on preside10 exited with status 23: socket read failed
11/03/2014 16:35:42 - Error bpbrm (pid=1732) cannot send mail because BPCD on preside10 exited with status 23: socket read failed
11/03/2014 16:35:42 - Info bpbkar (pid=0) done. status: 23: socket read failed
11/03/2014 16:35:42 - end writing
socket read failed (23)
11-03-2014 09:41 AM
I think i do. You have a directory the bpbkar process hangs on. Likely a directory with millions of small files.
The good new is you can debug it.
http://www.symantec.com/docs/TECH31513
http://www.symantec.com/docs/TECH29475
Follow instruction in TECH3153 and run a new backup. The bpbkar log file will now show files and folder it processes. And more less it it always the last 10 or 20 lines that reveal in culprit. Re-run the bakup to double check
If you are in doubt, attach the debug log as a file to a post. Do not post the debug text itself - it will be caught by the anti spam robot.
11-03-2014 09:43 AM
Think you have some kind of connectivity issue
Suggest you run the tests here:
https://www-secure.symantec.com/connect/blogs/general-connectivity-troubleshooting
11-03-2014 10:02 AM
Hi, I don't have the "Follow NFS" setting, so I think that TECH29475 donn't aply to this case...
Thanks
11-03-2014 10:16 AM
bptestbpcd and bpclntcmd commands seems to be everything OK so don't seem to be a connectivity issue
11-03-2014 10:47 AM
11-03-2014 11:58 AM
No - I referenced the tech note because it also mentioned the bpbkar_path_tr touch file.
11-03-2014 12:04 PM
I don't think its a connection issue if timeout occur after exactly 10:06 minutes.
Please perform the debug instruction mentioned in TECH31513
In short do the following on the client except from step 4:
11-03-2014 12:19 PM
Could you sastify my curiousity and post the output from the bptestbpcd and bpclntcmd commands ?
11-03-2014 01:30 PM
11/03/2014 16:25:40 - started process bpbrm (pid=1732)
11/03/2014 16:30:41 - Error bpbrm (pid=1732) bpcd on preside10 exited with status 23: socket read failed
11/03/2014 16:35:42 - Error bpbrm (pid=1732) cannot send mail because BPCD on preside10 exited with status 23: socket read failed
It's probably not coincidence that these log entries are each five minutes apart. More verbose logs (as suggested above) will help diagnose the trouble.
11-04-2014 12:19 AM
add CLIENT_READ_TIMEOUT = 1800 to master and media servers as well.
11-05-2014 05:17 AM
Finally you where right! It was a conectivity problem. I performed the bptestbpcd from Master Server, but when I tried from the Media Server I had no answer from the client, even when both machines (media and client) have correct /etc/hosts. The client was in a different vlan, now It is solved.
Thanks a lot!