cancel
Showing results for 
Search instead for 
Did you mean: 

error code 23 after 10:06 minutes execution

madmax00
Level 3

Hi all,

I have a policy that always fails with error code 23 after exactly 10 minutes and 6 seconds, no matter if it's a full or incremental backup. I think it should be a timeout config. but I don't find which one. I'd tried changing CLIENT READ TIMEOUT in the client from 300 to 3600 but still the same issue... any idea?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Michael_G_Ander
Level 6
Certified

Think you have some kind of connectivity issue

Suggest you run the tests here:

https://www-secure.symantec.com/connect/blogs/general-connectivity-troubleshooting

 

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

View solution in original post

13 REPLIES 13

Michael_G_Ander
Level 6
Certified

Could be something on a specific file system/mount point, assuming you are doing a file backup

Please post the full Job Details from the failing backup

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

madmax00
Level 3

Here it is:

11/03/2014 16:25:38 - Info nbjm (pid=10476) starting backup job (jobid=1737730) for client preside10, policy SIG_TX_PRESIDE10, schedule Full_SIG_TX_PRESIDE10
11/03/2014 16:25:38 - Info nbjm (pid=10476) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=1737730, request id:{AA35F3E8-636D-11E4-AB5E-8888AE45A246})
11/03/2014 16:25:38 - requesting resource STU_MSDP_FLO7
11/03/2014 16:25:38 - requesting resource backup00.NBU_CLIENT.MAXJOBS.preside10
11/03/2014 16:25:38 - requesting resource backup00.NBU_POLICY.MAXJOBS.SIG_TX_PRESIDE10
11/03/2014 16:25:38 - granted resource  backup00.NBU_CLIENT.MAXJOBS.preside10
11/03/2014 16:25:38 - granted resource  backup00.NBU_POLICY.MAXJOBS.SIG_TX_PRESIDE10
11/03/2014 16:25:38 - granted resource  MediaID=@aaab7;DiskVolume=PureDiskVolume;DiskPool=pool_msdp_flo7;Path=PureDiskVolume;StorageServer=bckflo07;MediaServer=bckflo07
11/03/2014 16:25:38 - granted resource  STU_MSDP_FLO7
11/03/2014 16:25:38 - estimated 12090163 kbytes needed
11/03/2014 16:25:38 - Info nbjm (pid=10476) started backup (backupid=preside10_1415028338) job for client preside10, policy SIG_TX_PRESIDE10, schedule Full_SIG_TX_PRESIDE10 on storage unit STU_MSDP_FLO7
11/03/2014 16:25:40 - started process bpbrm (pid=1732)
11/03/2014 16:30:41 - Error bpbrm (pid=1732) bpcd on preside10 exited with status 23: socket read failed
11/03/2014 16:35:42 - Error bpbrm (pid=1732) cannot send mail because BPCD on preside10 exited with status 23: socket read failed
11/03/2014 16:35:42 - Info bpbkar (pid=0) done. status: 23: socket read failed
11/03/2014 16:35:42 - end writing
socket read failed  (23)

 

Nicolai
Moderator
Moderator
Partner    VIP   

I think i do. You have a directory the bpbkar process hangs on. Likely a directory with millions of small files.

The good new is you can debug it.

http://www.symantec.com/docs/TECH31513

http://www.symantec.com/docs/TECH29475

Follow instruction in TECH3153 and run a new backup. The bpbkar log file will now show files and folder it processes. And more less it it always the last 10 or 20 lines that reveal in culprit. Re-run the bakup to double check

If you are in doubt, attach the debug log as a file to a post. Do not post the debug text itself - it will be caught by the anti spam robot.

Michael_G_Ander
Level 6
Certified

Think you have some kind of connectivity issue

Suggest you run the tests here:

https://www-secure.symantec.com/connect/blogs/general-connectivity-troubleshooting

 

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

madmax00
Level 3

Hi, I don't have the "Follow NFS" setting, so I think that TECH29475 donn't aply to this case...

Thanks

madmax00
Level 3

bptestbpcd and bpclntcmd commands seems to be everything OK so don't seem to be a connectivity issue

mph999
Level 6
Employee Accredited
Suggest bpbrm log (media server) and bpbkar + bpcd (client) at verbose 5

Nicolai
Moderator
Moderator
Partner    VIP   

No - I referenced the tech note because it also mentioned the bpbkar_path_tr touch file.

Nicolai
Moderator
Moderator
Partner    VIP   

I don't think its a connection issue if timeout occur after exactly 10:06 minutes.

Please perform the debug instruction mentioned in TECH31513

In short do the following on the client except from step 4:

  1. touch /usr/openv/netbackup/bpbkar_path_tr
  2. mkdir /usr/openv/netbackup/log/bpbkar
  3. Add VERBOSE to /usr/openv/netbackup/bp.conf
  4. run backup
  5. Inspect /usr/openv/netbackup/logs/bpbkar/{date}.log

Michael_G_Ander
Level 6
Certified

Could you sastify my curiousity and post the output from the bptestbpcd and bpclntcmd commands ?

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

CRZ
Level 6
Employee Accredited Certified

11/03/2014 16:25:40 - started process bpbrm (pid=1732)
11/03/2014 16:30:41 - Error bpbrm (pid=1732) bpcd on preside10 exited with status 23: socket read failed
11/03/2014 16:35:42 - Error bpbrm (pid=1732) cannot send mail because BPCD on preside10 exited with status 23: socket read failed

It's probably not coincidence that these log entries are each five minutes apart.  More verbose logs (as suggested above) will help diagnose the trouble.

Nicolai
Moderator
Moderator
Partner    VIP   

add CLIENT_READ_TIMEOUT = 1800 to master and media servers as well.

madmax00
Level 3

Finally you where right! It was a conectivity problem. I performed the bptestbpcd from Master Server, but when I tried from the Media Server I had no answer from the client, even when both machines (media and client) have correct /etc/hosts. The client was in a different vlan, now It is solved.

Thanks a lot!