cancel
Showing results for 
Search instead for 
Did you mean: 

Incremental backups for HP-UX client intermittently failing with status 13

WayneLackey
Level 5

Environment:

Master and media servers: Windows Server 2008 running NBU 7.6.0.1

Client: HP-UX B.11.31 running NBU client 7.1

Problem: One HP-UX client has recently started having intermittent backup failures on its incremental backups. The incremental backup kicks off as normal, it starts writing for a bit, then fails. Full backups are successful. I am sometimes able to re-run the incremental backup in the morning with success, sometimes not. Backup job detail follows:

2/4/2014 8:37:03 AM - Info nbjm(pid=20084) starting backup job (jobid=2911692) for client bkusolalhpv02, policy AL_FF_Prod_Weekly, schedule Differential 
2/4/2014 8:37:03 AM - Info nbjm(pid=20084) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=2911692, request id:{A51247EB-BE3B-44BF-970D-9EA38464D63A}) 
2/4/2014 8:37:03 AM - requesting resource ALL_DSSU
2/4/2014 8:37:03 AM - requesting resource bkusolcwbdcs001.NBU_CLIENT.MAXJOBS.bkusolalhpv02
2/4/2014 8:37:03 AM - requesting resource bkusolcwbdcs001.NBU_POLICY.MAXJOBS.AL_FF_Prod_Weekly
2/4/2014 8:38:32 AM - granted resource bkusolcwbdcs001.NBU_CLIENT.MAXJOBS.bkusolalhpv02
2/4/2014 8:38:32 AM - granted resource bkusolcwbdcs001.NBU_POLICY.MAXJOBS.AL_FF_Prod_Weekly
2/4/2014 8:38:32 AM - granted resource MediaID=@aaadG;DiskVolume=V:\;DiskPool=DCS008-V-R9;Path=V:\;StorageServer=bkusolpwbdcs008;MediaServe...
2/4/2014 8:38:32 AM - granted resource DCS008-V-R9
2/4/2014 8:38:32 AM - estimated 143764416 Kbytes needed
2/4/2014 8:38:32 AM - Info nbjm(pid=20084) started backup (backupid=bkusolalhpv02_1391521112) job for client bkusolalhpv02, policy AL_FF_Prod_Weekly, schedule Differential on storage unit DCS008-V-R9
2/4/2014 8:38:34 AM - started process bpbrm (6272)
2/4/2014 8:38:35 AM - Info bpbrm(pid=6272) bkusolalhpv02 is the host to backup data from    
2/4/2014 8:38:35 AM - Info bpbrm(pid=6272) reading file list for client       
2/4/2014 8:38:35 AM - connecting
2/4/2014 8:38:38 AM - Info bpbrm(pid=6272) starting bpbkar32 on client        
2/4/2014 8:38:38 AM - Info bpbkar32(pid=11839) Backup started          
2/4/2014 8:38:38 AM - Info bptm(pid=7392) start           
2/4/2014 8:38:38 AM - connected; connect time: 0:00:03
2/4/2014 8:38:39 AM - Info bptm(pid=7392) using 1048576 data buffer size       
2/4/2014 8:38:39 AM - Info bptm(pid=7392) setting receive network buffer to 4195328 bytes     
2/4/2014 8:38:39 AM - Info bptm(pid=7392) using 48 data buffers        
2/4/2014 8:38:39 AM - Info bptm(pid=7392) start backup          
2/4/2014 8:38:40 AM - Info bptm(pid=7392) backup child process is pid 6680.5344      
2/4/2014 8:38:40 AM - Info bptm(pid=6680) start           
2/4/2014 8:38:40 AM - begin writing
2/4/2014 8:38:42 AM - Info bpbrm(pid=6272) from client bkusolalhpv02: TRV - [/var/hpsrp/alhp2ap/export/customer/prod/dsal/data01/hist] is in a different file system from [/var/hpsrp/alhp2ap/export/customer/prod/dsal/data01]. Skipping
2/4/2014 8:38:42 AM - Info bpbrm(pid=6272) from client bkusolalhpv02: TRV - [/var/hpsrp/alhp2ap/export/customer/prod/dsal/data01/encounters] is in a different file system from [/var/hpsrp/alhp2ap/export/customer/prod/dsal/data01]. Skipping
2/4/2014 8:38:58 AM - Info bpbrm(pid=6272) from client bkusolalhpv02: TRV - [/var/hpsrp/alhp2ap/export/customer/prod/dsal/data01] is in a different file system from [/var/hpsrp/alhp2ap/export/customer]. Skipping
2/4/2014 8:38:58 AM - Info bpbrm(pid=6272) from client bkusolalhpv02: TRV - [/var/hpsrp/alhp2ap/export/customer/prod/dsal/data02] is in a different file system from [/var/hpsrp/alhp2ap/export/customer]. Skipping
... (cut out a bunch of this Skipping stuff for the sake of brevity)
2/4/2014 9:19:30 AM - Info bpbrm(pid=6272) from client bkusolalhpv02: TRV - [/var/hpsrp/alhp1ap/var/spool/sockets/pwgr/client2162] is a socket special file. Skipping
2/4/2014 9:19:30 AM - Info bpbrm(pid=6272) from client bkusolalhpv02: TRV - [/var/hpsrp/alhp1ap/var/spool/sockets/pwgr/client2170] is a socket special file. Skipping
2/4/2014 9:19:30 AM - Info bpbrm(pid=6272) from client bkusolalhpv02: TRV - [/var/hpsrp/alhp1ap/var/spool/sockets/pwgr/client2172] is a socket special file. Skipping
2/4/2014 9:19:30 AM - Error bpbrm(pid=6272) socket read failed, An existing connection was forcibly closed by the remote host.  (10054)
2/4/2014 9:19:32 AM - Info bpbkar32(pid=11839) done. status: 13: file read failed      
2/4/2014 9:19:32 AM - end writing; write time: 0:40:52
file read failed(13)

Any ideas?

Thanks,

Wayne

6 REPLIES 6

sri_vani
Level 6
Partner

I think for ur two posts--the issue will get resolve if you exlude /dev directory.

Try the backup by excluding dev directory as it is not reqd

                                                                                          OR Add IGNORE_XATTR = YES to bp.conf

 

Ref links:

http://www.symantec.com/business/support/index?page=content&id=TECH73719

http://www.symantec.com/business/support/index?page=content&id=TECH71070

WayneLackey
Level 5

Thank you for the timely response - I am having the system admin add the /dev directory to the exclude list, and we'll see if this alleviates the problem.

Thanks,

Wayne

Mark_Solutions
Level 6
Partner Accredited Certified

As per the other post:

Because of the files it is reading and skipping it is taking a long time to prepare its data stream to apss to the media server

You will see it gets to about 10 minutes into the job when it fails - that is 300 seconds

300 seconds is the default client read timeout so try changing that to 1800 initially - do this on the Timeout tab of the Media Servers Host properties (not the clients)

Hope this helps

WayneLackey
Level 5

Mark - the client read timeouts on the media servers are currently set at 3600.

Thanks,

Wayne

Mark_Solutions
Level 6
Partner Accredited Certified

OK - i had misread the log anyway - it fails after 43 minutes which is about 2580 seconds - so 3600 is still OK

What gets me is this:

2/4/2014 9:19:30 AM - Error bpbrm(pid=6272) socket read failed, An existing connection was forcibly closed by the remote host. (10054)

and that is network / timeout related

Perhaps a firewall or a keep alive setting somewhere if all of your timeouts are OK

Nicolai
Moderator
Moderator
Partner    VIP   

I suspect the last directory processed contain a very large number of files witch causes the backup to fails. Likely the number of MB process is more or less the same before the backup fails.

The number one culprit is the Oracle audit directory. Search for .aud files and if a large number is found ask the DBA to clean-up