Forum Discussion

sshagent's avatar
sshagent
Level 4
12 years ago

backups hanging

Is anyone experiencing lots of hung backups?  

We're not using anything fancy, just regular backups to tape(or disk), no advanced features and such.  Linux master, media and clients.  Running 7.5.0.3 so nowhere to go patch wise.

Basically my backups start off, and then they just don't seem to be doing anything.  From what i've seen it looks like bpbkar is disappearing.  

The end of the bpbkar log for a currently hung backup has this...

 

21:11:57.746 [14008] <4> bpbkar: INF - Processing /path/to/bond23/sbp_141/sbp_141_0250
23:14:00.278 [14008] <16> bpbkar: ERR - bpbkar FATAL exit status = 23: socket read failed
23:14:00.278 [14008] <4> bpbkar: INF - EXIT STATUS 23: socket read failed
23:14:00.278 [14008] <4> bpbkar: INF - setenv FINISHED=0
 
...but surely if bpbkar dies, the rest of the processes should abort and error out the exit code....which isnt happening.
If i check the job via vxlogs or bperror there is no sign of that bpbkar error.  
 
Oh its probably worth mentioning there are no firewalls involved either.  Has me puzzled.  Most backups go through, but some don't ( on seemingly random clients and media servers )
 
thanks for your time
 
 
 
 
 
 
  • There is a slight performance hit using CPR, but not huge.

    Every x(180) minutes it will want to perform a checkpoint and will prepare itself for that, but can only actually do it at file/folder boundaries - so it must finish backing up the file it is on before it can actually save the checkpoint.

    That is the reason i asked about the size / possible time that last listed file may take to back up.

    Assuming there are no firewalls etc. the only thing the 2 hours can be put down to, as far as i can see, is the keep alive timeout. The settings i gave earlier may well overcome that for you allowing you to maintain the 180 CPR. By default CPR is 15 minutes and most of my customers use 30 to 60 minutes.

    Perhaps having the CPR in excess of the 2 hour keep alive is the issue so it may be worth having at less that 120.

    Hope this helps

13 Replies