cancel
Showing results for 
Search instead for 
Did you mean: 

Netbackup Jobs Hanging Overnight

neilrsayers
Level 2

Hello All,

Over the past few days i've been logging in to the Netbackup Admin Console to find a few backup jobs have hung. After serveral hours of waiting, the jobs did not progress at all.

When i re-run these jobs as a manual backup, they all complete fine. We are running Netbackup 7.5.0.4 and have a master server and 2x media servers with a MSDP. The backups are hanging on both VMWare and MS-Windows backup policies. Has anyone come accross this on 7.5.0.4?

Thanks

Neil

11 REPLIES 11

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

hi,

are you observing this job haning for all the media servers?

did you see any high utilizations of the CPU and RAM of the servers?

does nbdb.log fine is increasing pretty high in size?

did all the jobs starting at same time, if yes did you try to schedule them in different times with in the avaliable backup window?

 

neilrsayers
Level 2

Thanks for your reply.

These jobs are hanging for all of our media servers. Our SLP stores the data in 2 locations  - the main dedupe stoage pool, and a backup to disk which expires after duplication to tape. CPU and RAM is fine. NBDB log was <1mb as catalog backups had recently taken place.

Backup job start times are already staggered. The ones that hang seem to be larger jobs that generally finish after 4:30 am. Backup job starting windows are between 6-11 pm.

I've just noticed that catalog backups are failing with exit status 84 when backing up to the dedupe pool. Perhaps this is related?

rvwilliams73
Level 2

I've seen this with windows clients for OS or sql jobs where the parent job will hang.  Any future scheduled job (as noted manual saves work), won't run until that job is cleared. currently I just go in and clear from the media server, the associated bpbrm process.  This will kill the job.

Mark_Solutions
Level 6
Partner Accredited Certified

No reply since December 2012 - prehaps it has already been resolved?

This does sound like a resource issue but more likely a memory related one than anything else, especially if processes are left hanging - as everything is up to date i would say it is a Desktop Heap issue based n them being windows servers - great little issue that can not actually log anything anywhere (though it does on occasions leave a message in the system event log)

If it is not resolved perhaps Neil could come back to us and tell us if he found anything

rvwilliams73
Level 2

I do see it on my larger boxes more often (or mind is playing tricks); lower disk space, higher cpu usage and more SQL dbs

 

joerglerche
Level 2

 

Hi all,

after upgrading my two NetBackup environments from 7.1.0.4 to 7.5.0.4 I reproducably find processes, that just hang around for days. There are monitoring type commands like nbdevquery, but also more important task like bpduplicate just doing nothing.

I don't use MSDP.

Might this be related in any way ?

Thanks,
Joerg

 

[ root@f1bsd1:/localnet/var/f1bsd1_nbu_configdump ]
> date
Mon Feb 18 17:06:14 CET 2013

[ root@f1bsd1:/localnet/var/f1bsd1_nbu_configdump ]
> ps -ef | awk '/nbdevquery|bpduplicate|bpexpdate|nbdelete/ '
root       667   665  0 09:06 ?        00:00:00 /usr/openv/netbackup/bin/admincmd/nbdevquery -listdv -stype AdvancedDisk -dp diskpool_nbstorage
root     11126 11124  0 Feb17 ?        00:00:00 /usr/openv/netbackup/bin/admincmd/nbdevquery -listdv -stype AdvancedDisk -dp f1fsbd1_NBstorage
root     11851 11849  0 15:06 ?        00:00:00 /usr/openv/netbackup/bin/admincmd/nbdevquery -listdp -stype AdvancedDisk
root     22106 22105  0 Feb15 ?        00:00:00 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit f1i2000_dc4_d1 -dp Oracle -Bidfile tmp/totape.21584.bid
root     25860 25858  0 06:06 ?        00:00:00 /usr/openv/netbackup/bin/admincmd/nbdevquery -listdv -stype AdvancedDisk -dp f1fsbd1_NBstorage
root     28028 28025  0 Feb17 ?        00:00:00 /usr/openv/netbackup/bin/admincmd/nbdevquery -listdv -stype AdvancedDisk -dp f1fsbd1_NBstorage

Marianne
Level 6
Partner    VIP    Accredited Certified
Are you working with the original poster (neil) or are you experiencing a similar issue?

joerglerche
Level 2

No, I described my own problem as it seems similar in some aspects and after opening a Symantec support case I was looking for related information.

mcrigg
Level 3

I am having the same issue after upgrading to 7.5.0.5.  Jobs just hang until they are cancelled, and once they are restarted, they complete successfully.  Oddly enough I am not having the issue on servers that I rolled back to 7.0.1.  If someone finds a solution please share as I am having a hard time getting support to call me back.

Fabrice_P_
Level 4
Certified

On 7.5.0.5 there is a specific bug which can sometimes make you think that a job is hang but it is actually completed. Please check the "detailed status" of the affected jobs to see if it is not hang after the completion (status 0).

joerglerche
Level 2

In my case the following turned out (but hat may not exactly match this thread's initial issue):

We have to use NBAZ and after poking around in the traces, we saw that access control operations prevented things to continue. After checking

bpnbat -whoami

I noticed that the credentials of the root user (that is running several scripts) were invalid. after refreshing those credetials with

bpnbat -login

everything runs smoothly now !

Regards,
Joerg