12-06-2012 02:46 PM
Hello All,
Over the past few days i've been logging in to the Netbackup Admin Console to find a few backup jobs have hung. After serveral hours of waiting, the jobs did not progress at all.
When i re-run these jobs as a manual backup, they all complete fine. We are running Netbackup 7.5.0.4 and have a master server and 2x media servers with a MSDP. The backups are hanging on both VMWare and MS-Windows backup policies. Has anyone come accross this on 7.5.0.4?
Thanks
Neil
12-06-2012 08:19 PM
hi,
are you observing this job haning for all the media servers?
did you see any high utilizations of the CPU and RAM of the servers?
does nbdb.log fine is increasing pretty high in size?
did all the jobs starting at same time, if yes did you try to schedule them in different times with in the avaliable backup window?
12-06-2012 08:42 PM
Thanks for your reply.
These jobs are hanging for all of our media servers. Our SLP stores the data in 2 locations - the main dedupe stoage pool, and a backup to disk which expires after duplication to tape. CPU and RAM is fine. NBDB log was <1mb as catalog backups had recently taken place.
Backup job start times are already staggered. The ones that hang seem to be larger jobs that generally finish after 4:30 am. Backup job starting windows are between 6-11 pm.
I've just noticed that catalog backups are failing with exit status 84 when backing up to the dedupe pool. Perhaps this is related?
01-30-2013 08:46 AM
I've seen this with windows clients for OS or sql jobs where the parent job will hang. Any future scheduled job (as noted manual saves work), won't run until that job is cleared. currently I just go in and clear from the media server, the associated bpbrm process. This will kill the job.
01-30-2013 10:11 AM
No reply since December 2012 - prehaps it has already been resolved?
This does sound like a resource issue but more likely a memory related one than anything else, especially if processes are left hanging - as everything is up to date i would say it is a Desktop Heap issue based n them being windows servers - great little issue that can not actually log anything anywhere (though it does on occasions leave a message in the system event log)
If it is not resolved perhaps Neil could come back to us and tell us if he found anything
01-30-2013 11:09 AM
I do see it on my larger boxes more often (or mind is playing tricks); lower disk space, higher cpu usage and more SQL dbs
02-18-2013 08:13 AM
Hi all,
after upgrading my two NetBackup environments from 7.1.0.4 to 7.5.0.4 I reproducably find processes, that just hang around for days. There are monitoring type commands like nbdevquery, but also more important task like bpduplicate just doing nothing.
I don't use MSDP.
Might this be related in any way ?
Thanks,
Joerg
[ root@f1bsd1:/localnet/var/f1bsd1_nbu_configdump ] > date Mon Feb 18 17:06:14 CET 2013 [ root@f1bsd1:/localnet/var/f1bsd1_nbu_configdump ] > ps -ef | awk '/nbdevquery|bpduplicate|bpexpdate|nbdelete/ ' root 667 665 0 09:06 ? 00:00:00 /usr/openv/netbackup/bin/admincmd/nbdevquery -listdv -stype AdvancedDisk -dp diskpool_nbstorage root 11126 11124 0 Feb17 ? 00:00:00 /usr/openv/netbackup/bin/admincmd/nbdevquery -listdv -stype AdvancedDisk -dp f1fsbd1_NBstorage root 11851 11849 0 15:06 ? 00:00:00 /usr/openv/netbackup/bin/admincmd/nbdevquery -listdp -stype AdvancedDisk root 22106 22105 0 Feb15 ? 00:00:00 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit f1i2000_dc4_d1 -dp Oracle -Bidfile tmp/totape.21584.bid root 25860 25858 0 06:06 ? 00:00:00 /usr/openv/netbackup/bin/admincmd/nbdevquery -listdv -stype AdvancedDisk -dp f1fsbd1_NBstorage root 28028 28025 0 Feb17 ? 00:00:00 /usr/openv/netbackup/bin/admincmd/nbdevquery -listdv -stype AdvancedDisk -dp f1fsbd1_NBstorage
02-18-2013 08:31 AM
02-18-2013 08:37 AM
No, I described my own problem as it seems similar in some aspects and after opening a Symantec support case I was looking for related information.
03-07-2013 08:48 AM
I am having the same issue after upgrading to 7.5.0.5. Jobs just hang until they are cancelled, and once they are restarted, they complete successfully. Oddly enough I am not having the issue on servers that I rolled back to 7.0.1. If someone finds a solution please share as I am having a hard time getting support to call me back.
03-07-2013 09:03 AM
On 7.5.0.5 there is a specific bug which can sometimes make you think that a job is hang but it is actually completed. Please check the "detailed status" of the affected jobs to see if it is not hang after the completion (status 0).
03-08-2013 12:07 AM
In my case the following turned out (but hat may not exactly match this thread's initial issue):
We have to use NBAZ and after poking around in the traces, we saw that access control operations prevented things to continue. After checking
bpnbat -whoami
I noticed that the credentials of the root user (that is running several scripts) were invalid. after refreshing those credetials with
bpnbat -login
everything runs smoothly now !
Regards,
Joerg