cancel
Showing results for 
Search instead for 
Did you mean: 

Jobs hung in active and queued state for hours

m3lyan
Level 3
Partner

We have faced same problem today  ,
Jobs in the Activity Monitor are static / hung / stuck / frozen in either an " Active " or " Queued " state
I’ve
Check
Nbdb_ping EMM database online, and ive make full validation 
Bpstulist ... storage unit viewable
Tpconfig –l    all devices up
NBU Services up
/usr/openv/netbackup/bin/admincmd/bpdbjobs –report  show that Jobs are hung on active or queued state
At the end we restart Netbackup services and rerun backup jobs


No system error. No system core dump. No file system full. No memory leak.
No errors logged in the SL8500


any help please
8 REPLIES 8

Andy_Welburn
Level 6
eg:
problems report
/var/adm/messages
NetBackup logs (e.g. bptm)

Anything on the Client(s):
Are they all different O/S's that are hanging or all of a type (e.g. Win2003)?
Anything reported on client (logs/event viewer/process monitor)?
Processes still running on client (bpfis/bpbkar)?

Anything on the jobs:
Have these jobs worked before or is this a new set up?
If worked previously, anything changed recently?
All jobs hanging or just a few?
Anything in job details? (e.g. waiting for resources)

m3lyan
Level 3
Partner
no error in os level
and we have one master-media and another 2 media server in two site , main and dr
problem cant happen on all servers at the same time


client --- different os (some fs some database ..)

this problem suddenly happen
all jobs hanging

Andy_Welburn
Level 6
nothing at all in Job Details & nothing at all changed recently (not just NetBackup but at a corporate level)?

Seeing as nothing is working at the moment, have you tried restarting NetBackup services or, push comes to shove, restarting the Master/Media servers?

m3lyan
Level 3
Partner
At the end we restart Netbackup services and rerun backup jobs
this problem ocuured at 9 march then 15 march and 17 march

rjrumfelt
Level 6
but they are all Windows 2k3 machines - no errors in the event logs, no errors in any of the NBU logs, but you said that this problem occurs across several different operating systems?

rjrumfelt
Level 6
We've attempted both methods.  I've had a case open with Symantec for some time and we've not really gotten very far. 

The closest thing that I can find is that when looking at the bpbkar log, you can see the exact moment when the backup hangs, as it looks like the servers just pass keep_alives back and forth, without exchanging any actual data.

There's a technote out there for that issue, however the size of the keep_alive signals are the correct size - the technote mentions the size of the keep_alives getting corrupted which causes the hang-up.  Nonetheless, I installed an EEB for the issue that did not fix the problem *sighs*

I've had every possible team here check the environment out and they can find no apparent issues.  Symantec is supposed to send our case to back line engineering.  We'll see if that gets us anywhere.

David_McMullin
Level 6

You might check if ANYTHING has changed -

I know I had an issue where I wanted to make two copies of an RMAN backup - and I set the multiple copies =2 on the application schedule not the automatic one - my whole netbackup environment went crazy and I had all kinds of issues.

I would never have thought that changing ONE policy would thrash my whole environment - but it did.

Ask everybody to check for changes - sometimes the smallest ones can cause the most issues.