cancel
Showing results for 
Search instead for 
Did you mean: 

How to clear and delete "waiting for retry" status 50 job?

thesanman
Level 6
I have a job sitting in a "waiting for retry" state with a status code of 50.  It's child job died earlier today that much I know.

This has happened to me on occasions and I am unable to cancel the job via the Java gui; neither can I delete it.  The fix under v6.5.3.1 was to stop and restart NetBackup Master Server (Linux, service netbackup stop then check for processes before service netbackup start).  Once done, the job went into a state from where it could be cancelled and deleted but now I've updated to v6.5.5 this stop/start fix no longer works.

Before I log a call with Symantec, anyone out there have any ideas?

Thanks,
Malcolm
1 ACCEPTED SOLUTION

Accepted Solutions

thesanman
Level 6
There is another way! 

My second support call was picked up by someone significantly more knowledgable and he got me to note the Job ID involved, shutdown NetBackup and then run

/usr/openv/netbackup/bin/bpjobd -r <jobid>

After restarting NetBackup the job has gone.

Note this fix was for v6.5.5; try it on other versions as you see fit.  Seems it's one of those "in the know" fixes.



View solution in original post

14 REPLIES 14

Gerald_W__Gitau
Level 6
Certified
Have you tried deleting the job from install_path\netbackup\db\jobs. Check the job id on console and delete it from ffilelogs, restart and trylogs folders. Restart the services.

Andy_Welburn
Level 6
which closely matches your issue & I'm sure I followed similar method in the early 6.5 days (or was it 5.1? sorry!)

This involves stopping netbackup & killing off any remaining processes as you state, but also deleting the pempersist file before restarting netbackup:

BUG REPORT: A parent job receives a Status 50 (client process aborted) and enters a "Waiting for R...

I would sincerely hope that this BUG hasn't resurfaced with 6.5.5 - maybe it would still be worth that call to Symantec?

thesanman
Level 6
Thanks for this info; still no go though despite a full stop/restart of NetBackup Master Server processes. 

I have logged a call with Symantec

thesanman
Level 6
I've seen a few of these of late (under v6.5.3.1) but a full restart of the NBU Master Server processes put the job in a failed state which you could work with.

Now, under v6.5.5 nothing seems to work.  I have logged a call with Symantec

Andy_Welburn
Level 6
It'd be nice to see what the resolution is.

thesanman
Level 6
Spent a good deal of time on the phone with Symantec today going though this.

We ended up using updatedb and locate to find all occurrences of the job id file under <install_path>/netbackup/db/jobs which we then deleted and then, after making sure there were no active jobs, we stopped netbackup and then deleted the job database!  After that it was a simple matter of restarting netbackup which re-creates a nice clean empty job database.

Downside is that you loose your job monitor history so you end up with a blank screen until NBU activity kicks in.

The problem seems to be inside the (progress?) database which you can't get in to to fix.  I would contend under v6.5.3, on restart, NBU cleaned up this sort of thing automatically.  This doesn't appear to happen anymore.

I will be monitoring this and if I see this again I am able to escalate through the support specialist and provide more info this time.  The trick seems to be in my case; don't delete the job the parent job was waiting for!

Malcolm

Andy_Welburn
Level 6
but it'd already died it shouldn't matter! :-s

dpigg
Not applicable

Work Around

UNIX:
netbackup stop
bp.kill_all
bpps -a
(verify all procs have exited)
 rm /usr/openv/netbackup/db/jobs/bpjobd.act.db
netbackup start

WINDOWS
bpdown -v -f
bpps
(verify all procs have exited)
del c:\Program Files\VERITAS\Netbackup\db\jobs\bpjobd.act.db
bpup -v

This will remove the jobs that were waiting for retry, active, incomplete or suspended when you stopped the server. 
It WILL NOT remove all the job history except for the fore mentioned jobs.

thesanman
Level 6
I did this first (with Symantec); I'm afraid it didn't work.  On restart the jobs were still there.  We had to delete the bpjobd.db file which is why we lost the history.

phscott
Level 3
I use my activity monitor DB for populating an external DB for historical data.  This is not a good solution.  Does anybody know if this shows up in 7.0?

Will_Restore
Level 6
or bpjobd ?

thesanman
Level 6
There is another way! 

My second support call was picked up by someone significantly more knowledgable and he got me to note the Job ID involved, shutdown NetBackup and then run

/usr/openv/netbackup/bin/bpjobd -r <jobid>

After restarting NetBackup the job has gone.

Note this fix was for v6.5.5; try it on other versions as you see fit.  Seems it's one of those "in the know" fixes.



thesanman
Level 6
well spotted; I've edited my post.

Giroevolver
Level 6

Thesanman is correct I have had this issue to and symantec eventually got it fixed usingthe bpjobd -r <jobid>

The same issue occours in NetBackup 7