Solved: Bad workaround (IMHO)

thesanman · ‎02-23-2010

I have a job sitting in a "waiting for retry" state with a status code of 50. It's child job died earlier today that much I know.

This has happened to me on occasions and I am unable to cancel the job via the Java gui; neither can I delete it. The fix under v6.5.3.1 was to stop and restart NetBackup Master Server (Linux, service netbackup stop then check for processes before service netbackup start). Once done, the job went into a state from where it could be cancelled and deleted but now I've updated to v6.5.5 this stop/start fix no longer works.

Before I log a call with Symantec, anyone out there have any ideas?

Thanks,
Malcolm

thesanman · ‎03-25-2010

There is another way!

My second support call was picked up by someone significantly more knowledgable and he got me to note the Job ID involved, shutdown NetBackup and then run

/usr/openv/netbackup/bin/bpjobd -r <jobid>

After restarting NetBackup the job has gone.

Note this fix was for v6.5.5; try it on other versions as you see fit. Seems it's one of those "in the know" fixes.

View solution in original post

Gerald_W__Gitau · ‎02-24-2010

Have you tried deleting the job from install_path\netbackup\db\jobs. Check the job id on console and delete it from ffilelogs, restart and trylogs folders. Restart the services.

Andy_Welburn · ‎02-24-2010

which closely matches your issue & I'm sure I followed similar method in the early 6.5 days (or was it 5.1? sorry!)

This involves stopping netbackup & killing off any remaining processes as you state, but also deleting the pempersist file before restarting netbackup:

BUG REPORT: A parent job receives a Status 50 (client process aborted) and enters a "Waiting for R...

I would sincerely hope that this BUG hasn't resurfaced with 6.5.5 - maybe it would still be worth that call to Symantec?

thesanman · ‎02-24-2010

Thanks for this info; still no go though despite a full stop/restart of NetBackup Master Server processes.

I have logged a call with Symantec

thesanman · ‎02-24-2010

I've seen a few of these of late (under v6.5.3.1) but a full restart of the NBU Master Server processes put the job in a failed state which you could work with.

Now, under v6.5.5 nothing seems to work. I have logged a call with Symantec

Andy_Welburn · ‎02-24-2010

It'd be nice to see what the resolution is.

thesanman · ‎02-25-2010

Spent a good deal of time on the phone with Symantec today going though this.

We ended up using updatedb and locate to find all occurrences of the job id file under <install_path>/netbackup/db/jobs which we then deleted and then, after making sure there were no active jobs, we stopped netbackup and then deleted the job database! After that it was a simple matter of restarting netbackup which re-creates a nice clean empty job database.

Downside is that you loose your job monitor history so you end up with a blank screen until NBU activity kicks in.

The problem seems to be inside the (progress?) database which you can't get in to to fix. I would contend under v6.5.3, on restart, NBU cleaned up this sort of thing automatically. This doesn't appear to happen anymore.

I will be monitoring this and if I see this again I am able to escalate through the support specialist and provide more info this time. The trick seems to be in my case; don't delete the job the parent job was waiting for!

Malcolm

Andy_Welburn · ‎02-25-2010

but it'd already died it shouldn't matter! :-s

dpigg · ‎03-03-2010

Work Around

UNIX:
netbackup stop
bp.kill_all
bpps -a
(verify all procs have exited)
rm /usr/openv/netbackup/db/jobs/bpjobd.act.db
netbackup start

WINDOWS
bpdown -v -f
bpps
(verify all procs have exited)
del c:\Program Files\VERITAS\Netbackup\db\jobs\bpjobd.act.db
bpup -v

This will remove the jobs that were waiting for retry, active, incomplete or suspended when you stopped the server.
It WILL NOT remove all the job history except for the fore mentioned jobs.

thesanman · ‎03-03-2010

I did this first (with Symantec); I'm afraid it didn't work. On restart the jobs were still there. We had to delete the bpjobd.db file which is why we lost the history.

phscott · ‎03-25-2010

I use my activity monitor DB for populating an external DB for historical data. This is not a good solution. Does anybody know if this shows up in 7.0?

Will_Restore · ‎03-25-2010

or bpjobd ?

thesanman · ‎03-25-2010

There is another way!

My second support call was picked up by someone significantly more knowledgable and he got me to note the Job ID involved, shutdown NetBackup and then run

/usr/openv/netbackup/bin/bpjobd -r <jobid>

After restarting NetBackup the job has gone.

Note this fix was for v6.5.5; try it on other versions as you see fit. Seems it's one of those "in the know" fixes.

thesanman · ‎03-25-2010

well spotted; I've edited my post.

Giroevolver · ‎03-26-2010

Thesanman is correct I have had this issue to and symantec eventually got it fixed usingthe bpjobd -r <jobid>

The same issue occours in NetBackup 7

VOX

How to clear and delete "waiting for retry" status 50 job?