cancel
Showing results for 
Search instead for 
Did you mean: 

ghost job with no information pop up!!!!

Andreas_Holmstr
Level 4
Partner
Hi 

Every now and then ghost jobs(a job id with no information other then the job id) appear in the activity monitor at the top if you sort on "state".

Out put from bpdbjobs on one of the job id's
bpdbjobs -all_columns -jobid 3753431
3753431,,,,,,,,0000000000,0000000000,0000000000,,,,,,,0,,,,,,,,master01.omaccess.net,,,,0,0,0,1,0,,,0000000000,0000000000,0000000000,,,0,0,0,,,,,,,,,,,,,,0,0,,,,,0,,,,


I have searched the internet and a couple of forums to find out why they appear and how to get rid of the but I have not had any luck yet... has any one else experinced anything similar?

I guess the problem could have something to do with a daemon crashing while theses jobs are active or queued, but we have not seen any evidence of that happening... 
 
Master server NetBackup 6.5.5 running on Solaris 10
all media servers run 6.5.5 except two which run 6.5.3 and 6.5.4, and they are mix och Solaris 8 & 10, HP-UX 11.11, Suse Linux and Windows 2003(VCB proxy).  
  
Kind Regards
Andreas

14 REPLIES 14

Andy_Welburn
Level 6
In most cases, an "unknown" job appears just prior to a scheduled job kicking-off & it then changes to that scheduled job - it briefly appears as a 'failed' job (white cross on red circle)

Sometimes if we're having issues & stopping & starting services we get this on a more permanent basis probably due to us stopping/starting services, cancelling & re-starting jobs etc. Again they show up as an "unknown" job and appear as failed. Have found in the GUI that we can just right-click to cancel & then delete.

However, we did find on some earlier release(s) (currently running 6.5.4) that only a restart on NB would remove the offending jobs.

Andreas_Holmstr
Level 4
Partner
the job does not appear to be active because you can't cancle or delete it...  they are not a big issue more annoying.. the will disappear after a few days with the wrap around of the jobs in the activity monitor...

//andeas

Andy_Welburn
Level 6
when you right-click?

(& I know what you mean about annoying!)

Andreas_Holmstr
Level 4
Partner
pic.JPG

Andy_Welburn
Level 6
Another possibilty you could try (came across this in another post this morning whilst looking around & haven't actually tried it) is to run a "sacrificial" job (from an existing policy or a new one set up specifically - the poster had a policy called Hung-Jobs for this). Whilst this "sacrificial" job is running select it as well as the "ghost" jobs, right-click & cancel (the idea being that the "sacrificial" job will allow the cancel job option, whereas the "ghost" jobs on their own don't as you've proved above).

Just an idea - but not mine (even if it works!)

Andreas_Holmstr
Level 4
Partner
Hi in the windows gui the there was some more information. the job was in state "unknown" and I with the right-click the cancel option was available but noting happend, no change of the job status...

I am not quite sure what you mean with "sacrificial job". Should I start a job with the same job Id as the ghost one???

/andreas

Andy_Welburn
Level 6
By sacrificial job I mean any job that you can start manually that you don't want. This is just to get it onto the Activity Monitor so that you can select it with all of the "unknown" or "ghost" jobs. Then right-click on all of these selected jobs ("sacrificial" and "ghost") together so that the cancel job option becomes available (that's why I called it sacrificial - you are sacrificing it to allow you the option to cancel the others). However, you seem to be saying that the cancel option is available in the Windows GUI but doesn't work - so it's a bit of a moot point anyway!

Stumpr2
Level 6
You may need to build up your master server with additional CPU's and memory.
have you done any tuning for Netbackup?

NetBackup 6.5 Backup Planning and Performance Tuning Guide (updated September 10, 2009)
http://support.veritas.com/docs/307083

Minimum system requirements for the Solaris kernel when used with VERITAS NetBackup (tm), defined in /etc/system
http://support.veritas.com/docs/238063

DOCUMENTATION: Tuning Solaris 10 for NetBackup
http://support.veritas.com/docs/308417

.

Andreas_Holmstr
Level 4
Partner
The master server is a T5120 with 64 virtual CPU's and 32GB of memory and is in charge of 10 media servers and a NAS taking care of roughly 2000-2500 jobs per day and the master server is only used as the EMM and master server. 

We have done the normal solaris OS tweaking specified in http://seer.entsupport.symantec.com/docs/307610.htmhttp://seer.entsupport.symantec.com/docs/308417.htm but maybe we should see if there is something more we can do...

I will see what we can do and keep you updated.

PS: the old ghost jobs which turned up a few days ago is gone but instead 9 new one showed up... this is begining to be a pain in the "...." as we can see no other sign of jobs failing!!! 


Thanks for the input
//Andreas

scored_well2
Level 4
Certified
Hi Andreas,

We also have the same problem. I have just upgraded to v6.5.5 from v6.5.2A and I'm now unable to delete these ghost jobs.

Andy, I tried your suggestion with the sacrificial job - got one started, selected that job and all ghosts, right-click and Cancel Job. The sacrificial cancelled just fine but the ghost jobs remain untouched.

Thinking about logging a support call if this continues to annoy.

Benoit_Seg
Level 3
Hi All,

I"m having the same ghost job problem. Here is my set up and the way these jobs appeared :
Master server (also a media server) on Windows 2003 R2 32 bit NBU 6.5.5, 3 media servers on Windows 2003 R2 32bits with NBU 6.5.5 and 1 Windows 2008 64 bit media server.
I upgraded to 6.5.5 version then replaced a media server (which was Windows 2003 R2 32 bits with a new Windows 2008 64 bits).
I have differents OS clients with different NBU version.
I noticed that the ghost jobs appeared when I connect to the master server and restart jobs that failed for End of windows(196) or no tapes(96) available reasons. NBU restarted the job with a new ID and the new jobs finishes with an ok status. The ghost job details appears and tells me that job ID xxx was restarted as jopb xxx client then it says client process aborted (50).
If I right click the job, the cancel job is available but does nothing. I tried deleting the job ID via command line, even restarting the client and the master server didn't change nothing. No process are running on the server or client. I tried deleting the new job and the ghost job at the same time but no luck. 
This started to happen after the updates. The ghost jobs are link to the Master server which is a Media server too and is backing up local clients. 

I was working with Veritas support for another issu but mentioned this problem. They give me some command to run in command line but didn't worked. The first issu was fixed so I guess I will open a case for this.

If someone has any comment or solution!!
Thanks.
Benoit

scored_well2
Level 4
Certified
OK, I logged a call with Symantec so I can't claim that I fixed anything myself. However, I did the following and since then (5 days now?) I have had no further problems...

1. When there are no backup or restore jobs running, stop all NetBackup services/daemons on the master server
2. Delete any files with the same job ID as the hung jobs from the /usr/openv/netbackup/db/jobs/trylogs/ and /usr/openv/netbackup/db/jobs/ffilelogs/ directories on the master server (the <install_path>\veritas\netbackup\db\jobs\trylogs and <install_path>\veritas\netbackup\db\jobs\ffilelogs directories on a Windows master server).  
3. Delete the /usr/openv/netbackup/db/jobs/bpjobd.act.db file (the <install_path>\veritas\netbackup\db\jobs\bpjobd.act.db file on a Windows master server)
4. Restart the NetBackup services/daemons

If anything, I'm wondering if the bpjobd.act.db file had become corrupted / out of synch during the upgrade to v6.5.5, although I ensured that no jobs were active before the upgrade took place. Anyway, the issue has not been repeated since. Hope this helps.

Benoit_Seg
Level 3
Hi all, i followed the indications to delete the files and it solved my problem.

Thanks.

Benoit