cancel
Showing results for 
Search instead for 
Did you mean: 

After Status 50 & UNKNOWN JOBS....

Malak
Level 4
Partner

... the JOBs simply don't do nothing.....

Hello All,

Windows 2008 R2 servers (2 Master in cluster, 9 Medias)

NetBackup 7.1.0.3, in all servers.

We have now jobs that are not doing nothing, for hours, and days, they never end.

And they never run again, if they are hourly JOBs, or daily JOBs, they don't run again until last job ends, as it never ends.... we don't have regular backups.

This JOBs cannot be deleted, the only option (That I know) is to STOP all NB services an delete it from DB by hand… like:

• Shutdown NetBackup Master AND Media Server processes
• Verify no active NetBackup processes on the Master server
* Verify no active NetBackup processes on all Media servers
• Run bpjobd -r <jobid> for each ghost jobid.

Will this ever be resolved?! OR does anyone know another, less disruptive way, to remove this “never ending” JOBs?

 

Thank you

32 REPLIES 32

Amaan
Level 6

You can try this:

http://www.symantec.com/business/support/index?page=content&id=TECH35484

It will help.

Let us know if this will not work.

Mark_Solutions
Level 6
Partner Accredited Certified

I prefer this one as it always works:

http://www.symantec.com/docs/TECH43177

Although you only need to delete the try and ffile files that relate to the hiung job id's

There was also this one but it says fixed in 7.1.0.2:

http://www.symantec.com/docs/TECH146990

The question is why this is happening to you in the first place?

You need to deal with these as per my first tech note above otherwise some policies may not be running

Malak
Level 4
Partner

Hello Amaan,

Thank you for your reply.

 

When I execute the cancel command, I get the following:

C:\Users\Administrator>bpdbjobs -cancel 1118359
Canceling 0 jobs

It looks like the JOB has ended but console monitor does not know....


Also, it looks like the same policy is able to run, besides there’s other JOB in running mode… (I’m not following the reason of this…)

Monday I’ll do more testes.

Mark_Solutions
Level 6
Partner Accredited Certified

OK - follow the first link I provided above (http://www.symantec.com/docs/TECH43177) to get rid of your orphaned jobs from the console - this always works and will resolve those for you.

If you find that you have jobs that have completed but the parent jobs are hung then it may be as a result of other things.

It may be a port blockage (running out of ports so parent status never gets updated) - you can help this in two ways so ensure that you do the following on the Master Server:

1. Add the following registry key (needs a reboot to take effect):

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
DWORD – TcpTimedWaitDelay  - Decimal Value of 30

2. Run this command from an Adminstrative command prompt (no reboot required, takes effect immediately and is persistent):

netsh int ipv4 set dynamicport tcp start=10000 num=50000

Next, if may be that the server runs out of desktop head so also check the following:

The desktop heap is increased by editing the "Windows" key in

HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\

Part of this key has a section that reads similar to: Windows SharedSection=1024,20480,1024

The last of the three figures needed changing to increase the desktop heap so that it now reads:

Windows SharedSection=1024,20480,2048

This solved my own issue with hung parent jobs: https://www-secure.symantec.com/connect/forums/hung-parent-jobs

Next it may be a PagedPool issue so also tune that:

To tune your paging use the following registry keys (need a reboot) - create them if not already there:

HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management\

DWORD - PoolUsageMaximum  - Decimal value of 40

DWORD - PagedPoolSize Hex value of FFFFFFFF (this is 8 x F)

All of these will help your system anyway so worth doing

Hope this helps

Malak
Level 4
Partner

Thank you for your post Mark.
I'll try to get this solution into production today. I'll give you feedback when possible.

Malak
Level 4
Partner

Sorry for the delay on this report...

Problem still occur after Mark suggestions.

Mark_Solutions
Level 6
Partner Accredited Certified

So are you saying that you removed all of the Status 50 jobs by following my link but more have appeared?

If so then you have a process failing on your system

Check the Application and System Event logs for application failures / popups etc. so that we can ping down what is going wrong

snapshot4
Level 3
Malak I have the same issue every so often, but I don't know of any online solution to get rid of those. The process you follow is the same that I do. However, the best way to prevent this is when restarting any failed backups wait until you see them queued, once the are queued you can delete the failed jobs. The jobs your talking about are caused from restarting a job and then immediatly deleting them.

Malak
Level 4
Partner

I have no interaction with this JOBs. This JOBs ran normally when window start, 7PM, 8PM, 11PM, 3AM, etc .... I do not restart them, and I do not cancel them...

When I notice these JOBS they are already "running" for 3 or 4 days.... This is when I stop netbackup and delete them from database.

I have 15.000 JOBS running daily, it’s impossible for me to keep them all under "surveillance"...

Mark_Solutions
Level 6
Partner Accredited Certified

If you have that many jobs running per day then it could quite possibly be an application or memory issue

If the majority of them are parent jobs then try the desktop heap setting has helped me in the past

How much RAM does your server have and what is your desktop heap setting?

Malak
Level 4
Partner

This server has 12GB memory.

 

Is this the section for DesktopHeap?

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows

"... Windows SharedSection=1024,20480,2048 ..."

 

Thank you

 

Mark_Solutions
Level 6
Partner Accredited Certified

Have you typed that correctly?

the middle figure 20480 looks wrong!

Malak
Level 4
Partner

I did a copy+past... this configuration was done based on your previous post....

Mark_Solutions
Level 6
Partner Accredited Certified

What were the figures originally?

Mark_Solutions
Level 6
Partner Accredited Certified

Ok - that is fine then - and you have increase it but still having errors

Did you do the other changes for the PagedPool?

As a Strategy you need to do the following (I know it is not easy to find the time but it is worth it)

1. Apply the PagedPool setting I gace you earlier

2. Cleanup the orphaned jobs as per the tech note I gave (needs NBU downtime)

3. When a failure ocurrs check out the Application and System event logs for anything happening - either processes dying / crashing or memory / system warnings

If you wish then upload your Application and System Event logs from the Master Server (in evtx format) and we can take a look to see what we can spot

Malak
Level 4
Partner

It was like this: Windows SharedSection=1024,20480,1024 (but, this could already be wrongly changed.....)

Malak
Level 4
Partner

Mark, I did all changes in your post...
I'm not looking daily at system logs, but 'l take a look...

"...

OK - follow the first link I provided above (http://www.symantec.com/docs/TECH43177) to get rid of your orphaned jobs from the console - this always works and will resolve those for you.

If you find that you have jobs that have completed but the parent jobs are hung then it may be as a result of other things.

It may be a port blockage (running out of ports so parent status never gets updated) - you can help this in two ways so ensure that you do the following on the Master Server:

1. Add the following registry key (needs a reboot to take effect):

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
DWORD – TcpTimedWaitDelay  - Decimal Value of 30

2. Run this command from an Adminstrative command prompt (no reboot required, takes effect immediately and is persistent):

netsh int ipv4 set dynamicport tcp start=10000 num=50000

Next, if may be that the server runs out of desktop head so also check the following:

The desktop heap is increased by editing the "Windows" key in

HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\

Part of this key has a section that reads similar to: Windows SharedSection=1024,20480,1024

The last of the three figures needed changing to increase the desktop heap so that it now reads:

Windows SharedSection=1024,20480,2048

This solved my own issue with hung parent jobs: https://www-secure.symantec.com/connect/forums/hung-parent-jobs

Next it may be a PagedPool issue so also tune that:

To tune your paging use the following registry keys (need a reboot) - create them if not already there:

HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management\

DWORD - PoolUsageMaximum  - Decimal value of 40

DWORD - PagedPoolSize Hex value of FFFFFFFF (this is 8 x F)

All of these will help your system anyway so worth doing

... "

snapshot4
Level 3

Are you using storage lifecycle policies?

Malak
Level 4
Partner

Hello snapshot4,

no we are not.