cancel
Showing results for 
Search instead for 
Did you mean: 

Hung Parent Jobs

Mark_Solutions
Level 6
Partner Accredited Certified

Hi

For a change i have an issue that i need some help on

NetBackup 7.0.1 (cant upgrade without major change control)

Master is clustered on Windows 2003 R2 x64 Enterprise

Multiple Media Servers on same NBU and O/S level.

All use Advanced Disk with SLPs and duplicate to LTO4 libraries

The system has been working perfectly for months - 80+TB on a weekly full, incrementals every day, 7000+ jobs per day.

Everything runs really well and all jobs and duplications keep nicely up to date.

The majority of jobs use multiple streams.

During the last week we have had a position where some (65) parents jobs are left as running even though their child jobs have all completed successfully.

This is on a range of media servers and not just on one - 7 different ones last night.

Have done a bpdown, process cleanup, cleaned up all media severs so nothing has been left hung.

Also done a cluster failover

But this is s till happening each day.

These parent jobs will not cancel.

I am thinking along the lines of the pempersist file (which will have followed the cluster failover) but just wanted some ideas on where to look first on this one

Thanks for any help

1 ACCEPTED SOLUTION

Accepted Solutions

Mark_Solutions
Level 6
Partner Accredited Certified

All resolved now

All updated to 7.1.0.3 and the desktop heap increased to 2048 - resolution was either the desktop heap or both - but not just 7.1.0.3.

For others information, especially those who think their Master Server is high enough specification to cope with anything - please compare with the one in this case .....

Clustered Dedicated Master

Windows 2003 R2 Enterprise ED 64 bit

32GB RAM

Dual Quad Core Opteron 2.4GHz Processors

1TB drive for catalogs etc. etc.

Should have coped!!

The desktop heap is increased by editing the "Windows" key in

HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\

Part of this key has a section that reads similar to: Windows SharedSection=1024,20480,1024

The last of the three figures needed changing to increase the desktop heap so that it now reads:

Windows SharedSection=1024,20480,2048

Hope this helps someone in the future.

View solution in original post

7 REPLIES 7

Mark_Solutions
Level 6
Partner Accredited Certified

Oh well - no offers!

I have been digging and am looking at Anti Virus causing this - will update when i have confirmed

AAlmroth
Level 6
Partner Accredited

Well, difficult to say really. If it has been working and then all of a sudden the problems came, then one would have to suspect that there has been a change to the system.

From your description doesn't sound like something I have seen before really. You already mentioned pempersist, that is a viable option I think. You don't mention that the jobs are "Waiting for retry", so it is possible not that known case either.

You could be right about antivirus, and perhaps a firewall update? As the master is on Windows 2003, did anyone apply any Microsoft hotfixes about a week ago? Any changes to NIC configuration?

Does the admin log reveal anything about system calls?

I guess you could do the Windows way; reboot of all NBU servers... Hard to believe, but a reboot has helped me so many times for NBU. EV is even worse.

/A

 

watsons
Level 6

If there are SQL backup jobs, then this may be related: http://www.symantec.com/docs/TECH65687

But if it's just filesystem, I can't think of why except some changes in the environment, especially on the network side. Maybe a TCP registry change maybe needed to see if it's related to the timeout (for child to connect back to parent), but I really forgot which one it is.

pempersist is also a good point, but that would involve restarting the master server.. 

 

Mark_Solutions
Level 6
Partner Accredited Certified

Thanks All

No SQL backups, everything is straight file system.

We are working with the AV team to get everything excluded correctly and i have found many processes that were no excluded since the change to 64 bit severs

This should go through in the next 2 or 3 days (change control!!) so I will update this after the exclusions have been applied.

Mark_Solutions
Level 6
Partner Accredited Certified

I have been working on this and have had all AV exclusions set which seemed to improve things a little

I then found that cmd.exe and nbdelete.exe processes were crashing (application pop ups)

So searched again and what do you know .... a tech note came out yesterday for just this fault (well not in its title but exactly in what is actually happening) and is fixed in 7.1.0.3:

http://www.symantec.com/docs/TECH177597

Trouble is we are on 7.0.1

Case logged to see if we can get the 7.0.1 version of the fix - will update if we do

Mark_Solutions
Level 6
Partner Accredited Certified

Just an update on this one - which looks to be resolved.

There was no 7.0.1 EEB available so the system has been upgraded to 7.1.0.3.

The errors still persisted, although fewer of them and we now go a Desktop Heap Exhaution message in the event logs.

Yesterday I increase the desktop heap to 2048 from 1024 and we had no hung jobs overnight so will monitor this to see how it goes over the weekend - but it is looking good.

Mark_Solutions
Level 6
Partner Accredited Certified

All resolved now

All updated to 7.1.0.3 and the desktop heap increased to 2048 - resolution was either the desktop heap or both - but not just 7.1.0.3.

For others information, especially those who think their Master Server is high enough specification to cope with anything - please compare with the one in this case .....

Clustered Dedicated Master

Windows 2003 R2 Enterprise ED 64 bit

32GB RAM

Dual Quad Core Opteron 2.4GHz Processors

1TB drive for catalogs etc. etc.

Should have coped!!

The desktop heap is increased by editing the "Windows" key in

HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\

Part of this key has a section that reads similar to: Windows SharedSection=1024,20480,1024

The last of the three figures needed changing to increase the desktop heap so that it now reads:

Windows SharedSection=1024,20480,2048

Hope this helps someone in the future.