I need an urgent Help.
ON friday,my environment was hung.Master server was hung and no jobs were triggered.All weeknd backups were skipped.No backups happened on the weekend.
ON monday,when we checked--no backup was there. A disaster.
We restarted all NBU services and then it wa snormal and jobs started.
I need a kind of script or rule so that I can able to know that No jobs are running on the master server(its hung) .
NBU : 8.1.1
You could always configure a "healthcheck" policy, schedule it to run once an hour, back up your hosts or version file or such, expire immediately (because you don't care about the data, just the success/failure of the job), and send it to the destination of your choice.
Then configure the monitoring solution of your choice (a script, OpsCenter, etc.) to see if the policy has run in the last 2 hours or such (to avoid issues due to peak backup windows, etc. ). If it hasn't, cut a ticket or otherwise alert someone.
This can even be expanded slightly to deliberately write to tape as a method of confirming that your tape library is functional too, if you don't have anything else keeping an eye on that.
You just have to watch out for scheduled maintenance windows or you'll end up with a few extra tickets. =)
I would bet that this is related to your nbpem cache filling up.
I've only seen this happen on Windows masters, but it isn't exclusive to them. You don't receive an alert because none of the processes stop, and the server is functioning as expected...with the obvious exception that none of your backups are running. It's very tricky, as jobs that were active when the cache fills up will continue running, making it appear as though everything is working fine.
There are quite a few documented issues with nbpem, and the below EEB is available (albeit for 8.1.2). Might be worth logging a case to check with Support to see if there is a patch for 8.1.1.