04-22-2020 08:07 AM
I need an urgent Help.
ON friday,my environment was hung.Master server was hung and no jobs were triggered.All weeknd backups were skipped.No backups happened on the weekend.
ON monday,when we checked--no backup was there. A disaster.
We restarted all NBU services and then it wa snormal and jobs started.
I need a kind of script or rule so that I can able to know that No jobs are running on the master server(its hung) .
NBU : 8.1.1
04-22-2020 08:16 AM
Perhaps list the backups that have ran in the last X number of hours, then send an email to you if the number of jobs is less than Y jobs?
04-22-2020 08:18 AM
Opscenter also has this alert that may help:
|Master Server Unreachable||An alert is generated when OpsCenter loses contact with the master server.|
04-22-2020 08:30 AM
Thanks.But no such errorrs have been received.
04-22-2020 08:32 AM
You would need to set it up.
04-23-2020 11:09 AM
You could always configure a "healthcheck" policy, schedule it to run once an hour, back up your hosts or version file or such, expire immediately (because you don't care about the data, just the success/failure of the job), and send it to the destination of your choice.
Then configure the monitoring solution of your choice (a script, OpsCenter, etc.) to see if the policy has run in the last 2 hours or such (to avoid issues due to peak backup windows, etc. ). If it hasn't, cut a ticket or otherwise alert someone.
This can even be expanded slightly to deliberately write to tape as a method of confirming that your tape library is functional too, if you don't have anything else keeping an eye on that.
You just have to watch out for scheduled maintenance windows or you'll end up with a few extra tickets. =)
04-23-2020 06:14 PM
04-24-2020 04:10 AM
Hi John Nardello1,
04-24-2020 11:39 AM
I think the healthcheck process would be unique for each organization.
05-05-2020 12:54 PM
I would bet that this is related to your nbpem cache filling up.
I've only seen this happen on Windows masters, but it isn't exclusive to them. You don't receive an alert because none of the processes stop, and the server is functioning as expected...with the obvious exception that none of your backups are running. It's very tricky, as jobs that were active when the cache fills up will continue running, making it appear as though everything is working fine.
There are quite a few documented issues with nbpem, and the below EEB is available (albeit for 8.1.2). Might be worth logging a case to check with Support to see if there is a patch for 8.1.1.