sporadically we have a problem with a SAP log backups filling the queue on a dedicated netbackup domain.
I am looking for information about configuring an alert in opscenter to monitor the number of queued jobs and send an email if that number is more then a configured value.
I did not find something relevant in the opscenter admin guide so I hope that somebody having experience with similar alert would step in and share.
thank you in advance!
I dont think there is a direct option.
I can think of couple of customized options :
1) bpdbjobs -report |grep -i <client name you want to monitor> | grep -i <queued> | wc -l
This will find number of queued jobs for the given client. You can schedule it through cron job at specified time . Or if you just want the alert when queued jobs threshold is crossed, then need to make shell script and send the alert. Let me know, i can help you.
2) Configure customized report in opscenter filtering queued jobs and send email , this is slightly out of sense though
yes indeed, it is sap and we are experiencing serious problems with these queued jobs and unable to solve it for several months already even with the veritas support case we have.
your comments and suggestions would be more than welcome!
thank you for your response as well.
you are right, however we managed to create a report based on the results of the following command:
bpdbjobs.exe -ignore_parent_jobs -keep_hours 1 -summary -l
noting that the master server is a windows machine.
You did not confirm this is for sap hana... If it is try to extend this property backint_response_timeout on SAP end from default 10 minutes to 60.
Also in order to do not overtake NBU resources by this policy in its properties set "limit jobs per policy" to 16 or less. This should not acquire all NBU resources - jobs will be just queued...
Hope this helps anyhow...
thanks for your comment. that's a very good direction
Yes, this is sap hana.
If i understand correctly the backint_response_timeout is the timeout that will cause a job to be canceled from the sap end if not completed.
this leads to the following article https://www.veritas.com/support/en_US/article.100043963 wich describes quite exctly our situation, i assume that this timeout does not work in our environment as described in the article.
I will have to check with the sap colleagues.
reducing the max jobs per policy limit has a negative impact instead of positive because of the way nbjm and nbrb allocate resources for the job and the load they put on the nbdb as per the following:
When a NetBackup media server is writing to disk storage, it updates the master server about the available capacity in each disk storage pool. After every 1 GB of data written, the media server sends the current capacity information to nbjm on the master server, and nbjm sends an RBdbUpdate message to nbrb, which triggers an update in NBDB.
however your comment was very helpful!
Well I can tell that in my env updating this property backint_response_timeout to bigger value helped... We were facing some issues where NBU domain was busy - and it did not accept the sap hana logs - when it was able to do it (after 10 mins) backint on hana end was already done/closed, and jobs were just piling up and exhausting the max streams and evetnually it was timing out with EC 54 ... so before I learnt about backint_response_timeout to bump it up I limited the policy to 16 instances and all other jobs were taken in and written to MSDP ;)