09-02-2021 11:19 PM
Hello everybody!
sporadically we have a problem with a SAP log backups filling the queue on a dedicated netbackup domain.
I am looking for information about configuring an alert in opscenter to monitor the number of queued jobs and send an email if that number is more then a configured value.
I did not find something relevant in the opscenter admin guide so I hope that somebody having experience with similar alert would step in and share.
thank you in advance!
09-03-2021 11:48 AM
09-07-2021 03:02 AM
I dont think there is a direct option.
I can think of couple of customized options :
1) bpdbjobs -report |grep -i <client name you want to monitor> | grep -i <queued> | wc -l
This will find number of queued jobs for the given client. You can schedule it through cron job at specified time . Or if you just want the alert when queued jobs threshold is crossed, then need to make shell script and send the alert. Let me know, i can help you.
2) Configure customized report in opscenter filtering queued jobs and send email , this is slightly out of sense though
09-29-2021 01:12 AM
Dear Quebek,
yes indeed, it is sap and we are experiencing serious problems with these queued jobs and unable to solve it for several months already even with the veritas support case we have.
your comments and suggestions would be more than welcome!
09-29-2021 01:17 AM
Dear Gial,
thank you for your response as well.
you are right, however we managed to create a report based on the results of the following command:
bpdbjobs.exe -ignore_parent_jobs -keep_hours 1 -summary -l
noting that the master server is a windows machine.
Thanks,
09-30-2021 12:24 AM
Hello
You did not confirm this is for sap hana... If it is try to extend this property backint_response_timeout on SAP end from default 10 minutes to 60.
Also in order to do not overtake NBU resources by this policy in its properties set "limit jobs per policy" to 16 or less. This should not acquire all NBU resources - jobs will be just queued...
Hope this helps anyhow...
09-30-2021 03:49 AM
Hi Quebek,
thanks for your comment. that's a very good direction
Yes, this is sap hana.
If i understand correctly the backint_response_timeout is the timeout that will cause a job to be canceled from the sap end if not completed.
this leads to the following article https://www.veritas.com/support/en_US/article.100043963 wich describes quite exctly our situation, i assume that this timeout does not work in our environment as described in the article.
I will have to check with the sap colleagues.
reducing the max jobs per policy limit has a negative impact instead of positive because of the way nbjm and nbrb allocate resources for the job and the load they put on the nbdb as per the following:
When a NetBackup media server is writing to disk storage, it updates the master server about the available capacity in each disk storage pool. After every 1 GB of data written, the media server sends the current capacity information to nbjm on the master server, and nbjm sends an RBdbUpdate message to nbrb, which triggers an update in NBDB.
article: https://www.veritas.com/support/en_US/article.100025235
however your comment was very helpful!
thank you
09-30-2021 05:31 AM
Hi
Well I can tell that in my env updating this property backint_response_timeout to bigger value helped... We were facing some issues where NBU domain was busy - and it did not accept the sap hana logs - when it was able to do it (after 10 mins) backint on hana end was already done/closed, and jobs were just piling up and exhausting the max streams and evetnually it was timing out with EC 54 ... so before I learnt about backint_response_timeout to bump it up I limited the policy to 16 instances and all other jobs were taken in and written to MSDP ;)
09-30-2021 06:44 AM
and what is your current config for backint_response_timeout?
thanks a lot!
09-30-2021 07:03 AM