cancel
Showing results for 
Search instead for 
Did you mean: 

how to configure alerts in opscenter for queued jobs

Gabriel-A
Level 3

Hello everybody!

 

sporadically we have a problem with a SAP log backups filling the queue on a dedicated netbackup domain.

 

I am looking for information about configuring an alert in opscenter to monitor the number of queued jobs and send an email if that number is more then a configured value.

I did not find something relevant in the opscenter admin guide so I hope that somebody having experience with similar alert would step in and share.

 

thank you in advance!

 

9 REPLIES 9

quebek
Moderator
Moderator
   VIP    Certified
Hi,
Is it SAP Hana? I would sort this issue proactively then react later on.
If SAP Hana on that client there is one setting about timeouts... I will provide more info once back at home, read on Mon

gial
Level 2

I dont think there is a direct option.

I can think of couple of customized options : 

1) bpdbjobs -report |grep -i <client name you want to monitor> | grep -i <queued> | wc -l 

This will find number of queued jobs for the given client. You can schedule it through cron job at specified time . Or if you just want the alert when queued jobs threshold is crossed, then need to make shell script and send the alert. Let me know, i can help you.

2) Configure customized report in opscenter filtering queued jobs and send email , this is slightly out of sense though

Dear Quebek,

 

yes indeed, it is sap and we are experiencing serious problems with these queued jobs and unable to solve it for several months already even with the veritas support case we have.

your comments and suggestions would be more than welcome!

 

Dear Gial,

 

thank you for your response as well.

 

you are right, however we managed to create a report based on the results of the following command:

 

bpdbjobs.exe -ignore_parent_jobs -keep_hours 1 -summary -l

 

noting that the master server is a windows machine.

 

Thanks,

quebek
Moderator
Moderator
   VIP    Certified

Hello

You did not confirm this is for sap hana... If it is try to extend this property backint_response_timeout on SAP end from default 10 minutes to 60.

Also in order to do not overtake NBU resources by this policy in its properties set "limit jobs per policy" to 16 or less. This should not acquire all NBU resources - jobs will be just queued...

Hope this helps anyhow...

Hi Quebek,

 

thanks for your comment. that's a very good direction

Yes, this is sap hana. 

If i understand correctly the backint_response_timeout is the timeout that will cause a job to be canceled from the sap end if not completed. 

 

this leads to the following article https://www.veritas.com/support/en_US/article.100043963 wich describes quite exctly our situation, i assume that this timeout does not work in our environment as described in the article.

I will have to check with the sap colleagues.

reducing the max jobs per policy limit has a negative impact instead of positive because of the way nbjm and nbrb allocate resources for the job and the load they put on the nbdb as per the following:

When a NetBackup media server is writing to disk storage, it updates the master server about the available capacity in each disk storage pool.  After every 1 GB of data written, the media server sends the current capacity information to nbjm on the master server, and nbjm sends an RBdbUpdate message to nbrb, which triggers an update in NBDB.

article:  https://www.veritas.com/support/en_US/article.100025235

however your comment was very helpful!

thank you

 

quebek
Moderator
Moderator
   VIP    Certified

Hi

Well I can tell that in my env updating this property backint_response_timeout  to bigger value helped... We were facing some issues where NBU domain was busy - and it did not accept the sap hana logs - when it was able to do it (after 10 mins) backint on hana end was already done/closed, and jobs were just piling up and exhausting the max streams and evetnually it was timing out with EC 54 ... so before I learnt about backint_response_timeout to bump it up I limited the policy to 16 instances and all other jobs were taken in and written to MSDP ;)

and what is your current config for backint_response_timeout?

 

thanks a lot!

quebek
Moderator
Moderator
   VIP    Certified
Hi
I think we increased it to 1800 sec...