cancel
Showing results for 
Search instead for 
Did you mean: 

Netbackup 7.7.3 bpdbm high cpu on windows master

param_2017
Level 2

Hi All,

we have been seeing many bpdbm processes are running even when there are no jobs running and eating up CPU of master and it always touching 95-100 % and we are not able to open policies and other consoles due to this ..

any inputs/suggestions please.

 

6 REPLIES 6

Marianne
Level 6
Partner    VIP    Accredited Certified

Do you have bpdbm logging enabled?

You should be able to locate the PID for each bpdbm process in the log in square brackets [].
This will show what each of the processes are doing.
Logging level should be at least level 2 for bpdbm.
If the log folder does not exist, restart NBU after creating it. 

I have seen how 'Browse for Restore' creates high bpdbm activity.
There will be more or less same amount of bprd processes if one or more 'browse' actions are running.

so far we have enabled all logging levels fro bpdbm,bprd and other processes as well. but no luck from veritas yet

Marianne
Level 6
Partner    VIP    Accredited Certified

This is just me... I have a serious problem with Support always asking for highest logging level. This will put a lot more strain on already burdended system resources. 
IMHO, 99% of issues can be solved with logging levels of between 0 and 3.

bpdbm logging level 0 ..do we have to change to 2.

mph999
Level 6
Employee Accredited

I always fall out with Marianne ;0) over logging levels.

If we have some odd issue, like this one where we have no idea what we are looking for, or why the issue is happening, if we go any less that 5, we may not see what we need.  Say we went for 3, we will see none of the debug 4 and 5 level lines that might be the clue to the answer.  So, it is a gamble - do we go <5 to keep log size down, but find out we need to go higher and have to collect the logs again.

Collecting logs multiple times, does not go down well.

Certainly, log level 0 is virtually useless, so yes, it will have to be increased.  

For this issue, at least at the moment it seems perhaps only one log is needed, so it is not too much trouble to go back and collect again if you decide to set the log level below 5 and it is then discovered it needs to be hgher.  Personally, I would suggest 5, but the services will probably need to be restarted to pick up the new level.

My suggestion: 

Stop NBU and delete the current bpdbm logs.  Increase log level for bpdbm to 5.

Restart services, and wait for the processes to start up - once they are establised and CPU is increasing, colloect the log.

It certainly soesn;t sound right, and so there is a possibility that not all the answer will be in the logs (we can only log things we know, abnormal behavior doesn't get logged, because it's not meant to happen and we doen't know aboout it) - this is when we enter the world of having to collect truss/ strace / proc_mon on the PID of the process, or get Engineering involved for a custom EEB that in this case would replace the bpdbm binary with a custom one that has incresed logging.

If you have a call logged, and they haven't asked for the following, I would get this collected, sent in and then update them to advise you have done this.

I would recommend:

bpdbm log at 5 (which I believe I have justified)

bpps output taken at the time the log is set high

nbsu -c -t output

proncmon output for the PIDs of the 'runaway' process would be very useful (might be essential, not sure) but it's not something easily explained useless you are familiar with it - TBH, it's best done over Webex if you are unsure.  It is one of the sysinternal packages and has to be installed.  It shows the interaction of the process with the kernal - or if you like, what the process is doing in depth.  The problem is that the output can be very diifcult to understand and may require BL/ Engineering or on some occassions, Microsoft to understand what is going on.

Oter vital information is understanding the history - it is not very likely this issue just happened, the OS / NBU etc ... does the same thing day in, day out.  If the behavior changes, it's because somewhere something has changed.  That could be with NBU settings, OS settings, OS patches, other 3rd party software on the box being updated or installed - who knows ...  but anything at all that has happened recently is important to know.

Hopefully, it will be a known issue, like this one ...

http://www.veritas.com/docs/000126424

 

Marianne
Level 6
Partner    VIP    Accredited Certified
I love it when a known issue is REALLY a known issue, i.e. a TN and a solution.

I have personally run into issues after an upgrade.... investigated every single TN with similar status code. Logged a call on day 2. Immediately heard from Support 'this is a known issue '!!!
No TN, and nothing in LBN. Just wasted 2 days of my life...