cancel
Showing results for 
Search instead for 
Did you mean: 

Netbackup 5.1 bpsched issue on HP-UX

etmsreec
Level 3

Hi,

Newbie here, so apologies for any errors in formatting or etiquette.

We have a Netbackup installation running v5.1MP6 on HP-UX.  Yes, it's old, yes we'd like to retire it or upgrade.

It seems to have stopped scheduling jobs all for itself?  It's been reliable and happy up to just before the holidays, but now the scheduling of backups seems to have started to fail.

We have restarted NBU a couple of times, which doesn't seem to have made any difference.

We are seeing the following message in the bpsched log file (once I'd created the directory for it to log bpsched to):

10:21:30.620 [16340] <2> bpsched: INITIATING (verbose=0) ...
10:21:30.621 [16340] <2> bpsched: BPRD PPID = 20591
10:21:30.621 [16340] <2> logparams: /usr/openv/netbackup/bin/bpsched -ppid 20591
10:21:30.621 [16340] <2> bpsched_main: wait_on_que=0, timeout_in_que=36000, reread_interval=300,queue_on_error=0, bptm_query_timeout=480, s_m 0
10:21:30.871 [16340] <2> LOCAL CLASS_ATT_DEFS: Product ID = 6
10:21:31.109 [16340] <8> bpsched_main: another regular bpsched is already examining the policy configuration
10:21:31.110 [16340] <4> bpsched: scheduler exiting - regular bpsched is already running (214)
10:31:30.249 [18497] <2> bpsched: INITIATING (verbose=0) ...
10:31:30.249 [18497] <2> bpsched: BPRD PPID = 20591
10:31:30.249 [18497] <2> logparams: /usr/openv/netbackup/bin/bpsched -ppid 20591
10:31:30.249 [18497] <2> bpsched_main: wait_on_que=0, timeout_in_que=36000, reread_interval=300,queue_on_error=0, bptm_query_timeout=480, s_m 0
10:31:30.482 [18497] <2> LOCAL CLASS_ATT_DEFS: Product ID = 6
10:31:30.714 [18497] <8> bpsched_main: another regular bpsched is already examining the policy configuration
10:31:30.715 [18497] <4> bpsched: scheduler exiting - regular bpsched is already running (214)
10:41:30.389 [20828] <2> bpsched: INITIATING (verbose=0) ...
10:41:30.389 [20828] <2> bpsched: BPRD PPID = 20591
10:41:30.389 [20828] <2> logparams: /usr/openv/netbackup/bin/bpsched -ppid 20591

Even without the "another regular bpsched is already examining..." it still didn't seem to want to schedule backups for itself?

Any thoughts please?  We haven't just rebooted the server, as why would we have to and why would that make any difference?  (Typical sys admin response! :) )

Thanks

23 REPLIES 23

Nicolai
Moderator
Moderator
Partner    VIP   

hi @etmsreec 

I suspect memory is the issues. With a 86% memory utilization, processes may not be able to allocate additional memory while running, and that typical result in hung processes, and that is what you are seeing with bpsched.

/Nicolai

etmsreec
Level 3

Hi,

This may be considered heresy, but I wondered what would happen if I killed off the bpsched process.  Sure enough, the backups that I'd been hoping for kicked off at the next wakeup interval.  I'm not sure if it was a Good Thing or a Bad Thing, as we now have several of them.  The good news is that backups did kick off at the next wakeup interval.  Should the extra bpsched -mainempty processes be there when backups are running?

We are getting messages in the bpsched log file saying get_stunit_primary_hname and telling me that clients are NOT a valid stunit.  Makes sense that they are not storage units as they are clients, but that only started after the bpsched was killed off.

We also have messages like "run_any_ret_level(3) returned -3(Something maxed out)".

:o(

 

 

Nicolai
Moderator
Moderator
Partner    VIP   

hi @etmsreec 

bpsched -mainempty is always running, if memory serves me right, it is the one that spawns a sub process every 10 min.

Netbackup does a lot of false positive testing,  meaning it looks like errors, but are supposed to be so.

If in doubt looks at the severity rating <2> <4> information messages. <8> errors and <16> catastrophic.

/Nicolai

etmsreec
Level 3

Hi,

Apologies for the delay.

To close this out, I've put a kill script (file attached) into cron which just checks for a bpsched once an hour and kills it off if it's there.  It seems to work, but may not be the cleanest.  Give me a DCL script and I'd be much happier!

Thanks for the help and guidance on this one.

Steve