11-22-2018 07:02 AM
Hi All,
We are having NetBackup Master server running @5.1. Issue is when there are no active jobs in netbackup or server is idle for some time the next scheduled jobs are not starting immediately and taking more than 2 - 3 hours to update the job in activity monitor and to start writing the data it is taking more than 4 - 5 hours.
But when there are jobs running and if we start any backup jobs those are appearing immediately in activity monitor and backups are getting successful
Backup services are working fine and cannot identify any errors in logs
Below are the services running
NB Processes
------------
root 2178 1 0 Apr 06 ? 0:00 /bin/sh /usr/openv/netbackup/bin/admincmd/nbdbdmon --user=root
root 2190 2178 0 Apr 06 ? 0:00 /usr/openv/db/bin/nbdbd --basedir=/usr/openv/db --datadir=/usr/openv/db/var --p
root 11214 4581 0 12:01:10 ? 0:00 /usr/openv/netbackup/bin/bpsched -mainempty
root 1840 1 0 Apr 06 ? 129:34 /usr/openv/netbackup/bin/bpdbm
root 1820 1 0 Apr 06 ? 5:42 /usr/openv/netbackup/bin/bprd
root 2007 1840 0 Apr 06 ? 23:41 /usr/openv/netbackup/bin/bpjobd
I know this is very very old version, but any suggestions or help is really appreciated.
Solved! Go to Solution.
12-03-2018 02:00 AM
Hi,
19th 14:00 - 20:00
Regards
RK
12-03-2018 02:19 AM
You will see lots of errors like these during this period:
14:01:33.469 [15713] <16> start_bptm: cannot connect to live200mcs.retail2u.trcg.co.uk
14:01:33.469 [15713] <16> start_bptm: bpcd exit: cannot connect to server backup restore manager (205)
14:01:33.469 [15713] <16> get_stunits: get_num_avail_drives failed with stat 205
14:01:33.469 [15713] <16> log_in_errorDB: cannot connect to server live200mcs.retail2u.trcg.co.uk; marking storage unit LIVE200MCS_9840_SPEKE_DRIVE as unavailable
While bpsched -mainempty is running, you will notice entries like these:
19:00:12.918 [25231] <8> bpsched_main: another regular bpsched is already examining the policy configuration
19:00:12.918 [25231] <4> bpsched: scheduler exiting - regular bpsched is already running (214)
No backups will be submitted.
You really need to get rid of unavailable media servers.
Hopefully you know how to properly decommission media servers?
12-03-2018 06:09 AM - edited 12-04-2018 08:22 AM
Bpsched wakes up every 20 minutes to scheduele backup, is a previous run isn't completed, the called bpsched will exit. It is not uncommon during peak hours for bpsched to keep running for hours. Trouble getting resource status will for sure cause long running bpsched processes.
12-13-2018 03:36 AM
Hi Marianne,
Yes, i am working on removing decom media servers which will take time.
Regards
RK
12-13-2018 03:42 AM
12-13-2018 07:22 AM
I can echo the point - it is important to remove missing and inactive systems from the master bp.conf - I had a few people with PC included and when more than a few went off line I could see an immediate impact and delays.
12-13-2018 11:36 PM
Just removing names from bp.conf won't help.
As per my post of 2 weeks ago, we see how bpsched is trying to connect to STU media servers to count UP drives:
If the Storage Units still exist for 'dead' media servers, then bpsched will still get stuck while trying to probe media servers for UP drives.
Image expiration will also be problem because media cannot be deassigned.
12-14-2018 04:17 AM
12-14-2018 04:50 AM
12-14-2018 09:46 AM
12-20-2018 04:08 AM
Hi Marianne,
Thanks for your update.
Yes, i am getting it done. but it is gonna take long time :). But really not sure how to proceed with mediadb as none of the media servers are live now.
Thanks
RK
12-20-2018 04:28 AM
to alleviate your current situation (bpsched going into hung state while waiting for response), please delete unused Storage Units and delete unused names from master's bp.conf.
This will take you all of 5 minutes... maybe 10 if you want to copy out a list of existing config.
Changing media server ownership of tapes can be dealt with later.
Maybe ask in a new post when you have the time to deal with it.
12-20-2018 04:50 AM
Hi Marianne,
Sure will do that.
Thanks all for your support
Regards
RK