cancel
Showing results for 
Search instead for 
Did you mean: 

Need assistance on NetBackup 5.1

RK1982
Level 3

Hi All,

 

We are having NetBackup Master server running @5.1. Issue is when there are no active jobs in netbackup or server is idle for some time the next scheduled jobs are not starting immediately and taking more than 2 - 3 hours to update the job in activity monitor and to start writing the data it is taking more than 4 - 5 hours. 

But when there are jobs running and if we start any backup jobs those are appearing immediately in activity monitor and backups are getting successful

Backup services are working fine and cannot identify any errors in logs

Below are the services running 

NB Processes
------------
root 2178 1 0 Apr 06 ? 0:00 /bin/sh /usr/openv/netbackup/bin/admincmd/nbdbdmon --user=root
root 2190 2178 0 Apr 06 ? 0:00 /usr/openv/db/bin/nbdbd --basedir=/usr/openv/db --datadir=/usr/openv/db/var --p
root 11214 4581 0 12:01:10 ? 0:00 /usr/openv/netbackup/bin/bpsched -mainempty
root 1840 1 0 Apr 06 ? 129:34 /usr/openv/netbackup/bin/bpdbm
root 1820 1 0 Apr 06 ? 5:42 /usr/openv/netbackup/bin/bprd
root 2007 1840 0 Apr 06 ? 23:41 /usr/openv/netbackup/bin/bpjobd

 

I know this is very very old version, but any suggestions or help is really appreciated. 

 

 

 

 

 

32 REPLIES 32

Hi,

 

19th 14:00 - 20:00

 

Regards

RK

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

You will see lots of errors like these during this period:

14:01:33.469 [15713] <16> start_bptm: cannot connect to live200mcs.retail2u.trcg.co.uk
14:01:33.469 [15713] <16> start_bptm: bpcd exit: cannot connect to server backup restore manager (205)
14:01:33.469 [15713] <16> get_stunits: get_num_avail_drives failed with stat 205
14:01:33.469 [15713] <16> log_in_errorDB: cannot connect to server live200mcs.retail2u.trcg.co.uk; marking storage unit LIVE200MCS_9840_SPEKE_DRIVE as unavailable

While bpsched -mainempty is running, you will notice entries like these:

19:00:12.918 [25231] <8> bpsched_main: another regular bpsched is already examining the policy configuration
19:00:12.918 [25231] <4> bpsched: scheduler exiting - regular bpsched is already running (214)

No backups will be submitted. 

You really need to get rid of unavailable media servers.

Hopefully you know how to properly decommission media servers?

Nicolai
Moderator
Moderator
Partner    VIP   

Bpsched wakes up every 20 minutes to scheduele backup, is a previous run isn't completed, the called bpsched will exit. It is not uncommon during peak hours for bpsched to keep running for hours. Trouble getting resource status will for sure cause long running bpsched processes.

Hi Marianne,

 

Yes, i am working on removing decom media servers which will take time.

 

Regards

RK

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
@RK1982
All about priorities, right?

Genericus
Moderator
Moderator
   VIP   

I can echo the point - it is important to remove missing and inactive systems from the master bp.conf - I had a few people with PC included and when more than a few went off line I could see an immediate impact and delays.

 

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@Genericus

Just removing names from bp.conf won't help.
As per my post of 2 weeks ago, we see how bpsched is trying to connect to STU media servers to count UP drives:

If the Storage Units still exist for 'dead' media servers, then bpsched will still get stuck while trying to probe media servers for UP drives. 

Image expiration will also be problem because media cannot be deassigned.

Genericus
Moderator
Moderator
   VIP   

 Good point regarding media servers - suggest using the nbdecommision to find all the links and clear them!

 

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
There is no nbdecommission for 5.1!
Config is all over the place before 6.x.
Master server has
Images
globdb
voldb
STUs

Each media server has own
mediadb
Device dbs.

So, when media server was removed without proper manual decommission, everything becomes a mess.
STUs are easy to delete.
To delete devices from globdb is a mess. Actually easier to delete or rename the file and recreate all device config.

Mediadb is the big problem. The 5.1 bpmedia command need source media server in order to transfer media ownership. User can try bpimage, but mediadb entries will still be missing and tapes cannot be correctly deassigned.

There were TNs back in the day, but have been removed. All references on Veritas is for post-6.x.

This is why it was so important to run nbcc when upgrading from 5.1.
Veritas/Symantec engineer would provide scrips to forge mediadb entries and all sorts of fixes before upgrade could be attempted.

I don't want to be in @RK1982's shoes!

Genericus
Moderator
Moderator
   VIP   

LOL, Marianne, "There is no nbdecommission for 5.1! "

Queue chorus - "Please upgrade!"

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS

Hi Marianne,

Thanks for your update.

Yes, i am getting it done. but it is gonna take long time :). But really not sure how to proceed with mediadb as none of the media servers are live now.

Thanks

RK

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@RK1982

to alleviate your current situation (bpsched going into hung state while waiting for response), please delete unused Storage Units and delete unused names from master's bp.conf. 
This will take you all of 5 minutes... maybe 10 if you want to copy out a list of existing config. 

Changing media server ownership of tapes can be dealt with later.
Maybe ask in a new post when you have the time to deal with it.

Hi Marianne,

 

Sure will do that.

 

Thanks all for your support

 

Regards

RK