cancel
Showing results for 
Search instead for 
Did you mean: 

Need assistance on NetBackup 5.1

RK1982
Level 3

Hi All,

 

We are having NetBackup Master server running @5.1. Issue is when there are no active jobs in netbackup or server is idle for some time the next scheduled jobs are not starting immediately and taking more than 2 - 3 hours to update the job in activity monitor and to start writing the data it is taking more than 4 - 5 hours. 

But when there are jobs running and if we start any backup jobs those are appearing immediately in activity monitor and backups are getting successful

Backup services are working fine and cannot identify any errors in logs

Below are the services running 

NB Processes
------------
root 2178 1 0 Apr 06 ? 0:00 /bin/sh /usr/openv/netbackup/bin/admincmd/nbdbdmon --user=root
root 2190 2178 0 Apr 06 ? 0:00 /usr/openv/db/bin/nbdbd --basedir=/usr/openv/db --datadir=/usr/openv/db/var --p
root 11214 4581 0 12:01:10 ? 0:00 /usr/openv/netbackup/bin/bpsched -mainempty
root 1840 1 0 Apr 06 ? 129:34 /usr/openv/netbackup/bin/bpdbm
root 1820 1 0 Apr 06 ? 5:42 /usr/openv/netbackup/bin/bprd
root 2007 1840 0 Apr 06 ? 23:41 /usr/openv/netbackup/bin/bpjobd

 

I know this is very very old version, but any suggestions or help is really appreciated. 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
There is no nbdecommission for 5.1!
Config is all over the place before 6.x.
Master server has
Images
globdb
voldb
STUs

Each media server has own
mediadb
Device dbs.

So, when media server was removed without proper manual decommission, everything becomes a mess.
STUs are easy to delete.
To delete devices from globdb is a mess. Actually easier to delete or rename the file and recreate all device config.

Mediadb is the big problem. The 5.1 bpmedia command need source media server in order to transfer media ownership. User can try bpimage, but mediadb entries will still be missing and tapes cannot be correctly deassigned.

There were TNs back in the day, but have been removed. All references on Veritas is for post-6.x.

This is why it was so important to run nbcc when upgrading from 5.1.
Veritas/Symantec engineer would provide scrips to forge mediadb entries and all sorts of fixes before upgrade could be attempted.

I don't want to be in @RK1982's shoes!

View solution in original post

32 REPLIES 32

RK1982
Level 3

Just to update, even manual jobs are not populating in Activity Monitor 

Mouse
Moderator
Moderator
Partner    VIP    Accredited Certified

Ok, the Activity Monitor is just a GUI, what is happening in reality, when you run bpdbjobs -report - do you see jobs running?

Do you have the NBAR installed and which type of GUI are you using, Windows or Java?

Hi,

Nothing is reporting in bpdbjobs -report as well, We use java (jnbSA &) from command line

 

I'd suggest looking to see what is reported via the bperror command as a first place before cranking any logging levels up. 

stop netbackup, start netbackup, and note the time of start.  When the problem next happens or is noticed, run bperror -hoursago <number of hours from start up of NB to observed behaviour>. 

Alternatively, run bperror using  blocks of time (bperror -s <start time> -e <end time> and work your way back in smaller chunks to see if you can spot anything if you can't stop netbackup at this time. Make sure you send the results to a file to view in your fav. editor.

I'd also look at system logs to see if there's anything being reported there about resourse starvation. NB's scheduler has been very robust except for the issues around daylight savings and such for a long time, and 5.1's was pretty good, for jobs to not show would, on a *n*x system suggest you'd exhausted your available process ID's at a given time, but other issues should show up with that as well.

I don't know if you can post the results of the support script here, but if you can, and I've got cycles available when you do, I'll look at 'em. Feel free to anonymize the critical bits first.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

That bpsched -mainempty seems to ring a bell... will have to really dig deep to see if I can find something in my own archives...

In the meantime, please check for excessive timeouts - like media/slave and client connect timeouts. Return timeouts to defaults and stop netbackup. Confirm that all processes have stopped. Especially that bpsched process.
Default SLAVE_CONNECT_TIMEOUT is 30 seconds. Do not go higher than 90 seconds.
Default CLIENT_CONNECT_TIMEOUT is 300. Higher than 900 is not recommended.

Ensure bpsched log folder exists and increase logging slightly. Level 3 should be good.
Restart NBU.
See if scheduled jobs start normally. Or start a job manually. Monitor bpsched log to see how storage units are evaluated and if this is maybe where delays are taking place.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I have found something about 'bpsched -mainempty' : http://sunmanagers.cs.toronto.edu/1998/0756.html

This process runs at the end of the backup window to cleanup expired images. 
For some unknown reason, it seems that this process actually connects to clients! So, when clients are offline, this process will take a long time (until client connect timeout kicks in for offline client, and then moving on to next client...)

Evidence of this in the bpsched log in URL mentioned above: 
17:02:56 [792] <4> bpsched: INITIATING... 
17:02:56 [792] <2> logparams: /usr/openv/netbackup/bin/bpsched -mainempty 
17:02:56 [792] <4> main_bpsched: another bpsched is already main-sched 

Please also read about polling of media servers for STUs - http://computeranddata.com/doc/misc/netbackup.faq

 

Seem to remember something about <defunct> processes on Solaris, that could give something like the described.

Think you need to run ps -ef or something similar too these <defunct> processes.

Stupid question have you closed down all the java gui's on the master and started a new one ?  

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

quebek
Moderator
Moderator
   VIP    Certified

Hey

If there is no output in bpdbjobs - I would have killed bpjobd and start it back again... or restart whole NBU ...

Nicolai
Moderator
Moderator
Partner    VIP   

create directory bpsched in /usr/openv/netbackup/logs/

This will cause the bpscheduler process to log what is is doing.  Once you have a still stand zip the text file attached it to a post. Do not bulk post the debug text.

@Marianne Wasn't online cleaning of the image database implemented in NBU 4.5 FP 8 ?. As I re-call it bpsched -mainempty is the main bpsched process that will spawn sub bpsched processes - but I might be wrong :)

@RK1982 What revision of NBU 5.1 are you running, you can find it in /usr/openv/netbackup/version or /usr/openv/netbackup/bin/version

Update: There is the pack.summary as well to look for version of Netbackup.

/opt/openv/pack/pack.summary

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@Nicolai wrote:

 

@Marianne Wasn't online cleaning of the image database implemented in NBU 4.5 FP 8 ?. As I re-call it bpsched -mainempty is the main bpsched process that will spawn sub bpsched processes - but I might be wrong :)

 


@Nicolai, I honestly cannot remember that far back.
The problem with such an old version is that no documentation (manuals, technotes) is available anymore. 
Even my own 'archives' (old hard drives) are somewhere 'stored' at home.

Hopefully @RK1982 will post a bpsched log shortly.
the one thing that I liked about bpsched log is the fact that is a legacy log and therefore readable. 

Nicolai
Moderator
Moderator
Partner    VIP   

@Marianne You don't know how much information that is stored in the deep pits of my mind :)

Well - back to the topic. If my theory is right bpsched -mainempty should be started by root and other bpsched processes should have PPID of bpsched -mainenpty

@RK1982 Please do a ps -ef on the master server and let's us know the output from the command.

 

Hi All,

 

Thanks all for your reply. Please find my anwsers below and i have attached requested logs


Unfortunately i cannot stop netbackup now. But coming to schedule backups it will start if any bakcup job is active in netbackup. If not even schedule backups will take time of 4 - 5 hours display or start byte count

I have attached logs from 19th where i have seen error last time during 14:00 - 19:00 where none of the jobs ran or in queued state for long time.

Nothing has been changed since last 7 months and i hope i have replied to all your queries and in case if i miss any do let me know.

 

ThanksRK

Hi

 

CLIENT_CONNECT_TIMEOUT is only option set.

 

Thanks

RK

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

How many media servers/storage units do you have in this environment?

There are a LOT of media servers that are down/unavailable :

start_bptm: cannot connect to cavan
start_bptm: cannot connect to donegal
start_bptm: cannot connect to erpprd
start_bptm: cannot connect to gpsdvs32
start_bptm: cannot connect to homefolders.retail2u.trcg.co.uk
start_bptm: cannot connect to iowa
start_bptm: cannot connect to kildare
start_bptm: cannot connect to live100mcs.retail2u.trcg.co.uk
start_bptm: cannot connect to live200mcs.retail2u.trcg.co.uk
start_bptm: cannot connect to live400mcs.retail2u.trcg.co.uk
start_bptm: cannot connect to ohst110mcs.retail2u.trcg.co.uk
start_bptm: cannot connect to ohst210mcs.retail2u.trcg.co.uk
start_bptm: cannot connect to ohst310mcs.retail2u.trcg.co.uk
start_bptm: cannot connect to ohst410mcs.retail2u.trcg.co.uk
start_bptm: cannot connect to ohst610mcs.retail2u.trcg.co.uk
start_bptm: cannot connect to ohst630mcs.retail2u.trcg.co.uk

I actually stopped looking through bpsched log at this point, so, there could be more.

Do you know what is wrong with those media servers?
Can they be fixed and brought online?

If they are dead/permanently unavailable, please start by deleting the storage units.

You will need to look at today and yesterday's bpsched logs to try and see what all those 'bpsched -mainempty' processes are doing.

Nicolai
Moderator
Moderator
Partner    VIP   

I agree with Marianne.

Missing media server could aslo mean you have a lot of assigned tapes to media servers no longer responding, meaning Netbackup cannot re-use them unless expired and re-assigned manually.

The bpsched log - do the log cover a quiet time when Netbackup no longer start new backups ? f yes, at what time ?

 

Nicolai
Moderator
Moderator
Partner    VIP   

A quick fix to the missing media servers could be the use of the FORCE_RESTORE_MEDIA_SERVER directive in bp.conf.

Pleas see:

https://www.veritas.com/support/en_US/article.000005346

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I am surprised that we have not heard back from @RK1982.

If the Storage Units still exist for 'dead' media servers, then bpsched will still get stuck while trying to probe media servers for UP drives. 

Image expiration will also be problem because media cannot be deassigned.

Hi,

 

Sorry for delay in reply. I was in leave for last few days. Those servers are offline and am working on removing those media servers

 

Regards

RK

Hi,

 

Yes, The log covered entire time when the issue occurred. Issue happened between 14:00 - 20:00

 

Thanks

RK