Forum Discussion

RK1982's avatar
RK1982
Level 3
7 years ago
Solved

Need assistance on NetBackup 5.1

Hi All,

 

We are having NetBackup Master server running @5.1. Issue is when there are no active jobs in netbackup or server is idle for some time the next scheduled jobs are not starting immediately and taking more than 2 - 3 hours to update the job in activity monitor and to start writing the data it is taking more than 4 - 5 hours. 

But when there are jobs running and if we start any backup jobs those are appearing immediately in activity monitor and backups are getting successful

Backup services are working fine and cannot identify any errors in logs

Below are the services running 

NB Processes
------------
root 2178 1 0 Apr 06 ? 0:00 /bin/sh /usr/openv/netbackup/bin/admincmd/nbdbdmon --user=root
root 2190 2178 0 Apr 06 ? 0:00 /usr/openv/db/bin/nbdbd --basedir=/usr/openv/db --datadir=/usr/openv/db/var --p
root 11214 4581 0 12:01:10 ? 0:00 /usr/openv/netbackup/bin/bpsched -mainempty
root 1840 1 0 Apr 06 ? 129:34 /usr/openv/netbackup/bin/bpdbm
root 1820 1 0 Apr 06 ? 5:42 /usr/openv/netbackup/bin/bprd
root 2007 1840 0 Apr 06 ? 23:41 /usr/openv/netbackup/bin/bpjobd

 

I know this is very very old version, but any suggestions or help is really appreciated. 

 

 

 

 

 

  • Marianne's avatar
    Marianne
    7 years ago
    There is no nbdecommission for 5.1!
    Config is all over the place before 6.x.
    Master server has
    Images
    globdb
    voldb
    STUs

    Each media server has own
    mediadb
    Device dbs.

    So, when media server was removed without proper manual decommission, everything becomes a mess.
    STUs are easy to delete.
    To delete devices from globdb is a mess. Actually easier to delete or rename the file and recreate all device config.

    Mediadb is the big problem. The 5.1 bpmedia command need source media server in order to transfer media ownership. User can try bpimage, but mediadb entries will still be missing and tapes cannot be correctly deassigned.

    There were TNs back in the day, but have been removed. All references on Veritas is for post-6.x.

    This is why it was so important to run nbcc when upgrading from 5.1.
    Veritas/Symantec engineer would provide scrips to forge mediadb entries and all sorts of fixes before upgrade could be attempted.

    I don't want to be in RK1982's shoes!
  • Just to update, even manual jobs are not populating in Activity Monitor 

  • Ok, the Activity Monitor is just a GUI, what is happening in reality, when you run bpdbjobs -report - do you see jobs running?

    Do you have the NBAR installed and which type of GUI are you using, Windows or Java?

    • RK1982's avatar
      RK1982
      Level 3

      Hi,

      Nothing is reporting in bpdbjobs -report as well, We use java (jnbSA &) from command line

       

      • Thoth's avatar
        Thoth
        Level 3

        I'd suggest looking to see what is reported via the bperror command as a first place before cranking any logging levels up. 

        stop netbackup, start netbackup, and note the time of start.  When the problem next happens or is noticed, run bperror -hoursago <number of hours from start up of NB to observed behaviour>. 

        Alternatively, run bperror using  blocks of time (bperror -s <start time> -e <end time> and work your way back in smaller chunks to see if you can spot anything if you can't stop netbackup at this time. Make sure you send the results to a file to view in your fav. editor.

        I'd also look at system logs to see if there's anything being reported there about resourse starvation. NB's scheduler has been very robust except for the issues around daylight savings and such for a long time, and 5.1's was pretty good, for jobs to not show would, on a *n*x system suggest you'd exhausted your available process ID's at a given time, but other issues should show up with that as well.

        I don't know if you can post the results of the support script here, but if you can, and I've got cycles available when you do, I'll look at 'em. Feel free to anonymize the critical bits first.

  • That bpsched -mainempty seems to ring a bell... will have to really dig deep to see if I can find something in my own archives...

    In the meantime, please check for excessive timeouts - like media/slave and client connect timeouts. Return timeouts to defaults and stop netbackup. Confirm that all processes have stopped. Especially that bpsched process.
    Default SLAVE_CONNECT_TIMEOUT is 30 seconds. Do not go higher than 90 seconds.
    Default CLIENT_CONNECT_TIMEOUT is 300. Higher than 900 is not recommended.

    Ensure bpsched log folder exists and increase logging slightly. Level 3 should be good.
    Restart NBU.
    See if scheduled jobs start normally. Or start a job manually. Monitor bpsched log to see how storage units are evaluated and if this is maybe where delays are taking place.

    • Marianne's avatar
      Marianne
      Level 6

      I have found something about 'bpsched -mainempty' : http://sunmanagers.cs.toronto.edu/1998/0756.html

      This process runs at the end of the backup window to cleanup expired images. 
      For some unknown reason, it seems that this process actually connects to clients! So, when clients are offline, this process will take a long time (until client connect timeout kicks in for offline client, and then moving on to next client...)

      Evidence of this in the bpsched log in URL mentioned above: 
      17:02:56 [792] <4> bpsched: INITIATING... 
      17:02:56 [792] <2> logparams: /usr/openv/netbackup/bin/bpsched -mainempty 
      17:02:56 [792] <4> main_bpsched: another bpsched is already main-sched 

      Please also read about polling of media servers for STUs - http://computeranddata.com/doc/misc/netbackup.faq

       

      • Michael_G_Ander's avatar
        Michael_G_Ander
        Level 6

        Seem to remember something about <defunct> processes on Solaris, that could give something like the described.

        Think you need to run ps -ef or something similar too these <defunct> processes.

        Stupid question have you closed down all the java gui's on the master and started a new one ?  

    • RK1982's avatar
      RK1982
      Level 3

      Hi

       

      CLIENT_CONNECT_TIMEOUT is only option set.

       

      Thanks

      RK

      • Marianne's avatar
        Marianne
        Level 6

        How many media servers/storage units do you have in this environment?

        There are a LOT of media servers that are down/unavailable :

        start_bptm: cannot connect to cavan
        start_bptm: cannot connect to donegal
        start_bptm: cannot connect to erpprd
        start_bptm: cannot connect to gpsdvs32
        start_bptm: cannot connect to homefolders.retail2u.trcg.co.uk
        start_bptm: cannot connect to iowa
        start_bptm: cannot connect to kildare
        start_bptm: cannot connect to live100mcs.retail2u.trcg.co.uk
        start_bptm: cannot connect to live200mcs.retail2u.trcg.co.uk
        start_bptm: cannot connect to live400mcs.retail2u.trcg.co.uk
        start_bptm: cannot connect to ohst110mcs.retail2u.trcg.co.uk
        start_bptm: cannot connect to ohst210mcs.retail2u.trcg.co.uk
        start_bptm: cannot connect to ohst310mcs.retail2u.trcg.co.uk
        start_bptm: cannot connect to ohst410mcs.retail2u.trcg.co.uk
        start_bptm: cannot connect to ohst610mcs.retail2u.trcg.co.uk
        start_bptm: cannot connect to ohst630mcs.retail2u.trcg.co.uk

        I actually stopped looking through bpsched log at this point, so, there could be more.

        Do you know what is wrong with those media servers?
        Can they be fixed and brought online?

        If they are dead/permanently unavailable, please start by deleting the storage units.

        You will need to look at today and yesterday's bpsched logs to try and see what all those 'bpsched -mainempty' processes are doing.

  • Hey

    If there is no output in bpdbjobs - I would have killed bpjobd and start it back again... or restart whole NBU ...

  • create directory bpsched in /usr/openv/netbackup/logs/

    This will cause the bpscheduler process to log what is is doing.  Once you have a still stand zip the text file attached it to a post. Do not bulk post the debug text.

    Marianne Wasn't online cleaning of the image database implemented in NBU 4.5 FP 8 ?. As I re-call it bpsched -mainempty is the main bpsched process that will spawn sub bpsched processes - but I might be wrong :-)

    RK1982 What revision of NBU 5.1 are you running, you can find it in /usr/openv/netbackup/version or /usr/openv/netbackup/bin/version

    Update: There is the pack.summary as well to look for version of Netbackup.

    /opt/openv/pack/pack.summary

    • Marianne's avatar
      Marianne
      Level 6

      Nicolai wrote:

       

      Marianne Wasn't online cleaning of the image database implemented in NBU 4.5 FP 8 ?. As I re-call it bpsched -mainempty is the main bpsched process that will spawn sub bpsched processes - but I might be wrong :-)

       


      Nicolai, I honestly cannot remember that far back.
      The problem with such an old version is that no documentation (manuals, technotes) is available anymore. 
      Even my own 'archives' (old hard drives) are somewhere 'stored' at home.

      Hopefully RK1982 will post a bpsched log shortly.
      the one thing that I liked about bpsched log is the fact that is a legacy log and therefore readable. 

      • Nicolai's avatar
        Nicolai
        Moderator

        Marianne You don't know how much information that is stored in the deep pits of my mind :-)

        Well - back to the topic. If my theory is right bpsched -mainempty should be started by root and other bpsched processes should have PPID of bpsched -mainenpty

        RK1982 Please do a ps -ef on the master server and let's us know the output from the command.