Forum Discussion

banjo67xxx's avatar
banjo67xxx
Level 4
11 years ago
Solved

Not all drives back in use after library failure?

Hi,

My StorageTek SL500 lost 2 PSUs and started telling me that it only had 2 drives at invalid addresses instead of 4 drives. Our remote team went into the data centre and shuffled the PSUs so we are now running with 3 ot of 4 PSUs and all 4 tape drives show online with valid addresses.

In NBU I gave up on the Device Monitor section of the GUI, and instead ran the commands:

/usr/openv/volmgr/bin/vmoprcmd -up 0

/usr/openv/volmgr/bin/vmoprcmd -up 1

/usr/openv/volmgr/bin/vmoprcmd -up 2

/usr/openv/volmgr/bin/vmoprcmd -up 3

and now it looks like everything is OK. I've started a few manual backups using at least 4 different volume pools, and then monitored the drives with:

/usr/openv/volmgr/bin/vmoprcmd -d
 
                                PENDING REQUESTS
 
                                     <NONE>
 
                                  DRIVE STATUS
 
Drv Type   Control  User      Label  RecMID  ExtMID  Ready   Wr.Enbl.  ReqId
  0 hcart    TLD                -                     No       -         0  
  0 hcart    TLD                -                     No       -         0  
  1 hcart    TLD               Yes   DUB050  DUB050   Yes     Yes        0  
  1 hcart    TLD               Yes   DUB050  DUB050   Yes     Yes        0  
  2 hcart    TLD               Yes   DUB055  DUB055   Yes     Yes        0  
  2 hcart    TLD               Yes   DUB055  DUB055   Yes     Yes        0  
  3 hcart    TLD               Yes   DUB060  DUB060   Yes     Yes        0  
  3 hcart    TLD               Yes   DUB060  DUB060   Yes     Yes        0  
 
                             ADDITIONAL DRIVE STATUS
 
Drv DriveName            Shared    Assigned        Comment                   
  0 HP.ULTRIUM4-SCSI.002  Yes      -                                         
  0 HP.ULTRIUM4-SCSI.002  Yes      -                                         
  1 HP.ULTRIUM4-SCSI.003  Yes      nbu-master01                           
  1 HP.ULTRIUM4-SCSI.003  Yes      nbu-master01                          
  2 HP.ULTRIUM4-SCSI.001  Yes      nbu-master01                           
  2 HP.ULTRIUM4-SCSI.001  Yes      nbu-master01                           
  3 HP.ULTRIUM4-SCSI.000  Yes      nbu-master01                           
  3 HP.ULTRIUM4-SCSI.000  Yes      nbu-master01        
However, regardless of what backups run and which volume pools they request tapes from, I notice that drive 0 never gets used.
 
What else could be causing the drive not to be selected for a job?
 
I don't know if this is relevant: This NBU master which was setup by my predecessor is displaying a "state detail" which I've not seen on the NBU master which I configured myself. On the master I configured jobs queue without state detail for a free tape drive, where as this NBU master is showing "Drive scan host is not active (nbu-master01-hcart-robot-tld-0)". Is this significant?
 
  • A reboot would fix issues at OS level.

    The only way to know why NBU would not/could not use the drive is to look at logs. If logs did not exist at the time that the issue was experienced, there is unfortunately no way to tell.
    /var/adm/messages may contain some device info/errors.

    For future troubleshooting, enable logging as follows:

    Add VERBOSE entry to /usr/openv/volmgr/vm.conf, then restart ltid.
    Additional media manager/device info will now be logged to /var/adm/messages

    Create /usr/openv/netbackup/logs/bptm folder to log NBU I/O activity.

     

9 Replies

  • what is the queue job detail status showing...?

     

    give us the output of below commands

    vmoprcmd     (without any switches/options)

    nbrbutil -dump | grep -i  HP.ULTRIUM4-SCSI.002

  • That's very interesting ... nbrbutil -dump has no record of that tape drive!!! (I checked the command syntax by looking for the other drives)

    So, I re-ran the "Configure Storage Device" wizard, and it scanned found all 4 drives, updated the master, and I looked again. Still not showing in the nbrbutil output!

    FYI, here's the output from vmoprcmd ...

    /usr/openv/volmgr/bin/vmoprcmd
     
                               HOST STATUS
    Host Name                                  Version   Host Status
    =========================================  =======   ===========
    nbu-master01                              710000    ACTIVE
    nbu-vm-bak01                             710000    OFFLINE
     
                                    PENDING REQUESTS
     
     
                                        <NONE>
     
                                      DRIVE STATUS
     
    Drive Name               Label   Ready  RecMID  ExtMID  Wr.Enbl.  Type
        Host                       DrivePath                            Status
    =============================================================================
    HP.ULTRIUM4-SCSI.000     Yes     Yes    DUB060  DUB060  Yes       hcart
        nbu-master01                   /dev/rmt/0cbn                        ACTIVE
        nbu-master01                   c80t0l0 (cln-emc-nfs01)              TLD
     
    HP.ULTRIUM4-SCSI.001     Yes     Yes    DUB055  DUB055  Yes       hcart
        nbu-master01                   /dev/rmt/1cbn                        ACTIVE
        nbu-master01                   c32t0l0 (cln-emc-nfs01)              TLD
     
    HP.ULTRIUM4-SCSI.002     No      No                     No        hcart
        nbu-master01                   /dev/rmt/2cbn                        TLD
        nbu-master01                   c48t0l0 (cln-emc-nfs01)              TLD
     
    HP.ULTRIUM4-SCSI.003     Yes     Yes    DUB050  DUB050  Yes       hcart
        nbu-master01                   /dev/rmt/3cbn                        ACTIVE
        nbu-master01                   c64t0l0 (cln-emc-nfs01)              TLD
    Also STU looks fine
    Label:                nbu-master01-hcart-robot-tld-0
    Storage Unit Type:    Media Manager
    Host Connection:      cln-bu01
    Number of Drives:     4
    On Demand Only:       no
    Max MPX/drive:        4
    Density:              hcart - 1/2 Inch Cartridge
    Robot Type/Number:    TLD / 0
    Max Fragment Size:    1048575 MB
     

     

  • >>Still not showing in the nbrbutil output!

     

    that's fine, that means no active allocation

     

  • When I get problems with drives, I simply delete them and rediscover them again (restart ltid/media manager after - no jobs running of course when you do this). More often than not it does the trick.

  • As revaroo said, deleteing from netbackup and the system is always a good idea. It eliminates the Issues from the OS side that NetBackup can't always see.

    After that is done and you get a good robtest with the drive, try creating a test policy that uses just that drive to force its use. 

  • Run nbrbutil -resetall I know you don't see any allocations, but I've seen this fix the issue multiple times. Each time, the issue occurred after some drive issue. Just be aware that this will kill any jobs running.
  • Hi mph999,

    Thanks. I tried that, but it didn't seem to work.

    After a reboot, it worked.

    Unfortunately, I don't know what the solution was, as I tried the "Configure Storage Device" wizard and "nbrbutil -resetall" before the reboot.

  • A reboot would fix issues at OS level.

    The only way to know why NBU would not/could not use the drive is to look at logs. If logs did not exist at the time that the issue was experienced, there is unfortunately no way to tell.
    /var/adm/messages may contain some device info/errors.

    For future troubleshooting, enable logging as follows:

    Add VERBOSE entry to /usr/openv/volmgr/vm.conf, then restart ltid.
    Additional media manager/device info will now be logged to /var/adm/messages

    Create /usr/openv/netbackup/logs/bptm folder to log NBU I/O activity.

     

  • Thanks for the advice. It looks like I'll have an opportunity to put it into practice very soon as 1 drive has already failed within 8 hours of powering the SL500 on ...