Forum Discussion

NBU35's avatar
NBU35
Level 6
7 years ago

Tape drives non-functional

Hi 

we have master server solairs 10, VCS cluster, NBU 7.7.3

Media Server : HPUX B.11.31 U ia64 , NBU 7.7.3

tape library - SL3000, which has 4 LTO4 and 20 LTO5 tape drives.

My robot control has only 18 LTO5 tape drives configured on it. 
As 4 LTO4 are not shared and only configured on specific media server (These drives are working fine).

2 LTO5 are specifically configured on another media server and are not share.

now out of these 18 tape drives, 8 tape drives are not working fine.

they are forever in down-tld mode, if we make them up and run backups, backups fail with tape-rbot error,

Now concern is even robtest is not able to dismount tapes, it gives scsi errors.

m d16 s262

Initiating MOVE_MEDIUM from address 1015 to 2261

move_medium failed

sense key = 0x5, asc = 0x3a, ascq = 0x0, MEDIUM NOT PRESENT

We have checked with tape library vendor according to them there is no hardware issue with tape drives and library.

 ALso all devices are in claimed state and visible on OS. 

Please suggest what i need to look for to fix this issue.

  • If the device name was present before, then it means that 'something' happened that caused the OS to lose connectivity to the device. 
    NBU uses the OS for device access.

    So, you need to troubleshoot as OS-level. 
    Check output of '/usr/openv/volmgr/bin/scan' and 'ioscan -f'.

    Check syslog file for errors. 

  • The tape needes to be ejected from the drive before it can be moved back to slot.

    So, if the tape is still in the drive, then the 'MEDIUM NOT PRESENT ' error is correct. 
    There is no 'unloaded' tape to be picked up by the robot hand.
    The media server that loaded the tape needs to unload the tape first. 
    Robot control host cannot do this if the drive is still reserved by the media server. 
    bptm on media server will send unload/eject to the tape drive when the job is done. 
    Evidence can be seen in bptm log on the media server.
    I suggest level 3 log for troubleshooting (level 5 only upon request from Veritas Support).

    To see why drives are DOWN'ed, you need to add 
    VERBOSE
    to vm.conf on all affected media servers and restart NBU/ltid.

    Try to UP the drives after this.
    Next time drive is DOWN'ed, the exact reason will by logged in System log on the media server.
    (e.g. /var/adm/syslog/syslog.log on HP-UX media server,  /var/adm/messages on Solaris media server.)

     

    • NBU35's avatar
      NBU35
      Level 6

      I am getting following error during unload command 

       

      unload d17
      Opening /dev/rtape/tape73_BESTnb, on the local host, please wait...
      Error - cannot open /dev/rtape/tape73_BESTnb (No such device or address)

      However this drive is ther in tpconfig -d 

      18 TLB5-KJC-DRIVE17 hcart2 TLD(2) DRIVE=17
      /dev/rtape/tape73_BESTnb DOWN

      • Marianne's avatar
        Marianne
        Level 6

        If the device name was present before, then it means that 'something' happened that caused the OS to lose connectivity to the device. 
        NBU uses the OS for device access.

        So, you need to troubleshoot as OS-level. 
        Check output of '/usr/openv/volmgr/bin/scan' and 'ioscan -f'.

        Check syslog file for errors.