Forum Discussion

robtaylor's avatar
robtaylor
Level 2
8 years ago

Netbackup not writing to 1 of 2 drives

Hi Guys,

Experiencing a very weird problem and looking for some assistance with troubleshooting this issue. Recently completed a hardware refresh and upgrade with current environment on Cisco C220 M3, Windows Server 2016, Netbackup 8.0 (master/media) which has a Quantum scalar i500 tape library with 2 drives connected to it.

Netbackup device configuration picks up the 2 drives and medium changer but can only write to 1 of the 2 drives. Jobs fails with error

 Error bptm (pid=6448) ioctl (MTREW) failed on media id 000044, drive index 1, The semaphore timeout period has expired. (121) (../bptm.c.7962)

 We have ruled out faulty media, replaced the tape drive, swapped tape drives but the issue follows the path. Drive diagnostics fails on the 2nd drive though I can see a tape being mounted to the drive. Just fails at passing any further commands to it. Both Quantum and Netbackup support are unable to identify the cause. Any ideas?

 

Regards,

Robert

  • 1. What drive type do you have SAS or  FC?

    2. Have you tried to swap or replace FC/SAS cables?

    3. Could you share OS event logs because "The semaphore timeout period has expired" message comes from OS level and I belive it's just a result of timeouts.

    4. Have you tried L&TT(HP) or  ITDT(IBM) tools for tape drive diagnostic?

    You have 3 possible problem place:

    a) OS/driver

    b)TapeLibrary

    c)Physical connection

    I don't belive that NetBackup could be a reason of this message. Do you have a chance to connect the tape library to a linux host and run some tests? OS level tests without NetBackup with different OS\driver would help to exclude most possible reasons and show if connection or Tape Library itself is a root cause.

     

     

  • 1. What drive type do you have SAS or  FC?

    2. Have you tried to swap or replace FC/SAS cables?

    3. Could you share OS event logs because "The semaphore timeout period has expired" message comes from OS level and I belive it's just a result of timeouts.

    4. Have you tried L&TT(HP) or  ITDT(IBM) tools for tape drive diagnostic?

    You have 3 possible problem place:

    a) OS/driver

    b)TapeLibrary

    c)Physical connection

    I don't belive that NetBackup could be a reason of this message. Do you have a chance to connect the tape library to a linux host and run some tests? OS level tests without NetBackup with different OS\driver would help to exclude most possible reasons and show if connection or Tape Library itself is a root cause.

     

     

    • robtaylor's avatar
      robtaylor
      Level 2

      Thanks guys, came back to update the solution to this issue. It turned out to be a driver issue. Even though the correct driver was used, the problem drive picked up an older driver and this was overlooked in initial troubleshooting. 

  • There are 2 paths for a library-based tape drive:
    Robotic control path - this seems to be working fine as tapes get mounted fine.
    Data path - this is the path between the server and the tape drive - this is where you are seeing a problem.

    Various components are involved here - the hba in the server, the hba driver and firmware, the tape driver on the server, the cable (either direct or through a switch) --- every piece of the physical path to the tape drive.

    When the robot has mounted a tape in the drive, it needs to send a 'tape ready' message via the OS to NetBackup. This message is either never sent or never received by NBU.

    So, you already know this is not a NetBackup or Quantum issue... 
    You need to troubleshoot the physical data path. Have you tried to replace or swop cables?
    Or move to another switch port? Or replace FC gbic?

    Do you have VERBOSE entry in vm.conf on the NBU server?
    If you do and NBU was restarted after adding the entry, additional hardware-related messages will be sent to Windows Event Viewer Application and System log.

    These Application and System logs are crucial in troubleshooting OS -> Device path issues.

    Can you share errors seen in Event Viewer logs?

    • Michael_G_Ander's avatar
      Michael_G_Ander
      Level 6

      Being a windows system make sure you have deleted any ghost devices, see hidden devices in the device manager.

      Another thing I have seen giving issues is a not properly loaded driver for the tape drive, usually because there was a tape in the tape drive at reboot/driver load time.

      Also some libraries "mask" the tape drive used for robotic control if the scsi number order is not correct, guess the correct number order is in the manual if that is the case.