Forum Discussion

AmirJabran81's avatar
8 years ago

Tape drive down very often

Hi All,

My current enviroment is as under:

Hardware server spec: DELL R710 Having external HBA : Qlogic (QLE2562)

OS= Windows 2008 R2 Enterprise Edition

Netbackup Server = 7.6.0.4

Tape Library = HP MSL2024 with two FC Tape Drive

Problem:

After One year successfull running , now my tape drives down oftenly one by one or most of the time both drive.

Where my Physical status of library and tape drive shows fine, i Log on to the MSL2024 Web console. It shows drives are in UP and Ready State.

There will be just below error in windows Event Viewer:(Application Log)

"Operator/EMM server has DOWN'ed drive HP.ULTRIUM5-SCSI.001 (device 1)" .

 

Please update is netbackup has some issue or HP library with tapes have some issue, HP vendor throughly check and test the drives , said it fine.

Waiting for yours response.

Thanks & Regards

M.Amir

 

  • The tape alert is as Mariane suggests

    0x00800000 0x00000000

    Flag 9: Write protect. Severity: Critical

    If the tape is 'not' write protected via the little switch, then you have a drive fault, as the drive itself, not NBU or anything else, is reporting that it is.  The tape alerts are sent directlt from the tape drive firmware.

  • in [installl_path]/netbackup/db/media do a

    more errors

    File should look somthing like:

    12/15/16 19:59:53 A37482 6 WRITE_ERROR 0211
    12/23/16 19:31:43 A32855 3 POSITION_ERROR 11112
    12/23/16 20:00:47 A32855 3 POSITION_ERROR 11112
    12/23/16 22:03:54 A32855 3 POSITION_ERROR 11112
    12/23/16 22:30:43 A32855 3 POSITION_ERROR 11112
    12/25/16 18:16:03 A23704 5 WRITE_ERROR 1110

  • Suspend these tapes and see if the problem persist. Its a unique tapes list from the db/media/error file

    e.g: bpmedia -m J332L5 -suspend

    J332L5
    J334L5
    3798L5
    0044L5
    0043L5
    0049L5
    A384L5
    J326L5
    3792L5
    J325L5
    J333L5

    The tape structure of the tapes look damaged. A tape has a file table just like disk, if this file table has been damaged e.g power loss, it could give the unable to space to end of file. When you suspend a tape, it will be reused when all images has been expired.

    • mph999's avatar
      mph999
      Level 6

      Good point Nicolai - though if there are cartridge memory issues that should also show as a tape alert.

      Usually

      0x0F: 'Failure of cartridge memory chip',
      0x12: 'Tape directory corrupted on load',

      Certainly agree with suspending the tapes theough ...

      • Marianne's avatar
        Marianne
        Level 6

        The 'write-protect' error seems to be intermittent.

        AmirJabran81 says if he unfreezes the tape it will be written fine at the next attempt.

        So, probably firmware issue rather than faulty media?

        Issue seems to be present for quite a while - even when LTO3 media was loaded in the LTO5 drives (probably for restore?):

        03/26/15 17:02:33 0003L3 0 TAPE_ALERT HP.ULTRIUM5-SCSI.000 0x00008000 0x00000000

          

    • AmirJabran81's avatar
      AmirJabran81
      Level 3

      Thanks for update,

      How could i set The tape structure of the tapes damaged to correctly. 

      The mention tapes are only tapes which are in library and free to use.

      Will also discuss the issue with hardware vendor on tapes alert as you said firmware issue.

      Mentions tape are my monthly and yearly backup tape it will go offside in next week.

       

      Will try to suspend these and check.

       

      regards

      M.Amir

      • Marianne's avatar
        Marianne
        Level 6

        You might as well leave the tapes FROZEN rather than unfreeze and then suspend them.

        Either way - suspended or frozen tapes can not be written to.
        You will have to enter a new set of tapes in the robot.

  • Most often the error in Netbackup is an indication that is another problem with infrastructure/hardware

    My list of things to check is

    1) Library console/panel (Seems you done that already)

    2) SAN connectivity, is the drive logged into the fabric (There has been a problem with some HP libraries/tape drives that they logged out after a while)

    3) HBA tool, can the drives be seen here & is persistent binding still in place

    4) Device manager in OS

    And of course all the related/relevant logs like bptm, media error file, OS event logs, library, tape drive, SAN switch log

    What is the cleaning state of these drives ? Have often see issue with drives that needed to be cleaned

    • AmirJabran81's avatar
      AmirJabran81
      Level 3

      Hello!

      Thanks for the update, Let me explain you questions!

       

      1) Library console/panel (Seems you done that already)

      Yes , its working fine with no errors

      2) SAN connectivity, is the drive logged into the fabric (There has been a problem with some HP libraries/tape drives that they logged out after a while)

      I have connected the tape library with dell server directly without any SAN switch

      3) HBA tool, can the drives be seen here & is persistent binding still in place

      Sorry could not getted what information you need.

      4) Device manager in OS

      Yes all latest driver for Tape library and tape drive is installed with latest firmware.

      And of course all the related/relevant logs like bptm, media error file, OS event logs, library, tape drive, SAN switch log

      What is the cleaning state of these drives ? Have often see issue with drives that needed to be cleaned

      Once in a month or as required. After cleaning behaviour will be same.

      The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue
       
      No change has been made just a minor upgrade from 7.6.0.3 to 7.6.0.4
       
      Regards
      M.Amir
      • mph999's avatar
        mph999
        Level 6

        PLease send the errors file as previously requested, that covers the most recent issues of drive going down.   Usually there is only one errors file.

        ...netbackup\db\jobs\media\errors

        Please also create the ..netbackup\logs\bptm folder.

        Via the GUI > Host Properties >Logs you can set the bptm verbose level to 5

        Create these folders

        <install path>\veritas\volmgr\debug\tpcommand

        <install path>\veritas\volmgr\debug\robots

        Add the work VERBOSE into the file <install path>\veritas\volmgr\vm.conf

        Create the empty files

        <install path>\veritas\volmgr\DRIVE_DEBUG

        <install path>\veritas\volmgr\ROBOT_DEBUG

        (Make sure wndows does not add and file suffix)

        Restart NBU

        Await repeat of issue and collect the above logs, along with the activity monitor details showing the issue for the job.

  • Can you show us [install_path]/netbackup/db/media/errors file ?

    • AmirJabran81's avatar
      AmirJabran81
      Level 3

      Hello!

       

      There is no of files which one to send , the lastest one?