Forum Discussion

Jason_Gauthier's avatar
10 years ago

Drive failed in library. Jobs queued

Hello,

 

 I had a tape drive fail in my robotic library.  There are (were) two drives.  I am getting a replacement, but in the meanwhile, BE will not backup.

All drives stay at queued, and inventory fails showing "Incorrect function" on 24 of the 48 slots.

What I've done:

  Attempted to delete any occurrence of the failed device. 

  In addition to reboots, etc.

 

I'm not sure what else to try at this point.  I do not have any bad or missing devices.

  • Even after numerous power cycles, the library would not stop thinking it had two drives.

    I don't know what trick, other than that, could be used to make it think else.

    I received the replacement drive today, and put it in.

    After power cycles, drivers installs, and reboots, backups work cleanly again.

     

    So, the problem is resolved with the hardware replacement, but I sure would have liked to figure out why the library thought it was missing a drive.

  • It sounds like the drive has failed, but BE isn't aware of it failing and has not taken the drive offline?  That could lead to some condition where BE is continuously attempting to use the failed drive and is stuck in an infinite loop.  What are the symptoms of the drive failure?  What tape library?

    I would suggest setting the failed drive to a state of paused or disabled in the BE GUI, then BE should run jobs using the remaining good drive.  This is assuming that jobs are targetted against a device pool or the robot itself, rather than directly against the failed drive.

  • The symptoms of the drive failure is lack of link on the FC cable, and the tape library reporting it as missing.

    We removed the physical drive to get a replacement.  The tape libary is a Dell PVT132.

    I don't know how to set the drive to a failed state/paused/disabled in the GUI at this point. As I mentioned in my original post the failed drive is no longer present there. I have completely removed all indications of its existence!

    All drives are pointed at the pool. Still no luck.

     

  • What happens if you point the backup directly to the functioning tape drive instead of the pool ?

  • Yup, I've tried that too.  Just goes into "infinite queue". I've tried:

    * The original pool

    * A new pool

    * A slot directly

    * The drive library

    * A specific drive

    I have no idea why it's queuing. Isn't there some place I can figure that out?

     

     

  • Please post your adamm.log. 

    Can you run an inventory of 1 slot?  Your statment of "All drives stay at queued, and inventory fails showing "Incorrect function" on 24 of the 48 slots." is confusing to me as I thought that the PV132T only had 24 slots maximum.

    Is your BE 2010 up to R3 and fully patched? 

    This technote may help, although it seems opposite of your situation.     

    http://www.symantec.com/docs/TECH63380

  • Ugh. My old tape drive is a PV132T. My current drive is a PV TL4000.  Sorry for that confusion!

    Posted my adamm.log. My 2010 is at R3, but I don't know if it has any supplemental updates.

  • This part of the adamm.log shows that the library still thinks that it should have 2 tape drives.  Which I think means that BE should be showing a "missing device" in the GUI.  If you power cycle the TL4000 library, it should become aware that it only has one drive, then, when BE services are cycled, it could show up as a proper one drive library in the BE gui.  But, this may be one of those weird libraries that take a factory default to get it to realize that a drive is gone.

     

    0001:0000:0003:0001  Device Name             "\\.\MediumChanger1"
                         Primary Inquiry         "IBM     3573-TL         B.60"
                         Serial Number           "IBM     3573-TL         00X4U78H7065_LL0"
                         Device Flags            UMD, SCSI, SN(TYPE 0), SN(ELEMENT)
                         Device State            3, Online

                         Device IDs              1016, {D582C147-82C9-4F45-92EC-4B02FEC13E5B}
                         Device Name             "TL 4000"
                         Device Type             2131755008, "CHANGER FS=1"
                         Device Features         0x00006000: RMP,RRD
                         1st Slot Number         1
                         Number Of Slots         46
                         Portal Slots            0
                         Import/Export           Manual

                         Drive Element 0         1048, "IBM     ULT3580-TD4     1310261465"
                         Drive Element 1         1053, "MISSING DEVICE          MISSING_DRV_IN_LDR_90"

     

    But, BE should not prevent you from using the good drive, even if one drive is officially missing.

    Are you using SSO?  If you have multiple servers seeing the tape library over FC, please cycle the BE services on all of them to make sure they have the correct/current hardware view.

  • Even after numerous power cycles, the library would not stop thinking it had two drives.

    I don't know what trick, other than that, could be used to make it think else.

    I received the replacement drive today, and put it in.

    After power cycles, drivers installs, and reboots, backups work cleanly again.

     

    So, the problem is resolved with the hardware replacement, but I sure would have liked to figure out why the library thought it was missing a drive.

  • Did you do a factory default on the library?  That would have made it realize that it only had one drive.