Forum Discussion

Calzor_Suzay's avatar
13 years ago
Solved

NBU / OS Not controlling Robotics since library upgrade

We upgraded our library from ADIC i2k to Quantum i6k and associated LTO3 to LTO5 drives.

All looks good excpet the robotics are no longer controllable and the cfgadm command produces a failing message.
We're running on Solaris 10 sparc and NBU 7.1.0.2

 #  cfgadm -al -o show_FCP_dev
Ap_Id                          Type         Receptacle   Occupant     Condition
c3                             fc-fabric    connected    configured   unknown
c3::500308c001d41002,0         tape         connected    configured   unknown
c3::500308c001d41002,1         unavailable  connected    configured   failing
c3::500308c001d41008,0         tape         connected    configured   unknown
c3::500308c001d4100e,0         tape         connected    configured   unknown
c3::500308c001d41014,0         tape         connected    configured   unknown

The drives are directly attached to a SAN brocade and the control path is set on drive 1 as above.

Ideas?

  • Check the library to verify it is happy.  Then switch robotic control to another drive and see if the problem follows. 

6 Replies

  • If cfgadm is failing you have an issue between the os and the library.

    Nothing can be done from NBU as we're not at that level 

    As always, it could be worth a power cycle of everything ...

    robot, server, switch, passing dog etc ...

    Could also be worth a 'google'

    This looks promising ...

    http://xteams.oit.ncsu.edu/iso/lun_removal

    http://itmerlinllc.wordpress.com/2010/08/24/fixing-cfgadm-errors-without-reboot/

    Marianne will be along shortly, she's pretty damm good with Quantum kit I do believe ...

    If the above fails,  time to call back the hardware / san team I think.

    Martin

  • Martin has more faith in my abilities than me... indecision
    I've been looking at this post, thinking - we need Martin here!

    The closest I could find was this: http://docs.oracle.com/cd/E19963-01/html/821-1459/devconfig2-8.html

    Look at this topic:


    How to Configure a SCSI Device


    1. c1::dsk/c1t4d0       unavailable   connected    unconfigured unknown
    2. Configure the SCSI device.
      # cfgadm -c configure c1::dsk/c1t4d0

    I would try something similar - just med-changer instead of dsk.

    If that does not work - you need to get back to basics. As Martin pointed out: somewhere 'between the os and the library'. 

  • Oh my, we're in trouble if we are both hoping each other will turn up ...

    Seem to recall I had similar on my test server - as the drive(s) are scsi connected there is no san - I think I just played about with cfgadm commands till I found one that worked ...  If I don't remember correctly, I would have totallly removed the devices from the OS and run a reconfig reboot.  Sometimes it is just quicker to take the heavy handed method.  

    I also recall a very very rare case where the san connected devices were visible in cfgadm (can't remember  in what state) and scan worked - this only showed that the devices responded to a scsi inquire command (scan is NOT a NBU command as such, it only sends scsi commands ).  Despite this, NBU would not work at all with the devices, they could be confgured (as scan worked, the device config worked) but they could not be used.  Given that it is the OS that actually writes/ reads to the drives, it wasn't likely that NBU was the cause - turned out that it was actually somethig to do with the san that was causing the issue (never did find out exactly what).  Persoannly I suspect it was a firmware issue somewhere, probably the HBA.

    Anyhow, I only mention this to demonstrate no matter what you see, the cause is not always where you think it is.

    What is certain, is that it is not a NBU issue (sorry ...).  If you removed NBU (not suggesting that you do ...) you would still have the cfgadm command, and it still wouldn't 'work'.

    Hope this helps,

    Martin

     

     

     

  • There is also a possibility that the HBA being used does not like the multiple LUNS coming into it - sometimes a symptom with Quantum libraries where the the robotics pass through the drive so the single connection shares the LUNS

    Worth taking a look at the HBA BIOS or checking if you are using any sort of controlling software for the HBA that may be preventing the binding

    Hope this helps

  • Check the library to verify it is happy.  Then switch robotic control to another drive and see if the problem follows. 

  • The problem was eventually found to be the control path option in the library designated as drive 1, we moved  the control path to drive 2 and it all started working, not sure if the drive is faulty for control path as we've not had chance to take it out of production yet.

    It was a pain checking everything several times over and was simply down to a duff drive in the end.