cancel
Showing results for 
Search instead for 
Did you mean: 

Problems with the robot

evgheny_c
Level 3

Hi

The problem in the next, when loading or after rebooting the server, the robot falls off.
If you type "sgscan" in the list, except for tape drivers, nothing.
As a device, the server sees the robot (see his wwn, and which adapter goes).
If you enter the command "luxadm fcode_download-p", the robot appears and works without any complaints until the next reboot.
Drivers on the HBA update, reinstall the latest version of Solaris - all without any changes.
If the console cable to connect to a server at a time when falling off the robot, we can see the following:

“tldd[1150]: TLD(0) unavailable: initialization failed: Unable to open robotic path”

 

server:Server Sun T5240

OS: Solaris Sparc

Tape Library: SL500

Netbackup 7.1

All fiber connection through the switch Brocade5300

14 REPLIES 14

Yasuhisa_Ishika
Level 6
Partner Accredited Certified

If you configured your host as MPxIO enabled, you have to disable it for SL500.
MPxIO supports disk devices only. If you connect tape devices through multiple ports, it become unstable.

MPxIO can be disabled on a per port basis. For more detail, check /kernel/drv/fp.conf on your host.

mph999
Level 6
Employee Accredited

Just reread the post ...

If you enter the command "luxadm fcode_download-p", the robot appears and works without any complaints until the next reboot.

I am not sure exactly what his does, but, suspect you need to speak with support for the HBA vendor.

Martin

Marianne
Level 6
Partner    VIP    Accredited Certified

Check /var/adm/messages for errors.

evgheny_c
Level 3

Hi Marianne

This files "messages. 

mph999
Level 6
Employee Accredited

This looks like a problem ...

 

Mar 13 11:36:40 bkp51-spb tldd[1140]: [ID 136913 daemon.notice] DecodeQuery() Actual status: Unable to open robotic path
Mar 13 11:36:40 bkp51-spb tldd[1140]: [ID 795118 daemon.error] TLD(0) unavailable: initialization failed: Unable to open robotic path
Mar 13 11:38:42 bkp51-spb tldcd[1213]: [ID 279486 daemon.notice] TLD(0) opening robotic path /dev/sg/c0tw500104f000ba5ecbl0
Mar 13 11:38:42 bkp51-spb tldcd[1213]: [ID 295976 daemon.error] TLD(0) [1213] robotic path /dev/sg/c0tw500104f000ba5ecbl0 is not a character device

this file is only a link to the raw device in /devices

*****************************************************************************************

 

When the library is NOT working, try this :

In this test DO NOT run luxadm commands

(1)

Run scan, prove it does not work

ls -al /dev/sg/c0tw500104f000ba5ecbl0

You will see the actual file this refers to ...

EXAMPLE:

 

lrwxrwxrwx   1 root     root          73 Mar  2 16:06 /dev/sg/c1t5l0 -> ../../devices/pci@1e,600000/pci@0/pci@9/pci@0,2/pci@1/scsi@2,1/sg@5,0:raw
 
(2)
 
Run scsi_command on this file :
 
EXAMPLE:
 
/usr/openv/volmgr/bin/scsi_command
root@womble 031412 $ /usr/openv/volmgr/bin/scsi_command -d /devices/pci@1e,600000/pci@0/pci@9/pci@0,2/pci@1/scsi@2,1/sg@5,0:raw
Inquiry data: removable dev type 1h HP      C5713A          H107
 
 
Does your library respond ?
 
(3)  Run scan again, does this now work ???
 
I have seen a very similar case, scan fails to work until you run scsi_command on the raw file, then once you do this scan works.  Very very strange, looks like some OS issue but unknown as yet.
 
Regards,
 
martin

Omar_Villa
Level 6
Employee

Try to clean SG and rebuild it, use this steps to do it and be sure you only have tape devices zone to the hba port were your changer is.

 

 http://seer.support.veritas.com/docs/266501.htm 

 

Regards.

Marianne
Level 6
Partner    VIP    Accredited Certified

Which HBA? Which HBA driver?

Confirm that persistent binding is in place in HBA driver config file.

evgheny_c
Level 3

Step 1

Immediatly before reboot, luxadm -fcode_download -p NOT running, sgscan not show robot

ls -al /dev/sg/c0tw500104f000ba5ecbl0
lrwxrwxrwx 1 root root 78 марта 14 14:15 /dev/sg/c0tw500104f000ba5ecbl0 -> ../../devices/pci@400/pci@0/pci@c/SUNW,qlc@0/fp@0,0/sg@w500104f000ba5ecb,0:raw

 

Step 2

/opt/openv/volmgr/bin/scsi_command -d /devices/pci@400/pci@0/pci@c/SUNW,qlc@0/fp@0,0/sg@w500104f000ba5ecb,0:raw
user scsi ioctl() failed, may be timeout, errno = 25, Inappropriate ioctl for device
inquiry failed (Inappropriate ioctl for device)

 

No, library is not respond (((

 

evgheny_c
Level 3

Already try rebuild, not worked (((

evgheny_c
Level 3

HBA - QLE2460

driver - from NetBackup 7 (sg)

Marianne, please explain this phrase:

Confirm that persistent binding is in place in HBA driver config file.

Marianne
Level 6
Partner    VIP    Accredited Certified

Persistent binding ensures that the OS assigns the SAME device name/path to devices when system is rebooted. Without persistent binding, it is possible that devices are scanned in different order and different device paths/names will be assigned.

We can see in messages that Qlogic HBA is using driver version:

Qlogic qlc(0) FCA Driver v20110321-3.05

and firmware:

qlc(0): Firmware version 5.4.3

First of all, check with your Oracle/SUN support team if driver and firmware is up-to-date.

Next, find the config file for your HBAs in /kernel/drv/sparcv9 or /kernel/drv - e.g. qlc.conf

You will find explanation and sample entries for persistent binding. Best to find the documentation for your particular HBA model, e.g. http://filedownloads.qlogic.com/files/driver/35393/QLA2300_Fibre_Channel_Drivers_for_Solaris_SPARC_readme.htm

mph999
Level 6
Employee Accredited

Hmm, I can't get my head round how this could be a persistant binding problem.

Let me explain why:

As Marianne has pointed out, if it was a persistant biniding problem, this would mean the path to the devces could change.  For example, if the robot path was :

/dev/path1 

it could change at the os level to /dev/path2

If this was the case, the fix, or workaround, would be to reconfigure the devices at the NBU level, that is, rebuild the sg driver, and then run device wizard.

But, in this case, to fix, we run 

 "The problem in the next, when loading or after rebooting the server, the robot falls off.

If you type "sgscan" in the list, except for tape drivers, nothing.
As a device, the server sees the robot (see his wwn, and which adapter goes).
If you enter the command "luxadm fcode_download-p", the robot appears and works without any complaints until the next reboot."

 

Also - “tldd[1150]: TLD(0) unavailable: initialization failed: Unable to open robotic path”

 

.. this path issue does look like a persistent biniding issue, I agree, but it's just that the fix to get the robot working is not what I would expect.

I think an important part of understanding this is to undestand exacty what:

"luxadm fcode_download-p   is doing.

 

So, looking at the man page for luxadm ,,,

 

fcode_download
           Locate  the  installed  FC/S,  FC100/S,  FC100/P,   or
           FC100/2P host bus adapter cards and download the FCode
           files in dir-name to the appropriate cards.  The  com-
           mand  determines  the  correct  card  for each type of
           file,  and  is  interactive.  User   confirmation   is
           required before downloading the FCode to each device.

           Use fcode_download to load FCode only  in  single-user
           mode.  Using  fcode_download  to update a host adapter
           while there  is  I/O  activity  through  that  adapter
           causes  the adapter to reset. Newly updated FCode will
           not be executed or visible until a system reboot.

-p    Prints the current version of  FCode  loaded  on
                 each card. No download is performed.

So this command seems to cause a reset (as well as print the fcode version) - is it this that is 'fixing' the issue. It's just an idea, happy to be told I'm wrong ... Regards, Martin

jim_dalton
Level 6

Reboot the robot.

Had a very similar problem recently, indeed with a T series Solaris box, with fibre to my drives and my robot...also an SL500. Running sol10, netb7.1.

The master/media server got rebooted for whatever reason and the robot just didnt want to talk. It was fine once rebooted. Hard to explain.Suns response wasnt exactly convincing as Ive rebooted media servers before and since without a need to bounce the robot.

Let me know how you get on.

Jim

mph999
Level 6
Employee Accredited

I've discussed this with Marianne.

We have ruled out  a pisistent binding issue - the log shows that the robot starts working again on the same path.

We can also rule out an sg issue.

We can also now rule out NetBackup.  If this was a Netbackup issue, we would have to make 'some fix' in NetBackup to make it work, we don't. - we run luxadm.

I accept that scan is failing, and yes, scan is a Symantec supplied command - but it has noting to do with Netbackup - all it does is send scsi-commands to the devices in the opearating system.  If they are not 'contactable' then scan will fail.

Also, we ran scsi_command against the raw device file.  We know now that the device file is correct(correct path) as it can be made to work - so, why is this failing.  Again, scsi_command is a symantec supplid command, but once again, it is nothing to do with NBU - it just send industry standard commands to the device specified, which we see fails to respond, that is, until an operating system command is run.

The key to this problem is why running the luxadm command fixes the issue - what is that doing to the HBA that allows it to work - WITH NO OTHER CHANGES MADE.

From the man page, it seems that this causes a reset, perhaps that is it - is there a fault on the HBA perhaps ?  

I think to investigate this further you need to contact your hardware support.

There is no evidence that this is NBU and I believ I have proved now for 100% that it is not.

Regards,

 

Martin