03-13-2012 11:39 PM
Hi
The problem in the next, when loading or after rebooting the server, the robot falls off.
If you type "sgscan" in the list, except for tape drivers, nothing.
As a device, the server sees the robot (see his wwn, and which adapter goes).
If you enter the command "luxadm fcode_download-p", the robot appears and works without any complaints until the next reboot.
Drivers on the HBA update, reinstall the latest version of Solaris - all without any changes.
If the console cable to connect to a server at a time when falling off the robot, we can see the following:
“tldd[1150]: TLD(0) unavailable: initialization failed: Unable to open robotic path”
server:Server Sun T5240
OS: Solaris Sparc
Tape Library: SL500
Netbackup 7.1
All fiber connection through the switch Brocade5300
03-14-2012 01:29 AM
If you configured your host as MPxIO enabled, you have to disable it for SL500.
MPxIO supports disk devices only. If you connect tape devices through multiple ports, it become unstable.
MPxIO can be disabled on a per port basis. For more detail, check /kernel/drv/fp.conf on your host.
03-14-2012 02:07 AM
Just reread the post ...
If you enter the command "luxadm fcode_download-p", the robot appears and works without any complaints until the next reboot.
I am not sure exactly what his does, but, suspect you need to speak with support for the HBA vendor.
Martin
03-14-2012 07:59 AM
Check /var/adm/messages for errors.
03-15-2012 05:40 AM
Hi Marianne
This files "messages.
03-15-2012 05:50 AM
This looks like a problem ...
Mar 13 11:36:40 bkp51-spb tldd[1140]: [ID 136913 daemon.notice] DecodeQuery() Actual status: Unable to open robotic path Mar 13 11:36:40 bkp51-spb tldd[1140]: [ID 795118 daemon.error] TLD(0) unavailable: initialization failed: Unable to open robotic path Mar 13 11:38:42 bkp51-spb tldcd[1213]: [ID 279486 daemon.notice] TLD(0) opening robotic path /dev/sg/c0tw500104f000ba5ecbl0 Mar 13 11:38:42 bkp51-spb tldcd[1213]: [ID 295976 daemon.error] TLD(0) [1213] robotic path /dev/sg/c0tw500104f000ba5ecbl0 is not a character device
this file is only a link to the raw device in /devices
*****************************************************************************************
When the library is NOT working, try this :
In this test DO NOT run luxadm commands
(1)
Run scan, prove it does not work
ls -al /dev/sg/c0tw500104f000ba5ecbl0
You will see the actual file this refers to ...
EXAMPLE:
03-15-2012 07:07 AM
Try to clean SG and rebuild it, use this steps to do it and be sure you only have tape devices zone to the hba port were your changer is.
http://seer.support.veritas.com/docs/266501.htm
Regards.
03-15-2012 07:40 AM
Which HBA? Which HBA driver?
Confirm that persistent binding is in place in HBA driver config file.
03-15-2012 08:01 AM
Step 1
Immediatly before reboot, luxadm -fcode_download -p NOT running, sgscan not show robot
ls -al /dev/sg/c0tw500104f000ba5ecbl0
lrwxrwxrwx 1 root root 78 марта 14 14:15 /dev/sg/c0tw500104f000ba5ecbl0 -> ../../devices/pci@400/pci@0/pci@c/SUNW,qlc@0/fp@0,0/sg@w500104f000ba5ecb,0:raw
Step 2
/opt/openv/volmgr/bin/scsi_command -d /devices/pci@400/pci@0/pci@c/SUNW,qlc@0/fp@0,0/sg@w500104f000ba5ecb,0:raw
user scsi ioctl() failed, may be timeout, errno = 25, Inappropriate ioctl for device
inquiry failed (Inappropriate ioctl for device)
No, library is not respond (((
03-15-2012 08:02 AM
Already try rebuild, not worked (((
03-15-2012 08:38 AM
HBA - QLE2460
driver - from NetBackup 7 (sg)
Marianne, please explain this phrase:
Confirm that persistent binding is in place in HBA driver config file.
03-15-2012 02:39 PM
Persistent binding ensures that the OS assigns the SAME device name/path to devices when system is rebooted. Without persistent binding, it is possible that devices are scanned in different order and different device paths/names will be assigned.
We can see in messages that Qlogic HBA is using driver version:
Qlogic qlc(0) FCA Driver v20110321-3.05
and firmware:
qlc(0): Firmware version 5.4.3
First of all, check with your Oracle/SUN support team if driver and firmware is up-to-date.
Next, find the config file for your HBAs in /kernel/drv/sparcv9 or /kernel/drv - e.g. qlc.conf
You will find explanation and sample entries for persistent binding. Best to find the documentation for your particular HBA model, e.g. http://filedownloads.qlogic.com/files/driver/35393/QLA2300_Fibre_Channel_Drivers_for_Solaris_SPARC_readme.htm
03-15-2012 03:42 PM
Hmm, I can't get my head round how this could be a persistant binding problem.
Let me explain why:
As Marianne has pointed out, if it was a persistant biniding problem, this would mean the path to the devces could change. For example, if the robot path was :
/dev/path1
it could change at the os level to /dev/path2
If this was the case, the fix, or workaround, would be to reconfigure the devices at the NBU level, that is, rebuild the sg driver, and then run device wizard.
But, in this case, to fix, we run
"The problem in the next, when loading or after rebooting the server, the robot falls off.
If you type "sgscan" in the list, except for tape drivers, nothing.
As a device, the server sees the robot (see his wwn, and which adapter goes).
If you enter the command "luxadm fcode_download-p", the robot appears and works without any complaints until the next reboot."
Also - “tldd[1150]: TLD(0) unavailable: initialization failed: Unable to open robotic path”
.. this path issue does look like a persistent biniding issue, I agree, but it's just that the fix to get the robot working is not what I would expect.
I think an important part of understanding this is to undestand exacty what:
"luxadm fcode_download-p is doing.
So, looking at the man page for luxadm ,,,
fcode_download Locate the installed FC/S, FC100/S, FC100/P, or FC100/2P host bus adapter cards and download the FCode files in dir-name to the appropriate cards. The com- mand determines the correct card for each type of file, and is interactive. User confirmation is required before downloading the FCode to each device. Use fcode_download to load FCode only in single-user mode. Using fcode_download to update a host adapter while there is I/O activity through that adapter causes the adapter to reset. Newly updated FCode will not be executed or visible until a system reboot.
-p Prints the current version of FCode loaded on each card. No download is performed.
So this command seems to cause a reset (as well as print the fcode version) - is it this that is 'fixing' the issue. It's just an idea, happy to be told I'm wrong ... Regards, Martin
03-16-2012 03:28 AM
Reboot the robot.
Had a very similar problem recently, indeed with a T series Solaris box, with fibre to my drives and my robot...also an SL500. Running sol10, netb7.1.
The master/media server got rebooted for whatever reason and the robot just didnt want to talk. It was fine once rebooted. Hard to explain.Suns response wasnt exactly convincing as Ive rebooted media servers before and since without a need to bounce the robot.
Let me know how you get on.
Jim
03-16-2012 06:46 AM
I've discussed this with Marianne.
We have ruled out a pisistent binding issue - the log shows that the robot starts working again on the same path.
We can also rule out an sg issue.
We can also now rule out NetBackup. If this was a Netbackup issue, we would have to make 'some fix' in NetBackup to make it work, we don't. - we run luxadm.
I accept that scan is failing, and yes, scan is a Symantec supplied command - but it has noting to do with Netbackup - all it does is send scsi-commands to the devices in the opearating system. If they are not 'contactable' then scan will fail.
Also, we ran scsi_command against the raw device file. We know now that the device file is correct(correct path) as it can be made to work - so, why is this failing. Again, scsi_command is a symantec supplid command, but once again, it is nothing to do with NBU - it just send industry standard commands to the device specified, which we see fails to respond, that is, until an operating system command is run.
The key to this problem is why running the luxadm command fixes the issue - what is that doing to the HBA that allows it to work - WITH NO OTHER CHANGES MADE.
From the man page, it seems that this causes a reset, perhaps that is it - is there a fault on the HBA perhaps ?
I think to investigate this further you need to contact your hardware support.
There is no evidence that this is NBU and I believ I have proved now for 100% that it is not.
Regards,
Martin