01-08-2013 04:24 PM
Hello. Having some issues with my tape drive (sun storagetek sl 48). Master is on Solaris 10, running NB 7.5.0.3. Media server is on same box.
My attached tape drive has been throwing 2009 errors..after running a couple commands earlier, I rebooted my library through the Sun admin gui...ever since doing so, I can no longer connect to this robot through NB - get the error stated above.
The tape drive is ready and online. I restarted ltid (usr/openv/volmgr/bin/ltid), and still am having issues. This happened after running tpautoconf -t, tpautoconf -r, and scan -changer. Rebooted library (software).
So, what on earth did I do to my tape drive? Why can't NB see it anymore - I've rebooted the library on my other environemnt with no issues.
Per my /var/adm/messages:
olmasterback emlxs: [ID 349649 kern.info] [ 5.04B4]emlxs1: NOTICE: 720: Link up. (4Gb, loop, initiator)
Jan 8 19:15:04 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.0554]emlxs1: NOTICE: 730: Link reset.
Jan 8 19:15:04 nbolmasterback tldcd[3163]: [ID 832037 daemon.error] scsi command failed, may be timeout, scsi_pkt.us_reason = 3
Jan 8 19:15:04 nbolmasterback tldcd[3163]: [ID 415613 daemon.error] TLD(0) mode_sense ioctl() failed: Error 0
Jan 8 19:15:04 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.02D7]emlxs1: NOTICE: 710: Link down.
Jan 8 19:15:07 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.04B4]emlxs1: NOTICE: 720: Link up. (4Gb, loop, initiator)
Jan 8 19:15:09 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.0554]emlxs1: NOTICE: 730: Link reset.
Jan 8 19:15:09 nbolmasterback tldcd[3163]: [ID 832037 daemon.error] scsi command failed, may be timeout, scsi_pkt.us_reason = 3
Jan 8 19:15:09 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.02D7]emlxs1: NOTICE: 710: Link down.
Jan 8 19:15:12 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.04B4]emlxs1: NOTICE: 720: Link up. (4Gb, loop, initiator)
Jan 8 19:15:14 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.0554]emlxs1: NOTICE: 730: Link reset.
Jan 8 19:15:14 nbolmasterback tldcd[3163]: [ID 832037 daemon.error] scsi command failed, may be timeout, scsi_pkt.us_reason = 3
Jan 8 19:15:14 nbolmasterback tldcd[3163]: [ID 415613 daemon.error] TLD(0) mode_sense ioctl() failed: Error 0
Jan 8 19:15:14 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.02D7]emlxs1: NOTICE: 710: Link down.
Jan 8 19:15:15 nbolmasterback tldd[3159]: [ID 320639 daemon.error] TLD(0) unavailable: initialization failed: Unable to sense robotic device
Jan 8 19:15:17 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.04B4]emlxs1: NOTICE: 720: Link up. (4Gb, loop, initiator)
Jan 8 19:17:17 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.0554]emlxs1: NOTICE: 730: Link reset.
Jan 8 19:17:17 nbolmasterback tldcd[3163]: [ID 832037 daemon.error] scsi command failed, may be timeout, scsi_pkt.us_reason = 3
Jan 8 19:17:17 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.02D7]emlxs1: NOTICE: 710: Link down.
Jan 8 19:17:19 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.04B4]emlxs1: NOTICE: 720: Link up. (4Gb, loop, initiator)
Jan 8 19:17:21 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.0554]emlxs1: NOTICE: 730: Link reset.
Jan 8 19:17:21 nbolmasterback tldcd[3163]: [ID 832037 daemon.error] scsi command failed, may be timeout, scsi_pkt.us_reason = 3
Jan 8 19:17:21 nbolmasterback tldcd[3163]: [ID 415613 daemon.error] TLD(0) mode_sense ioctl() failed: Error 0
Jan 8 19:17:21 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.02D7]emlxs1: NOTICE: 710: Link down.
Jan 8 19:17:24 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.04B4]emlxs1: NOTICE: 720: Link up. (4Gb, loop, init
_________________
tpautoconf and scan -changer no longer bring anything up, as the robot dissappeared. Per robtest:
ash-3.00# ./robtest
Configured robots with local control supporting test utilities:
TLD(0) robotic path = /dev/sg/c0tw500110a0008c72fal1
Robot Selection
---------------
1) TLD 0
2) none/quit
Enter choice: 1
Robot selected: TLD(0) robotic path = /dev/sg/c0tw500110a0008c72fal1
Invoking robotic test utility:
/usr/openv/volmgr/bin/tldtest -rn 0 -r /dev/sg/c0tw500110a0008c72fal1
Opening /dev/sg/c0tw500110a0008c72fal1
user command failed, may be timeout, scsi_pkt.us_reason = 3
user command failed, may be timeout, scsi_pkt.us_reason = 3
mode_sense ioctl() failed: Error 0
____
anyone, please help?!?!
Thanks,
Scott
Solved! Go to Solution.
01-09-2013 02:01 PM
Try these steps:
# cd /usr/openv/volmgr/bin/driver
# /usr/openv/volmgr/bin/sg.build all
Install the new sg driver configuration:
# /usr/bin/rm -f /kernel/drv/sg.conf
# /usr/openv/volmgr/bin/driver/sg.install
Check/verify config:
# /usr/openv/volmgr/bin/sgscan
(I am aware that Martin's method is different, but this has always worked for me.)
01-08-2013 04:57 PM
Hi Scott,
It is very late here so this will be brief based on first impressions.
This :
Opening /dev/sg/c0tw500110a0008c72fal1
user command failed, may be timeout, scsi_pkt.us_reason = 3
user command failed, may be timeout, scsi_pkt.us_reason = 3
mode_sense ioctl() failed: Error 0
01-08-2013 05:10 PM
Thank you for the response:
cfadm command:
fga_msg: NULL msgp
cfga_msg: NULL msgp
cfga_msg: NULL msgp
cfga_msg: NULL msgp
cfga_msg: NULL msgp
cfga_msg: NULL msgp
cfga_msg: NULL msgp
cfga_msg: NULL msgp
cfga_msg: NULL msgp
cfga_msg: NULL msgp
cfga_msg: NULL msgp
cfga_msg: NULL msgp
cfga_msg: NULL msgp
Ap_Id Type Receptacle Occupant Condition
c8 fc connected unconfigured unknown
c9 fc-private connected configured unknown
c9::500110a0008c72fa med-changer connected configured failed
c10 fc connected unconfigured unknown
c11 fc-private connected configured unknown
c11::500110a0008c7300 tape connected configured failed
______________________
I see the changer listed, so I will proceed.
When it comes to this command:
modunload -i $(echo $(modinfo |grep "sg (SCSA" |awk '{print $1}'))
Looks like I got the syntax wrong or something, keeps coming back with :
usage: modunload -i <module_id> [-e <exec_file>]
I'm not very skilled when it comes to coding, so I copied your command verbatim. Was that what you were implying?
01-08-2013 05:40 PM
Did you run cfgadm with "-al -o show_FCP_dev" argments(cfgadm -al -o show_FCP_dev)? No lun# is displayed in your output, and device condition is displayed as "failed".
Here is an output of "cfgadm -al -o show_FCP_dev" in my working environment.
<pre>
# cfgadm -al -o show_FCP_dev
Ap_Id Type Receptacle Occupant Condition
c2 fc-fabric connected configured unknown
c2::5001a4bcf8d18000,0 med-changer connected configured unknown
c2::5001a4bcf8d18000,1 tape connected configured unknown
c2::5001a4bcf8d18000,2 tape connected configured unknown
c3 fc connected unconfigured unknown
</pre>
BTW, somehow link of emlxs1 continue to go up and down. You should check HBA hardware, HBA driver, SAN swithch or so first.
01-08-2013 05:44 PM
modunload -i $(echo $(modinfo |grep "sg (SCSA" |awk '{print $1}'))
Looks like I got the syntax wrong or something, keeps coming back with :
usage: modunload -i <module_id> [-e <exec_file>]
Use "modunload -i `modinfo| awk '$6=="sg"{print $1}'`" instead.
01-08-2013 06:48 PM
Yasuhisa,
I did use this command:
/usr/openv/volmgr/bin
bash-3.00# cfgadm -al -o show_FCP_dev
and I see what you mean with my results:
Ap_Id Type Receptacle Occupant Condition
c8 fc connected unconfigured unknown
c9 fc-private connected configured unknown
c9::500110a0008c72fa med-changer connected configured failed
c10 fc connected unconfigured unknown
c11 fc-private connected configured unknown
c11::500110a0008c7300 tape connected configured failed
This robot should have two drives. I see both in your example, with LUN included. Mine, not so much. Interesting indeed. I'm on Solaris 10, not sure if that matters?
01-08-2013 07:07 PM
Also, as an fyi, your tip of
Use "modunload -i `modinfo| awk '$6=="sg"{print $1}'`" instead
did work. I ran that, then proceeded with Martin's suggestion earlier:
01-08-2013 07:11 PM
please ignore my last post, as I didn't spell 'kernel' right.
anywho, after fixing my mistake:
bash-3.00# modunload -i `modinfo| awk '$6=="sg"{print $1}'`
bash-3.00# mv /kernel/drv/sg.conf /kernel/drv/sg.conf.old
bash-3.00# mv sg.links sg.links.safe
bash-3.00# mv sg.conf sg.conf.safe
bash-3.00# ../sg.build all
The file ./st.conf should be appended to /kernel/drv/st.conf.
A reboot may be necessary to create any new device files.
Created file ./sg.conf.
Created file ./sg.links.
bash-3.00# ./sg.install
Copied files to /kernel/drv/amd64.
Doing add_drv of the sg driver
devfsadm: driver failed to attach: sg
Warning: Driver (sg) successfully added to system but failed to attach
WARNING: /usr/sbin/add_drv failed.
There may be no SCSI devices on this machine
01-08-2013 08:03 PM
Forget about NBU re-config until OS status is correct.
cfgadm output is still showing 'failed':
c9::500110a0008c72fa med-changer connected configured failed
...
c11::500110a0008c7300 tape connected configured failed
As per Yasuhisa's post above - you need to check HBA as we see these entries being repeated in messages file:
Jan 8 19:15:09 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.02D7]emlxs1: NOTICE: 710: Link down.
Jan 8 19:15:12 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.04B4]emlxs1: NOTICE: 720: Link up. (4Gb, loop, initiator)
Jan 8 19:15:14 nbolmasterback emlxs: [ID 349649 kern.info] [ 5.0554]emlxs1: NOTICE: 730: Link reset.
A server reboot will reset hba,
If these messages continue after reboot, log a call with your server support team.
01-08-2013 11:18 PM
c9::500110a0008c72fa med-changer connected configured failed
Yes, typically I managed not to cover what to do in the state 'failed', there are many possibilities, as you will understand you can only cover some, hence my description of the most common.
Failed state looks to me like a library issue - there is a level of communication (you can see it) but it is not responding correctly.
I would recommend to speak with the hardware vendor and perhaps os support, but hardware vendor first.
This :
modunload -i $(echo $(modinfo |grep "sg (SCSA" |awk '{print $1}'))
It uses the modinfo command to get the instance number of the SG driver and ‘pastes’ this into the modunload command, which ‘stops’ the driver.
For example :
root@womble netbackup $ modinfo |grep "sg (SCSA"
189 7b7e0000 37a8 338 1 sg (SCSA Generic Revision: 3.7)
We only want the instance number (1st field) so the same command with the awk bit gives :
root@womble netbackup $ modinfo |grep "sg (SCSA" |awk '{print $1}'
189
To unload this we could do :
modunload –i 189
But is you put a command inside $( ) then it will use only the output of the command :
Easiest shown with echo ...
root@womble netbackup $ echo date
date
But use $() and it will not echo date, but will echo the output of date ...
root@womble netbackup $ echo $(date)
Fri Jan 4 12:08:03 GMT 2013
Hence ...
modunload -i $(echo $(modinfo |grep "sg (SCSA" |awk '{print $1}')
So to do this manually, this command will give the instance number of the driver
modinfo |grep "sg (SCSA" |awk '{print $1}'
This will unload it :
modunload –i <instance number >
Martin
01-09-2013 12:45 PM
Hello. Been working on Marianne's suggestion to resolve 'condition' failed status - rebooted the server and power cycled the tape drive. No longer says 'condition failed':
Ap_Id Type Receptacle Occupant Condition
c8 fc connected unconfigured unknown
c9 fc-private connected configured unknown
c9::500110a0008c72fa,0 tape connected configured unknown
c9::500110a0008c72fa,1 med-changer connected configured unknown
c10 fc connected unconfigured unknown
c11 fc-private connected configured unknown
c11::500110a0008c7300,0 tape connected configured unknown
However, in /var/adm/messages, I am getting this now:
Jan 9 15:03:12 nbolmasterback vmd[717]: [ID 617826 daemon.notice] ready for connections
Jan 9 15:03:22 nbolmasterback avrd[741]: [ID 748556 daemon.notice] st.conf configuration for HP.ULTRIUM4-SCSI.000 (device 0), name [HP Ultrium LTO 4], vid [HP Ultrium 4*], type 0x3b, block size 0, options 0x18619 (see st(7D) man page)Jan 9 15:03:45 nbolmasterback mac: [ID 736570 kern.info] NOTICE: e1000g1 unregistered
Jan 9 15:03:45 nbolmasterback mac: [ID 736570 kern.info] NOTICE: e1000g2 unregistered
Jan 9 15:03:45 nbolmasterback mac: [ID 736570 kern.info] NOTICE: e1000g3 unregistered
Jan 9 15:03:58 nbolmasterback pseudo: [ID 129642 kern.info] pseudo-device: devinfo0
Jan 9 15:03:58 nbolmasterback genunix: [ID 936769 kern.info] devinfo0 is /pseudo/devinfo@0
Jan 9 15:04:41 nbolmasterback pseudo: [ID 129642 kern.info] pseudo-device: fcsm0
Jan 9 15:04:41 nbolmasterback genunix: [ID 936769 kern.info] fcsm0 is /pseudo/fcsm@0
Jan 9 15:04:50 nbolmasterback scsi: [ID 799468 kern.info] st1 at fp5: name w500110a0008c7300,0, bus address 2
Jan 9 15:04:50 nbolmasterback genunix: [ID 936769 kern.info] st1 is /pci@3,0/pci1022,7458@a/pci10df,fc12@1,1/fp@0,0/tape@w500110a0008c7300,0
Jan 9 15:04:50 nbolmasterback scsi: [ID 365881 kern.info] /pci@3,0/pci1022,7458@a/pci10df,fc12@1,1/fp@0,0/tape@w500110a0008c7300,0 (st1):
Jan 9 15:04:50 nbolmasterback <HP Ultrium LTO 4>
Jan 9 15:04:50 nbolmasterback avrd[741]: [ID 990698 daemon.notice] st.conf configuration for HP.ULTRIUM4-SCSI.001 (device 1), name [HP Ultrium LTO 4], vid [HP Ultrium 4*], type 0x3b, block size 0, options 0x18619 (see st(7D) man page)Jan 9 15:04:54 nbolmasterback tldcd[748]: [ID 295976 daemon.error] TLD(0) [748] robotic path /dev/sg/c0tw500110a0008c72fal1 does not exist
Jan 9 15:04:54 nbolmasterback last message repeated 1 time
Jan 9 15:04:54 nbolmasterback tldd[727]: [ID 795118 daemon.error] TLD(0) unavailable: initialization failed: Unable to open robotic path
Jan 9 15:06:56 nbolmasterback tldcd[748]: [ID 295976 daemon.error] TLD(0) [748] robotic path /dev/sg/c0tw500110a0008c72fal1 does not exist
Jan 9 15:06:56 nbolmasterback last message repeated 1 time
Jan 9 15:06:56 nbolmasterback tldd[727]: [ID 795118 daemon.error] TLD(0) unavailable: initialization failed: Unable to open robotic path
Jan 9 15:08:58 nbolmasterback tldcd[748]: [ID 295976 daemon.error] TLD(0) [748] robotic path /dev/sg/c0tw500110a0008c72fal1 does not exist
Jan 9 15:27:19 nbolmasterback sg: [ID 266374 kern.notice] Symantec SCSA Generic Revision: 3.7
____________
I see the drive listed here, however, saying the robotic path doesn't exist is interesting. Again, I only started looking at the tape drive after a week of 2009 errors. After a software reboot, I am having these issues. How to reconfigure the path? Even though nothing changed, is this simply refreshing / reseting this?
01-09-2013 01:06 PM
also, as an fyi
bash-3.00# ./sgscan
#
#WARNING: detected StorEdge Network Foundation connected devices not in
# SG configuration file:
#
# Device World Wide Port Name 500110a0008c7300
# Device World Wide Port Name 500110a0008c72fa
#
# See /usr/openv/volmgr/NetBackup_DeviceConfig_Guide.txt chapter
# "Special configuration for "Sun StorEdge Network Foundation" HBA/Driver"
# for information on how to use sg.build and sg.install to
# configure these devices
I think I may just be stuck in land of confusion.
01-09-2013 01:18 PM
Now that OS can see devices, follow steps again to rebuild sg driver.
You need to follow steps in Emulex documentation to configure Persistent Binding to ensure that OS device names remain the same when server is rebooted.
01-09-2013 01:48 PM
01-09-2013 01:48 PM
01-09-2013 01:49 PM
Marianne,
Just so I'm understanding things as I should...
ls -l /dev/rmt/*cbn
lrwxrwxrwx 1 root root 89 Jul 31 2008 /dev/rmt/0cbn -> ../../devices/pci@3,0/pci1022,7458@9/pci10df,fc12@1,1/fp@0,0/tape@w500110a0008c72fa,0:cbn
lrwxrwxrwx 1 root root 89 Jul 31 2008 /dev/rmt/1cbn -> ../../devices/pci@3,0/pci1022,7458@a/pci10df,fc12@1,1/fp@0,0/tape@w500110a0008c7300,0:cbN
_______
c8 fc connected unconfigured unknown
c9 fc-private connected configured unknown
c9::500110a0008c72fa,0 tape connected configured unknown
c9::500110a0008c72fa,1 med-changer connected configured unknown
c10 fc connected unconfigured unknown
c11 fc-private connected configured unknown
c11::500110a0008c7300,0 tape connected configured unknown
_______
The addresses are the same, and LUNs are provided in output now. However, in trying to follow the above steps, the first command is stopping me..(either Martin or Yasuhisa's won't work now)
bash-3.00# modunload -i `modinfo| awk '$6=="sg"{print $1}'` (yasuhisa's syntax)
usage: modunload -i <module_id> [-e <exec_file>]
bash-3.00# modunload -i $(echo $(modinfo |grep "sg (SCSA" |awk '{print $1}')) (martin's syntax)
usage: modunload -i <module_id> [-e <exec_file>]
Also, Martin's tip of :
modinfo |grep "sg (SCSA" |awk '{print $1}'
does not return anything - so no driver is installed? . Am I really just copying this stuff wrong?
I'm assuming that now that the OS is reporting tape / changer as being present, rebuilding would fix issue. What am I glossing over?
01-09-2013 02:01 PM
Try these steps:
# cd /usr/openv/volmgr/bin/driver
# /usr/openv/volmgr/bin/sg.build all
Install the new sg driver configuration:
# /usr/bin/rm -f /kernel/drv/sg.conf
# /usr/openv/volmgr/bin/driver/sg.install
Check/verify config:
# /usr/openv/volmgr/bin/sgscan
(I am aware that Martin's method is different, but this has always worked for me.)
01-09-2013 02:32 PM
Marianne (and Martin / Yasuhisa),
I just wanted to say thanks for your wonderful help. The last tip got me all squared away, and I'm currently running duplication jobs. Hopefully you all know just how helpful you make this forum.
Until I screw something up again,
-Scott