cancel
Showing results for 
Search instead for 
Did you mean: 

I/O error found on some drives

Inder
Level 4

 

 

Getting below I/O errors in /var/adm/messages for couple of drives. Any idea what could be the issue and resolution for this.

 

Jun 14 02:33:21 stpsn066 avrd[7047]: [ID 433828 daemon.error] ioctl error on HP.ULTRIUM5-SCSI.005 (device 5, /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0,         0/sg@w500308c002204110,0:raw) thru sg driver, I/O error
Jun 14 02:33:36 stpsn066 avrd[7047]: [ID 253097 daemon.error] ioctl error on HP.ULTRIUM5-SCSI.001 (device 1, /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0,         0/sg@w500308c002204104,0:raw) thru sg driver, I/O error
Jun 14 02:33:36 stpsn066 avrd[7047]: [ID 608415 daemon.error] ioctl error on HP.ULTRIUM5-SCSI.003 (device 3, /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0,         0/sg@w500308c002204128,0:raw) thru sg driver, I/O error
Jun 14 02:33:36 stpsn066 avrd[7047]: [ID 433828 daemon.error] ioctl error on HP.ULTRIUM5-SCSI.005 (device 5, /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0,         0/sg@w500308c002204110,0:raw) thru sg driver, I/O error
Jun 14 02:33:51 stpsn066 avrd[7047]: [ID 253097 daemon.error] ioctl error on HP.ULTRIUM5-SCSI.001 (device 1, /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0,         0/sg@w500308c002204104,0:raw) thru sg driver, I/O error
Jun 14 02:33:51 stpsn066 avrd[7047]: [ID 608415 daemon.error] ioctl error on HP.ULTRIUM5-SCSI.003 (device 3, /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0,         0/sg@w500308c002204128,0:raw) thru sg driver, I/O error
Jun 14 02:33:51 stpsn066 avrd[7047]: [ID 433828 daemon.error] ioctl error on HP.ULTRIUM5-SCSI.005 (device 5, /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0,         0/sg@w500308c002204110,0:raw) thru sg driver, I/O error
Jun 14 02:34:06 stpsn066 avrd[7047]: [ID 253097 daemon.error] ioctl error on HP.ULTRIUM5-SCSI.001 (device 1, /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0,         0/sg@w500308c002204104,0:raw) thru sg driver, I/O error
Jun 14 02:34:06 stpsn066 avrd[7047]: [ID 608415 daemon.error] ioctl error on HP.ULTRIUM5-SCSI.003 (device 3, /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0,         0/sg@w500308c002204128,0:raw) thru sg driver, I/O error
Jun 14 02:34:06 stpsn066 avrd[7047]: [ID 433828 daemon.error] ioctl error on HP.ULTRIUM5-SCSI.005 (device 5, /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0,         0/sg@w500308c002204110,0:raw) thru sg driver, I/O error
Jun 14 02:34:21 stpsn066 avrd[7047]: [ID 253097 daemon.error] ioctl error on HP.ULTRIUM5-SCSI.001 (device 1, /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0,         0/sg@w500308c002204104,0:raw) thru sg driver, I/O error
Jun 14 02:34:21 stpsn066 avrd[7047]: [ID 608415 daemon.error] ioctl error on HP.ULTRIUM5-SCSI.003 (device 3, /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0,         0/sg@w500308c002204128,0:raw) thru sg driver, I/O error
Jun 14 02:34:21 stpsn066 avrd[7047]: [ID 433828 daemon.error] ioctl error on HP.ULTRIUM5-SCSI.005 (device 5, /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0,         0/sg@w500308c002204110,0:raw) thru sg driver, I/O error
Jun 14 02:34:36 stpsn066 avrd[7047]: [ID 253097 daemon.error] ioctl error on HP.ULTRIUM5-SCSI.001 (device 1, /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0,         0/sg@w500308c002204104,0:raw) thru sg driver, I/O error
Jun 14 02:34:36 stpsn066 avrd[7047]: [ID 608415 daemon.error] ioctl error on HP.ULTRIUM5-SCSI.003 (device 3, /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0,       

1 ACCEPTED SOLUTION

Accepted Solutions

mph999
Level 6
Employee Accredited
These steps are done on the media server having issues Stop NBU Remove files in /dev/sg and /dev/mt Run devfsadm Check /dev/rmt/files have been recreated Rebuild sg driver as in my post above scan should now work Run nbemmcmd -deletealldevices -allrecords (deletes devices from NBU) or just to deleted devices on a particular media server nbemmcmd -deletealldevices -machinename server_name -machinetype media Run device wizard - step 3 means select robot control host and media servers at the same time http://www.symantec.com/docs/TECH125956

View solution in original post

17 REPLIES 17

Nicolai
Moderator
Moderator
Partner    VIP   

These messages do not tell what the cause is. You need to look for SCSI sense keys - possible in the messages file as well.

http://en.wikipedia.org/wiki/Key_Code_Qualifier

mph999
Level 6
Employee Accredited

ioctl is usually a hardware error, though slightly odd its happened on multiple drives at the same time.  If these drives are all connected via the sam HBA it could be an HBA issue (or firmware).  You should also consider the cables and even the drives.

You could rebuild the sg driver, but to be honest, I'm not convinced this will be at fault., usless the drives have disappeared from the output of /usr/openv/volmgr/bin/scan, and even then it doesn't mean its the sg driver.  But if thyey have disappeared and you rebuild the sg driver and they don't reappear, then it's pretty certain it's something else.

What happens if you run the mt command, eg.

mt -f <device file> stat

You could also use /usr/openv/volmgr/bin/scsi_command -d  /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0

To rebuild the sg driver

ksh
modunload -i $(echo $(modinfo |grep "sg (SCSA" |awk '{print $1}'))
mv /kernel/drv/sg.conf /kernel/drv/sg.conf.old
/usr/openv/volmgr/bin/driver/sg.install
 
Martin
 

Inder
Level 4

Doesn't know much about netbackup but getting below sense key realted messages:

 


 [ID 110452 daemon.error] TLD(0) key = 0x5, asc = 0x3b, ascq = 0xa0, UNKNOWN ERROR, KEY: 0x05, ASC: 0x3B, ASCQ: 0xA0
tldcd[17508]: [ID 110452 daemon.error] TLD(0) key = 0x5, asc = 0x3b, ascq = 0xa0, UNKNOWN ERROR, KEY: 0x05, ASC: 0x3B, ASCQ: 0xA0
tldcd[18100]: [ID 110452 daemon.error] TLD(0) key = 0x5, asc = 0x3b, ascq = 0xa0, UNKNOWN ERROR, KEY: 0x05, ASC: 0x3B, ASCQ: 0xA0
tldcd[18161]: [ID 110452 daemon.error] TLD(0) key = 0x5, asc = 0x3b, ascq = 0xa0, UNKNOWN ERROR, KEY: 0x05, ASC: 0x3B, ASCQ: 0xA0
tldcd[27756]: [ID 110452 daemon.error] TLD(0) key = 0x5, asc = 0x3b, ascq = 0xa0, UNKNOWN ERROR, KEY: 0x05, ASC: 0x3B, ASCQ: 0xA0
tldcd[28316]: [ID 110452 daemon.error] TLD(0) key = 0x5, asc = 0x3b, ascq = 0xa0, UNKNOWN ERROR, KEY: 0x05, ASC: 0x3B, ASCQ: 0xA0

Inder
Level 4

Is that make any sense ?

Ankit_Maheshwar
Level 5

I suggest to upgrade OS patch level, Library firmware or Device Drivers..

 

Agree with Martin ioctl is usually a hardware error...

Please read below technote.. this might help you...

 

http://h30499.www3.hp.com/t5/Tape-Libraries-and-Drives/Tape-Library-Sense-Key-Error/td-p/3497542

mph999
Level 6
Employee Accredited
Yes, this makes sense:
 
key = 0x5, asc = 0x3b, ascq = 0xa0
 
Unfortunately it is 'unknown'.
 
These 'codes' follow an industry standard, but some are 'unique' to the vendor.
 
The only match I can find is this :
 
http://www.symantec.com/docs/TECH74225
 
If this is not the solution for you, you will need to contact the hardware vendor to find out what this error means.
 
It is not a NBU fault, we are just displaying the error, the error is generated and sent from the hardware.
 
Martin

Inder
Level 4

I can see the drives in tpconfig -d command but not in Scan command.

 

Below are the output of mt and scsi coammands:


# mt -f /dev/rmt/11cbn stat
/dev/rmt/11cbn: No such file or directory


# scsi_command -d  /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0
Could not open /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0
unable to lstat /devices/pci@0/pci@0/pci@8/pci@0/pci@2/SUNW,emlxs@0,1/fp@0

mph999
Level 6
Employee Accredited

So something has happened to the device files.

OK, Personally I would remove the device files, reboot the server (or run devfsadm) and see if they come back..

Until they reappear at the OS level, NBU will never work, and the issue remains outside NBU.

If they come back to the OS (eg.mt command works) then try scan, if they are in scan reconfig drives in NBU, if they are missing from scan, rebuild the sg driver and see if they reappear.  I updated the sg build method, I missed a bit out that you need if you reconfig devices filoes at os level.

ksh
cd /usr/openv/volmgr/bin/driver
mv sg.links sg.links.safe
mv sg.cong sg.conf.safe
../sg.build all
modunload -i $(echo $(modinfo |grep "sg (SCSA" |awk '{print $1}'))
mv /kernel/drv/sg.conf /kernel/drv/sg.conf.old
/usr/openv/volmgr/bin/driver/sg.install
 
Martin

mph999
Level 6
Employee Accredited
Forgot to add, forget tpconfig, useless in this case (at the moment anyway) - it only shows the nbu config, and if this hasn't changed, the drives will be visible, doesn't mean they are going to work. We have shown that os commands don't work, until these do work, nothing else will. The order you have to get things working is this See devices in OS / os commands (eg mt ) / scsi_command / scan / NBU Martin

huanglao2002
Level 6

I/O error found on some drives?

 

Can you chek the hardware status? and check the drvice LED status. it's may be help for you.

Nicolai
Moderator
Moderator
Partner    VIP   

These errors are from the tape library - not the tape drives

[ID 110452 daemon.error] TLD(0) key = 0x5, asc = 0x3b, ascq = 0xa0, UNKNOWN ERROR, KEY: 0x05, ASC: 0x3B, ASCQ: 0xA0

mph999
Level 6
Employee Accredited

... good point, interestingly that the dirves are also not responding (seems to be path issue) - wonder if multiple different issues are going on here.

Inder
Level 4

There is no fault LED on drives, everything seems fine from library end.

I think mph999 is right, there seems to be issue with device files. As I am new to netbackup, can anybody please provide me step by step procedure to recreate device files, drivers and configure them in Netbackup without breaking anything ?

Inder
Level 4

Operating system on master server is Solaris 10. Also do I need to do the same settings on media servers also.

mph999
Level 6
Employee Accredited
These steps are done on the media server having issues Stop NBU Remove files in /dev/sg and /dev/mt Run devfsadm Check /dev/rmt/files have been recreated Rebuild sg driver as in my post above scan should now work Run nbemmcmd -deletealldevices -allrecords (deletes devices from NBU) or just to deleted devices on a particular media server nbemmcmd -deletealldevices -machinename server_name -machinetype media Run device wizard - step 3 means select robot control host and media servers at the same time http://www.symantec.com/docs/TECH125956

Inder
Level 4

Thank you very much.

mph999
Level 6
Employee Accredited

Hmm, not sure where the ACS errors fir in then, as I presume you have it all working now.

Oh well, the important thing is that it works, sometimes, understanding every small detail is not important.

Thanks, for the solution,

Martin