Hi All,
NBU 6.5.4
Solaris T2000 master/media server
Qualstar TLS-8332 (4) LTO2 & 4 LTO3 -SCSI connected
So recently I have been getting scsi errors reported to the messages file on my Solaris server.
1. Where can I find out what they mean? Here is an example:
Feb 4 18:20:36 polaris scsi: [ID 107833 kern.warning] WARNING: /pci@780/pci@0/pci@8/pci@0/scsi@8,1/st@2,0 (st8):
Feb 4 18:20:36 polaris Error for Command: write file mark Error Level: Fatal
Feb 4 18:20:36 polaris scsi: [ID 107833 kern.notice] Requested Block: 5963 Error Block: 5963
Feb 4 18:20:36 polaris scsi: [ID 107833 kern.notice] Vendor: IBM Serial Number:
Feb 4 18:20:36 polaris scsi: [ID 107833 kern.notice] Sense Key: Media Error
Feb 4 18:20:36 polaris scsi: [ID 107833 kern.notice] ASC: 0x52 (cartridge fault), ASCQ: 0x0, FRU: 0x36
Feb 4 20:21:12 polaris scsi: [ID 107833 kern.warning] WARNING: /pci@780/pci@0/pci@8/pci@0/scsi@8/st@5,0 (st6):
Feb 4 20:21:12 polaris Error for Command: write Error Level: Fatal
Feb 4 20:21:12 polaris scsi: [ID 107833 kern.notice] Requested Block: 21170 Error Block: 21170
Feb 4 20:21:12 polaris scsi: [ID 107833 kern.notice] Vendor: IBM Serial Number:
Feb 4 20:21:12 polaris scsi: [ID 107833 kern.notice] Sense Key: Aborted Command
Feb 4 20:21:12 polaris scsi: [ID 107833 kern.notice] ASC: 0x4b (data phase error), ASCQ: 0x0, FRU: 0x30
Feb 8 18:14:25 polaris scsi: [ID 107833 kern.warning] WARNING: /pci@7c0/pci@0/pci@8/pci@0/scsi@8,1/st@4,0 (st5):
Feb 8 18:14:25 polaris Error for Command: space Error Level: Fatal
Feb 8 18:14:25 polaris scsi: [ID 107833 kern.notice] Requested Block: 1 Error Block: 1
Feb 8 18:14:25 polaris scsi: [ID 107833 kern.notice] Vendor: IBM Serial Number:
Feb 8 18:14:25 polaris scsi: [ID 107833 kern.notice] Sense Key: Media Error
Feb 8 18:14:25 polaris scsi: [ID 107833 kern.notice] ASC: 0x14 (recorded entity not found), ASCQ: 0x0, FRU: 0x36
2. The above errors, two look like media problems, but not sure which tape had the issue, should I assume the the next entry with a tape dismount is the culprit?
3. And the data phase error is drive related?
4. Also how do I map /pci@780/pci@0/pci@8/pci@0/scsi@8/st@5,0 (st6) above to to the rmt/# shown in iostat -En below?
Below shows hard errors:
iostat -En
c1t0d0 Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
Vendor: LSILOGIC Product: Logical Volume Revision: 3000 Serial No:
Size: 73.01GB <73012215808 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 2 Predictive Failure Analysis: 0
c1t2d0 Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
Vendor: LSILOGIC Product: Logical Volume Revision: 3000 Serial No:
Size: 146.56GB <146561286144 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 2 Predictive Failure Analysis: 0
c0t0d0 Soft Errors: 9 Hard Errors: 0 Transport Errors: 1
Vendor: MATSHITA Product: CD-RW CW-8124 Revision: DZ13 Serial No:
Size: 1.53GB <1533480960 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 9 Predictive Failure Analysis: 0
rmt/6 Soft Errors: 0 Hard Errors: 1 Transport Errors: 0
Vendor: IBM Product: ULTRIUM-TD2 Revision: 67U1 Serial No:
rmt/2 Soft Errors: 0 Hard Errors: 1 Transport Errors: 0
Vendor: IBM Product: ULTRIUM-TD2 Revision: 67U1 Serial No:
rmt/3 Soft Errors: 0 Hard Errors: 1 Transport Errors: 0
Vendor: IBM Product: ULTRIUM-TD2 Revision: 67U1 Serial No:
rmt/4 Soft Errors: 0 Hard Errors: 15 Transport Errors: 0
Vendor: IBM Product: ULTRIUM-TD3 Revision: 73P5 Serial No:
rmt/5 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: IBM Product: ULTRIUM-TD3 Revision: 69U2 Serial No:
rmt/7 Soft Errors: 0 Hard Errors: 1 Transport Errors: 0
Vendor: IBM Product: ULTRIUM-TD2 Revision: 67U1 Serial No:
rmt/8 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: IBM Product: ULTRIUM-TD3 Revision: 69U2 Serial No:
rmt/9 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: IBM Product: ULTRIUM-TD3 Revision: 69U2 Serial No:
5. If a piece of media has an error, the tape is not automatically marked as frozen. NBU tries it again, so it is possible the same errors repeat as NBU tries to use the tape again. Is this the right way to do this? Or should I set NBU to auto freeze the tape when a write error is detected? Is this a fuction of the volmgr and where is it configured?
Any help is greatly appreciated!