07-13-2015 05:52 PM
Hi,
I get periodic media errors which caused the tape drives to go down. How do I determine I should not use the media anymore? For example, I just got
Jul 13 12:36:17 buphost2 scsi: [ID 107833 kern.warning] WARNING: /pci@400/pci@0/pci@d/SUNW,qlc@0/fp@0,0/st@w500308c09f1a109d,0
(st10):
Jul 13 12:36:17 buphost2 Error for Command: space Error Level: Fatal
Jul 13 12:36:17 buphost2 scsi: [ID 107833 kern.notice] Requested Block: 1 Error Block: 1
Jul 13 12:36:17 buphost2 scsi: [ID 107833 kern.notice] Vendor: HP Serial Number:
Jul 13 12:36:17 buphost2 scsi: [ID 107833 kern.notice] Sense Key: Media Error
Jul 13 12:36:17 buphost2 scsi: [ID 107833 kern.notice] ASC: 0x14 (recorded entity not found), ASCQ: 0x0, FRU: 0x0
It seems to me it is a very serious error on the media. It's marked frozen by NBU.
1. Is it smart enough to mark that error block bad so the tape can be continued to be used if I unfreeze it?
2. Is it correct to assume even after the media is expired, it should not be used again? Basically once it's expired, I should throw it away?
Thanks,
Solved! Go to Solution.
07-14-2015 10:59 AM
Here is the script
https://www-secure.symantec.com/connect/downloads/tperrsh-script-solaris-only
Also see
Media information - Comment:15 Jul 2013 : Link
https://www-secure.symantec.com/connect/forums/logging-detail-information-about-unrecoverablerecoverable-error
https://www-secure.symantec.com/connect/forums/media-diagnostic-tools
07-14-2015 10:37 AM
Yes, that is serious enough to retire the tape.
There is no easy way, without specialist software, to spot media before they fail.
You could keep an eye on ...netbackup/db/media/errors file, on each media server and count the number of times a tape, or drive appears.
However it is not exact, a few errors are perfectly fine, though not lthe one you have shown above.
Some errors appear in that file that are not really errors, eg drive needs cleaning. Then there is the question of how masterworks are too many, 5, 15, 25 ???
Unfortunately, it comes down to experience.
Google for tperr.sh
This is a script I wrote, available on connect that runs on Solaris and provides a level of analysis on the errors files.
07-14-2015 10:59 AM
Here is the script
https://www-secure.symantec.com/connect/downloads/tperrsh-script-solaris-only
Also see
Media information - Comment:15 Jul 2013 : Link
https://www-secure.symantec.com/connect/forums/logging-detail-information-about-unrecoverablerecoverable-error
https://www-secure.symantec.com/connect/forums/media-diagnostic-tools