12-20-2010 05:43 AM
Hello Friends,
I have a dozen of AIX media servers here, and one of them showing 84 errors during my DB2 Prod Backups. We're using Brand New LTO4's and i don't see 84 errors for any other Server. I always freeze the damaged tape but i get the same failure for another pone picked up on next backup. Never seen this before... All my archive logs and OS backups run just fine for that same media/client.
12/18/2010 8:05:32 PM - positioning XX5226 to file 1
12/18/2010 8:05:37 PM - positioned XX5226; position time: 00:00:05
12/18/2010 8:05:37 PM - begin writing
12/18/2010 9:43:15 PM - Error bptm(pid=1287086) cannot write image to media id XX5226, drive index 4, The media surface is damaged.
12/18/2010 9:46:06 PM - end writing; write time: 01:40:29
media write error(84)
Have you guys experienced this kind of failure?
Thanks,
Solved! Go to Solution.
12-20-2010 07:56 AM
If it's the same media server you properly have a bad or marginal tape drive. Inspect /usr/openv/netbackup/db/media/errors
DATE MEDIA ID INDEX ERROR DRIVE_NAME
05/04/10 04:41:11 A00926 2 POSITION_ERROR 00110-2F
05/04/10 04:46:12 A00823 2 POSITION_ERROR 00110-2F
05/04/10 04:51:18 A00522 2 POSITION_ERROR 00110-2F
05/04/10 04:57:55 A00632 2 POSITION_ERROR 00110-2F
In this example drive index 2 named 00110-2F made 4 error on different tape. Down the drive that show up most times and see if it resolves the status 84. If it do, you need to replace the tape drive in question.
LTO tape drive have a bad habit by returning position error (SCSI sense 3, asc 14, ascq 00) when they start to go bad.
12-20-2010 05:45 AM
Sorry about these typos... lol
12-20-2010 07:56 AM
If it's the same media server you properly have a bad or marginal tape drive. Inspect /usr/openv/netbackup/db/media/errors
DATE MEDIA ID INDEX ERROR DRIVE_NAME
05/04/10 04:41:11 A00926 2 POSITION_ERROR 00110-2F
05/04/10 04:46:12 A00823 2 POSITION_ERROR 00110-2F
05/04/10 04:51:18 A00522 2 POSITION_ERROR 00110-2F
05/04/10 04:57:55 A00632 2 POSITION_ERROR 00110-2F
In this example drive index 2 named 00110-2F made 4 error on different tape. Down the drive that show up most times and see if it resolves the status 84. If it do, you need to replace the tape drive in question.
LTO tape drive have a bad habit by returning position error (SCSI sense 3, asc 14, ascq 00) when they start to go bad.
12-20-2010 09:39 AM
Looks like you're right dude... Here's what i've found...
10/03/10 04:02:12 XX4076 4 WRITE_ERROR Drive002
10/03/10 15:05:36 XX4093 4 WRITE_ERROR Drive002
10/09/10 13:15:42 XX1866 4 WRITE_ERROR Drive002
12/05/10 03:23:15 XX4228 4 WRITE_ERROR Drive002
12/18/10 21:43:16 XX5226 4 WRITE_ERROR Drive002
All in the same drive... i'll put it down and see how it goes. Thanks for now :)
12-21-2010 12:41 AM
If you wan't to take it a step forward, inspect the syslog to see if it's "real" write errors (sense 3 asc 0C ascq 00). Netbackup will more or less report all I/O issues as write or read errors and can be fooled by bad scsi cables, firmware issues etc etc.
If the syslog (it may be somewhere else on AIX) does not report sense errors you may have a other issues that a bad tape drive.
You can see the different LTO sense error codes here:
http://www.computers-it.com/general/general_scsi_sense_key.php
Best Regrds
Nicolai