cancel
Showing results for 
Search instead for 
Did you mean: 

AIX Media Server - Repeatedly 84 errors

Adriano_BR
Level 3

Hello Friends,

I have a dozen of AIX media servers here, and one of them showing 84 errors during my DB2 Prod Backups. We're using Brand New LTO4's and i don't see 84 errors for any other Server. I always freeze the damaged tape but i get the same failure for another pone picked up on next backup. Never seen this before... All my archive logs and OS backups run just fine for that same media/client.

12/18/2010 8:05:32 PM - positioning XX5226 to file 1
12/18/2010 8:05:37 PM - positioned XX5226; position time: 00:00:05
12/18/2010 8:05:37 PM - begin writing
12/18/2010 9:43:15 PM - Error bptm(pid=1287086) cannot write image to media id XX5226, drive index 4, The media surface is damaged.
12/18/2010 9:46:06 PM - end writing; write time: 01:40:29
media write error(84)

Have you guys experienced this kind of failure?

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions

Nicolai
Moderator
Moderator
Partner    VIP   

If it's the same media server you properly have a bad or marginal tape drive. Inspect /usr/openv/netbackup/db/media/errors

      DATE                 MEDIA ID  INDEX ERROR           DRIVE_NAME


05/04/10 04:41:11 A00926   2    POSITION_ERROR  00110-2F
05/04/10 04:46:12 A00823   2   POSITION_ERROR   00110-2F
05/04/10 04:51:18 A00522   2   POSITION_ERROR   00110-2F
05/04/10 04:57:55 A00632   2   POSITION_ERROR   00110-2F

In this example drive index 2 named 00110-2F made 4 error on different tape. Down the drive that show up most times and see if it resolves the status 84. If it do, you need to replace the tape drive in question.

LTO tape drive have a bad habit by returning position error  (SCSI sense 3, asc 14, ascq 00) when they start to go bad.

View solution in original post

4 REPLIES 4

Adriano_BR
Level 3

Sorry about these typos... lol

Nicolai
Moderator
Moderator
Partner    VIP   

If it's the same media server you properly have a bad or marginal tape drive. Inspect /usr/openv/netbackup/db/media/errors

      DATE                 MEDIA ID  INDEX ERROR           DRIVE_NAME


05/04/10 04:41:11 A00926   2    POSITION_ERROR  00110-2F
05/04/10 04:46:12 A00823   2   POSITION_ERROR   00110-2F
05/04/10 04:51:18 A00522   2   POSITION_ERROR   00110-2F
05/04/10 04:57:55 A00632   2   POSITION_ERROR   00110-2F

In this example drive index 2 named 00110-2F made 4 error on different tape. Down the drive that show up most times and see if it resolves the status 84. If it do, you need to replace the tape drive in question.

LTO tape drive have a bad habit by returning position error  (SCSI sense 3, asc 14, ascq 00) when they start to go bad.

Adriano_BR
Level 3

Looks like you're right dude... Here's what i've found...

10/03/10 04:02:12 XX4076 4 WRITE_ERROR Drive002
10/03/10 15:05:36 XX4093 4 WRITE_ERROR Drive002
10/09/10 13:15:42 XX1866 4 WRITE_ERROR Drive002
12/05/10 03:23:15 XX4228 4 WRITE_ERROR Drive002
12/18/10 21:43:16 XX5226 4 WRITE_ERROR Drive002

All in the same drive... i'll put it down and see how it goes. Thanks for now :)

Nicolai
Moderator
Moderator
Partner    VIP   

If you wan't to take it a step forward, inspect the syslog to see if it's "real" write errors (sense 3 asc 0C ascq 00).  Netbackup will more or less report all I/O issues as write or read errors and can be fooled by bad scsi cables, firmware issues etc etc.

If the syslog (it may be somewhere else on AIX) does not report sense errors you may have a other issues that a bad tape drive.

You can see the different LTO sense error codes here:

http://www.computers-it.com/general/general_scsi_sense_key.php

Best Regrds

Nicolai