cancel
Showing results for 
Search instead for 
Did you mean: 

seeing media write errors (84) on some of my backup jobs

kproehl
Level 5

i Hi Everyone,

 

I recently started to see some media write errors on some of my backup jobs.  We currently write all of our backups to 10 IBM LTO4 tape drives.  The tape library is a Quantum-I500.  In the past when I see these errors it indicates an issue on the physcially tape library.  I currently do not see any physical issues with any of the tape drives.  

Please let me know if anyone else has experiened this error.

Thanks,

 

Kyle

11/2/2014 8:13:17 AM - Info nbjm(pid=1120) starting backup job (jobid=635387) for client golden105, policy DB-GDB28, schedule Default-Application-Backup  
11/2/2014 8:13:17 AM - Info nbjm(pid=1120) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=635387, request id:{E0BE9EF7-2FD5-483E-8CDE-EED495832159})  
11/2/2014 8:13:17 AM - requesting resource nbmedia102-hcart2-robot-tld-2
11/2/2014 8:13:17 AM - requesting resource veritasarsenal.NBU_CLIENT.MAXJOBS.golden105
11/2/2014 8:13:17 AM - requesting resource veritasarsenal.NBU_POLICY.MAXJOBS.DB-GDB28
11/2/2014 8:13:18 AM - granted resource veritasarsenal.NBU_CLIENT.MAXJOBS.golden105
11/2/2014 8:13:18 AM - granted resource veritasarsenal.NBU_POLICY.MAXJOBS.DB-GDB28
11/2/2014 8:13:18 AM - granted resource 090376
11/2/2014 8:13:18 AM - granted resource IBM.ULTRIUM-TD4.000
11/2/2014 8:13:18 AM - granted resource nbmedia102-hcart2-robot-tld-2
11/2/2014 8:13:18 AM - estimated 0 Kbytes needed
11/2/2014 8:13:18 AM - Info nbjm(pid=1120) started backup (backupid=golden105_1414933998) job for client golden105, policy DB-GDB28, schedule Default-Application-Backup on storage unit nbmedia102-hcart2-robot-tld-2
11/2/2014 8:58:42 AM - end writing
media write error(84)

1 ACCEPTED SOLUTION

Accepted Solutions

mph999
Level 6
Employee Accredited

The tape alert means ...

Flag 3: Hard error. Severity: Warning
Flag 6: Write failure. Severity: Critical
Flag 20: Cleaning required. Severity: Critical
Flag 39: Diagnostics required. Severity: Warning

Maybe a bad drive, or one that just needs cleaning ....

 

 

View solution in original post

10 REPLIES 10

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Status 84 is an I/O error.

It can be caused by anything on the I/O path - server hba (including firmware and/or driver), tape driver, cable. gbic, switch port, tape drive, media.

You need logging to pinpoint the problem.
Add VERBOSE entry to ...\volmgr\vm.conf, create bptm folder under ...\netbackup\logs, then restart NBU Device Manager service.

After next error, check the bptm log as well as Windows Event Viewer System and Application log.

kproehl
Level 5

So the errors seem to have stopped but I noticed now that some of the drive path randomly go down.  Could that still be related to drive problems?

mph999
Level 6
Employee Accredited
Yes, it could be related to a drive probem. On the media server(s), try looking in /usr/openv/netbackup/db/media/errors - what are the last few lines showing - do they lis the drives going down, is there any error or tape alert shown. Apart from the logs Marianne mentions, also create an empty file call ...\volmgr\DRIVE_DEBUG and in ROBOT_DEBUG - ltid will need restarting to pick up these change (as well as the logs mentioned by Marianne). The two touch files will increse the amount of debug logging in to the system message file

watsons
Level 6

Error 84 may or may not be a physical tape issue. Look at it this way:

It is a "media write error", so meaning whichever component involved in this process can be the cause.

bptm is the "netbackup process" that deals with the "tape drive" to write the aleady-read data into the "physical/virtual tape".

So if the issue is not "physical/virtual tape", it can still be "tape drive" or "netbackup process". Test to make sure "tape drive" is working fine, then only look at "netbackup process". The process requires a "connection" which can be a network, LAN or SAN, so check the "connection". It is still possible a "netbackup bug" of bptm binary, but it is rare, you will need to supply logs for Netbackup support to verify if that's the case.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Status 84 errors that stop could possibly be indication of bad media. That is why you need logs.

<install-path>\veritas\netbackup\db\media\errors file on the media server is a good starting place.

kproehl
Level 5

Thanks for the replies everyone.  I am looking through the logs now and will reply with what I find.

kproehl
Level 5

I took a look at <install-path>\veritas\netbackup\db\media\errors and I do see error on both media servers that connect to these 10 LTO4 tape drives.

Does anyone know what these errors mean?

First Media server

11/02/14 17:11:24 090282 -1 OPEN_ERROR IBM.ULTRIUM-TD4.001
11/02/14 17:14:43 092492 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/02/14 17:29:43 092538 -1 OPEN_ERROR IBM.ULTRIUM-TD4.003
11/02/14 19:16:47 090282 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/02/14 19:16:49 092492 -1 OPEN_ERROR IBM.ULTRIUM-TD4.004
11/02/14 19:16:50 900022 6 OPEN_ERROR IBM.ULTRIUM-TD4.001
11/02/14 19:16:51 092538 -1 OPEN_ERROR IBM.ULTRIUM-TD4.003
11/02/14 20:17:44 092492 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/02/14 20:23:31 900022 6 OPEN_ERROR IBM.ULTRIUM-TD4.004
11/02/14 20:32:55 092538 -1 OPEN_ERROR IBM.ULTRIUM-TD4.003
11/02/14 21:44:27 090675 -1 OPEN_ERROR IBM.ULTRIUM-TD4.001
11/02/14 22:10:10 092664 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/02/14 22:34:15 090675 -1 OPEN_ERROR IBM.ULTRIUM-TD4.004
11/03/14 00:57:47 090675 -1 OPEN_ERROR IBM.ULTRIUM-TD4.003
11/03/14 06:03:27 092538 -1 OPEN_ERROR IBM.ULTRIUM-TD4.001
11/03/14 06:03:49 092373 -1 OPEN_ERROR IBM.ULTRIUM-TD4.003
11/03/14 06:04:07 090579 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 07:03:57 092538 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 07:05:21 092373 -1 OPEN_ERROR IBM.ULTRIUM-TD4.001
11/03/14 07:13:27 900003 -1 OPEN_ERROR IBM.ULTRIUM-TD4.003
11/03/14 08:06:16 092538 -1 OPEN_ERROR IBM.ULTRIUM-TD4.001
11/03/14 08:13:07 092373 -1 OPEN_ERROR IBM.ULTRIUM-TD4.003
11/03/14 08:35:41 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 08:58:09 092538 -1 OPEN_ERROR IBM.ULTRIUM-TD4.003
11/03/14 09:18:46 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.004
11/03/14 09:19:28 092377 7 OPEN_ERROR IBM.ULTRIUM-TD4.004
11/03/14 09:36:46 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 10:39:16 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 11:38:54 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 12:17:16 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.003
11/03/14 12:30:00 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.003
11/03/14 12:32:42 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 13:43:59 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000

Second Media Server

10/31/14 20:42:45 092442 6 OPEN_ERROR IBM.ULTRIUM-TD4.006
10/31/14 20:42:45 092442 6 WRITE_ERROR IBM.ULTRIUM-TD4.006
10/31/14 20:42:48 092442 6 TAPE_ALERT IBM.ULTRIUM-TD4.006 0x24001000 0x02000000
11/01/14 14:23:01 092764 0 WRITE_ERROR IBM.ULTRIUM-TD4.000
11/01/14 18:18:27 092768 0 WRITE_ERROR IBM.ULTRIUM-TD4.000
11/02/14 07:31:08 092491 9 OPEN_ERROR IBM.ULTRIUM-TD4.007
11/02/14 07:35:19 092491 9 OPEN_ERROR IBM.ULTRIUM-TD4.007
11/02/14 07:39:30 092491 9 OPEN_ERROR IBM.ULTRIUM-TD4.007
11/02/14 07:43:40 092491 9 OPEN_ERROR IBM.ULTRIUM-TD4.007
11/02/14 07:43:40 092491 9 WRITE_ERROR IBM.ULTRIUM-TD4.007
11/02/14 08:58:42 090376 0 WRITE_ERROR IBM.ULTRIUM-TD4.000
11/02/14 11:03:24 090141 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/02/14 12:03:38 090141 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/02/14 14:02:54 090141 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/02/14 16:43:05 092681 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/02/14 19:47:12 090376 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/02/14 22:47:27 091189 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 08:05:36 092412 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 09:06:28 092665 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 10:07:55 090390 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 11:09:11 090390 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000

kproehl
Level 5

I took a look in <install-path>\veritas\netbackup\db\media\errors for errors and I do see error on both media servers that use these 10 LTO4 tape drives

Does anyone know what these errore mean?

First Media server


11/03/14 06:03:49 092373 -1 OPEN_ERROR IBM.ULTRIUM-TD4.003
11/03/14 06:04:07 090579 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 07:03:57 092538 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 07:05:21 092373 -1 OPEN_ERROR IBM.ULTRIUM-TD4.001
11/03/14 07:13:27 900003 -1 OPEN_ERROR IBM.ULTRIUM-TD4.003
11/03/14 08:06:16 092538 -1 OPEN_ERROR IBM.ULTRIUM-TD4.001
11/03/14 08:13:07 092373 -1 OPEN_ERROR IBM.ULTRIUM-TD4.003
11/03/14 08:35:41 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 08:58:09 092538 -1 OPEN_ERROR IBM.ULTRIUM-TD4.003
11/03/14 09:18:46 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.004
11/03/14 09:19:28 092377 7 OPEN_ERROR IBM.ULTRIUM-TD4.004
11/03/14 09:36:46 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 10:39:16 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 11:38:54 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 12:17:16 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.003
11/03/14 12:30:00 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.003
11/03/14 12:32:42 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 13:43:59 092750 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000

Second Media Server


11/02/14 16:43:05 092681 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/02/14 19:47:12 090376 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/02/14 22:47:27 091189 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 08:05:36 092412 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 09:06:28 092665 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 10:07:55 090390 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000
11/03/14 11:09:11 090390 -1 OPEN_ERROR IBM.ULTRIUM-TD4.000

jim_dalton
Level 6

Wow thats a fine selction of errors across a selection of drives : 0, 1, 3 and 4. I would hazard a guess theres a mix of both drive issues and tape issues. 092750 is a repeater for example, probably a media issue.

I would force clean all the drives mentioned then get a known brand new tape and experiment in the drives in turn: a write and a read. If thats all good then you know drives are fine when the media is, then I'd look at the media detailed in the logs.

Anything more from tapealert and or your robotic manager? You can lookup the tapealert flags.

Jim

mph999
Level 6
Employee Accredited

The tape alert means ...

Flag 3: Hard error. Severity: Warning
Flag 6: Write failure. Severity: Critical
Flag 20: Cleaning required. Severity: Critical
Flag 39: Diagnostics required. Severity: Warning

Maybe a bad drive, or one that just needs cleaning ....