cancel
Showing results for 
Search instead for 
Did you mean: 

Duplication using Vault causing "...media write error (84)"

Iwan_Tamimi
Level 6

Hi All,

 

Our system:

Master: RHel 6.1, NetBackup 7.5.0.4

Media: Windows 2008 , NetBackup 7.5.0.4

We do duplication using Vault regularly twice in a day once in a day (16:30 and 04:00), this one has been done OK for more than a year running fine until recently.

The vault for this media server failed with the "Duplicate of backupid mstr4-bck_1398754826 failed, media write error (84)." (see the rest on the attachement). After that the tape drives on this media server cannot be used anymore. The workaround was I rebooted the media server then everything Ok. This problem happened every 4 or 5 days (happenned 5times.).

This media server also backup vmware to a DataDomain disk, when the problem happened this kind of backup can still running fine, only the backup using tapes failed. The only different from before , now we have more clients.

Anyone know the caused? Is that any parameter I need to raise?

Thank you.

Iwan

1 ACCEPTED SOLUTION

Accepted Solutions

mph999
Level 6
Employee Accredited

To be fair to Marianne, I would have said exactly the same.

/usr/openv/netbackup/db/config/DPS_PROXYDEFAULTRECVTMO - is related to disk

/usr/openv/netbackup/db/config/DPS_PROXYNOEXPIRE - is related to disk

So it is not particularly clear how these two touch files can fix an issue with tape ...

Perhaps ...

(1) Something very odd is happening and in this case these touchfiles (or one of them) is having an effect someho - not very likely, especially given the error is related to writing the header block, so at that time it is not even reading from the disk. 

or

(2) The issue is intermittant and has cleared for the moment, but as you applied the touchfiles it makes it look like this resolved the issue.

 

View solution in original post

7 REPLIES 7

Yogesh_Jadhav1
Level 5

Iwan,

The logs says you are trying to use WORM tapes and NetBackup is failing to write on it. Can you check if the same tapes which are already used and are coming in your inventory and being used which is resulting in backup failures. Please verify.

Marianne
Level 6
Partner    VIP    Accredited Certified

It is not Vault duplication that is causing status 84 error.
It is a hardware I/O error during write operation that is causing status 84.

Hardware can be anything on the data path - from the hba in the media server (including firmware and driver), the fibre or scsi cable, all the way to the tape drive and media.

Here one particular media ID is causing both failures - 400379.

4/29/2014 16:33:42 - Info bptm (pid=6868) media id 400379 mounted on drive index 2, drivepath {6,0,0,0}, drivename ESL02_Drive8, copy 2
04/29/2014 16:33:53 - Info bptm (pid=6868) WriteFile() failed err = 19
04/29/2014 16:33:53 - Error bptm (pid=6868) write error on media id 400379, drive index 2, writing header block, 19

 

The tape driver is unable to write to the header block.

Media seems to be unusable. Take this tape out of the robot and have it checked out.

 

Iwan_Tamimi
Level 6

Hi All,

Thank you for the responses.

Marianne and Yogesh.

I don't think it is really the tape failures of 400379, if it is so why it didn't just take another tapes. During the Vault for duplication there are 3 other media servers that using the same tape library, so far they are running fine. This ebs5-bck server that alway hit with this problem. For the previous problem I also tried to put the "failed" tape to other pool so it won't be used as the scratch.

Most of the time the problem gone after I rebooted the system. but I still don't know what the caused.

I am going to try with this solution:

https://www-secure.symantec.com/connect/forums/media-servers-going-offline

but still don't have the time to restart the netbackup.

 

Regards,

 

Iwan

Iwan_Tamimi
Level 6

Sorry I forgot to tell, the reason I want to try the https://www-secure.symantec.com/connect/forums/media-servers-going-offline solution because, after the system rebooted I see Media Server only "active for disk" not tape.

So sometimes even after I rebooted the media server , the problem still exist. Anyone know the solution?

Regards,

 

Iwan Tamimi

Marianne
Level 6
Partner    VIP    Accredited Certified

The error is clear in the logs that you posted. Media header cannot be written to this piece of media. Nothing else.

NBU will keep on trying to use this piece of media until 3 I/O errors has been reported in 12 hours, after which media will be frozen. 
Other media servers will not try to use this tape because it is now Assigned to this media server.

For additional Media Manager (hardware) troubleshooting, add VERBOSE entry to vm.conf on the media server and restart NBU.

Device errors will be logged in /var/log/messages on the media server.
So, if errors are caused by anything other than bad media (hba errors, etc), this will be reported in messages file.

Please post entries in /usr/openv/netbackup/db/error file on this media server for the last month or so.

 

Please see :

In-depth Troubleshooting Guide for Exit Status Code 84
http://www.symantec.com/docs/TECH43243

Iwan_Tamimi
Level 6

Hi All,

 

I am sorry for the very late reply. This problem currently solved after I implented the https://www-secure.symantec.com/connect/forums/media-servers-going-offline solution, without even change/replace  a single tape drives. Seemed the "media write error" was a misleading (?). After almost a month the problem never happened anymore so far.

Thank you all for all of the concern.

 

Regards,

Iwan

mph999
Level 6
Employee Accredited

To be fair to Marianne, I would have said exactly the same.

/usr/openv/netbackup/db/config/DPS_PROXYDEFAULTRECVTMO - is related to disk

/usr/openv/netbackup/db/config/DPS_PROXYNOEXPIRE - is related to disk

So it is not particularly clear how these two touch files can fix an issue with tape ...

Perhaps ...

(1) Something very odd is happening and in this case these touchfiles (or one of them) is having an effect someho - not very likely, especially given the error is related to writing the header block, so at that time it is not even reading from the disk. 

or

(2) The issue is intermittant and has cleared for the moment, but as you applied the touchfiles it makes it look like this resolved the issue.