cancel
Showing results for 
Search instead for 
Did you mean: 

Media write error. EOD was not written

serg2002
Level 3

Hi,

I am sorry if I opened this discussion not in the right place. I am archiving a big project. Last night I've received 3 consequent warning emails from the tape library:

SR Notes: [04 - Media Issue] type=HP LTO6 2FC fwrev=J5KZ sn=HU1252U3EE fsc=0x750B media=K03344L6 log=FH_F0027A200D_HP_LTO6_FC_HU1252U3EE_2022-04-28_11.05.49_TA03_BCK03344L6.ltd.gz, Cause: Undetermined

SR Notes: [03 - Hard Error] type=HP LTO6 2FC fwrev=J5KZ sn=HU1252U3EE fsc=0x750B media=K03344L6 log=FH_F0027A200D_HP_LTO6_FC_HU1252U3EE_2022-04-28_11.05.49_TA03_BCK03344L6.ltd.gz, Cause: Undetermined

SR Notes: Drive sled management has determined that the data tape is incomplete (DDC INVALIDEOD STATUS), type=HPLTO6 2FC fwrev=J5KZ sn=HU1252U3EE media=K03344L6

The problem tape went to the state 'FROZEN'

The archive job status says:
1: (84) media write error
2: (0) the requested operation was successfully completed

As far as I understood EOD was not written to the tape.

I have a few concerns about archived data.

What NetBackup does when it meets such or similar problem when it cannot write to tape? Since I see the job was completed successfully, NB managed to write to another tape.

Does it mean that the problem is only with writing to this tape? It's half full and it's not a big deal if I cannot write to it anymore. I can suspend it and use for future restoring data only.

Do I need to be worried if the tape is still good for restoring data? It has 3 written images from different dates for the last month. Do I need to re-archive them?

Thank you.

4 REPLIES 4

mph999
Level 6
Employee Accredited

I've never seen this message before ...

"Drive sled management has determined that the data tape is incomplete"

... perhaps the library 
documentation/ vendor could tell exactly what it means.

Your explanation of missing EOD (by this I think you mean <end of tape> sounds reasonable).

Ultimately, I would agree with you, probably only the one tape (else you would see the message for other tapes), if NetBackup showed the job as complete (status 0), then NetBackup might have thought everything had got to the tape.  It may indicate an issue with the drive though.

Certainly, I would retire that tape.

NetBackup writes data something like this 

TH FM BH DATA DATA DATA DATA DATA FM EH <eot>

.... depending on data size, there may be additional  BH and FM but I think for a simple example the above is close enough ....

TH - tape header
BH - backup header
FM - file mark
eot - end of tape

The tape header, backup header and empty header are 1k blocks, the data is whatever block size NetBackup is set to use, with the last block probably being a bit smaller (unless the data size is exactly divisible by the block size, which realistically just isn't going to happen.

eot is a bit special, and I'm not 100% sure what it is, I think it's two 'file marks', and indicates the 'logical end of data' , that is, the last backup on the tape.  When a new backup is written the 'empty header' and eot are over written and then re-written at the end of the new backup.

Usually when something goes wrong with a tape, you'll get an scsi sense code, look here and you'll see the various different ones (there are loads which apply to different scsi devices, not just tape drives).

https://www.t10.org/lists/asc-num.htm

eg. - 0Ch/ 00h is write error

A scsi sense error can be thought of as error message sent by a scsi device when it has been asked to do something.

This would usually appear in the bptm log and also system messages / event log.

I would also expect the drive to log a tape alert, NetBackup reads these 'from the drives firmware' when the tape is ejected, and again, should be logged to bptm as well as /usr/openv/netbackup/db/media/errors file (on the media server).

It would be interesting to read the tape back using scsi_command -map -f <path to tape device containing tape>

Easy enough to do, load tape with tpreq -m <media id>, wait a minute
Run vmoprcmd and look for the Medicaid alongside a tape drive, note down the path
Run /usr/openv/volmgr/bin/scsi_command -map -f <path>

Be sure to redirect output to a file, as if the tape is half full, it will be pretty long (will also take a few hours as it reads all the tape)

Note. scsi_command for WIndows uses the {x,y,z} path, not TAPE0 etc, but I think from memoery this is the path format given by vmoprcmd.

This will read the tape and tell us in some detail, what is on it, we also see the file marks, various headers and <eot>, so in theory, if something is missing, we should be able to see what, the bptm log for the last backup on the tape would also be useful.

Depending what is missing however, depends how bad it is, as long as it's not missing <data> and the <fm> at the end of the data, the backup is probably restorable.  If for example, it's missing the <eot> marker, you just won't be able to append another backup to the tape, but what's on it should be fine.

But ...

It's possible that as NetBackup reports a status 84, it may have binned that backup anyway - if a job fails NetBackup (usually) rewinds the tape to the beginning of the failed images, and writes a new <empty header>.  As the tape was frozen though, it might not have done that, I don't know.

I have seen a similar issue before, only once - where a faulty tape drive was unable to write the <eot> mark, everything else was fine.  No error was thrown, so although data was restorable, the lack of <eot> meant the tape couldn't be appended to (until they expired).  A new tape drive resolved the issue.

 

DPeaco
Moderator
Moderator
   VIP   

@mph999 

Someone needs to give you a pay raise! Very good info, great detail and a lot of info to learn from out of this!!

Thanks,
Dennis

@mph999 

Thank you very much

mph999
Level 6
Employee Accredited

Thank you for your kind words Dennis, I've done it a few times in the past ...