Solved: Media read error

Abhisheknetback · ‎08-28-2014

Hi Friends,

One of the duplication failling with below is job details .

8/27/2014 12:20:41 PM - begin reading
8/27/2014 12:44:51 PM - Error bptm(pid=6564) cannot read image from media id 0039L5, drive index 1, err = 23
8/27/2014 12:44:51 PM - Warning bptm(pid=6564) TapeAlert Code: 0x01, Type: Warning, Flag: READ WARNING, from drive Drive005 (index 1), Media Id 0039L5
8/27/2014 12:44:53 PM - Info bptm(pid=6564) EXITING with status 85 <----------
8/27/2014 12:45:00 PM - Error bpduplicate(pid=8640) host med1 backup id mw2_1391112019 read failed, media read error (85).
8/27/2014 12:45:01 PM - Error bpduplicate(pid=8640) host med1 backupid mw2_1391112019 write failed, termination requested by administrator (150).
8/27/2014 12:45:02 PM - Error bpduplicate(pid=8640) Duplicate of backupid mw2_1391112019 failed, termination requested by administrator (150).
8/27/2014 12:45:02 PM - Error bpduplicate(pid=8640) Status = no images were successfully processed.
8/27/2014 12:45:03 PM - end Duplicate; elapsed time: 00:52:36
no images were successfully processed(191)

I ran media verfiy after this below is attached, ples help.

Thanks You !!!

Abhisheknetback · ‎09-08-2014

Dear all,

Recycle the NBU services on Master restore done , guss bprd was hung .

Thanks You

View solution in original post

mph999 · ‎08-28-2014

Error is coming from the drive, not NBU.

8/27/2014 12:44:51 PM - Warning bptm(pid=6564) TapeAlert Code: 0x01, Type: Warning, Flag: READ WARNING, from drive Drive005 (index 1), Media Id 0039L5

Could be hardware, or bad tape - I'm not usually one to recommend cleaning, but could be worth cleaning drive once to see if things improve.

No fault with NBU. and nothing can be done in NBU to resolve this, as it's actually the operating system, not NBU that reads (and writes) to the tapes.

TapeAlerts are sent by the drives, it's impossible for NBU to cause these.

You may wish to contact the drive vendor, though in my experience they usually just (incorrectly) blame NBU.

RiaanBadenhorst · ‎08-28-2014

Down the drive, clean the drive, and run the dup from another drive.

Mark_Solutions · ‎08-28-2014

The verify log is fascinating .. you seem to have images out of sequence in that .. but that may be an oracle / rman thing again rather than a netbackup thing ... and then seems to time out in the end... but as the others say the actual tape alert code 0x01 is a hardware issue on the tape drive - a read warning from the drive.

Deal with that but do look at the verify log to find out what is out os sync with your oracle backups

Abhisheknetback · ‎08-31-2014

I have inserted same media in onother library and started media verify .

I will update once it done .

Thank You

Abhisheknetback · ‎09-07-2014

Hi Friends,

I inserted the same media in another library and ran media verify , below is the job details . please help.

*** See attachment ***

Thank You

sanjaynaidu · ‎09-07-2014

Please check if the purticular media is failling on same drive or else different drives

mph999 · ‎09-07-2014

I'm not sure what else we can tell you ...

8/27/2014 7:42:34 PM - Warning bptm(pid=12228) TapeAlert Code: 0x01, Type: Warning, Flag: READ WARNING, from drive Drive004 (index 0), Media Id 0039L5

The drive cannot read the media, either the drive has an issue, or the media has an issue.

The media might work on another drive, sometimes you get tapes that can only be read on certain drives (usually happens towards the end of their life, and often the only drive that can read them is the drive that wrote them).

Martin

Abhisheknetback · ‎09-07-2014

Dear ,

I changed this media in two diffrent library and ran verify but i got the same error,

something wrong with media only .

Thanks You!!

Abhisheknetback · ‎09-08-2014

Dear all,

Recycle the NBU services on Master restore done , guss bprd was hung .

Thanks You

mph999 · ‎09-08-2014

No, nothing to do with bprd.

The error is from the tape drive as I explained, 100% nothing to do with NBU, it's completely impossible for NBU to cause that tape alert.

What is likely to have happened is that you have a bad tape, but 'intermittant' - so in other words, you run the restore enough times, and one will work, however the tape will most likely fail to work at all at some point if it is continued to be used.

I will hazard a guess that the tape has been used a lot and is fairly worn, or, possibly the drive that wrote the tape is fairly worn and so reading the tape back is a bit 'hit-and-miss'.

Marianne · ‎09-08-2014

My guess is that the actual issue was with entries like these:

9/5/2014 1:04:30 PM - Error bpverify(pid=18356) Filename from image (/T24RUN/T2401/bnk.interface/log/res/20140129061617125109.xml.resp) does not match filename in database (/T24RUN/T2401/bnk.interface/log/res/20140129061617125209.xml.resp).  
9/5/2014 1:04:31 PM - Error bpverify(pid=18356) File number does not match for file /T24RUN/T2401/bnk.interface/log/res/20140129061617125109.xml.resp, in image is 6836862, in database is 6836875.
9/5/2014 1:04:32 PM - Error bpverify(pid=18356) Block number does not match for file /T24RUN/T2401/bnk.interface/log/res/20140129061617125109.xml.resp, in image is 362738590, in database is 362738616.
9/5/2014 1:04:32 PM - Error bpverify(pid=18356) Filename from image (/T24RUN/T2401/bnk.interface/log/res/20140129061887125106.xml.resp) does not match filename in database (/T24RUN/T2401/bnk.interface/log/res/20140129060037125200.xml.resp).  
9/5/2014 1:04:33 PM - Error bpverify(pid=18356) File number does not match for file /T24RUN/T2401/bnk.interface/log/res/20140129061887125106.xml.resp, in image is 6836863, in database is 6836876.
9/5/2014 1:04:33 PM - Error bpverify(pid=18356) Block number does not match for file /T24RUN/T2401/bnk.interface/log/res/20140129061887125106.xml.resp, in image is 362738592, in database is 362738618.

I have never seen this and have no idea where to start looking:

File number does not match for file XXXX , in image is 6836862, in database is 6836875.

I was hoping that Martin or other Symantec Backline engineer would see this and tell us why this is happening and which 'database' other than 'image' is referred to.

Maybe the header info in EMM database?

Errors like these will cause duplications to fail.

If this was fixed by a reboot, my guess is that it is the recycling of bpdbm as well as NBDB/EMM that has fixed the issue, rather than bprd.

You never told us your NBU patch level? Maybe some bug in your patch level?

Handy NetBackup Links

Abhisheknetback · ‎09-08-2014

Dear Friends,

I opend case with symantec , tech support guy ask to recycle NBU services on master that i did . and it worked.

if you want i will provide you symantec case id if you can verify with symantec .

what can i say more .

I agree with marianne post it also could be

If this was fixed by a reboot, my guess is that it is the recycling of bpdbm as well as NBDB/EMM that has fixed the issue, rather than bprd.

Thanks You !!

:)

mph999 · ‎09-08-2014

I'm not sure why it is happening, but it has to be the .f file

Eg a .f file from my test system

um len plen dlen blknum ii raw_sz GB dev_num path data

0 0 1 50 0 0 0 0 16 / 16877 root root 0 1409910411 1409903708 1409903708
0 0 11 50 1 0 0 0 33 /netbackup/ 16877 root root 0 1409910412 1406627376 1406627376
1 0 20 50 2 1 0 0 33 /netbackup/testdata/ 16877 root root 0 1409914189 1409152472 1409152472
2 0 25 56 3 1 0 0 33 /netbackup/testdata/file1 33261 root root 3189401 1409153770 1409153860 1409914189

No where else do we hold the filename, we see a blknum also. You can get simlar messages also complaining about the device number which is also there (dev_num) and I can only assume (dangerous I know ...) that one of the fields with no title is the filenumber.

I do not doubt that the restart resolved things, if that is what Abhisheknetbackup says happened, then that is fact. Hpowever, I can;t see how it makes any sense - the TAPEALERT is, as we know, a hardware thing, not a contents of tape thing, and the .f file doesn't change after a reboot.

I found a couple of simlar errors in past cases, but there were way more compex as they were Granular backups/ vmbackups etc .. and they were not fixed via a restart.

Unless I can find some answers, I think we'll have to put this down to a NetBackup oddity ... It's certainly rare, as I only found a few previous cases and no matches at all for 'Filename from image ...', they were all Block num or Device num.

I like to get to the real cause whenever possible, it helps others and avoids the forum looking incorrect, so I wasn;t trying to be unhelpful previously, but I really can;t find a link to a restart and this type of error.

I'll have a look around ...

Martin

mph999 · ‎09-08-2014

OK, I had a chat with a colleague, they agree the fix doen;t match the symptoms so we are a bit stumped on this.

What I could recommend is to create :

/usr/openv/netbackup/db/images/<client name>/debug_file_history (an empty file)

If the issue should reoccur, then we would have a greater chance of understanding why. Please not that this will create an extra file > size than the .f file for each backup image, so if space is tight don't do it.

The bpdbm log would also be required, this gets massive at high verbose levels, so again, if disk space is tight then it's a no go.

Perhaps wait to see if it happens again, before looking to put the above into place.

Abhisheknetback · ‎09-08-2014

Hi Martin,

Thanks for your help, sure i will do what you suggested , if issue reoccur .

Thank You So Much

mph999 · ‎09-09-2014

You are welcome ...

Sometimes we see issues that just don't make sense, I think this is one of them ...

VOX

Media read error