Solved: Media Damage - can NBU select a new tape?

brian_s · ‎02-04-2005

Hi. We are brand new clients.
We are: Netbackup 5.1 on AIX 5.2
We are running ORACLE online backup job. We are using Calendar based scheduling, and the "retry after run day" is NOT checked.
Last night, the job failed because the media surface was damaged.
Is there a way to get NB to just choose a new tape from the library using the robotic arm when this happens? Do I need to check the retry after run day? The job just failed, which means I need to initiate the backup manually. The job is scheduled to run once a day, every day.
Thank you for any assitance!

Stumpr2 · ‎02-16-2005

I had a problem with a batch of bad tapes. I ended up sending the tapes back to the vendor for replacements. The problem is that if a tape is actually bad then the retries are not always sufficient to get the job don in a tight window since the retrry would use the exact same tape it just reported as having an error. The question is if Netbackup detects an error when writing to tape during a backup, is there a way to make it continue the backup on a different tape?
NetBackup will resubmit a job that fails with a media error according to the Global Settings for failed backup attempts. Unfortunately it may try using the same tape that just had a media error. Create a fle /usr/openv/netbackup/MEDIA_ERROR_THRESHOLD. Enter the number of allowable errors before NetBackup will freeze a tape. If you enter the number zero then it will FREEZE the tape on the first media error and then NetBackup will be forced to use a new tape on the retry.
I did this until I was able to identify all the bad tapes and at a later time troubleshoot and unfreeze and retry that specific tape.

On another note: sometimes the NetBackup retries are not sufficient in an Oracle online backup. In addition to using the NetBackup retries you may need to set a variable within the backup scripts for the number of retries. I have seen this with using SQL-Back track.

Theres been a lot of detailed replies. Do you have further questions?

View solution in original post

Vishal_Mehra · ‎02-06-2005

You can freeze this tape and move ahead. Moreover, you can write your own bp_end_notify script that takes care of this automatically and can also restart the jobs if you'd like.

brian_s · ‎02-10-2005

Hi.
Thank you for your reply.
I am unsure how to implement what you suggested.
We are initiating the backups from NBU scheduler, not the through the command line. How would a bp_end_notify script know about the damaged media and how would we restart the job from the script? (I assume through the NBU commands).
Thank you for your assistance!

TempoVisitor · ‎02-15-2005

Hya

NBU job scheduling method is :
- frequency does not allow backup to take place if there is not enough elapsed time since the previous successful backup
- calendar allows only 1 backup a day if a successful backup already took place that day.

Both method also use a bpconfig or Global Attributes parameter : 2 tries every 12 hours is the default behavior.

This means : if your backup failed, it does not have the right to try again before 12 hours.

Then depending on your start window, if it's still open, a backpup can take place - if not, it waits until the next start window.
If you work under calendar schedule AND the next start window is taking place the day after ... you must have selected the retry allowed after runday parameter.

At this point, NetBackup should have already put the defect tape in a FROZEN state, so it should automaticaly take a new tape for the next try, assuming ther are some available or active ....

Thus you never have to directly act on the library !

Kerkael

jbeima · ‎02-15-2005

Take a look at the "Schedule backup attempts" in the Global Attributes (Host Properties/Master/Properties). You may want something like "3 tries in 8 hours". This would try 3 different tapes in your nightly 8 hour window. Be careful of how your report failures: the Activity Monitor and bpdbjobs -report will list only a single line but bperror -backstat will list multiple entries.

Stumpr2 · ‎02-16-2005

I had a problem with a batch of bad tapes. I ended up sending the tapes back to the vendor for replacements. The problem is that if a tape is actually bad then the retries are not always sufficient to get the job don in a tight window since the retrry would use the exact same tape it just reported as having an error. The question is if Netbackup detects an error when writing to tape during a backup, is there a way to make it continue the backup on a different tape?
NetBackup will resubmit a job that fails with a media error according to the Global Settings for failed backup attempts. Unfortunately it may try using the same tape that just had a media error. Create a fle /usr/openv/netbackup/MEDIA_ERROR_THRESHOLD. Enter the number of allowable errors before NetBackup will freeze a tape. If you enter the number zero then it will FREEZE the tape on the first media error and then NetBackup will be forced to use a new tape on the retry.
I did this until I was able to identify all the bad tapes and at a later time troubleshoot and unfreeze and retry that specific tape.

On another note: sometimes the NetBackup retries are not sufficient in an Oracle online backup. In addition to using the NetBackup retries you may need to set a variable within the backup scripts for the number of retries. I have seen this with using SQL-Back track.

Theres been a lot of detailed replies. Do you have further questions?

brian_s · ‎02-16-2005

Thank you all for you replies.
I will change theBackup attempts in the global settings.
I will also add the file MEDIA_ERROR_THRESHOLD.
BOB - I could not find this anywhere in the documentation.
Do you know where I could find something about this?

Thanks!

Stumpr2 · ‎02-16-2005

I know of no documentation for MEDIA_ERROR_THRESHOLD
But if you go search the knowledge base for 4.5 with MEDIA_ERROR_THRESHOLD you should get 25 hits in the newsgroups.
http://seer.support.veritas.com/nav_bar/clustersearch.asp?SrchPageID=techsearch.asp&crumb=&highlight=on&SearchArea=All&rc=25&display=&ddProduct=NETBACKUPDC&ProdSelect=&SearchTerm=

Stumpr2 · ‎02-16-2005

Further info
I have this note from a VERITAS Software NetBackup Engineer
MEDIA_ERROR_THRESHOLD:
Touch this file and add a value in the file. The default is 2 - meaning 2 media errors within TIME_WINDOW will freeze the media. Also see TIME_WINDOW. See technote 234412

and for TIME_WINDOW:
Specifies the amount of time that BPTM will look backwards in the Errors DB for problems with drives\tapes to determine what action to take. Used with MEDIA_ERROR_THRESHOLD and DRIVE_ERROR_THRESHOLD. See technote 234412

and for DRIVE_ERROR_THRESHOLD:
Touch this file and add a value in the file. The default is 2 - meaning 2 drive errors within TIME_WINDOW downs the drive. See technote 234412

Unfortunately I can no longer get to the technote.Message was edited by:
Bob Stump

brian_s · ‎02-16-2005

Thank you BOB!! Very helpful!

VOX

Media Damage - can NBU select a new tape?