Forum Discussion

dtrb's avatar
dtrb
Level 2
15 years ago

retrying failed jobs

this might be a basic question but:
I have a job that failed after several retries with a:
NBU status:96, EMM status: No media is available
because I ran out of tapes in the pool.
I added more tapes to the pool and right-clicked on the job and chose "Restart job".
My question is this:
What happens to all the data from the failed job that actually DID get written to tape?
I assume the new job starts all over from the beginning and re-backs up everything (its a "full")?
Am I going to end up with duplicate data on multiple tapes? How can I check?

The part that confuses me is this job uses a wildcard selection list like: /path/to/my/files/*
and NBU fires off a job for each sub-directory. Lets say there are 500. If 400 of them finish successfully, then the last 100 fail because I ran out of tape, if I restart the job am I going to re-backup the previous 400 that finished successfully?

Thanks.
  • then you should not have to go back - you should just be able to restart the job from the activity monitor from the last checkpoint, however we do not use them in our environment so I am not the most familiar with checkpoints.

    One way to check for sure to see if you need to rerun anything is mock up a restore, and spot check what, if anything, is available from the backup that failed.  If you see that some of the streams that finished are available for restore, then you should only need to rerun the 100 or so streams that failed.

7 Replies

  • then you should not have to go back - you should just be able to restart the job from the activity monitor from the last checkpoint, however we do not use them in our environment so I am not the most familiar with checkpoints.

    One way to check for sure to see if you need to rerun anything is mock up a restore, and spot check what, if anything, is available from the backup that failed.  If you see that some of the streams that finished are available for restore, then you should only need to rerun the 100 or so streams that failed.
  • What happens to all the data from the failed job that actually DID get written to tape?

    Data written in to the tape cant be erased. it will still alive. Media Erasable not possible., if you restart the job it will continue from the end of failed job data.

    I assume the new job starts all over from the beginning and re-backs up everything (its a "full")?

    Yes, if u restart job it ll re backup all data's, you can enable Check Points so that it ll allow you to retrive from checkpoints. but it is not supported for all type of backups

    Am I going to end up with duplicate data on multiple tapes? How can I check?

    Yes. But No way to erase the old datas. you can verify the images in the Media Images/Media Contents report.

  • The fact that you were able to restart the job tells me you had checkpoint enabled and that this was indeed a filesystem type backup, right?
    IMHO, you should be no worse off than a normal EOM situation and rest of backup carried on on new tape. If ANY data was written twice, it would be the bit of data after the last checkpoint on tape 1. If you want to be sure, check bptm log on media server. Look for last checkpoint before EOM and status 96 and then details of what happened when job restarted. To be 100% sure, do as rjrumfelt suggested - do a test restore to different folder and see how much data is read from tape 1 and how much from tape 2.
    You would not have been this lucky if this was an application or database backup....
  • Although it was hard to really tell what was happening (due to the fact that it creates sooo many jobs)  I can only assume it "did the right thing" based on the fact that the retried job finished up in just a few hours, where the original job ran for about 9 hours. Plus, when I did a bpverify to get a list of the tapes, there were only 2 total, so I think all is well. Thanks to all for the input.

  • "...
    this job uses a wildcard selection list like: /path/to/my/files/* and NBU fires off a job for each sub-directory. Lets say there are 500....
    ..."

    I know I wouldn't want 500 jobs kicking off all at once!

    DOCUMENTATION: Best practices when configuring a large number of files or wildcards in policy file lists

    Maybe grouping them? This is an example of one of our policies:

    NEW_STREAM
    /path/to/my/files/[a-dA-D]*
    NEW_STREAM
    /path/to/my/files/[e-lE-L]*
    NEW_STREAM
    /path/to/my/files/[m-rM-R]*
    NEW_STREAM
    /path/to/my/files/[s-zS-Z]*

    This results in just four streams.