cancel
Showing results for 
Search instead for 
Did you mean: 

Query: Finding failed backup image fragments

Matthew_Longmui
Level 3

Hi team,

I'm wondering if it is possible to identify both valid and invalid (e.g. from failed backups) image fragments on a tape?

Example scenario:

Take an MPX LTO5 tape (native capacity: 1.5TB) and assume no compression. The tape has has two generic windows file servers streaming to it. However, server 2 fails at some point - say 500GB down the road - for whatever reason. Server one completes without issue, sucking up the remaining 1TB. 

At 1.5TB the tape is classified as full by NetBackup (again - for simplicities sake, there is no compression). 1TB = server one. 500GB = server two failed backup. = 1.5TB total.

Now my understanding is that because server one's backup is succesful, it's images are valid and the entire tape will remain classified as full. The failed image fragments are not "freed" per say, given the how the tape is written too in a linear nature (https://vox.veritas.com/t5/Backup-Recovery-Community-Blog/Understanding-how-NetBackup-writes-to-a-ta...) - and so "technically" these failed spaces are taking up capacity on that tape. 

My questions are:

1. is there a way to identify those failed image fragements somehow ? 

2. Or a way to determine what are the only valid images are on that tape?

e.g. can we assume the "images on tape" report will return only the valid and successful images on that tape, so i know that if i wanted to move them to another tape, i could safely scratch the old one and reclaim the 500GB of lost tape capacity. 

 

 

 

 

 

 

3 ACCEPTED SOLUTIONS

Accepted Solutions

Nicolai
Moderator
Moderator
Partner    VIP   

Yes - correct

If you have two data streams each of 500GB, perfect multiplexed and one completes and one fails at 497 GB, the 497GB capacity is "lost" until backup lifetime is expired.

View solution in original post

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@Matthew_Longmui wrote:

So i'm wondering if my logic here is sound. And thus, "images on tape" does provide the valid list of successful images on the tape that i can relocate (if i wanted - and that is it's own question; effort vs just riding it out until all images expire vs buying new tapes) to another MPX tape with space to fit them, and reclaim all that lost 900GB+. 


@Matthew_Longmui

I agree with your logic. 
You need to weigh up 'effort vs just riding it out'.

I guess it boils down to how often this happens - 1 out of 20 tapes? or more?

If more, the assumption is that you are seeing quite a high backup failure rate.
It might be worth your while to investigate the failures and address the cause rather than spending too much effort on duplicating tapes. 

Another option might be to consider disk as backup storage with duplication afterwards to tape. 
Backup failure to disk will always be deleted after image cleanup.
Chances of failures during duplication are much less than backups.

View solution in original post

mph999
Level 6
Employee Accredited

Sorry, mis-read ... yes, my explanation s only true for non-mpx backups ....

I willl edit my answer and add a note.

So, indeed, for mpx jobs the failed fragments would remain until the tape expires.  There is no way around this.

View solution in original post

8 REPLIES 8

Matthew_Longmui
Level 3

More:

Case in point: I have an LTO5 tape that says it's full mpx, with 23 images on tape, and 23 valid images on tape. But those images only equal 600GB. So, where's the remaining 900GB? I assume tied up in failed backup fragments from other jobs. Multiple this out substantially and...insert sad face.

So i'm wondering if my logic here is sound. And thus, "images on tape" does provide the valid list of successful images on the tape that i can relocate (if i wanted - and that is it's own question; effort vs just riding it out until all images expire vs buying new tapes) to another MPX tape with space to fit them, and reclaim all that lost 900GB+. 

 

 

Matthew, NetBackup does not keep failed backup fragments around. When a job fails, NetBackup will run an image cleanup job to get rid of any data associated with the failed job(s). Could you give some more information about the specific LTO5 tape with 23 images and 23 valid images on tape that equal 600GB.

mph999
Level 6
Employee Accredited

Edited ths answer as I didn't appreciate the question was about MPX backups

This answer below is only true for non-mpx jobs

If a backup fails to a tape, NBU rewinds to the beginning of the backup, and writes an empty header - or in plain english, the tape is repositioned to the beginning of the faied backup and that point is marked as the logic-end-of-data.  Meaning, no failed fragments are left, they are overwritten on the job.

NBU does not understand tape capacity, left alone, it would write to the same tape forever.  What happens, is that the tape drive firmware detects the end of tape (a mark is written to the tape during manufacturer), this is just before the read end of tape (or physical end of tape) and therefore, there is enough space to finish writing the block being written.  At that point, the drive firmware sets a tape full flag in the tape drive driver.  At this point the tape driver will nt accept any more data, and a 'tape full' alert is sent to NBU.  Only then does NBU know to change the tape.

An LTO5 will hold about 1.5TB 'native' - or in other words this is the minimum, if the data is compressable, it will hold more.  IF a tape becomes full at 1.5TB or above, its possible the data isn't compressable.  If it becomes full at less than that, then yes, there is a potential issue.  Given the description of how NBU knows a tape is full (ie. via the drive firmware/ tape driver) you can see how this issue is outside NBU.  I have delat with man y of these such issues over the years, and it comes down to either a driver issue, firmware issue or hardware fault.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@mph999

What about a multiplexed backup where 1 out of 2 MPX backups fail? 
As per the OP's opening post -
.... server 2 fails at some point - say 500GB down the road - for whatever reason. Server one completes without issue, sucking up the remaining 1TB. 

My understanding is that NBU cannot rewind/reclaim the 500GB already written as the other MPX'ed backup is still running.
Is my understanding incorrect?

Nicolai
Moderator
Moderator
Partner    VIP   

Yes - correct

If you have two data streams each of 500GB, perfect multiplexed and one completes and one fails at 497 GB, the 497GB capacity is "lost" until backup lifetime is expired.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@Matthew_Longmui wrote:

So i'm wondering if my logic here is sound. And thus, "images on tape" does provide the valid list of successful images on the tape that i can relocate (if i wanted - and that is it's own question; effort vs just riding it out until all images expire vs buying new tapes) to another MPX tape with space to fit them, and reclaim all that lost 900GB+. 


@Matthew_Longmui

I agree with your logic. 
You need to weigh up 'effort vs just riding it out'.

I guess it boils down to how often this happens - 1 out of 20 tapes? or more?

If more, the assumption is that you are seeing quite a high backup failure rate.
It might be worth your while to investigate the failures and address the cause rather than spending too much effort on duplicating tapes. 

Another option might be to consider disk as backup storage with duplication afterwards to tape. 
Backup failure to disk will always be deleted after image cleanup.
Chances of failures during duplication are much less than backups.

mph999
Level 6
Employee Accredited

Sorry, mis-read ... yes, my explanation s only true for non-mpx backups ....

I willl edit my answer and add a note.

So, indeed, for mpx jobs the failed fragments would remain until the tape expires.  There is no way around this.

Thanks, team! I very much appreciate your comments (especially since they align with what I was thinking, haha). 

I'll mull it over my options, now that they are clearer to me. 

Cheers!