cancel
Showing results for 
Search instead for 
Did you mean: 

Recover from partial backups

DBWR
Level 3

We had several backups where the tapes got full.  I know this is a situation you want to avoid, but there was a period of time where there wasn't much we could do about it, and depending on the amount of data there on the day sometimes it'd finish sometimes it wouldn't.

 

When it failed, it has still backed up 95% of the files.  I've gone to restore from one of the old tapes, it was quite a while ago so I had to import it, and the import fails saying "...tar did not find all the files to be restored".  I know it can't find all the files, but what about the ones that are there?  Just because a few files were missed off the end I can't believe it would invalidate the entire backup.

 

Does anyone know a way around this?

 

Thanks

Matt

1 ACCEPTED SOLUTION

Accepted Solutions

mph999
Level 6
Employee Accredited

The NBU is virtualy the same, same options ...  The main difference from memory is it frops the leading / so retores are relative to where you, not absolute (making althernate restore locations possible).

This is the posh way to do it  ...  It was aimed at images that span tapes, but works fine for a single tape.  You actually get the data off the tape with dd, then untar the files later.  

How to use dd to read data off a tape (Unix/ Linux)

On occassion it may be necessary to try and read data off a tape if there is a failure in NetBackup.  Usually this will be only to demonstrate that the tape is unreadable due to an issue outside of NetBackup.

It is easier to explain this via example.

We can see from bpimagelist that the image nbmedia00_1418641103 spans 2 tapes, E01001 and E01002

IMAGE nbmedia00 0 0 8 nbmedia00_1418641103 nbmedia_big 0 *NULL* root Full 0 1 1418641103 130 1419850703 0 0 2545152 1 1 2 0 nbmedia_big_1418641103_FULL.f *NULL* *NULL* 0 1 0 0 0 *NULL* 0 0 0 0 0 0 0 *NULL* 0 0 0 *NULL* 269 0 0 740 0 0 *NULL* *NULL* 0 0 0 0 *NULL* *NULL*
HISTO -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
FRAG 1 1 1566208 0 2 6 1 E01001 nbmedia00 262144 2 1418641103 2 0 *NULL* 1419850703 0 65537 0 0 0 1 0 1418641233 0 *NULL* *NULL*
FRAG 1 2 978944 0 2 6 1 E01002 nbmedia00 262144 2 1418641103 2 0 *NULL* 0 0 0 0 0 0 1 0 0 0 *NULL* *NULL*

It does not actually matter if the image spans multiple tapes, or is contained on a single media, the procedure is exactly the same.

First, use scsi_command -map to obtain an image of what is on the tape, this needs to be done in fragment order, so E01001 is the first tape as it contains fragment 1, E01002 is the second tape as it contains fragement 2.

[root@nbmedia00 testdata]# scsi_command -map -f /dev/nst1
00000000: file 1: record 1: size 1024: NBU MEDIA header (E01001)
00000001: file 1: eof after 1 records: 1024 bytes
00000002: file 2: record 1: size 1024: NBU BACKUP header
          backup_id nbmedia00_1418641103: frag 1: file 1: copy 1
          expiration 1419850703: retention 1: block_size 262144
          flags 0x0: mpx_headers 0: resume_count 0: media E01001
00000003: file 2: record 2: size 262144
00006121: file 2: eof after 6119 records: 1603798016 bytes
eot

We see that the header for the backup image nbmedia00_1418641103 is located at file 2, and the data fragment is immediate after the header as expected.

We need to first read this fragment using dd, into a file on disk, followed by the actual data in fragment 1.

We can build the commands required by referencing the output of scsi_command -map output, consideration must be given to filenames and blocksize used to allow the read via dd to be successful, and correct identification of the files later.  In this example we have used the backupid, follwed by a description of what part of the image the file relates to.

NOTE:
We want to position the tape to file 2, fsf 1 is correct as it positions after file 1, which of course is just before file 2.
mkdir /tmp/recover

mt -f /dev/nst1 rewind
mt -f /dev/nst1 fsf 1
dd if=/dev/nst1 of=/tmp/recover/bu-nbmedia00_1418641103_frag1header bs=1k   count=1
dd if=/dev/nst1 of=/tmp/recover/bu-nbmedia00_1418641103_frag1 bs=256k


A quick look at the files created :

-rw-r--r-- 1 root root 1603796992 Dec 15 03:16 bu-nbmedia00_1418641103_frag1
-rw-r--r-- 1 root root       1024 Dec 15 03:15 bu-nbmedia00_1418641103_frag1header


The frag1header (which is the backup header) is 1K in size which is correct for a header.


We now need to recover any remaining fragments for the image from the next tape, so as before, we repeat the process and run scsi_command -map -d <device> on the next tape, E01002.

00000000: file 1: record 1: size 1024: NBU MEDIA header (E01002)
00000001: file 1: eof after 1 records: 1024 bytes
00000002: file 2: record 1: size 1024: NBU BACKUP header
          backup_id nbmedia00_1418641103: frag 2: file 1: copy 1
          expiration 1419850703: retention 1: block_size 262144
          flags 0x0: mpx_headers 0: resume_count 0: media E01002
00000003: file 2: record 2: size 262144
00003827: file 2: eof after 3825 records: 1002439680 bytes
00003828: file 3: record 1: size 1024: NBU EMPTY header (file 2)
00003829: file 3: eof after 1 records: 1024 bytes
eot


Here we see that we have frag 2 of backup header for frag 2 of image nbmedia00_1418641103 at the position file 2, so we can again put the required commands together.
In this case, because the fragements we want are both the first fragments on the tapes, the commands we need are the same, so we just change the filenames.

mt -f /dev/nst1 rewind
mt -f /dev/nst1 fsf 1
dd if=/dev/nst1 of=/tmp/recover/bu-nbmedia00_1418641103_frag2header bs=1k   count=1
dd if=/dev/nst1 of=/tmp/recover/bu-nbmedia00_1418641103_frag2 bs=256k

Looking now at the files we have created :

-rw-r--r-- 1 root root 1603796992 Dec 15 03:16 bu-nbmedia00_1418641103_frag1
-rw-r--r-- 1 root root       1024 Dec 15 03:15 bu-nbmedia00_1418641103_frag1header
-rw-r--r-- 1 root root 1002438656 Dec 15 03:22 bu-nbmedia00_1418641103_frag2
-rw-r--r-- 1 root root       1024 Dec 15 03:22 bu-nbmedia00_1418641103_frag2header


We have the two backup headers (one for each fragment) for reference if required, and the two files holding the actual data.

We can then 'cat' the two files containing each fragment of the image into one file:

cat bu-nbmedia00_1418641103_frag* >bu-backup

And read this single tar file back, which contains the data from the backup.

[root@nbmedia00 recover]# /usr/openv/netbackup/bin/tar tvf bu-backup
Erwxr-xr-x root/root 103 Dec 15 02:58 2014 .AtTrIbUtEs.0
--Extra header --
drwxr-xr-x root/root   0 Dec 15 02:41 2014 /
drwxr-xr-x root/root   0 Dec 15 02:49 2014 /testdata/
-rw-r--r-- root/root 1532471296 Dec 15 02:50 2014 /testdata/tarfile1

View solution in original post

15 REPLIES 15

revarooo
Level 6
Employee

What did the original backup fail with? If it errored when backing up, then likelyhood is what is on the tape could be uncataloged data (the image is discarded if the backup fails)

Also are you sure you imported ALL the tapes?

 

Marianne
Level 6
Partner    VIP    Accredited Certified
So, if my understanding is correct, tape filled up, no more tapes were available and backup failed with status 96, right? You cannot recover from this backup because the backup image is incomplete. This is why the import failed as well. Nothing you can do other than ensure you have sufficient tapes for future backups.

revarooo
Level 6
Employee

Indeed Marianne, we need to know more about this "error" when the backup ran!

Marianne
Level 6
Partner    VIP    Accredited Certified
Only a Status 1 is a 'Partial' backup and can be restored from. If you have Checkpoints enabled in policies, most status codes (like 96) will leave jobs in Incomplete state. You can then fix the issue (add more tapes) and resume jobs. Else, if job is in Done state, there is nothing that can be done to complete the job or restore from it. All you can do is run the job again from scratch. With default retry settings, NBU will automatically retry the job if backup window is still open.

DBWR
Level 3

Thanks for the replies.

It was quite a while ago so the logs are no longer in the activity monitor, but I remember it would say something along the lines of "media full, requesting new media", it was part of an autoloader which was full, as there were no unallocated tapes it would then fail saying unable to allocate new media.  I think it was status 96.

I know this is something you want to avoid but at the time we were undermanned, underbudgeted, and had a sudden spike in user data.

I'm still surprised that none of it is recoverable.  If you have 900,000 files and the tape only has room for 899,999 it seems a bad design that the whole lot becomes unreadable.  Is this the same with all most SW or just Netbackup?

mph999
Level 6
Employee Accredited

If NBU fails the backup, it does not record the details of the backup in the catalog, so it is not searchable.

If the backup fails on the first tape used for that backup (ie. it didn't span to another tape) then upon failure, the tape is rewound to the end of the previous backup / start of the backup that just failed, and the EOD mark is re-written .

This then makes it impossible to get at the files that were written to the tape before the failure, as the tape drive firmware will not go past the EOD mark, so only backups before the failure can be imported.

The only way to get at the files, and this will only work for non multiplexed backups, is to use a tape drive with modified firmware that can ignore/ skip past the EOD mark.

Unfortunately, the only people I know who will have these are the drive manufactures themselves and data recovery firms.

DBWR
Level 3

When I tried importing the tape, it listed all the files (including the one I want to get to!) which took a long time, which would suggest to me it's not the EOD mark causing the problem?

I'm considering sending it to a recovery firm anyway, depending on the cost.  As the files are on there and there isn't any corruption or hardware fault, it seems ridiculous that we can't get to them ourselves.

Michael_G_Ander
Level 6
Certified

Have recovered from partially mutliplexed backups back in the Netbackup 4.5 days on UNIX, then we used a combination of mt, dd and Netbackup tar.

We used the options -i & -M for Netbackup tar, which then meant ignore error and Multiplexed media together with normal options

But it is a lot of work, so will suggest you only do it if the files is very important to recover

 

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

DBWR
Level 3

Thanks for the suggestion, I have considered it but it's going to take quite a while to get a spare linux box installed and connected to a tape drive, then find the exact commands to make it all work.  Even after that it might not work, so it might be cheaper to just send it away to a recovery company.

If there was a way of manually doing it in windows using the tar32.exe that would be worth a go but can't find enough information on it anywhere.

mph999
Level 6
Employee Accredited

When I tried importing the tape, it listed all the files (including the one I want to get to!) which took a long time, which would suggest to me it's not the EOD mark causing the problem?

Yes, I agree, if it were EOD then you wouldn't see them.

And I've just worked out why ...

When a tape spans, if the backup fails, we don;t go back to the first tape and rewrite the EOD, we just leave it.  In this case, the backup finished on the first tape, it expected the second tape but didn;t get it, so I think it has treated it the same way - ie. not gone back and re-written the EOD.

I suspect you are failing on the phase 2 import, the reason is because you only have 'half a tar file' - when NBU spans tapes it doesn't write an empty header at the end of the first tape, and I think this is how it knows there should be more to come, which in this case it doesn't get.

Is this backup multiplexed ?  I see MPX has been mentioned, but can you confirm.

DBWR
Level 3

Yes this sounds very feasible.  It wasn't multiplexed no.

DBWR
Level 3

Any ideas how to recover the incomplete tar tape?  While I don't use regular tar in linux much I've got a funny feeling if it was there would be a way, but Netbackup's inplementation of tar is different.

Michael_G_Ander
Level 6
Certified

You have to use the tar in Netbackup as far as I know, tar --help will give all the options as I remember it.

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

mph999
Level 6
Employee Accredited

The NBU is virtualy the same, same options ...  The main difference from memory is it frops the leading / so retores are relative to where you, not absolute (making althernate restore locations possible).

This is the posh way to do it  ...  It was aimed at images that span tapes, but works fine for a single tape.  You actually get the data off the tape with dd, then untar the files later.  

How to use dd to read data off a tape (Unix/ Linux)

On occassion it may be necessary to try and read data off a tape if there is a failure in NetBackup.  Usually this will be only to demonstrate that the tape is unreadable due to an issue outside of NetBackup.

It is easier to explain this via example.

We can see from bpimagelist that the image nbmedia00_1418641103 spans 2 tapes, E01001 and E01002

IMAGE nbmedia00 0 0 8 nbmedia00_1418641103 nbmedia_big 0 *NULL* root Full 0 1 1418641103 130 1419850703 0 0 2545152 1 1 2 0 nbmedia_big_1418641103_FULL.f *NULL* *NULL* 0 1 0 0 0 *NULL* 0 0 0 0 0 0 0 *NULL* 0 0 0 *NULL* 269 0 0 740 0 0 *NULL* *NULL* 0 0 0 0 *NULL* *NULL*
HISTO -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
FRAG 1 1 1566208 0 2 6 1 E01001 nbmedia00 262144 2 1418641103 2 0 *NULL* 1419850703 0 65537 0 0 0 1 0 1418641233 0 *NULL* *NULL*
FRAG 1 2 978944 0 2 6 1 E01002 nbmedia00 262144 2 1418641103 2 0 *NULL* 0 0 0 0 0 0 1 0 0 0 *NULL* *NULL*

It does not actually matter if the image spans multiple tapes, or is contained on a single media, the procedure is exactly the same.

First, use scsi_command -map to obtain an image of what is on the tape, this needs to be done in fragment order, so E01001 is the first tape as it contains fragment 1, E01002 is the second tape as it contains fragement 2.

[root@nbmedia00 testdata]# scsi_command -map -f /dev/nst1
00000000: file 1: record 1: size 1024: NBU MEDIA header (E01001)
00000001: file 1: eof after 1 records: 1024 bytes
00000002: file 2: record 1: size 1024: NBU BACKUP header
          backup_id nbmedia00_1418641103: frag 1: file 1: copy 1
          expiration 1419850703: retention 1: block_size 262144
          flags 0x0: mpx_headers 0: resume_count 0: media E01001
00000003: file 2: record 2: size 262144
00006121: file 2: eof after 6119 records: 1603798016 bytes
eot

We see that the header for the backup image nbmedia00_1418641103 is located at file 2, and the data fragment is immediate after the header as expected.

We need to first read this fragment using dd, into a file on disk, followed by the actual data in fragment 1.

We can build the commands required by referencing the output of scsi_command -map output, consideration must be given to filenames and blocksize used to allow the read via dd to be successful, and correct identification of the files later.  In this example we have used the backupid, follwed by a description of what part of the image the file relates to.

NOTE:
We want to position the tape to file 2, fsf 1 is correct as it positions after file 1, which of course is just before file 2.
mkdir /tmp/recover

mt -f /dev/nst1 rewind
mt -f /dev/nst1 fsf 1
dd if=/dev/nst1 of=/tmp/recover/bu-nbmedia00_1418641103_frag1header bs=1k   count=1
dd if=/dev/nst1 of=/tmp/recover/bu-nbmedia00_1418641103_frag1 bs=256k


A quick look at the files created :

-rw-r--r-- 1 root root 1603796992 Dec 15 03:16 bu-nbmedia00_1418641103_frag1
-rw-r--r-- 1 root root       1024 Dec 15 03:15 bu-nbmedia00_1418641103_frag1header


The frag1header (which is the backup header) is 1K in size which is correct for a header.


We now need to recover any remaining fragments for the image from the next tape, so as before, we repeat the process and run scsi_command -map -d <device> on the next tape, E01002.

00000000: file 1: record 1: size 1024: NBU MEDIA header (E01002)
00000001: file 1: eof after 1 records: 1024 bytes
00000002: file 2: record 1: size 1024: NBU BACKUP header
          backup_id nbmedia00_1418641103: frag 2: file 1: copy 1
          expiration 1419850703: retention 1: block_size 262144
          flags 0x0: mpx_headers 0: resume_count 0: media E01002
00000003: file 2: record 2: size 262144
00003827: file 2: eof after 3825 records: 1002439680 bytes
00003828: file 3: record 1: size 1024: NBU EMPTY header (file 2)
00003829: file 3: eof after 1 records: 1024 bytes
eot


Here we see that we have frag 2 of backup header for frag 2 of image nbmedia00_1418641103 at the position file 2, so we can again put the required commands together.
In this case, because the fragements we want are both the first fragments on the tapes, the commands we need are the same, so we just change the filenames.

mt -f /dev/nst1 rewind
mt -f /dev/nst1 fsf 1
dd if=/dev/nst1 of=/tmp/recover/bu-nbmedia00_1418641103_frag2header bs=1k   count=1
dd if=/dev/nst1 of=/tmp/recover/bu-nbmedia00_1418641103_frag2 bs=256k

Looking now at the files we have created :

-rw-r--r-- 1 root root 1603796992 Dec 15 03:16 bu-nbmedia00_1418641103_frag1
-rw-r--r-- 1 root root       1024 Dec 15 03:15 bu-nbmedia00_1418641103_frag1header
-rw-r--r-- 1 root root 1002438656 Dec 15 03:22 bu-nbmedia00_1418641103_frag2
-rw-r--r-- 1 root root       1024 Dec 15 03:22 bu-nbmedia00_1418641103_frag2header


We have the two backup headers (one for each fragment) for reference if required, and the two files holding the actual data.

We can then 'cat' the two files containing each fragment of the image into one file:

cat bu-nbmedia00_1418641103_frag* >bu-backup

And read this single tar file back, which contains the data from the backup.

[root@nbmedia00 recover]# /usr/openv/netbackup/bin/tar tvf bu-backup
Erwxr-xr-x root/root 103 Dec 15 02:58 2014 .AtTrIbUtEs.0
--Extra header --
drwxr-xr-x root/root   0 Dec 15 02:41 2014 /
drwxr-xr-x root/root   0 Dec 15 02:49 2014 /testdata/
-rw-r--r-- root/root 1532471296 Dec 15 02:50 2014 /testdata/tarfile1

DBWR
Level 3

Many thanks for the detailed instructions.

Unfortunately it is connected to a Windows box, and we don't have a spare Linux box, or the hardware required to connect it up to the autoloader.  If anyone knows a way of doing it from Windows that'd be really helpful.  If not I'll look into getting something setup to follow the above instructions when I get chance but it's going to take a while.