cancel
Showing results for 
Search instead for 
Did you mean: 

Vault Duplication

rsm_gbg
Level 5

Hi,

I started using Vault Duplication to Offsite tapes. (all LTO4 tapes)
It seems it either don't compress the data onto the duplication tape or it for some other reason don't fill it and just have to continue to another Offsite tape.

I run a normal backup on to local tape during the night, it's about 1TB

Vault do a duplication of todays data.
But it start duplication onto one tape but instead of filling it it starts another duplication onto a new tape.
It usually fills about 500-600 GB and then starts a new dupl. to a new tape.

They all have the same retention on the daily tape.

Why isn't it filling up the tape?

I went from inline copy to using Vault duplication and with inline copy it wasn't any problem.

- Roland

28 REPLIES 28

mph999
Level 6
Employee Accredited
tperr just looks in /usr/openv/netbackup/db/media/errors file (on each media server) - just copy the file somewhere safe, and then delete the contents of the original. I'll have a look at the bpdbjobs output - should be possible with awk ... The speed issue 'could' be separate from the 'full tape' issue - we don't know either way at the moment. I think we should do the following, else this will go on for ages ... - a very controlled test. Create 2-3 GB of data on disk, using a songle file of about 100MB (just copy to a new filename multiple times via a script). Back this up to your good tape drive until it is full - use a new volume pool 'test'. Hopefully you will have 1.2 GB on the tape. Allow it to span to another tape so the job doesn't fail. Now for the clever bit - when you are happy you have a real full tape, delete the data from disk, and the restore what is on the full tape. Eject the the tape that the back spanned onto - the job will fail, but the data from only the first tape will have been restored. Now, we have the exact amount of data that will fit on one tape. Using exactly the same policy and data, backup again to the 'bad' drive (just down the drive you don't want to use) - how much is on the tape ? Next : Backup again using in line tape copy - check tapes Next: Use vault, what are the results Hopefully this will reproduce the issue, but in a very controlled (and repeatable if necessary) way. Make sure the bptm log is turned up to VERBOSE = 5 (will also need system messages log). After each job - copy the bptm log and name it bptm_backup / bptm_inline / bptm_vault Zero out the log before each attempt (>log.) Lets see what this shows . I think this test gives the exact info needed to log a call with Symantec and NOT be asked to do anything else. Don't log the call yet, let's see if the test shows what we want. M

rsm_gbg
Level 5

Sounds good,

need some clarifications.

> Back this up to your good tape drive until it is full - use a new volume pool 'test'. Hopefully you will have 1.2 GB on the tape.

You mean 1.2TB I presume.
I assume I need to write a tiny script that excercise bpbackup over and over again till the tape is full.

>Now for the clever bit - when you are happy you have a real full tape, delete the data from disk, and the >restore what is on the full tape. Eject the the tape that the back spanned onto - the job will fail, but the >data from only the first tape will have been restored.
>Now, we have the exact amount of data that will fit on one tape.

I assume you mean to restore the entire tape, and that means I need to loop bprestore renaming the files to new names.
That means I need a 1.2TB disk which I unfurtunatly don't have.

I can however do the backup part and "count" the number of 100MB files that I backup.
So therefor I know how much data goes onto the tape.
It should be the same as nbu reports.

- Roland

mph999
Level 6
Employee Accredited

HI Roland,

Opps, I did mean TB ...

Yes, all I was after was a method of getting a full tape (or very nearly full) that is exactly repeatable with exactly the same data, with each file in that data being exactly the same (apart from filename).

I did think of one catach, if both media are not available, the restore probably won't start so we might have had to get creative ... but looks like we don't need that method now.

Using 100MB files means that on the good tape, we''' only have a gap at the end just under 100MB, this isn;t an issue as the bad tape holds much more than 100MB less - so even though the first tape is not quite full, it is still more than will fit on the bad tape - hence the use of smallish files.

Thinking a bit  more about this, the backups will need to be individual backups - so we can increase the size of each backup to say 20 - 30 GB, if you have that available space.  The bad tape holds about 400GB less, so as long as the backups are less than this, when the last one fails to fit on the first good tape there is still more than enough data to make the bad tape full - you get the idea ...  The important thing is that it is 'exactly' repeatable with each backup containing exactly the same files.

Regards,

M

rsm_gbg
Level 5

Did the first backup test onto the "good" drive today.
The tape happily swollowed 1.5TB and isn't full, I made 1000 mkfile 100m, so 100GB and then backed it up 15 times.
I ran out of time today and will try the other drive tomorrow.
But I never seen 90M/sec before, but it is just empty files.

Client=pnms01 Policy=test-pnms01 Elapsed=0000016989 Kbytes=102400672 Files=1002 KBpersec=89338
Client=pnms01 Policy=test-pnms01 Elapsed=0000015809 Kbytes=102400640 Files=1002 KBpersec=95350
Client=pnms01 Policy=test-pnms01 Elapsed=0000014719 Kbytes=102400640 Files=1002 KBpersec=91447
Client=pnms01 Policy=test-pnms01 Elapsed=0000013580 Kbytes=102400640 Files=1002 KBpersec=94913
Client=pnms01 Policy=test-pnms01 Elapsed=0000012480 Kbytes=102400640 Files=1002 KBpersec=91823
Client=pnms01 Policy=test-pnms01 Elapsed=0000011340 Kbytes=102400640 Files=1002 KBpersec=93960
Client=pnms01 Policy=test-pnms01 Elapsed=0000010230 Kbytes=102400672 Files=1002 KBpersec=89622
Client=pnms01 Policy=test-pnms01 Elapsed=0000009061 Kbytes=102400640 Files=1002 KBpersec=93610
Client=pnms01 Policy=test-pnms01 Elapsed=0000007950 Kbytes=102400640 Files=1002 KBpersec=93230
Client=pnms01 Policy=test-pnms01 Elapsed=0000006830 Kbytes=102400672 Files=1002 KBpersec=93449
Client=pnms01 Policy=test-pnms01 Elapsed=0000005710 Kbytes=102400640 Files=1002 KBpersec=93640
Client=pnms01 Policy=test-pnms01 Elapsed=0000004590 Kbytes=102400640 Files=1002 KBpersec=93139
Client=pnms01 Policy=test-pnms01 Elapsed=0000003470 Kbytes=102400640 Files=1002 KBpersec=93644
Client=pnms01 Policy=test-pnms01 Elapsed=0000002360 Kbytes=102400672 Files=1002 KBpersec=86727
Client=pnms01 Policy=test-pnms01 Elapsed=0000001150 Kbytes=102400672 Files=1002 KBpersec=96465

rsm_gbg
Level 5

Today I used the "bad" drive.
The backup times are slower about 4-5min per backup cycle.
But the empty 100MB files compresses very well and I managed to cram 3TB onto the ONE tape!
Very efficient indeed.

I guess I have to take a ~100MB file with some real data and rename it 1000 times.
That will better show the compression ratio.

We still have the fact that same backup at roughly the same time of day (very quiet) is 4-5min slower consistently.
Maybe the difference will be larger when the data has some random bits in it instead of empty files.
I will do another test tomorrow.

Client=pnms01 Policy=test-pnms01 Started=1386020009 Elapsed=0000021846 Ended=1386041855 Kbytes=102400704 Files=1002 KBpersec=88063i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1386020009 Elapsed=0000020416 Ended=1386040425 Kbytes=102400704 Files=1002 KBpersec=83120i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1386020009 Elapsed=0000018956 Ended=1386038965 Kbytes=102400672 Files=1002 KBpersec=90819i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1386020009 Elapsed=0000017646 Ended=1386037655 Kbytes=102400736 Files=1002 KBpersec=81879i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1386020009 Elapsed=0000016226 Ended=1386036235 Kbytes=102400672 Files=1002 KBpersec=92987i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1386020009 Elapsed=0000014926 Ended=1386034935 Kbytes=102400736 Files=1002 KBpersec=80009i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1386020009 Elapsed=0000013426 Ended=1386033435 Kbytes=102400736 Files=1002 KBpersec=84228i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1386020009 Elapsed=0000011976 Ended=1386031985 Kbytes=102400704 Files=1002 KBpersec=89394i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1386020009 Elapsed=0000010586 Ended=1386030595 Kbytes=102400736 Files=1002 KBpersec=81078i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1386020009 Elapsed=0000009056 Ended=1386029065 Kbytes=102400704 Files=1002 KBpersec=87739i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1386020009 Elapsed=0000007606 Ended=1386027615 Kbytes=102400704 Files=1002 KBpersec=88610i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1386020009 Elapsed=0000006146 Ended=1386026155 Kbytes=102400736 Files=1002 KBpersec=86990i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1386020008 Elapsed=0000004657 Ended=1386024665 Kbytes=102400736 Files=1002 KBpersec=85135i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1386020008 Elapsed=0000003117 Ended=1386023125 Kbytes=102400736 Files=1002 KBpersec=84124i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1386020008 Elapsed=0000001547 Ended=1386021555 Kbytes=102400736 Files=1002 KBpersec=77616i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1385937066 Elapsed=0000016989 Ended=1385954055 Kbytes=102400672 Files=1002 KBpersec=89338i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1385937066 Elapsed=0000015809 Ended=1385952875 Kbytes=102400640 Files=1002 KBpersec=95350i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1385937066 Elapsed=0000014719 Ended=1385951785 Kbytes=102400640 Files=1002 KBpersec=91447i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1385937065 Elapsed=0000013580 Ended=1385950645 Kbytes=102400640 Files=1002 KBpersec=94913i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1385937065 Elapsed=0000012480 Ended=1385949545 Kbytes=102400640 Files=1002 KBpersec=91823i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1385937065 Elapsed=0000011340 Ended=1385948405 Kbytes=102400640 Files=1002 KBpersec=93960i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1385937065 Elapsed=0000010230 Ended=1385947295 Kbytes=102400672 Files=1002 KBpersec=89622i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1385937065 Elapsed=0000009061 Ended=1385946126 Kbytes=102400640 Files=1002 KBpersec=93610i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1385937065 Elapsed=0000007950 Ended=1385945015 Kbytes=102400640 Files=1002 KBpersec=93230i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1385937065 Elapsed=0000006830 Ended=1385943895 Kbytes=102400672 Files=1002 KBpersec=93449i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1385937065 Elapsed=0000005710 Ended=1385942775 Kbytes=102400640 Files=1002 KBpersec=93640i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1385937065 Elapsed=0000004590 Ended=1385941655 Kbytes=102400640 Files=1002 KBpersec=93139i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1385937065 Elapsed=0000003470 Ended=1385940535 Kbytes=102400640 Files=1002 KBpersec=93644i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1385937065 Elapsed=0000002360 Ended=1385939425 Kbytes=102400672 Files=1002 KBpersec=86727i STunit=pnms01-tape
Client=pnms01 Policy=test-pnms01 Started=1385937065 Elapsed=0000001150 Ended=1385938215 Kbytes=102400672 Files=1002 KBpersec=96465i STunit=pnms01-tape

rsm_gbg
Level 5

New test today using the good drive.
I took a 4.2GB file and copied it to File-1 to File-25 roughly ~101GB in total, so now we have random data to backup instead of empty files.

What is very interesting is that on loop 1 the backup took 20:11min but I saw this:
07:55:15 - Info bpbkar (pid=586) bpbkar waited 45167 times for empty buffer, delayed 48334 times
07:55:16 - Info bptm (pid=587) waited for full buffer 4336 times, delayed 12767 times
pretty good

Loop 3 took 15:56 and I saw this:
08:10:59 - Info bpbkar (pid=1336) bpbkar waited 47877 times for empty buffer, delayed 50670 times
08:10:59 - Info bptm (pid=1337) waited for full buffer 259 times, delayed 899 times

Any ideas why this loop suddenly didn't have to wait for full buffer more than 250 times?

mph999
Level 6
Employee Accredited
Not sure - waiting for full buffer usually is due to either disk read speed of the client, or the network - bptm is waiting for data to fill the buffer. Perhaps the system was busy, and was less busy by the time the third loop came round. I forgot - best not to use empty files, go for real data - as the backups are individual I see no problem with using the same file/ files as suggested. As long as it's 'controlled', M

rsm_gbg
Level 5

Hi,

Just started recently I get a lot of these "open failed in io_open, I/O error"

What does that mean? It seems to coupe with it and continue just fine.
Drive 1 is the "bad" one.

- Roland

Dec  9 03:55:11 pnms01 ltid[3353]: [ID 527590 daemon.notice] LTID - Sent ROBOTIC request, Type=1, Param2=1
Dec  9 03:55:11 pnms01 tldd[3362]: [ID 632670 daemon.notice] TLD(0) MountTape DLT111 on drive 1, from slot 19
Dec  9 03:55:11 pnms01 tldd[29230]: [ID 302958 daemon.notice] TLD(0) open failed in io_open, I/O error
Dec  9 03:55:11 pnms01 tldd[29230]: [ID 179076 daemon.notice] TLD(0) unload==TRUE, but no unload, drive 1 (device 1)
Dec  9 03:55:11 pnms01 tldcd[3366]: [ID 644291 daemon.notice] inquiry() function processing library HP       MSL G3 Series    H.20:
Dec  9 03:55:11 pnms01 tldcd[3366]: [ID 281747 daemon.notice] Processing MOUNT, TLD(0) drive 1, slot 19, barcode DLT111L4        , vsn DLT111
Dec  9 03:55:11 pnms01 tldcd[29231]: [ID 925075 daemon.notice] TLD(0) opening robotic path /dev/sg/c0t4l1
Dec  9 03:55:11 pnms01 tldcd[29231]: [ID 644291 daemon.notice] inquiry() function processing library HP       MSL G3 Series    H.20:
Dec  9 03:55:11 pnms01 tldcd[29231]: [ID 592656 daemon.notice] TLD(0) initiating MOVE_MEDIUM from addr 1019 to addr 1

mph999
Level 6
Employee Accredited
You just caught me as I was off to bed ... That I think is normal ... yes it's an error but if I recall correctly we make check to see if the drive is empty, and the error we 'expect' is what occurs when we try to access a drive with no tape. M