cancel
Showing results for 
Search instead for 
Did you mean: 

data backed up vs data stored on tape

tythecellist
Level 4
Partner

I'm being asked by a manager to supply a report that I don't think is possible to provide ... at least not in the way that person is hoping to see it. :\

Let's say that in a 24-hour period I back up 50 TB of data.  That (to me...) means that the sum total of the jobs that complete successfully (or, well, partially successfully) is 50 TB.

I'm being asked to show what that amount of data occupies *on tape*, without factoring in tape hardware compression.   Stay with me here.

In my mind, to get at least a reasonably accurate idea of how much tape this 50 TB backup set is using, I'd just need to know what my average tape compression ratio is, for my LTO5 tapes, and divide the 50 TB number by whatever my average per-tape capacity is (using that compression ratio.)   But again, that involves factoring in compression, and I'm being told not to do that for this report.

Aside from Accelerator or Dedupe backups, I cannot think of another way to do this.  

Anyone else?

 

1 ACCEPTED SOLUTION

Accepted Solutions

mph999
Level 6
Employee Accredited
It works something along these lines. This may not be 100% correct as non of this is documented, it's just what I've worked out ... Regarding the original query, it's not going to help as it's still 'compressed' - but I thought it might be of interest anyway. Looking at just one file in two different backups taken one after the other, the .f file is identical, the particular field of interest however, being the blknum. womble_1443432148 num len plen dlen blknum ii raw_sz GB dev_num path data 4 0 25 53 13 1 0 0 33 /netbackup/testdata/file3 33060 root root 1810 1433156057 1433156057 1443431955 womble_1443432175 num len plen dlen blknum ii raw_sz GB dev_num path data 4 0 25 53 13 1 0 0 33 /netbackup/testdata/file3 33060 root root 1810 1433156057 1433156057 1443432154 However, the 'FRAG' line info from NBDB is different (alternative way to look at this as opposed to bpimagelist output) womble_1443432148 '461','391','332','1','1','0','A00000','1000002','2','1','1','32768','A00000','6','1','262144','2','1443432148','0','0','','0','0','','','2015-09-28 09:22:54.122663','2015-09-28 09:22:54.122710' womble_1443432175 '462','392','333','1','1','0','A00000','1000002','2','1','1','32768','A00000','6','2','262144','5','1443432148','0','0','','0','0','','','2015-09-28 09:23:15.033018','2015-09-28 09:23:15.033062' The field meanings are CREATE TABLE "DBM_MAIN"."DBM_ImageFragment" ( 1 "ImageFragmentKey" unsigned bigint NOT NULL DEFAULT autoincrement 2 ,"ImageKey" unsigned bigint NOT NULL 3 ,"ImageCopyKey" unsigned bigint NOT NULL 4 ,"CopyNumber" integer NOT NULL 5 ,"FragmentNumber" integer NOT NULL 6 ,"ResumeCount" integer NOT NULL 7 ,"MediaID" varchar(1024) NOT NULL 8 ,"MediaServerKey" unsigned int NULL DEFAULT 0 9 ,"StorageUnitType" integer NOT NULL DEFAULT 0 10 ,"StuSubType" smallint NOT NULL DEFAULT 0 11 ,"FragmentState" smallint NOT NULL 12 ,"FragmentSize" bigint NOT NULL DEFAULT 0 13 ,"FragmentID" varchar(4096) NOT NULL DEFAULT '' 14 ,"Density" integer NOT NULL DEFAULT 0 15 ,"FileNum" integer NOT NULL DEFAULT 0 16 ,"BlockSize" integer NOT NULL DEFAULT 0 17 ,"Offset" integer NOT NULL DEFAULT 0 18 ,"MediaDate" bigint NOT NULL DEFAULT 0 19 ,"DeviceWrittenOn" integer NOT NULL DEFAULT 0 20 ,"FFlags" integer NOT NULL DEFAULT 0 21 ,"MediaDescription" varchar(1024) NOT NULL DEFAULT '' 22 ,"FragmentCheckpoint" smallint NOT NULL DEFAULT 0 23 ,"MediaSequenceNum" integer NOT NULL DEFAULT 0 24 ,"MediaExtents" varchar(4096) NOT NULL DEFAULT '' 25 ,"SnapshotClientMountHost" varchar(1024) NOT NULL DEFAULT '' 26 ,"CreatedDateTime" timestamp NOT NULL DEFAULT current utc timestamp 27 ,"LastModifiedDateTime" timestamp NOT NULL DEFAULT utc timestamp 28 ,CONSTRAINT "PK_DBM_IMAGEFRAGMENT" PRIMARY KEY ("ImageFragmentKey" ASC) The 17th field is offset, '2' for the womble_1443432148 backup, '5' for the womble_1443432175 backup The offset is the starting position of the fragment, relative to the beginning of the tape. Looking at what's on the tape with scsi_command -map root@womble db $ scsi_command -map -f /dev/rmt/0cbn 00000000: file 1: record 1: size 1024: NBU MEDIA header (A00000) 00000001: file 1: eof after 1 records: 1024 bytes 00000002: file 2: record 1: size 1024: NBU BACKUP header backup_id womble_1443432148: frag 1: file 1: copy 1 expiration 1443435748: retention 10: block_size 262144 flags 0x0: mpx_headers 0: resume_count 0: media A00000 00000003: file 2: record 2: size 32768 00000004: file 2: eof after 2 records: 33792 bytes 00000005: file 3: record 1: size 1024: NBU BACKUP header backup_id womble_1443432175: frag 1: file 2: copy 1 expiration 1443435775: retention 10: block_size 262144 flags 0x0: mpx_headers 0: resume_count 0: media A00000 00000006: file 3: record 2: size 32768 00000007: file 3: eof after 2 records: 33792 bytes 00000008: file 4: record 1: size 1024: NBU EMPTY header (file 3) 00000009: file 4: eof after 1 records: 1024 bytes eot The first column '00000000:' etc ... is the offset position ... and the backup headers show as being offset 00000002 and 00000005, which aligns with what is given in the image fragment table. So, knowing the starting position (image table) of the fragment relative to the beginning of the tape, and, the file position from the .f file, relative to the beginning of the fragment positions of files and amount of tape used can be worked out.

View solution in original post

27 REPLIES 27

Jim-90
Level 6

The first querstion I would ask is : "What problem does this reporting solve?"   Useless questions and reports from mangers are just that - "usless" and time wasters 

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

That made my head hurt for a bit.

There is no possible way to do this unless you've got a full tape, and also only if you've completely used that tape for 1 backup and it didn't span onto another tape. When the tape is full, we'll know about it just because the tape drive reached the end of the tape and asked for another. So at that stage you can look at bpmedialist and see you put 5TB on a 1.6TB. For a tape that is not full we have no idea how much of the tape has been written on. Lets not get into multiplexing.....

 

Nicolai
Moderator
Moderator
Partner    VIP   

There is no way of telling how much 50TB occupies on tape without compression. There is simply no tools to report that. 

Will the report be used for invoicing ?

Youre process of using a average would also be my best advice.

 

 

mph999
Level 6
Employee Accredited

Nicolai my friend .... you are misssing the obvious.

50MB of data, occupies on tape without compression, 50MB does it not ?

If, it writes without error that is - if there are 'recoverable' errors, the drive will re-write the data, invisible to NBU and the OS, but will use slightly more tape.

 

mph999
Level 6
Employee Accredited

I'm confused though, and my head hurts too Riaan ...

Given that drives use compression, what use is a report that doesn't tape compression into account ???

This info is obtainable from the bptm log, you can see the size of each fragment sent.  As this is on the NBU side, it is the value before compression.

You could also get the size of the fragements for a given backupid, from the catalog.

Nicolai
Moderator
Moderator
Partner    VIP   

Absolutely :)

But as I understand the question, the user ask how the data occupy on tape "backend" so to say, and thus how much remaining. 

The only thing we can report is front end data, the amount of data we protect. How much space it occupy on on a given media - either tape,MSDP, Data Domain - we can't say directly. They are "black boxes" to us.

And because compression is always on - how could we report data uncompressed - without knowing the average compression per image (we don't).

The only way - very simple and stupid -  I can think of is dividing image size with 512 byes - that the sector size of SCSI. Thus a 100MB backup would occupi 200 blocks of data on tape. But what in the world would that pice of information be good for ...

Hope this helps :)

 

 

 

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Its a bit confusing but what the manager wanted to know is how much each backup took up on tape. Well that is what I understood. So assume the following

Oracle backup is put on LTO 4, we manage to stick 2.4 TB on the 800GB rated media. Magically the data stream ended right at (or before) the end of tape marker. So we know the compression is 3:1 to and the backup is taking up 800GB of storage.

 

But what if that job had 1 Megabyte more to write. A new tape will be loaded and the backup will complete. But we won't be able to determine how much of the LTO media was used.  (or will we?).

 

To add to that, what happens to the 8 other backups we put on that tape? They all have have different compression ratios (assuming they come from different backup sources). So we'll have 9 different backups with 9 different, and unknown, compression ratios occupying on tape.

 

When the end of tape marker rolls up and the last backup is finished there is no way to calculate the storage used (per backup) as in the first example.

 

So you can really only report this in increments of 800GB.

 

Unless I'm missing something.
 

Will_Restore
Level 6

But how many microns of magnetic flux were utilized?

That is the important question.  8>

 

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

The answer is always 27

revarooo
Level 6
Employee

What is the value in knowing this? That is the question that should be asked!

Will_Restore
Level 6

Not 42? 

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Since we know that answer is 27, the next questions could be why ask silly questions (as Revarooo pointed out)

mph999
Level 6
Employee Accredited

The block position of each fragment is recored (at least the starting offset from the beginning of the tape) in the catalog, seen on the FRAG lines of bpimage output, so in theory, you could work out how many blocks each fragment took  _ I think ...  

You have also got some position data in the .f file, per file, which I think is how the fast block positioning works ...  BUT

1.  This is a wild idea I've just thought of, havn't tested, researched or thought through ...

2.  If multiplexing is used, I suspect all bets are off ...

3.  I might be wrong ...

Proving this could be fun, you need to take say 1 jpg image of a decent size (because jpgs are pretty much uncompressible) or a good sized .zip file (again, uncompressible and write it to a unassigned tape.  zip file is better as you can get  decent size easily, I'm talking something like a couple of gigs.

Then back it up again, to the same tape.

Next, backup (again to an unassigned tape) something nice and compressible (like a .txt file) of the same size as the zip file.  As before back it up again (to the same tape).

THen compare the start block off set of the 1st fragment of the 2nd backup on each tape, it should be different as one image compressed and one didn't.

Then with some maths and knowing the blocksize you might be able to come to some conclustion.

Not sure if this would show anything useful though ...

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Too clever. Do we know how big a block is?

mph999
Level 6
Employee Accredited

Yep, whatever SIZE_DATA_BUFFERS is set to ...

However, not sure it's going to help as it's still compressed ...

Nicolai
Moderator
Moderator
Partner    VIP   

Interesting ... +1

mph999
Level 6
Employee Accredited
It works something along these lines. This may not be 100% correct as non of this is documented, it's just what I've worked out ... Regarding the original query, it's not going to help as it's still 'compressed' - but I thought it might be of interest anyway. Looking at just one file in two different backups taken one after the other, the .f file is identical, the particular field of interest however, being the blknum. womble_1443432148 num len plen dlen blknum ii raw_sz GB dev_num path data 4 0 25 53 13 1 0 0 33 /netbackup/testdata/file3 33060 root root 1810 1433156057 1433156057 1443431955 womble_1443432175 num len plen dlen blknum ii raw_sz GB dev_num path data 4 0 25 53 13 1 0 0 33 /netbackup/testdata/file3 33060 root root 1810 1433156057 1433156057 1443432154 However, the 'FRAG' line info from NBDB is different (alternative way to look at this as opposed to bpimagelist output) womble_1443432148 '461','391','332','1','1','0','A00000','1000002','2','1','1','32768','A00000','6','1','262144','2','1443432148','0','0','','0','0','','','2015-09-28 09:22:54.122663','2015-09-28 09:22:54.122710' womble_1443432175 '462','392','333','1','1','0','A00000','1000002','2','1','1','32768','A00000','6','2','262144','5','1443432148','0','0','','0','0','','','2015-09-28 09:23:15.033018','2015-09-28 09:23:15.033062' The field meanings are CREATE TABLE "DBM_MAIN"."DBM_ImageFragment" ( 1 "ImageFragmentKey" unsigned bigint NOT NULL DEFAULT autoincrement 2 ,"ImageKey" unsigned bigint NOT NULL 3 ,"ImageCopyKey" unsigned bigint NOT NULL 4 ,"CopyNumber" integer NOT NULL 5 ,"FragmentNumber" integer NOT NULL 6 ,"ResumeCount" integer NOT NULL 7 ,"MediaID" varchar(1024) NOT NULL 8 ,"MediaServerKey" unsigned int NULL DEFAULT 0 9 ,"StorageUnitType" integer NOT NULL DEFAULT 0 10 ,"StuSubType" smallint NOT NULL DEFAULT 0 11 ,"FragmentState" smallint NOT NULL 12 ,"FragmentSize" bigint NOT NULL DEFAULT 0 13 ,"FragmentID" varchar(4096) NOT NULL DEFAULT '' 14 ,"Density" integer NOT NULL DEFAULT 0 15 ,"FileNum" integer NOT NULL DEFAULT 0 16 ,"BlockSize" integer NOT NULL DEFAULT 0 17 ,"Offset" integer NOT NULL DEFAULT 0 18 ,"MediaDate" bigint NOT NULL DEFAULT 0 19 ,"DeviceWrittenOn" integer NOT NULL DEFAULT 0 20 ,"FFlags" integer NOT NULL DEFAULT 0 21 ,"MediaDescription" varchar(1024) NOT NULL DEFAULT '' 22 ,"FragmentCheckpoint" smallint NOT NULL DEFAULT 0 23 ,"MediaSequenceNum" integer NOT NULL DEFAULT 0 24 ,"MediaExtents" varchar(4096) NOT NULL DEFAULT '' 25 ,"SnapshotClientMountHost" varchar(1024) NOT NULL DEFAULT '' 26 ,"CreatedDateTime" timestamp NOT NULL DEFAULT current utc timestamp 27 ,"LastModifiedDateTime" timestamp NOT NULL DEFAULT utc timestamp 28 ,CONSTRAINT "PK_DBM_IMAGEFRAGMENT" PRIMARY KEY ("ImageFragmentKey" ASC) The 17th field is offset, '2' for the womble_1443432148 backup, '5' for the womble_1443432175 backup The offset is the starting position of the fragment, relative to the beginning of the tape. Looking at what's on the tape with scsi_command -map root@womble db $ scsi_command -map -f /dev/rmt/0cbn 00000000: file 1: record 1: size 1024: NBU MEDIA header (A00000) 00000001: file 1: eof after 1 records: 1024 bytes 00000002: file 2: record 1: size 1024: NBU BACKUP header backup_id womble_1443432148: frag 1: file 1: copy 1 expiration 1443435748: retention 10: block_size 262144 flags 0x0: mpx_headers 0: resume_count 0: media A00000 00000003: file 2: record 2: size 32768 00000004: file 2: eof after 2 records: 33792 bytes 00000005: file 3: record 1: size 1024: NBU BACKUP header backup_id womble_1443432175: frag 1: file 2: copy 1 expiration 1443435775: retention 10: block_size 262144 flags 0x0: mpx_headers 0: resume_count 0: media A00000 00000006: file 3: record 2: size 32768 00000007: file 3: eof after 2 records: 33792 bytes 00000008: file 4: record 1: size 1024: NBU EMPTY header (file 3) 00000009: file 4: eof after 1 records: 1024 bytes eot The first column '00000000:' etc ... is the offset position ... and the backup headers show as being offset 00000002 and 00000005, which aligns with what is given in the image fragment table. So, knowing the starting position (image table) of the fragment relative to the beginning of the tape, and, the file position from the .f file, relative to the beginning of the fragment positions of files and amount of tape used can be worked out.

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi Martin

 

Thanks for the that super technical explanation. Can you explain what each line refers to. I'm not certain I understand how we know how much tape it used. How do the blocks get layed out?

Marianne
Level 6
Partner    VIP    Accredited Certified
Only if you have a flux capacitor....