cancel
Showing results for 
Search instead for 
Did you mean: 

media list question

pylej
Level 4
Hi Guys,

I am a bit confused by the output from my media list report. We use LTO-2 tapes compressed which can hold 400 gigs of data. How is it possible when I run my media list reports that some media has more then 400 gigs written to it?

Thanks for the help
1 ACCEPTED SOLUTION

Accepted Solutions

James_Perry
Level 4
I think I can help out here a bit. 

The 200 GB native backup number means that you will get a maximum of 200 GB on the tape with 0% compression, or basically an exact, unmodified copy between source and tape.  The 400 GB compressed number comes from an estimated 2:1 compression ratio of the data.  All the tape does is to store the data written from the tape drive.  It is the tape drive that contains a hardware compression / decompression algorithm that controls compressing the source data before writing it to tape.

A set of data's ability to be compressed comes not from theoretical numbers of the tape manufacturers but the compressibility of the data, as allided to above.  It is best explained with an example of very generic compression techniques.  The most basic one I know scans through a block of data of size X and sees if there are repeated sequences of bytes of Y bytes in length.  If this sequence is located the compression algorithm removes the later repeated sequence of bytes and replaces them with a pointer to the start of where it first saw the sequence and how many bytes were replaced.  See the examples below, hopefully my explaination will be clearer then.


Example 1 - Data string with no repeated data blocks:
1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

Example 2 - Data with some repeated blocks
1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
A                                                                                                                                           Z



Now in example 1 you would not get any compression if backed up to tape as there is no sequence of repeated characters.

But in example 2, the 12...YZ line of 62 bytes repeats two times.  Say it only takes 6 bytes for the compression algorithm to point the first occurance and note that it removed 62 repeated bytes, then the 124 bytes making up the second and third lines is reduced to 12 bytes.  The last line has 60 spaces that can be reduced to 7 bytes, one for the space and 6 for the pointers and count giving backup 53 more bytes.  My basic compression here reduces 248 bytes of source data down to 83 bytes.  This is approximately a 3:1 compression ration or a 66% reduction in data size between source and tape.

So if you had files that were compressible on average of 3:1, you would see approximately 600 GB of data on your LTO-2 tape. 

Files like images and already compressed archives like Zip and Rar files are already compressed so the tape drive does not give much, if any, additional compression to them.  Other files like database files with lots of white space for padding out data cells compresses very well.

Hope I cleared this up a bit.



View solution in original post

14 REPLIES 14

Will_Restore
Level 6
429496729600 looks to be more than 400G but it's not. 

Andy_Welburn
Level 6
but I'll provide the info (as I see it) again, so please bear with me! ;)

We use LTO3 tapes (400Gb native, 800Gb compressed - as advertised 'on the tin').

Now if I look at the data stored on my FULL tapes, the total data stored per tape ranges from (currently) ~200Gb to ~1.2Tb. It all depends on the type of data stored & how it is affected by compression - some data is much more 'amenable' to compression than others. This can especially be true if trying to compress a file thats already compressed - it can make it larger!

***EDIT***

A couple of URLs that I kindly provided in one of my previous posts which "may explain better":

http://www.tapestockonline.com/dltadrdacoq.html

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=lpg50244

pylej
Level 4
Hi Bill,


Why is that not more than 400 Gig? The media list give me the output in kb so that figure 429496729600 would be 429 Tera in my opinion.

Thanks
Jim

pylej
Level 4
Hi Andy,

Ok I will read the documentation, and I understand about how some files are affected differently by compression, but I would not expect that the media list would give me numbers that are higher then what is possible to be put on the tape.

I need to read more about this.

Thank you

Andy_Welburn
Level 6
I didn't make them up, honest!

As some data can be compressed more readily you can get more data onto a tape than what is advertised.

NetBackup is reporting the uncompressed data size. The data is then compressed onto tape - the amount of data that can be stored on the tape then depends upon the type of data itself & how it responds to compression. Hence my reported figure of 1.2Tb on a LTO3 (400/800Gb) tape.

Andy_Welburn
Level 6
I believe the figure he quoted was in bytes as opposed to the kb that the media list reports. A common misconception that 400Gb equates to 400,000,000 kb or 400,000,000,000 bytes, forgetting the 1024 multiplier/divisor.

I think we are all guilty of this at some time for expedience. ;)

Will_Restore
Level 6
I see what you are saying.  We've got some LTO2 tapes that appear to have over 500Gb while the rated capacity is 200/400GB native/compressed. 

Will_Restore
Level 6
At quick glance I would expect to see 400 "and change" and not 429... (or 419K)  :)

James_Perry
Level 4
I think I can help out here a bit. 

The 200 GB native backup number means that you will get a maximum of 200 GB on the tape with 0% compression, or basically an exact, unmodified copy between source and tape.  The 400 GB compressed number comes from an estimated 2:1 compression ratio of the data.  All the tape does is to store the data written from the tape drive.  It is the tape drive that contains a hardware compression / decompression algorithm that controls compressing the source data before writing it to tape.

A set of data's ability to be compressed comes not from theoretical numbers of the tape manufacturers but the compressibility of the data, as allided to above.  It is best explained with an example of very generic compression techniques.  The most basic one I know scans through a block of data of size X and sees if there are repeated sequences of bytes of Y bytes in length.  If this sequence is located the compression algorithm removes the later repeated sequence of bytes and replaces them with a pointer to the start of where it first saw the sequence and how many bytes were replaced.  See the examples below, hopefully my explaination will be clearer then.


Example 1 - Data string with no repeated data blocks:
1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

Example 2 - Data with some repeated blocks
1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
A                                                                                                                                           Z



Now in example 1 you would not get any compression if backed up to tape as there is no sequence of repeated characters.

But in example 2, the 12...YZ line of 62 bytes repeats two times.  Say it only takes 6 bytes for the compression algorithm to point the first occurance and note that it removed 62 repeated bytes, then the 124 bytes making up the second and third lines is reduced to 12 bytes.  The last line has 60 spaces that can be reduced to 7 bytes, one for the space and 6 for the pointers and count giving backup 53 more bytes.  My basic compression here reduces 248 bytes of source data down to 83 bytes.  This is approximately a 3:1 compression ration or a 66% reduction in data size between source and tape.

So if you had files that were compressible on average of 3:1, you would see approximately 600 GB of data on your LTO-2 tape. 

Files like images and already compressed archives like Zip and Rar files are already compressed so the tape drive does not give much, if any, additional compression to them.  Other files like database files with lots of white space for padding out data cells compresses very well.

Hope I cleared this up a bit.



Andy_Welburn
Level 6
Very eloquently put! I think I'll bookmark that (with the previous URLs) for future forum posts! ;)

Karthikeyan_Sun
Level 6
 Very Helpful Information James.. Thank you.

David_McMullin
Level 6
Good compression 101

pylej
Level 4
Thanks alot James!

pylej
Level 4
Hi Andy,

Thanks alot when you said "NetBackup is reporting the uncompressed data size" then is all started to make sense for me.

Thank you!