cancel
Showing results for 
Search instead for 
Did you mean: 
Sym-cr
Level 5


Understanding Compression in Backup Exec:

An explanation of Hardware and Software compression used with Backup Exec

 

How does compression work?

It can be better explained with an example. Consider a text file (.txt) with its content as “aaaabbbbbccaaaddd

Repeated occurrences of characters in sequence:

a = 4 times

b = 5 times

c = 2 times

a = 3 times

d = 3 times

Now, when compression algorithm is applied, the above string which had 17 characters will be compressed to: “4a5b2c3a3d

So, the provided string has been compressed from 17 characters to 10 characters resulting in compressed text file to be smaller in size, when compared to the actual file size.

When the compressed string is decompressed (4a5b2c3a3d), it provides the actual string (aaaabbbbbccaaaddd)

Note: This example is just to understand the basic compression method and does not prove to apply any standard compression algorithm.

 

Different types of compression that are commonly used with Backup Exec

Hardware Compression:

Hardware compression is provided by the drive and the related compression is performed by the drive at the drive level. However, we can specify the backup job to use Hardware as the compression.

In Backup Exec 2010:

This can be done by editing the Backup job properties, clicking on General under Settings, and then selecting a different type of compression under the Compression Type drop down) or globally from Tools-> Options-> Under Job Defaults- Backup-> Compression Type.

In Backup Exec 2012:

Choose the backup job-> under Backup Options-> Storage-> Compression

Advantage: No System Overhead as the compression is performed by the drive.

Disadvantage: The restore of the data which was compressed by a different drive can cause issues.

Requirements: The drive should support hardware compression. (Refer the drive’s manual with exact model-number of your drive for the feature information or contact the drive manufacturer.)

These days compression on the drives can be set, which lets the user to enable or disable the hardware compression by sending SCSI commands to the drive.

 

Can we set or modify the compression ratio?

No, it’s not possible to set or modify the compression ratio. There is no option in Backup Exec to set the compression ratio. Backup Exec only transfers the data to the hardware for compression and waits for the reply about the amount of data been compressed. After receiving the reply from the hardware, it displays the compression ratio on the Backup Exec console.

NOTE:

a) Image and picture files are fully compressed on disks. Therefore, when backing up these types of files, no hardware compression takes place, which means that the tape drive is operating at its native (non-compression) rate of speed. Hardware compression is performed by the tape device and not the backup software.

b) The compression ratio of 2:1 as claimed by the vendors can be assumed to be seen in ideal scenario. Different files have their own compression capacity. Text files can be highly compressed when compared to high density graphics files. The files with .jpg, .gif, .zip, or .cab extensions are not likely to compress at all. Also, there are many applications, including some database and mail applications which already compress the files they deal with.
Example:

Already compressed files (.zip, .arj, .iso, .rar, etc)

Natively compressed file formats (.jpg, .gif, .mpg etc)

c) Already compressed files when compressed further could even become bigger than its actual size. This can highly affect the storage capacity of the tapes resulting in holding the data even lesser than their native capacity.

 

Software Compression:

Software compression is something which can be configured in the Backup Exec.

In Backup Exec 2010:

This can be done by editing the Backup job properties, clicking on General under Settings, and then selecting a different type of compression under the Compression Type drop down) or globally from Tools-> Options-> Under Job Defaults- Backup-> Compression Type.

In Backup Exec 2012:

 Choose the backup job-> under Backup Options-> Storage-> Compression

Software compression compresses the data either before it is sent from a remote system if backing up using a remote agent, or before being sent to the tape drive if backing up a local host server system.

Advantage: It does not depend on the drive. Hence, if the compression is performed using any version of Backup Exec, there will be no issues while restoring the same data.

Software compression can be useful if you are performing remote backups and have limited bandwidth between the host server and remote agent and want to try to reduce the backup window. It can in some cases be more efficient than hardware compression.

Disadvantage: Depending on your environment some system overhead might be involved. However, it’s quite negligible.

Requirements: The compressed back up data needs to be restored by using the same software (any version).

 

Disk Capacity:

Native 
This is the real capacity of the media, and it’s close to the stated native capacity of any tape drive/media.

Compressed
It is the normally the capacity which the device is marketed as being able to hold. However, this depends on the amount of compression for the data being backed up.

 

How to compute the Compression Ratio?

Compression ratio is the ratio of a file’s original size divided by its compressed size. A compression ratio of 2:1 means the compressed file is ½ as large as the original file.

If the Compression Ratio is (a:b)

Uncompressed data writing capacity = b x (size of the disk)

Compressed data writing capacity = a x (size of the disk)

 

REFERENCE:

http://www.symantec.com/docs/TECH8326

http://www.symantec.com/docs/TECH49521

http://www.symantec.com/docs/TECH78389

http://www.symantec.com/docs/TECH50960

http://www.symantec.com/docs/TECH6076

Comments
CraigV
Moderator
Moderator
Partner    VIP    Accredited

Hi,

 

did you delete your previous article and recreate this?

Thanks!

Sym-cr
Level 5

Yes..Craig. With few corrections.

pkh
Moderator
Moderator
   VIP    Certified

a) How does your compression algorithm handle this string?

aaaa00000000000000000000bbbbbccaaaddd

Note that there are twenty (20) zeros (0) between the string of a's and the string of b's

b) You claimed that the overhead for software compression is negligible.

1) Can you provide some documentation to substantiate your claim?

2) If this is true, why would one want to use hardware compression?

CraigV
Moderator
Moderator
Partner    VIP    Accredited

...interesting that the questions above haven't been answered. Would be good to know how you arrived at your findings...

Thanks!

CraigV
Moderator
Moderator
Partner    VIP    Accredited

...any feedback here on the queries?

Thanks!

b_meaney5630
Not applicable

Craig/PKH,

I'm not a Symantec expert, but that example was a VERY simplified example.

As i'm sure you're aware, a text file is not really seen by the operating system as the contents of the file.

Here's a better example...when you back up a text file with the contents "CCCC", it might pass this string to the backup system. (This is NOT a real example. Just more demonstrative of what it may look like.)

0x00140x00230x00050x00050x00050x00050x00010x0055

--- 0x0014 indicates that the file is a text file

--- 0x0023 indicates the beginning of the line is being shown

--- 0x0005 indicates the letter C in the text file

--- 0x0001 indicates the end of line

--- 0x0055 indicates the end of the file

the backup system can replace 0x0014 with something like %1, likewise the beginning of line 0x0023 could be %2, end of line 0x0001 could be %3, and 0x0055 could be %4. 0x0005 could be %5

That 48 character string is now 16 characters --- %1%2%3%3%3%3%4%5

In one text file, that is a 75% reduction in a one line text file from 48 characters to 16. But imagine if you had 100 lines of text. that %2 and %3 could reduce 600 characters (0x0001 is 6 characters, times 100) down to 200 characters (%2 is 2 characters, times 100).

On top of that, if you have 100 text files, you can replace all 100 0x0014's and 0x0055's with %1 and %4

Tape drives have this interpreter built in to read and write this "deduplicated" data, so that is what they refer to as hardware compression. Software compression involves the computer you run BE on opening up each file, finding repeated strings, doing the substitution itself, and then writing the file. Depending on the size of the file, and whether or not BE loads the whole file into memory. If you're deduping a 4GB log file, you may see more resource utilization than smaller 50KB apache configs, for example.

Hopefully this clears things up a bit.

Version history
Last update:
‎03-24-2014 02:58 AM
Updated by: