Blog Post

Protection
2 MIN READ

Don't Get Duped by Dedupe

Mayur_Dewaikar's avatar
13 years ago

If you are evaluating dedupe solutions, the dedupe ratios claimed by dedupe vendors are bound to intrigue you.  I have seen claims of dedupe rates as high as 50:1 and I am sure there are claims of even higher dedupe than that. Are such dedupe rates realistic? Speaking truthfully, yes, but one must understand the assumptions and the math behind such high dedupe rates.  These dedupe rates generally have the following assumptions:

  1. Logical Capacity: Logical capacity is the amount of data one “would have” stored with no dedupe or compression. So for example, if you are protecting 20 TB of data for 30 days and if you are running daily backups, your total data protected data (in theory) is 20 x 30= 600 TB. In practice, for an environment with average change rate, backend dedupe capacity is equal to the front end capacity for a 30 day retention period. So assuming 20 TB of dedupe storage is needed, your dedupe ratio is 600/20 = 30:1. While this makes perfect sense from a marketing message standpoint, such “simplified math” is not practical for sizing real production deployments.  
  2. Mostly static data with low change rate
  3. Longer retention periods

For an average environment, it is very difficult to see such dedupe rates unless of course the data is fairly static and is retained over an extended period of time and lastly if you consider “logical capacity” to be a fair basis for measuring data reduction. The following table characterizes the impact of these dedupe ratios on a 100 TB environment.

 

Example of 100 TB of protected data

Dedupe ratio

Percent Reduction

Actual Data Stored

Impact

10:1

90%

10 TB

N/A

20:1

95%

5 TB

5 TB

30:1

96.67%

3.33 TB

1.67TB

50:1

98%

2 TB

1.33TB

99:1

99%

1 TB

1 TB

 

As you can see from the example above, the difference between 20:1 and 99:1 dedupe for a 100TB environment is exactly 4 additional terabytes of backend capacity. The difference between 50:1 dedupe and 99:1 dedupe is exactly 1 additional terabyte of backend capacity.

The point being, the incremental value in storage savings is very minimal as the dedupe rates go up and the actual math on paper looks much less attractive as compared to what is often claimed in product data sheets and dedupe reports within the product GUIs and so dedupe ratios should not be the only deciding factor in making a decision on a dedupe solution.

When evaluating a dedupe solution, it is important to keep in mind that dedupe is simply a feature that helps reduce cost of storage.  And while many vendors offer dedupe with their storage products, the secret sauce is really the integration with the backup application. The more functionality you have access to via the integration of your dedupe solution with your backup solution, higher is the value received from your dedupe investment. And so, when evaluating a dedupe solution, don’t get duped by dedupe and think beyond the superfluous dedupe ratios.

 

Published 13 years ago
Version 1.0
No CommentsBe the first to comment