cancel
Showing results for 
Search instead for 
Did you mean: 

Dedupe ratio predictor

effiko
Level 4
Partner Certified

I have the idea to write a small application in C++ (so it is portable between unix and Windows) to traverse the backup file tree and compute for each file (if smaller than a block) or block an MD5 hash. Block size shall be a parameter.

There is no need to have an incremental run as only the hashing is computed, but in version 2, this can be added using some portable free DB.

All the hash numbers shall be processed by either Excel or awk to produce a histogram which will give some idea of the expected dedupe ratio.

I wonder if anybody had it done already or you think something is wrong with this idea.

I'll apreciate your inputs.

 

3 REPLIES 3

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

effiko
Level 4
Partner Certified

Thanks Nagalla,

The mentioned post is a post backup analysis for images already in the DataDomain. What I have in mind is a program that will predict the dedupe ratio before installing any backup or dedupe engine on the clients premises more like a presales tool.

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

good idea.... it realy helps to predict the storage requirests also to somelevel...