08-05-2013 06:11 AM
I have the idea to write a small application in C++ (so it is portable between unix and Windows) to traverse the backup file tree and compute for each file (if smaller than a block) or block an MD5 hash. Block size shall be a parameter.
There is no need to have an incremental run as only the hashing is computed, but in version 2, this can be added using some portable free DB.
All the hash numbers shall be processed by either Excel or awk to produce a histogram which will give some idea of the expected dedupe ratio.
I wonder if anybody had it done already or you think something is wrong with this idea.
I'll apreciate your inputs.
08-05-2013 08:51 AM
hi see the below effort..for dedup ration in data domain..
08-05-2013 10:55 PM
Thanks Nagalla,
The mentioned post is a post backup analysis for images already in the DataDomain. What I have in mind is a program that will predict the dedupe ratio before installing any backup or dedupe engine on the clients premises more like a presales tool.
08-05-2013 11:19 PM
good idea.... it realy helps to predict the storage requirests also to somelevel...