hi see the below effort..for

effiko · ‎08-05-2013

I have the idea to write a small application in C++ (so it is portable between unix and Windows) to traverse the backup file tree and compute for each file (if smaller than a block) or block an MD5 hash. Block size shall be a parameter.

There is no need to have an incremental run as only the hashing is computed, but in version 2, this can be added using some portable free DB.

All the hash numbers shall be processed by either Excel or awk to produce a histogram which will give some idea of the expected dedupe ratio.

I wonder if anybody had it done already or you think something is wrong with this idea.

I'll apreciate your inputs.

RamNagalla · ‎08-05-2013

hi see the below effort..for dedup ration in data domain..

https://www-secure.symantec.com/connect/forums/how-get-individual-dedupecompressionrawspace-rates-cl...

effiko · ‎08-05-2013

Thanks Nagalla,

The mentioned post is a post backup analysis for images already in the DataDomain. What I have in mind is a program that will predict the dedupe ratio before installing any backup or dedupe engine on the clients premises more like a presales tool.

RamNagalla · ‎08-05-2013

good idea.... it realy helps to predict the storage requirests also to somelevel...

VOX

Dedupe ratio predictor