Deduplication - impacts of data the is already compressed, encrypted or large numbers of tiny files.
Is this still a significant problem? Had a quick look at the doco on deduplication and there is no mention of if.
In the past vendors including Veritas/Symantec warned that data that is already compressed and/or encrypted would result in poor deduplication performance, the same with large numbers of tiny files files. Compressed data includes stuff like audio files (mp3, wma, ogg etc), video files (avi, mkv, mp4 etc), image files (png, bmp, jpeg etc), PDFs, zipped files.
A while back I tested the rumour that of the gzip & pigz using the --rsyncable option is deupe friendly friendly. I found it had significant improvment in dedupe performance on commvault, and it there was only a very small size difference between the compressed files with/without the --rsyncable option.
The amount data can be deduped can have a big impact on what can be logically stored.... example - click to see full view
Thanks, my concern was the lack of information was the lack of documentation implying that things had changed and I was unable to provide verification to people of the impacts of file size and data that was already compressed and/or encrypted. Something I have telling people since the days of Pure Disk when it was DIY and came on DVD.
I eventually found a reference in the Backup Planning and Performance Tuning Guide Release 8.3 and later. In the most obvious place : Deduplication Guide Unix, Linux, Windows Release 10 there is no mention of Backup Planning and Performance Tuning Guide in providing additional information The dedupe guide only mentions performance being impacted by "small files"