cancel
Showing results for 
Search instead for 
Did you mean: 

De-Duplication function question

PCAU
Level 4

Hi

Can someone please explain the De-Duplication option to me, my understand of it might not be correct.

The server which is being backed up, Server X has 1TB of user data on it and from running File Server Resource Manager reports I know there's approxaimately 100GB worth of duplicate files as users like to keep their own copies of stuff everywhere.

Now my understanding of the De-Duplication option is that the data in the above senario will be de-duplicated apon running of the backup, so in a very basic calculation for simplicity (not taking into affect any other duplicate blocks) the total backup size should be around the 900GB.

The second nights backup will also run, not only de-duplicating the data on the file structure as described above but also comparing the previous nights backup to the current one.

This is why it can take several backup cycles for the De-Dup ratio to increase and the actual savings to become apparent.

Is this correct in it's very basic form?

In the testing I've been doing, I'm seeing very poor de-duplication ratios on the first say 2 backup cycles. This is well documented to being normal. I am however surprised that a traditional backup with compression enabled still achieves a much better ratio on the first 2 nights worth of backup which is why im beginning to think that the user data isn't de-duplicated at all and it is only the backup cycles as a whole that is compared and de-duplicated with previous backup cycles?

Hope that makes sense.

1 REPLY 1

teiva-boy
Level 6

There are a few things I would try here, and perhaps too, verifying some confusion I've seen too...

 

1.  On server X, what is the data you are backing up?  Multimedia files of any type do NOT dedupe.  Nor do AutoCAD files in many cases.  Though, you will get SIS of them over subsequent backups.  These include mpg, mp3, tiff, jpg, etc...  Anything audio/visual/images, just doesnt dedupe

2.  Try to enable the compression value in the pd.conf file on server X.  Change the value from 0 to 1.  

3.  Right click the dedupe folder location, and compare it from backup #1, then do another backup, did the folder grow by 2x?  If so, something is a miss.  Much like tape compression being low, the value reported on overall dedupe ratios when you highlight the dedupe folder device in BE doesnt seem to match the folder growth via Windows explorer.  I think graphically in the GUI, something is wrong, as my folder actually doesn't grow all that much, let alone the 2-4:1 ratios I'm seeing!

4.  Are you doing client side dedupe or media server dedupe?  If client side, you should see in the job log what amount of data was actually sent in KB.  This is the data that changed since the last backup.  It should be a much smaller value than the "Scanned," value in the job log.

5.  I've seen much better performance and stability in BE2010 R2 over BE2010.  Dedupe rates still vary widely from run to run with no consistency, but at least the darn thing is stable!  

 

For home directories, I'm seeing anywhere from 200:1-600:1 ratios, SQL only about 2:1, and vmware about 200:1-300:1 lately.  Averaged out according to the reports, it claims only a 4:1 reduction.  Which I think is wrong from a reporting perspective.