cancel
Showing results for 
Search instead for 
Did you mean: 

Source Dedup doesn't work after database re-index

Rafael_DAuria
Level 4

Hey Guys,

I am having a problem in BE2010 while backing up an SQL 2008 Database (40GB) with source dedup via an 4Mbps WAN connection.

MON - First backup took a lot of time (say 48 hours), but that's OK

WED to FRI - Backups took about 20min, as source dedup was working smoothly

SAT - Re-index SQL job was run on the database, to re-organize database indices and pages

SUN - Backup took 30 hours

Do you guys have seen this behavior? I don't think it should happen, as the re-index in fact doesn't change a lot of data in the database.

Toughts?

Thanks a lot in advance!

1 REPLY 1

teiva-boy
Level 6

The reindex while not changing the data within the database, is changing the blocks in which the data is written to, thus the segments change, thus new segment hashes are created and basically deduped data from previous passes is not useful anymore.

the BE dedupe engine is breaking data into 64KB segments (I think, it's been a while since I verified).  Doesn't matter if the file is 1MB in size, or 1GB in size.  it's broken into smaller fixed segments.  Since you reindexed the data, the data in each cluster on the disk is reallocated, thus the segments that get analyzed change as well.  negating all previous stored segment hash values.

 

Is there a way around this?  Perhaps, it's up to the product to evolve to have better application stream handlers to actually peer into the file and see what kind of data it is, and not just look at it as a flat file.  It's also a process in which perhaps maintenance windows done on servers just affect dedupe/backup performance and this is the way it is...

 

e.g. I had SAN replication setup between two states, lots of SQL data being replicated.  reports were running slow, so the DBA reindexed the database for the reporting server.  What was only a 40GB database, caused some 700GB of block level changes to the SAN LUN's and that is what had to be replicated!  Needless to say, we had to re-evaluate SAN replication on a per-application basis.  Simple tasks like eseutil, reindexing, defrags would all affect replication.