cancel
Showing results for 
Search instead for 
Did you mean: 

Deduplication with Duplicate Job

Ldoodle
Level 5

Hiya,

We are testing the deduplication option in 2010 R3 and our long term goal is to duplicate backup sets from one site to another (remote office rather than data center).

I was just wondering if the duplicate job would copy the size of the backup data without as reported by Backup Exec (our file server for example is ~200GB) or if it would only duplicate the deduplicated data size, say 300MB per day increase which is about what we're running at?

Also, is the deduplicated data a complete set in it's own right. I'm thinking in terms of restore, I guess I would need the actual first 'full' (deduplicated) backup plus the last deduplicated backup. So effectively i'm getting the benefit of differential backups in terms of backup sizes along with the benefit of full backups in terms of restore?

Oh one thing though, our deduplication backups are taking the same time as a full backup? Is this correct - I would expect a deduplicated backup to be only slightly longer than a differential.

1 ACCEPTED SOLUTION

Accepted Solutions

Kiran_Bandi
Level 6
Partner Accredited

Let us say your first day dedup job backed up 2 GB of data. The duplicate job which is related to that will duplicate that 2GB of data to target dedup storage folder. Second day dedup job backed up 20 MB of data then the duplicate backup job will duplicate only that 20 MB of data to target dedup storage folder.

Regards....

View solution in original post

10 REPLIES 10

AmolB
Moderator
Moderator
Employee Accredited Certified

Deduplication is meant to save space and not time. During deduplication lots of processing happens 

in the background hence the job takes long time compared to disk backup.

When you duplicate the deduplicated data, it is rehydreated to its original form so the size of the 

duplicate data will be the same as it was of the original data.

Kiran_Bandi
Level 6
Partner Accredited

1) Are you speaking about optimized deduplication? Duplicating backup data between same kind of devices. (Openstorage devices or dedup storage folders). In this only unique data will be copied between the devices also.

But If you duplicate a deduplicated backup to tape, data will be rehydrated before writing to tape.

Refer: http://www.symantec.com/docs/HOWTO51853

2) You have to do a restore from deduplicated backup sets, same like normal restore jobs.

Regards...

Ldoodle
Level 5

The plan is to have a BE 2010 R3 server in each office, with their own deduplication storage folder. Each server will be aware of each other's deduplication storage folder and the duplicate backup job will use the remote deduplication folder as the target device.

I was hoping to be able to only duplicate the changes as we only have 10Mb between offices, so duplicating over 500GB of data each night is obviously not do-able!!

If this is not how it works, how can I achieve this without doing differential backups? Seems a bit stupid for a duplicate job to not only duplicate 'different' data.

Kiran_Bandi
Level 6
Partner Accredited

Let us say your first day dedup job backed up 2 GB of data. The duplicate job which is related to that will duplicate that 2GB of data to target dedup storage folder. Second day dedup job backed up 20 MB of data then the duplicate backup job will duplicate only that 20 MB of data to target dedup storage folder.

Regards....

Ldoodle
Level 5

Thanks Kiran, that is how I was expecting it to work!

Can you confirm how this is setup. I've just tried to add another dedupe storage folder (on a remote server) but it said only one dedupe folder can exist.

I was thinking it would be the same as a remote B2D folder, where you just select a network share on a remote computer?

Kiran_Bandi
Level 6
Partner Accredited

Yes. only one dedup storage folder per media server is allowed.

To comfigure duplication between dedup storage folders you need CASO. You can also copy data from dedup storage folder on a managed media server to dedup storage folder on another managed media server.

Regards....

robnicholson
Level 6

Deduplication is meant to save space and not time

I thought one of the selling points of de-dupe was reduced backup windows as far less data is been written to backup store? I agree that space saving is it's major selling point.

People say "Lots of processing is going on" with de-dupe but I'd love to know exactly what this extra processing is? The file still has to be read from disk no matter what method is used. De-dupe does a hash of the 64k blocks and then a database looking using that hash as an index. Where is the bottleneck? Is the hash algorithm slower than writing the 64k blocks to backup disk? Or is the index lookup slow? Or a combination of both?

In my tests, client side de-duplication was slower than server side (about 20% in some cases) which was perplexing as I cannot understand how calculating the 64k block hashs locally and then looking up the hash over the LAN (to the Perfect Disk database) was slower than transferring the entire file across a 1GBit/s network and letting the media server do the same thing. Just didn't add up. But haven't had time to do further tests when the network is quiet.

Cheers, Rob.

Ldoodle
Level 5

By "You can also copy data from dedup storage folder on a managed media server to dedup storage folder on another managed media server" I guess you mean a manual copy.

If you 'fudge' it like this the 2 media servers won't be aware of each other which I wouldn't really want to do, as i'm guessing the CAS option also 'mirrors'  catalogs etc so if straight after a backup and duplicate job, the primary site blows up, you can do an instant restore in the remote site as if it was actually backed up there?

Kiran_Bandi
Level 6
Partner Accredited

Managed media servers are part of CASO only. Copy doesn't mean manual copy. duplication only.

Ldoodle
Level 5

Hi Kiran

My duplicate jobs seem to be duplicating the full whack of data each time (ie the same byte count as the backup job itself). I am testing this with both servers in the same physical site connected via a 2Gbps NIC team and the duplicate jobs are taking nearly 4 hours to complete.

So i'd imagine when moving the second server to the remote site, speed will pretty much diminish over the 10Mbps WAN connection.

UPDATE: I am duplicating data from the CAS to the MMS. No actual backup jobs will exist on the MMS, it is literally just there to store the backup data redundantly. I'm reading the Optimized Duplication white paper and it seems to refer to backup jobs on the MMS being duplicated to the CAS (i.e. the opposite to how I am doing it?). Have I made a boo boo, or can optimized duplication be done in either direction?