Pre stage data for slow WAN deduplication

mickelingon · ‎05-11-2010

Hi

I wonder if you can prepare deduplication over slow WAN connection.
I mean, you don't want to do the initial full backup over WAN to start the dedup backup jobs.
Can you do this by transfering the data with a USB disk to the onsite network where the backup environment is located and the run dedup jobs over the WAN just backing up changed blocks.

Hope the question is not to fuzzy:-)

Mike

Colin_Weaver · ‎05-11-2010

Yes you can (although I have not tried it)

You can seed the data by copying most of the data to a USB disk and then backing up that data to the same DeDup target that you intend to use for your server backup

The other way to seed the data is if you build your servers as images anyway - then backup a local server that is based on the same image and then backup the remote server afterwards

If you image the server to usb - using say BESR or Ghost then be aware that to seed it Backup Exec will need to see the original files system, not the image containmer files that contain the fiel system so you will have to extract the data first if using that sort of technology.

mickelingon · ‎05-11-2010

Thanks, I'll try it

Mike

rcdauria · ‎09-12-2010

Hey guys,

Has anyone achieved this goal?

I've read (don't remember where) that BE Dedup is based on the source of the data, so this strategy suggested by Collin wouldn't work.

For example, if a have an 100MB file in a server, backed up via source dedup, and then i try to backup the same 100MB file in a second server, it wouldn't transmit only unique blocks, as they are from different sources/servers.

I imagine GLOBAL DEDUP (the oposite of the example) would happen only in PureDisk (full version).

Any thoughts? I basically have a 50GB database, trying to backup it via 4Mbps WAN (first backup should run for 70hrs, too much for an instable link).

Thanks a lot in advance!

teiva-boy · ‎09-12-2010

Rafael,

On client side dedupe, the client will scan all data to be backed up, the data broken up into segments and an MD5 (like) hash created. This data is then sent to the BE server with the dedupe option installed and configured.

From there the BE server responds back to the client with what segments to send and what it already has.

So Colin's scenario would work.

rcdauria · ‎09-13-2010

Hi Teiva,

Thanks for your reply. Actually what we've seen is that the job is copying all the data again. Here's my scenario (detailed):

-ServerA: BE2010 R2 Media Server + Dedup Folder
-ServerB: SQL 2008 R2 (50GB database)
-ServerC: SQL 2008 R2 (50GB almost identical database.. gap of 1 day, but the database had really few updates).

ServerA and ServerB are near (1Gbps connection), and ServerC is far away (4Mbps connection).

My steps:

1)Backup ServerB database to the Dedup Device - total time = 2hs, dedup ratio of 1.1:1 (first copy).
2)Backup ServerB database again - total time = 10mins, dedup 314:1 (so dedup is working, and direct access too)
3)Backup ServerC database - copied 2GB in 3hrs, so source dedup is not working.

I've cancelled the job to see if it was working with RADA, and it was (the warning stating that source dedup was not used didn't appeared). If I block port 10000 and re-run the job, it shows me the warning.

Another curious fact: If I run the job again, it will go fast to the 2GB (1GB/min), and then slow down again. So we can conclude that the 2GB that were copied from ServerC (first job) in fact were stored, but the rest of the data is not being compared with the MD5 hashs generated from ServerB backup.

Any ideias?

Thanks a lot for your help, and sorry the bad english. :)

Rafael

rcdauria · ‎09-13-2010

Guys,

I've made another test with a file, just to be sure that it wasn't an behavior caused by backing up an SQL Database.

1)Copied an 100MB file to the local server (ServerB) and remote server (ServerC).
2)Run a backup of the ServerB (40 seconds).
3)Run a backup of the ServerC with source dedup (5 minutes) - Source dedup didn't compared the blocks with the data from the other server.
4)Run again a backup of the ServerC with source dedup (20 seconds) - Source dedup DID work, but clearly was compared with the hashes from ServerC (step 3) and not ServerB (step 2)

In a few words: No source dedup from different machines.

Rob_peters · ‎12-09-2010

I am looking to setup a Dedup server on my network where the Media server will backup files from 10 different servers all over a WAN connection, in remote locations. One of which is an Exchange server with 300GB of databases and other SQl databases on another server, the rest are file servers.

I understand about pre-staging the file servers on a USB drive from the source using an identical file structure and backing it up onto the destination media server using dedup. But how can I pre-stage the Exchange databases and SQL databases? Will these databases also get pre-staged on a USB drive using the same identical directory structure from the source exchange server? Or does it not work the same for live exchange databases? Same for SQL databases, will the SQL databases need to be backed up on a USB with the same directory structure?

Thanks, this post has been very helpful.

Rob

VOX

Pre stage data for slow WAN deduplication