Deduplication Bandwitdth requirement

Hi, We are still in the trial so trying to evaluate if the deduplicate option would be useful or not. From what I read, since we have remote offices, it could be really useful to save on bandwidth. We created a deduplication drive in BE15 and did some tests. We created 4x 8G files that we backuped once. The deduplication drive has 32G of data in it. We copied the 4 files on another server and tried to back them up. Miracle, the backup phase was very fast, practically no bandwidth was required as the client side deduplication worked like a charm and the deduplication drive stayed at 32 gigs since the files were already there. So this seemed amazing... But... During the verify phase, bandwidth was used between the BE15 server and the server being backed up. It caped our 60MBPS connexion but in Upload (traffic going from our BE15 TO the server being backed up)... Is this a normal behavior ? I don't see the point of saving the requirement of the download if we need to upload everything back to the server for verification. Both servers are Windows 2012 R2 with the BE agent on them. I guess I configured Something wrong but can't find anything Any help would be appreciated. Thanks
1 Solution

Accepted Solutions
Highlighted
Accepted Solution!

OK so what actually happens

OK so what actually happens is that we are doing an MD5 (or similar checksum) against the backed up data, but because the job has client side enabled against it, then the process responsible for doing the checksum comparision is the beremote process on the remote server, hence the beremote process is pulling all of the data (not just the changed  new chunks) across the network to rehydrate to do the checksum calculation. Note: even for non-client-side dedup the beremote process on the Backup Exec server would still have to rehdydrate the files in memory to do the checksum comparison, so it is still resource intensive to do a verify.

 

We have a Blueprint (best practices doc) for dedup available here:

https://www-secure.symantec.com/connect/articles/backup-exec-2014-blueprint-deduplication

If you open the PDF in the above link and go down to slide 21 where we start the DO NOT... section and then read slide 22, you will see that Verify is listed as one of the DO NOTs

View solution in original post

10 Replies
Highlighted

Re: Deduplication Bandwitdth requirement

Hi,

 

Never copy files...duplicate them otherwise Backup Exec isn't aware of the details within on the remote site rendering them unrestorable.

Duplicate the files correctly and then try the Verify again and see what the results are.

Thanks!

Highlighted

Re: Deduplication Bandwitdth requirement

Hi,

Thanks for the reply, but not sure I follow what you mean. ?

Just to clarify, we had 4 files "A", "B", "C" and "D" which were 8 gigs each (so 32 gigs total) these files were on the server X

We have BE15 with the deduplication disk on a server named BE15

We backuped files from server X on server "BE15"

We then copied the 4 files from server "X" to server "Y" and backuped them from the new server

As mentionned, all 4 files were backuped very fast on server "Y" as the file didn't have to be transfered over the network to server BE15

But during verify phase, there was a high traffic coming from the server BE15 to the server "Y"

Hope it's more clear.

Based on this, I don't understand your comment about never copying files VS duplicating them ? Could you clarify

Thanks in advance

 

Highlighted

Re: Deduplication Bandwitdth requirement

Are you doing server-side or client-side dedup?

Highlighted

Re: Deduplication Bandwitdth requirement

We're trying to do "Client-side dedup"

It seems to work fine as the 32 gigs was not transfered between server "Y" and server "BE15" in my previous example. So the client realized the files were on the deduplication device and didn't transfered them to the BE15 server...

What's strange is that during the verify of the job, 32 gigs were transfered between BE15 to the server "Y" which is a non sense to me

Highlighted

I believe that transfer is

I believe that transfer is due to the verification.  You can confirm this by not verifying the backup.

Highlighted

Re: Deduplication Bandwitdth requirement

Hi,

I confirm that is only due to verification.

But my question is, is this normal ? Aren't people veryfing there backup in the real world ?

So deduplication is cool because you save on disk space and bandwidth during the backup phase

But you have to upload everything back during the veryfing phase... Is this is by design, it's totally ridiculous (beside that you save space on disk or you you decide to turn off verification).

I feel like it's a nonsense/bug, if it's by design

Client side or server side takes the same amount of disk space (which is normal

Server side eats up all the bandwidth before deduplicating happens

Client side eats up all the bandwith after deduplication for the verification...

I don't see the point unless you sacrifice verification

I'm still hoping it's something I can fix with a setting in BE15 that I did wrong. Otherwise, sure I won't purchase this option.

Thanks

Highlighted

Re: Deduplication Bandwitdth requirement

You can delay the verification until the demand for bandwidth is less
Highlighted

Re: Deduplication Bandwitdth requirement

Smiley Happy

Thanks for the reply, but this just confirmed the deduplication is useless in our scenario.

I don't understand why the verify is not a simple md5 made on both side instead of transfering the whole backup between both machine... Bad implementation in my opinion.

Could be so much more useful if the verification was not written this way.

Anyway, thanks everybody for your help on the subject. I really appreciate the community help during the evaluation phase of the product

 

Highlighted
Accepted Solution!

OK so what actually happens

OK so what actually happens is that we are doing an MD5 (or similar checksum) against the backed up data, but because the job has client side enabled against it, then the process responsible for doing the checksum comparision is the beremote process on the remote server, hence the beremote process is pulling all of the data (not just the changed  new chunks) across the network to rehydrate to do the checksum calculation. Note: even for non-client-side dedup the beremote process on the Backup Exec server would still have to rehdydrate the files in memory to do the checksum comparison, so it is still resource intensive to do a verify.

 

We have a Blueprint (best practices doc) for dedup available here:

https://www-secure.symantec.com/connect/articles/backup-exec-2014-blueprint-deduplication

If you open the PDF in the above link and go down to slide 21 where we start the DO NOT... section and then read slide 22, you will see that Verify is listed as one of the DO NOTs

View solution in original post

Highlighted

Re: Deduplication Bandwitdth requirement

Thanks Colin for the whitepaper.

It's strange to not use "verify" in the sense that I always felt like you gotta check your backups. But it's a confirmation that it's designed this way.

Thanks everyone for your help.