Forum Discussion

MSDI's avatar
MSDI
Level 4
9 years ago

Deduplication Bandwitdth requirement

Hi, We are still in the trial so trying to evaluate if the deduplicate option would be useful or not. From what I read, since we have remote offices, it could be really useful to save on bandwidth. We created a deduplication drive in BE15 and did some tests. We created 4x 8G files that we backuped once. The deduplication drive has 32G of data in it. We copied the 4 files on another server and tried to back them up. Miracle, the backup phase was very fast, practically no bandwidth was required as the client side deduplication worked like a charm and the deduplication drive stayed at 32 gigs since the files were already there. So this seemed amazing... But... During the verify phase, bandwidth was used between the BE15 server and the server being backed up. It caped our 60MBPS connexion but in Upload (traffic going from our BE15 TO the server being backed up)... Is this a normal behavior ? I don't see the point of saving the requirement of the download if we need to upload everything back to the server for verification. Both servers are Windows 2012 R2 with the BE agent on them. I guess I configured Something wrong but can't find anything Any help would be appreciated. Thanks
  • OK so what actually happens is that we are doing an MD5 (or similar checksum) against the backed up data, but because the job has client side enabled against it, then the process responsible for doing the checksum comparision is the beremote process on the remote server, hence the beremote process is pulling all of the data (not just the changed  new chunks) across the network to rehydrate to do the checksum calculation. Note: even for non-client-side dedup the beremote process on the Backup Exec server would still have to rehdydrate the files in memory to do the checksum comparison, so it is still resource intensive to do a verify.

     

    We have a Blueprint (best practices doc) for dedup available here:

    https://www-secure.symantec.com/connect/articles/backup-exec-2014-blueprint-deduplication

    If you open the PDF in the above link and go down to slide 21 where we start the DO NOT... section and then read slide 22, you will see that Verify is listed as one of the DO NOTs

10 Replies

  • Hi,

     

    Never copy files...duplicate them otherwise Backup Exec isn't aware of the details within on the remote site rendering them unrestorable.

    Duplicate the files correctly and then try the Verify again and see what the results are.

    Thanks!

  • Hi,

    Thanks for the reply, but not sure I follow what you mean. ?

    Just to clarify, we had 4 files "A", "B", "C" and "D" which were 8 gigs each (so 32 gigs total) these files were on the server X

    We have BE15 with the deduplication disk on a server named BE15

    We backuped files from server X on server "BE15"

    We then copied the 4 files from server "X" to server "Y" and backuped them from the new server

    As mentionned, all 4 files were backuped very fast on server "Y" as the file didn't have to be transfered over the network to server BE15

    But during verify phase, there was a high traffic coming from the server BE15 to the server "Y"

    Hope it's more clear.

    Based on this, I don't understand your comment about never copying files VS duplicating them ? Could you clarify

    Thanks in advance

     

  • Are you doing server-side or client-side dedup?

  • We're trying to do "Client-side dedup"

    It seems to work fine as the 32 gigs was not transfered between server "Y" and server "BE15" in my previous example. So the client realized the files were on the deduplication device and didn't transfered them to the BE15 server...

    What's strange is that during the verify of the job, 32 gigs were transfered between BE15 to the server "Y" which is a non sense to me

  • I believe that transfer is due to the verification.  You can confirm this by not verifying the backup.

  • Hi,

    I confirm that is only due to verification.

    But my question is, is this normal ? Aren't people veryfing there backup in the real world ?

    So deduplication is cool because you save on disk space and bandwidth during the backup phase

    But you have to upload everything back during the veryfing phase... Is this is by design, it's totally ridiculous (beside that you save space on disk or you you decide to turn off verification).

    I feel like it's a nonsense/bug, if it's by design

    Client side or server side takes the same amount of disk space (which is normal

    Server side eats up all the bandwidth before deduplicating happens

    Client side eats up all the bandwith after deduplication for the verification...

    I don't see the point unless you sacrifice verification

    I'm still hoping it's something I can fix with a setting in BE15 that I did wrong. Otherwise, sure I won't purchase this option.

    Thanks

  • You can delay the verification until the demand for bandwidth is less
  • :)​

    Thanks for the reply, but this just confirmed the deduplication is useless in our scenario.

    I don't understand why the verify is not a simple md5 made on both side instead of transfering the whole backup between both machine... Bad implementation in my opinion.

    Could be so much more useful if the verification was not written this way.

    Anyway, thanks everybody for your help on the subject. I really appreciate the community help during the evaluation phase of the product

     

  • OK so what actually happens is that we are doing an MD5 (or similar checksum) against the backed up data, but because the job has client side enabled against it, then the process responsible for doing the checksum comparision is the beremote process on the remote server, hence the beremote process is pulling all of the data (not just the changed  new chunks) across the network to rehydrate to do the checksum calculation. Note: even for non-client-side dedup the beremote process on the Backup Exec server would still have to rehdydrate the files in memory to do the checksum comparison, so it is still resource intensive to do a verify.

     

    We have a Blueprint (best practices doc) for dedup available here:

    https://www-secure.symantec.com/connect/articles/backup-exec-2014-blueprint-deduplication

    If you open the PDF in the above link and go down to slide 21 where we start the DO NOT... section and then read slide 22, you will see that Verify is listed as one of the DO NOTs

  • Thanks Colin for the whitepaper.

    It's strange to not use "verify" in the sense that I always felt like you gotta check your backups. But it's a confirmation that it's designed this way.

    Thanks everyone for your help.