cancel
Showing results for 
Search instead for 
Did you mean: 

BE 2010 - Inline dedupe?

Clee
Level 3
Does BE 2010 do fully inline deduplication, or is it more of a "store and crunch" approach?

In other words, do I need to allocate enough space for deduped data only, or would I need to allocate enough space for non-deduped data that would then get reduced in size with deduplication?
1 ACCEPTED SOLUTION

Accepted Solutions

Aidan_Finley
Level 4
Employee
Hi Clee -

To answer your original question, all of BE's software-based dedupliction (either done on the client or on the media server) is "in-line".  There is no need for scratch space or some other space set aside to hold data before it gets deduped.

On to your Hyper-V question.  Straight VMDK files will not see the greatest level of deduplication at this time.  I'm not sure what level that will be right now, but it will *not* be 10:1 - probably more like 5:1 or 3:1 if I had to guess.

You will see a much higher level of deduplication if you installed a BE Remote Agent in each guest and do client-side deduplication.  If you are purchasing the Agent for Hyper-V then you are entitled to run an agent in each guest attached to a specific host.  While you have stated you don't want to do that, it is going to be the best way (currently) to achieve high levels of deduplication.  Client deduplication's affect on the host (or, in this case, virtual guest) is greatest in the first few backups.

In closing, while your theoretical approach above will certainly work, I don't believe you will see 90% reduction in backup sizes while backing up VMDK's directly.  In the BE 2010 release, you should try client deduplication directly from each virtual guest, in order to achieve that level of deduplication.

Thanks,

Aidan Finley
Sr. Product Manager, Backup Exec

View solution in original post

4 REPLIES 4

Ben_L_
Level 6
Employee
This is going to depend on how the backup and remote agent is configured. 

If you are using direct access on the remote agent only the data to be dedup'd is sent across the wire.

If you aren't using direct access, all the data is sent across the wire to the dedup folder.  once it gets there the data not to be dedup'd is essentially just dropped and only new data to be dedup'd is stored.

Hope that helps in figuring out how much space is needed for your backups.

Clee
Level 3
How would these deduplication options work with Hyper-V guest images?

I have a Windows Server 2008 R2 physical server running Hyper-V. I want to back up the entire virtual guest server hard drive images (.vhd and other related files) for running virtual servers. Let's say my goal is to do complete virtual server image backups every day.  If the un-deduped size of all the hard drive images was 1TB - and they deduped 90% down to 100 GB, it sounds like I could simply have 100GB of disk space on the storage target and that would be enough to handle the backups for one day using direct access on the Hyper-V host agent?

As a follow-on question, I assume using the direct access would create a significant utilization hit on the Hyper-V host since it would be doing client-side deduplication?

And if I didn't want that hit, I'd have to not use the direct access - but then I'd need 1 TB of target disk space for the un-deduped .vhd files to be backed up to that target - but that after the dedupe, it would be 900GB free?

Thanks


Clee
Level 3
Here are my assumptions and theoretical approach:

1) I'm using the BE agent for virtual servers on the Hyper-V Host (physical) system
2) I'm doing online backups of the entire virtual guest servers
3) I'm NOT using GRT - just interested in getting full server backups
4) Running a dedupe on the client side (Hyper-V physical host) would be too resource intensive
5) Therefore, I'm NOT doing dedupe on these full server images at the Hyper-V side, but at the media server end.
6) The entire hydrated/non-deduped backup of all the virtual guest servers is 1 TB
7) After deduping, the 1TB dedupes down to 100GB
8) The full server backups run each day, and the next full backup dedupes to 110GB - i.e. backups add roughly 10 GB with each additional backup (Let's just assume for the sake of simplicity that this is how the dedupe would turn out - I realize that this is highly dependant on the kinds and amount of change the virtual servers undergo each day).

9) With this approach, I would need the following amount of disk space: 1 TB + 100 GB + 10 GB for each additional day I add to the backups.

Anything wrong with my assumptions/approach?

Aidan_Finley
Level 4
Employee
Hi Clee -

To answer your original question, all of BE's software-based dedupliction (either done on the client or on the media server) is "in-line".  There is no need for scratch space or some other space set aside to hold data before it gets deduped.

On to your Hyper-V question.  Straight VMDK files will not see the greatest level of deduplication at this time.  I'm not sure what level that will be right now, but it will *not* be 10:1 - probably more like 5:1 or 3:1 if I had to guess.

You will see a much higher level of deduplication if you installed a BE Remote Agent in each guest and do client-side deduplication.  If you are purchasing the Agent for Hyper-V then you are entitled to run an agent in each guest attached to a specific host.  While you have stated you don't want to do that, it is going to be the best way (currently) to achieve high levels of deduplication.  Client deduplication's affect on the host (or, in this case, virtual guest) is greatest in the first few backups.

In closing, while your theoretical approach above will certainly work, I don't believe you will see 90% reduction in backup sizes while backing up VMDK's directly.  In the BE 2010 release, you should try client deduplication directly from each virtual guest, in order to achieve that level of deduplication.

Thanks,

Aidan Finley
Sr. Product Manager, Backup Exec