cancel
Showing results for 
Search instead for 
Did you mean: 

DeDupe questions

macpiano
Level 6

I asked part of this at the end of another post but I feel I need a new thread on it.

I'm just getting looking into dedupe and I have CASO. I have 4 buildings and want to do dedupe between them. One job is 2TB on a full backup. I have multiple servers in one job at each site. What are your best practices? I have tape drives at each buidling to backup those dedupes etc.

thanks

Gary

1 ACCEPTED SOLUTION

Accepted Solutions

teiva-boy
Level 6

If you configure client-side dedupe, all data is scanned and processed by that client, and ONLY the deduplicated data is sent over the wire.  

If you do not enable client side dedupe, and take the defaults, the full stream of data is sent over the wire, and the BE media server will process it in real-time and deduplicate it.  You can slice up jobs to use either/or process, you are not limited to just one.  The license covers both.

So YES, the potential for you to do client-side dedupe in a remote site is highly possible, and backup over your WAN link.  I'll have to find a Remote office backup using BE2010 whitepaper I have saved for you...  I'll attach it later this weekend if I remember.  Or remind me via private message.

Just keep in mind that the hardware requirements are 1.5GB of RAM per TB of deduped data.  So that means, when BE has reduced and stored the data in a deduplicated amount, that is how you size what RAM is needed, not based on what you are actually backing up.  This does make sizing difficult, but starting with 8GB of RAM is about the norm these days, getting 16GB is an incremental upgrade, so not a big deal IMO, the requirement is more for folks with existing hardware is my guess.  While CPU power is important, the dedupe engine is not multi-threaded and will not be able to max out all available cores.  So if cost is an issue, you can go to a single CPU.  

And for future storage upgrades, keep in mind, you can only have ONE dedupe storage folder per media server.  It also must be block level storage, no NAS as a target is supported for the dedupe location.  So if you run out of the 5TB of local disk down the road, you'll have to move to a DAS or SAN.  And then perform a data migration of the dedupe store, which could be complicated, though I will expect Symantec tech's to be fully trained on this over time 

View solution in original post

13 REPLIES 13

teiva-boy
Level 6

You also had these questions too ;)

 

We have the Central Management option on the main Backup exec server.

1. If I only do client dedupe but have 4 seperate servers being backed up and 4 seperate jobs, would a file that exists on all 4 servers would be backed up once or 4 times?

--Each server has their own Dedupe storage folder, that is it's own "island of data."  So there is no "Global Dedupe," in BackupExec's native Dedupe.  If you used PureDisk as a backend storage, than multiple BE servers can write to it, and only one copy would be stored.  

Alternatively, if you used optimized duplication, when the backup set is duplicated to another media server, any redundant copies of data would be removed.  

So the end is, you'll have 4 copies today.  But start duplicating data back to a central media server, and that central server would throw out the 3 redundant copies.

2. If I have the multiple servers with one backup job would it be better to use the media server dedupe?

--You have no choice.  To use client-side dedupe, you have to have one job per server.  This is a better practice anyways IMO.  Multiple servers in a single selection list is not my preferred method of doing things.  I've talked about this ad nauseum on the forums in many recent posts.

3. As I read this when the storage folder is backed up to tape it recreates or as some would say it rehydrates it? I have plenty of tape drive space so I could care less about storing the deduped folder itself unless that is a best practice. I guess I would feel more secure if the tapes had the dehydrated one in case part of the tape would get hosed.

--When you back up to the dedupe storage folder, than duplicate to tape, the data is re-hydrated to tape.  This way in a complete DR scenario, you can restore directly, without having to stage to disk first.  Also, if your data was deduped to tape, and it spans multiple tapes, and we all know how reliable tape backups are...  If one tape failed in a spanned set, you lose everything!  

So you in most cases are duplicating to tape, where it re-hydrates the data to it's original size.  But if you want to protect the BE server itself, there is a way to backup the dedupe store itself to tape.  This will be the folder size, and not re-hydrated data.  

macpiano
Level 6

On item 1 I meant if within the same media server and building there was the same file on 4 different servers in that building and I do 4 jobs like you suggest in that building would it be backed up 4 times? I should note that the disk that it will be using will be in another building not the local media server. So the media backs up the local servers but the disk it will be backing up to is 20 miles away.

And thanks for your info it was great and I will be looking at your posts on single vs. multiple servers in a backup job.

gary

teiva-boy
Level 6

How is the disk going to be 20 miles away?  Do you have a metro-WAN link via FC to your disk system?  Disk typically needs to be local and direct attached for dedupe to happen on a media server.

Let me ask you this...  How much data is as each site?  And what is the size of a daily incremental? and lastly how much bandwidth is there between sites? 

There is the possibility that you can just use remote agents in each site and do client side dedupe over the WAN?  But this depends on many factors, thus my questions above.

macpiano
Level 6

let's take two of my buildings. They are connected with a 100 MB tls circuit. Basically just like having ethernet between them. I can shove about 35 gigs an hour through it. I have 2 TB to backup on a full backup. I have a media server at both buildings with 5 TB of storage on the media servers themselves. They both have ML6000 tape drives on them and one is LTO 3 and the other will be LTO4 soon. They can handle whatever I throw at them.

I need to be able to store my backup at each other's site but as I do it now it takes 60 hours for a full backup. How do I make dedupe work in this scenario?

Daily incremental is about 150 gigs on the big job.

 

thanks

Gary

teiva-boy
Level 6

With dedupe licensed at each site, you can duplicate the backup to each other site, only sending unique blocks of data over.  It'll be much lower than your 150GB incremental, even your full should be less than 150GB going forward, after the initial sync takes place.  Your

100Mb link is fantastic for bandwidth as it is, and I see no reason you cant make this work

 

At this point, I would recommend you setup a trial of BE2010 R2, and add in the dedupe option as a trial.  Try some full backups, and even remote backups over your 100Mb link.  Make sure to enable client-side dedupe, and watch the job logs for the value  of "Data scanned," and "data sent."

The "data sent," value should be much much smaller than the scanned value, this is the dedupe in action.  

You should be able to extrapolate this data to validate if th scenarios suggested would work.

macpiano
Level 6

Oh I love the 100MB connection. I saw in one of your other posts that you compress this well but the how to said to make sure it is done before you start the first one. Is that correct?

So the dedupe stuff initially stays on the local dedupe disk, in this case my 5TB drives on my local media server? We will be licensing all the media servers. You can't have the remote server send the dedupe from the remote server being backed up to its own local dedupe storage? In your scenario I would end up, since I'm going to be backing up on both ends, the data from both servers on both ends.

thanks for the responses

Gary

teiva-boy
Level 6

The initial full will take the longest, there is very little around this step.  There actually might be, but you are better off, just doing the initial copy over a weekend or long weekend.  There is a setting in the pd.conf file to enable compression change the value from 0 to 1.  This is to compress BEFORE the actual dedupe takes place.  The compression in the job settings, does it AFTER the dedupe takes place and does not work when talking deduplication.  Same goes for client-side dedupe, compression needs to be enabled in the pd.conf file.  

The initial backup stays local.  You'll use CASO later to make a duplicate job of the data from one media server to another.  

teiva-boy
Level 6

Oh and it should be possible to take one of your tape backups, and move it buildings, and duplicate it to the dedupe storage folder.  This will eliminate the initial full copy more than likely.

macpiano
Level 6

As it stands now I can do a regualr full backup over the weekend but it takes 60 hours the way I'm doing it now. 2TB approx. 35 gigs an hour 60hours- around the 2Tb. But if anything glitches well it's not going to happen. If after the first one I can cut that to a 1/4 I would be happy.

teiva-boy
Level 6

Enabling client side dedupe, and backing up over the 100MB, you WILL see your backup times decrease.  

Enabling dedupe, but doing it on the media server, you will not, as the full 2TB will still have to be transferred over the wire first, then processed.  

macpiano
Level 6

On the client side dedupe I have to ask this question. If I have building A backing up a server in Building B as in I can have any media server backup any server no matter where it is physically located does the dedupe process in a client side dedupe happen on that client server itself or is all the data drawn into its media server and then deduped.

You can see where I am going with this. If Building A can directly dedupe a server in another building without it all going through the WAN that would eliminate the Media server in Building B doing any of the processing.

I have another question as all my servers are not here and I'm puchasing all the licenses at the moment. The max data being backed up is 2 Tb on a full. The drives on those media servers is 5 TB. The servers will be dual 6 cores with 24 gigs of ram. I understand it takes 1.5 gigs of ram for each TB on the drives so I should be set as far as that is concerned. Will the 5 TB drives be sufficient with all this to give me a couple weeks backup on the dedupe part? All this be rehydrated immediately to tape on the local tape drive.

I will be buying NAS's next year that will be much larger but I can't do it now.

I know lots of questions but just trying to get a handle on this before I get too far along. I will be of course testing this as I go along.

Gary

teiva-boy
Level 6

If you configure client-side dedupe, all data is scanned and processed by that client, and ONLY the deduplicated data is sent over the wire.  

If you do not enable client side dedupe, and take the defaults, the full stream of data is sent over the wire, and the BE media server will process it in real-time and deduplicate it.  You can slice up jobs to use either/or process, you are not limited to just one.  The license covers both.

So YES, the potential for you to do client-side dedupe in a remote site is highly possible, and backup over your WAN link.  I'll have to find a Remote office backup using BE2010 whitepaper I have saved for you...  I'll attach it later this weekend if I remember.  Or remind me via private message.

Just keep in mind that the hardware requirements are 1.5GB of RAM per TB of deduped data.  So that means, when BE has reduced and stored the data in a deduplicated amount, that is how you size what RAM is needed, not based on what you are actually backing up.  This does make sizing difficult, but starting with 8GB of RAM is about the norm these days, getting 16GB is an incremental upgrade, so not a big deal IMO, the requirement is more for folks with existing hardware is my guess.  While CPU power is important, the dedupe engine is not multi-threaded and will not be able to max out all available cores.  So if cost is an issue, you can go to a single CPU.  

And for future storage upgrades, keep in mind, you can only have ONE dedupe storage folder per media server.  It also must be block level storage, no NAS as a target is supported for the dedupe location.  So if you run out of the 5TB of local disk down the road, you'll have to move to a DAS or SAN.  And then perform a data migration of the dedupe store, which could be complicated, though I will expect Symantec tech's to be fully trained on this over time 

macpiano
Level 6

Again, great info. Our scenario would be like backing up a remote office since I want all bakups off site and deduped before it hits the wire. My servers coming in will have dual 6 core processors and 24 gigs of ram. I will be raid 5ing 6 1TB drives in one huge partition so I should be set for a while. Next fall we will be purchasing reall NASs but until then I think I can make this work.

I'm marking this as a solution for now but if you can find that whitepaper that woud be great.

 

Gary