cancel
Showing results for 
Search instead for 
Did you mean: 

Symantec Backup Exec Deduplication Q & A

JGillTech
Level 3

Moving from EMC NetWorker to Symantec Backup Exec w/ Deduplication Option.  I have read through many KBs, white papers, admin guide, etc. and have ended with more questions than answers.  Two of are particular interest to me at this point:

 

1.  If you are conducting a deduplication backup, is there any reason not to do incrementals every time?  I don't see any reason as to why you would want to scan every file on a large file server (full backup) just to have most of them nixed due to the dedupe.  How would one run a dedup job?  I am assuming that if I specify a backup job on a new server w/ incremetal that has not been backed up, it will do a full and then incrementals thereafter. 

 

2.  How do overwrite and append periods work in context of dedup backup?  If you are running incrementals and your orignal data (from the full backup) expires, what happens then?  Further, what is append period going to do for me here? 

 

While those two questions are of particular interested, there are more that someone could perhaps weigh in on.   What is the best practice for configuring back to disk to tape.  My thought is that I would use the deduplication storage folder for everything, then once a week, duplicate a backup (probably an incremetal and assume that it will make a full, not sure?)  to tape.  The tape would have a different overwrite period, which would reflect my archive requirements.  I am not sure how the duplicate job will work if I select a backup that was last performed as an incremental to the deduplication storage folder. 

 

Further, should the dedup folder itself be backed up?  What are the best practices for this? 

 

More questions, that aren't obvious.  The deduplication has a database component.  Should the database exist on the same volume as the deduplication storage folder, should it be moved to the same volume as the media server (where this is already another database) or should another lun be setup just for this database?  I couldn't find any best practices on this one. 

 

15 REPLIES 15

Kiran_Bandi
Level 6
Partner Accredited

How would one run a dedup job?

Create a job just like a normal backup job and choose the deduplication folder or an open storage device as target. When the job runs deduplication will happen. You can run FULL backup once in a week or in a month and do the incrementals/differentials on the remaining days. 

How do overwrite and append periods work in context of dedup backup?  If you are running incrementals and your orignal data (from the full backup) expires, what happens then?  Further, what is append period going to do for me here? 

Overwrite Protection will be calculated from the time last write happened to the media. Which means, if you keep appending to a media, OPP of that media will be increased accordingly. 

For normal backups, you cannot use deduplication folder as destination. You must create a B2D folder to backup to disk. For disk backups, always configure jobs to overwrite always as there is no advantage in appending.

Duplicating to tape: If you duplicate a deduplicated backup to tape, data will be rehydrated and written to tape (Same as original). 

Dedup Database: You can configure BE to save the dedup database also on the same volume as dedup storage folder. However saving both on different volumes improves databse performance. 

Regards...

JGillTech
Level 3

Why would I even bother with a full back if using deduplication? 

 

In regard to normal backups, what are you referring to?  How do you define a normal backup?  I can create a backup job and use the deduplication storage folder as a destination. 


Okay, so if overwrite protection is calculated from the last write happened to the media, what happens to a full backup if I only do this once and then incrementals thereafter?  What happens to  incrementals when they expire?  Is there some sort of trim function run against the dedup folder/database? 

 

You mention that I should specify ovewrite vs. append, why is this significant?  What is going to be overwritten?  Can an OST file expire?  What signifigance does an append period have in respect to a deduplication folder? 

 

If I select a recent incremental backup job to be duplicated to tape, what will happen?  Will I get a full backup on tape or will just the incremental data be copied to tape? 

 

As for the dedup database, should that be on the same physical volume as the media server database?  What is the best practice here?  You say differen than dedup volume, but nothing else. 

pkh
Moderator
Moderator
   VIP    Certified

 

In coming up with your backup plan, you should first leave out dedup because it adds another layer of complexity and can sometimes confuse matters.
 
When you run an incremental job, it will only backup the files which have their archive bit on.  It will not backup files whose archive bit are not on.  Since you cannot be sure that all the files have their archive bit on, you should always run a full backup before you run either an incremental or a differential backup.  This also applies when you add a new resource to the your selection list, i.e, you should immediately run a full backup to establish a new baseline before resuming your incremental/differential backup.  This also applies to the modified time method.
 
After you have done your first full backup to establish a baseline, you can just keep doing incremental backups.  There is nothing to prevent you from doing so.  However, you would need to keep the full and all the incrementals because when you restore you need to restore the full backup plus all the incremental backups since the full backup.  This is the reason why you do a full backup periodically to minimise the number of incrementals that you need to keep and restore.  Doing an incremental backup can sometimes result in a longer backup time because if you are using the archive bit method, each file has to be access to determined if its archive bit is on.  A full backup would just dump the entire disk.  All these applies whether you are using dedup or not.
 
When you backup to tape, you would sometimes append to the tape to maximise the usage of the tape capacity.  If you backup to disk, the .bkf can be as big or as small as the amount of data backed up, so there is no wastage of space.  If you append to a .bkf file, it will only extend the overwrite protection period of the .bkf.  Suppose you do 2 backups of 10GB and 20GB on 2 consecutive days and your OPP is 3 days.  If they are backed up to 2 different .bkf file, then the first .bkf file will be overwriteable after 3 days and you can use it then.  If you append the second backup to the first .bkf file, then the entire .bkf will be overwriteable only 3 days after the second backup or 4 days after the first backup.  This means that you would not be able to use the first 10GB of the .bkf for 1 day more.  Hence we never use append for B2D jobs. I don't know what will happen when you use append with dedup, but going by the logic above, I don't see any reason why you should use append, rather than overwrite.
 
When you duplicate a backup, it will only duplicate whatever data that is backed up by that job.  This means that when you duplicate an incremental job, it will only duplicate the incremental data, not the full data.
 
Let's bring dedup into the picture.  Dedup only backs up changed blocks which means that if you just do full backups, the subsequent full backups would be sort of like incremental backups, i.e. only the changed blocks are stored.  However, when you do full backups, the entire lot of files are processed thus increasing the backup time.  Incremental backups means less files processed, but more complex restores (as explained above), so you have to balance the two.
 
Suppose a file is heavily modified and it is backed up by your full backup and 2 subsequent incremental backups.  The data blocks for this file could be scattered into many OST file.  A data block from this file could be in OST00001 and is not modified after the full backup, so this block is referenced by the 2 subsequent incremental backups.  When the OPP for the full backup and the 2 subsequent incremental backups expire, all references to this data block are deleted and the dedup engine can reclaim this space.  The dedup engine does housekeeping 2 times daily to reclaim space in the OST files.
 
You can backup the dedup folder by backing up the shadow copy component of the volume containing the dedup folder.
 
Should the database exist on the same volume as the deduplication storage folder?
 
With R3, you don't have a choice.  They have to be on the same volume.  It is recommended that you use a separate volume for the dedup folder and that is volume be dedicated to the dedup folder.
 
=================
 
In the future, I would suggest that you try to have only 1 question per discussion.  It is very difficult to digest a lot of questions and come up with a coherent answer.  You are more likely to have better responses with short questions.
 
I would also suggest that you familiarise yourself with the Managing your data section of the Admin Guide.

teiva-boy
Level 6

Personally...  I'd rather run Networker 7.6.2 over BackupExec!  It's an enterprise product, the gui in 7.6 is very good, and when competing on straight up backup performance, Networker will win hands down.  Think multiplexing and multistreaming.  Something BackupExec doesnt have and never will have.  Combine that with a DataDomain unit, you've got a smokin setup that will run circles around anything BackupExec can do.

However, BackupExec and it's pricepoint is hard to beat.  Just be warned, that the Dedupe option is still rough around the edges, doesnt perform all that well, and some of the advanced features of GRT are proprietary to Symantec, so support could be long and lengthy...  

Though in the end, it's not a fair comparison..  One is an enterprise product, the other is a small business product.  One is meant for hundreds or thousands of servers, the other a few dozen servers and a few TB's.  The prices of each reflect that.  

robnicholson
Level 6

If you are conducting a deduplication backup, is there any reason not to do incrementals every time? 

Because it's a nightmare when you have to do a restore... Backup Exec does not have functionality whereby you can say "Restore this folder as it was at this date/time" and it works out which backups to restore (e.g. back to last full and then work out incrementals).

Instead you'd have to manually browse down the restore tree for EVERY incremental backup done since the last full backup (which in your case could be years?) and select the sub-folder.

Even you did a full backup once a month, you could end up having to drill down through 31 separate backups. Just don't do it - your IT team will shoot you ;)

So that's why we do full backup weekly and differentials each day. Yes, it slows down the backup but it means a restore is just two drill downs through the backup history.

It would be less of a problem if browsing the backup history was fast but it's not. Open a top level node and sometimes read the news for a while whilst it opens.

But even if Backup Exec could do the donkey work for us (e.g. you just highlight the path/file and it restores), doing just incrementals forever might not be the best solution. If somebody asked for a restore of a folder that's been around for a year, you might end up having to restore from many, many individual backups which could be just too slow. I guess it depends upon how efficient the algorithms are for determining which backups need restoring.

So yes, I really wish BE could do restores better so we could use incrementals more but it doesn't. And to do so I suspect is too scary for the developers as this product has years of evolutionary development in there.

Cheers, Rob.

JGillTech
Level 3

Thanks for the advice, next time will separate the questions into multiple posts... didn't want to flood the forum.  Coming from a NetWorker environment, there are many things that aren't transferable.   For one, NetWorker won't allow you to do an incremental if it doesn't have a full backup in the media database.  BE doesn't care and allows this to happen.  I am not sure why it's not setup, as NetWorker, to check and see what's in the media database and request that a full backup be conducted.  Further, I am not sure why during a restore BE doesn't automatically select all the incrementals going back to the full.  

In regard to the database clean-up operation, will it actually modify the size of the OST files and remove the "blank" space after marking blocks as overwritable? 

Therefore, let's review my backup strategy... perhaps you folks can make some recommendations:

1.  I want to retain all backups within the dedup folder for 6-months (operational backup)

2.  I want to send backups to tape every week with a two-month retention time. 

3.  I want to send backups to tape every month with a 2-year retention time. 

 

So it appears, without diving into the admin guide, that I need to setup a backup job that does a full backup to the deduplication folder once a week, and another job that does diffs during the week with a retention of 6-months.  Should I change the rentetion of the full/diff?  If so, what would be the outcome/advantage?

 

Then I should setup a duplicate job to run immediately following the full backup so I have a complete set on tape with a retention period of 2-months for my weekly rotation.  I would then setup another job to run a full backup every month (also have to create another dedup job to run a full backup at the beginning of the month) with a 2-year retention period. 

End result, I would have 6-months of onsite backup to restore from.  Should something happen to this, I have a 2-month tape rotation that has weekly full backup sets.  Then I have a monthly set with 2-year retention and perhaps include the dedup database.  Does this make sense?  Now that I am writing this, it doesn't seem like the best strategy.  Any suggestions? 

robnicholson
Level 6

BE doesn't care and allows this to happen.  I am not sure why it's not setup, as NetWorker, to check and see what's in the media database and request that a full backup be conducted. 

That merits a Friday afternoon smile (in a nice way) cheeky

What you suggest very sensible to me but this kind of sensible suggestion typifies where Backup Exec is currently at in it's lifecycle.

To state "An incremental backup cannot be carried out unless the backup system is sure that it has the preceeding full backup (and indeed incremental backups) is intact" is a perfectly legitimate requirement.

No problem when writing a system from scratch.

But Backup Exec isn't been written from scratch, it's been continually evolved with new features bolted on. I'm guessing but to implement such a requirement in the current system, would a) be hard work as the architecture wasn't designed to check this kind of thing and b) would therefore be yet another bolt on that fails in one of the zillion paths through the program.

It's why I seriously suggest that Symantec start again with Backup Exec. Yes, very costly exercise but the developers would probably leap at the chance and in the longer run, would give a much better product. One that could compete with the new boys on the block.

Cheers, Rob.

robnicholson
Level 6

The deduplication has a database component.  Should the database exist on the same volume as the deduplication storage folder

I suspect you are reading slightly out of date information.

The deduplication storage folder is the database. Yes, there is a folder that starts filling with 1GB data files containing your actual backed up date (in 64k chunks) but this is intertwined with the database itself.

That's not entirely true as when deduplication was added to BE, you could split the database from the data folder. That option was removed in Backup Exec 2010 R3 (much to my surprise) but the documentation and online help hasn't been updated to reflect that change (you find that a lot in BE as well).

The reasons for this change are a little unclear with some excuse about problems but I cannot for the life of me guess what they are.

Splitting the database from the backup data folder sounds like another sensible requirement to me. It allows you to put the dedupe database (containing records of what is where) on fast local hard disks but consider putting the data itself on slower iSCSI or other storage.

There is an "unsupported" configuration change you can manually do to re-introduce this split but the inference I took was "Don't expect to be able to upgrade BE cleanly" which basically means "Don't do it".

Shame really....

Cheers, Rob.

JGillTech
Level 3

blush

JGillTech
Level 3

So let's sum things up here to make sure I am understanding correctly. The BE media database doesn't keep track of full and incremental backups, such that an incremental is not allowed to run unless there is a full backup??? Further, can one assume that a full backup would be allowed to expire before the DEPENDENT incremental backups??? Oy vey!

pkh
Moderator
Moderator
   VIP    Certified

Backup Exec does not have functionality whereby you can say "Restore this folder as it was at this date/time" and it works out which backups to restore (e.g. back to last full and then work out incrementals).

I have told you many times previously in the forum that this is not true.  BE has this technology.  It is called ADBO which is a licenced option.

pkh
Moderator
Moderator
   VIP    Certified

 The BE media database doesn't keep track of full and incremental backups, such that an incremental is not allowed to run unless there is a full backup???

No. This is not true. You can run an incremental backup without running a full, but it not recommended.

 

Further, can one assume that a full backup would be allowed to expire before the DEPENDENT incremental backups?

The OPP for the full and incremental backups are independent.  It is up to you to manage the data.

 

I would STRONGLY recommend that you read the Admin Guide before proceeding further.  It is not point relying on other people's interpretation.  Get a good understanding so that you can separate the wheat from the chaff.

pkh
Moderator
Moderator
   VIP    Certified

 I want to retain all backups within the dedup folder for 6-months (operational backup)

Do make sure that you have sufficient disk space and RAM to do so.  You are suppose to have 1.5GB of RAM per TB of data stored.

I think you should take note of Teiva-boy's comment regarding the price/functionalities of Networker and BE.

robnicholson
Level 6

Do make sure that you have sufficient disk space and RAM to do so.  You are suppose to have 1.5GB of RAM per TB of data stored.

On our installation, BE 2010 R3 complains if there is less than 8GB of RAM.

Cheers, Rob.

robnicholson
Level 6

I stand corrected - yes, BE now does have this feature but it's an extra cost whereas personally, I would have thought it should be part of the deduplication license option as without it, dedupe isn't as powerful as it could be.

And with this option, you can only perform full and incrementals. Doesn't let you do differential which makes sense.

Cheers, Rob.