How do I calculate what size my dedupe store should be and should I dedupe everything?
I currently have around 11TB of data, made up of around 70 odd Virtual Machines with various functions, Domain Controllers, File Servers, Exchange 2010, etc etc which I need to get my head around for a project, I don't know how to calculate how big the dedupe store should be, nor if I should dedupe everything.
Also, is there a tool which would look at data on a filestore and calculate where the bulk lies (currently we have one job to back up a whole fileserver and the job just takes too long, it starts on Friday and is still running on Monday night when I am meant to chhange the tapes for the Monday job, we are now skipping the Monday backup and sometimes it is still running on the Tuesday night!) - I figure if I break the job down into several smaller jobs it might be more sucessful.
Any help gratefully recieved!
Backup Exec 2010 supports upto 16TB of dedup folder and X size of dedup folder requires
X* 1.5GB of RAM (recommended) Its also recommended to use dedicated volume for the dedup folder.
Breakdown larger jobs into smaller ones and then run those jobs simultaneously (Your
media server hardware should supports multiple concurrent jobs). You may also use client side
dedup feature if your media server is running short of resources.
Expect better dedup ratio from a flat file backup as compared to database backups.
Also make sure you are using BE2010R3 with latest patches, previous versions had few issues
which are fixed in the R3.
There is no way to calculate the size of a dedup folder beforehand. There is the dedup overhead which is unknown and also it depends on how well your data deduplicates. VM's do not dedup well compared to files.
To track down the cause of your slow job, you can either break it up like you want to, or expand the job log and check the backup and verify timing of each of the resource that is backed up.
OK, so rather than backing up entire machines, would we be better not to dedupe these VM's and just full and incremental and save the deduping for say file servers?
Is there any hard and fast rules for dedupe?
No. I think you should just go ahead and dedupe everything and then see whether you are getting the benefits that you are expecting. If not, then revert that particular server/VM to normal backup.
Not a scientific method but on our typical office-type file server containing ~1.2TB of data, six months of weekly-full & daily-diff is using 2.5TB of dedupe space which is pretty impressive IMO. It means we've got plenty of expansion space on our 7.2TB dedupe disk storage or we could extend out the retention period beyond six months.
Typically, the differential backup grows to ~100GB (10%) by the end of the week, slowing ramping up as the week goes on.
We'd like to be able to use our Advanced Disk Backup license to allow us to switch to daily incremental backups (using True Image for restore) but it doesn't support backup of distributed file system (grumble grumble).
We don't backup Exchange to dedupe as GRT Exchange backups don't dedupe - they simply get stored as Exchange image files. Hopefully compressed but I doubt it... so we use B2D for Exchange as it's faster than dedupe.
No, we don't manually clear it up as I hope that BE does that for us. There is an automatic management task that runs once a day and I think one of it's tasks is to flag expired files/blocks from the dedupe database for re-use.
I guess that if we had a temporary huge blip of use of storage then we might consider dedupe but to be honest, I'd just let it expire automatically.
So continuing the very-back-of-envelope calculation, you could say 2 x disk usage for ~6 months of deduplicated storage of an average office document based organisation.