I'm planning a centralised backup solution which will use BE2010 dedupe to a single 11TB dedupe folder. This folder will be on an EMC CX4- series with 13 x 1TB SATA drives in a raid 5 set. The tape device will be LTO5 FC attached. What kind of performance am I likely to get during the duplication to tape? I need to understand this as I have to plan other backup jobs around this one.
My performance experience with deduplicatino is that it is way slower than traditional B2D storage.
I monitored disk I/O, memory usage, CPU usage, all without serious usage. It was just about 50% slower than B2D on most of the duplicate jobs (to a DLTS4 library).
Also I don't know if you are aware of stability issues with the deduplication feature - I recently migrated some media servers back to B2D after a one-month-open case at symantec didn't resolve problems with corruption, non-availability, creation of thousands of dummy-media (all to be cleaned out manually, which takes hours..), etc.
There are some discussions about this ongoing at symantec forums.
As if any additonal confirmation was needed that the Dedupe side of Backup Exec is not fit for purpose, I have a colleague who is in the same situation on another customer job. Symantec support seem to be fixing one issue and causing 3 more to occur.
Thanks for responding.
If you want better dedupe performance, you should be doing client side dedupe, so you can have more simultaneous streams direct to disk.
From there, based on policy, you can have jobs duplicate to tape automatically as jobs finish... This way the duplicate to tape section, can happen during production hours without affecting any other network/application services.
We used client side dedup (default setup without optional, poorly documented software compression) and our symantec support technican told us not to use more than 3 (three) streams running parallel to the dedup store, as there are some known stability issues when running more than 3 or maximum 4 parallel streams (instead of 16 maximum possible).
Also we use policy based duplicate jobs, which where timed some hours after the backup jobs so that no parallel backup and duplicate jobs would run, because this can also cause some problems with the dedup store.
But nevertheless the throughput into and out from dedup store wasn't as high as with backup-to-disk storage folders.
It was some kind of funny, as the job rate went up and down from 600MB/min to 3000MB/min when doing a duplicate to tape. but no job ever got the "normal" job rate of 3500MB/min when B2D was used.
Again - there was no high cpu, memory or disk I/O usage, and it made no difference if there was just one duplicate job running or two of them.
We also tried to move (or recreate) the dedup-database to(on) another, faster drive to check out if this would give some better throughput - but it made no difference.
Here some information about dedup-compression: http://seer.entsupport.symantec.com/docs/347681.htm
I think dedup was just too new to be released to the public, but that's a problem with nearly every software vendor.
Otherwise "Puredisk" is on the market for a long time, and it works fine as far as I know. Migrating it to the Windows platform and integration into BE2010 as a logical tape library was a nice idea, but not implemented perfectly.
Symantec supports still needs more feedback from customers experiencing dedup problems, but most of the customers don't have the time to be a guinea pig - specially not in productive environments.