The 5020 series specs say a

DP101 · ‎04-01-2012

I have a customer who has about 500 TB of data (300 TB and one site and 200 TB at the other) across two data centers about 500 miles apart.

They are using TSM so this would be a complete new NBU install (which they want to do). They only want to use tapes for monthly archives only (3 year retention).

They want to cross replicate and will put in a sufficent pipe depending on our recommendations.

I was thinking 5000 appliances. One source and one target for replication at both sides. I'm a bit unsure on sizing though. Is there a sizing calculator somewhere? Also, would it be better to create two domains and use AIR when replicating or use one domain?

Thanks,

Felix

teiva-boy · ‎04-01-2012

Your largest pool is what 192TB? I do not beleive the Symantec appliances are "big," enough for your data set and throughput needs. You really should use NBU with DataDomain and OST/BOOST. The largest DataDomain is the 890 or the GDA. Both much larger and more scalable than the 5x00 series appliances.

DP101 · ‎04-01-2012

The 5020 series specs say a one node system can logically hold 640 TB - 1.6 PB. With 6 nodes I can go to 3.8 PB to 9.6 PB. I don't under stand how this is not scalable as you stated. My customer is not an EMC fan for obvious reasons. Can you elaborate on why you think the 5000 series wouldn't work? I can configure the backups for rolling fulls, meaning I can do 1/7th of a full each night.

Thanks.

Sebastian_Baszc · ‎04-01-2012

Hey Mate,

The specification you provided is correct for the ideal situation. This data doesn't however include the following elements:

1. Type of the data (compressed data (jpg, zip archives, Oracle compressed file sets etc don't deduplicate well)

2. Seeding - before the data can start being deduplicated, the PureDisk databases have to be first seeded with hashes for the existing data. In the deployments, it is often assumed that the first full backup is used to seed the pool (SPA).. in your case - 300TB.

My recomendation would be to get from Symantec pdat tool and establish the deduplication ratio for the existing data. You can also look at the data the customer has and look for similarities. If there is a lot of similar data - they will deduplicate well after the first full backup will be taken for a subset of data. For example - if they backup a lot of Wintel servers - c:\Windows is fairly the same across all servers.

It can also happen that majority of the data can be archived before taking a backup. In this scenario you/the customer could look at implementing the archiving solution (EV for example). Once the data is archivied - it is also deduplicated by EV across all stores (mail, file servers, sharepoint etc...).

If this 300TB is the data from File Servers - you can use Synthetic backups. In this scenario - you would need to replicate (optimised duplication) only once and afterwards just send incrementals (around 10% of the total data - deduplication ratio... so let's assume between 3-6TB a day depending on the change ratio and deduplication ratio). So... if they have 1Gb/s link between DCs and you need to transfer around 3TB - taking into account that 1Gb/s is more or less 100MB/s (it is 125MB/s but this is theoreticall speed) - it gives us 8.5h ... so... to make it safer - around 12h just to transfer that data between locations. Keep in mind that by default optimised duplication takes the whole available bandwidth. ..

Hope it helps.

S

Sean_Craig · ‎04-03-2012

Hey Felix,

You've mentioned the total data at each site, but much of this is duplicate data? I'd suggest contacting your local Symantec specialist who can help with a dedup assessment tool. This should help with your sizing.

Sean

AbdulRasheed · ‎04-04-2012

If your environment is like most of the enterprise sites, it is highly unlikely that you have 100% unique data. When you say 300TB on frontend, do you know how much this is unstructured data (files) and how much are applications and databases? A Symantec field representative can help you with this sizing.

If you are cross-replicating, I would strongly recommend using AIR. That way you have an Active/Active DR setup. Even when a site is completely down, you can recover the clients at the alternate client from bare metal if needed. With NetBackup 7.5, AIR support BMR.

Further, NetBackup deduplication you also get NetBackup Accelerator at no additional cost. AIR and Accelerator are two of the biggest reasons you want to consider using NetBackup 5020 appliances in this use case.

Chad_Wansing2 · ‎05-08-2012

So the first place I would look is to draw logical lines in the data between things that would not deduplicate against each other. We use a calculator that asks total data size for different data sets like Oracle, SQL, file system, and VMWare. Things like file system and VMware dedupe very well against each other but not at all against Oracle. You might be able to put the Oracle data in its own pool and lower the max required pool size without adversely impacting potential dedupe rates.

Long story short, I work with customers every day to figure out scalable NBU appliance solutions for environments just like this as an appliance and dedupe specialist and I have almost 20 counterparts now this year (up from 9 last year if that gives you any hints as to the success the appliances are experiencing and Symantec's dedication to this solution in the market) across the US who can help you figure out an architecture for your customer. If nothing else, just drop me a line and I'll be more than happy to help!

VOX

Appliances