cancel
Showing results for 
Search instead for 
Did you mean: 

Configuring Appliance for MSDP vs. Advanced Disk

rweiss
Level 2

I am in the process of configuring 3 NetBackup Appliances, and am trying to figure out how much space to allocate for Deduplication vs. Advanced Disk.  I work in an environment where we back up approximately 600 servers, and backup about 140 TB per week.  We do File Level backups as well as Database backups.

I want to use the 3 NetBackup Appliances so that any of the 3 appliances can do the backup and the restore as necessary, so that no single appliance is always doing the backup for the same server each time.  With that being said, if I was to consider using Deduplication, where would the initial Fingerprint lie, and how much disk space would it take up ?  Each appliance has 28 TB of storage space.  Thanks.

 

2 ACCEPTED SOLUTIONS

Accepted Solutions

sdo
Moderator
Moderator
Partner    VIP    Certified

What form do your database backups take?  Agents, or dumps to disk?

Will your appliances also be replicating to another site?

Will your appliances also be duplicating to tape?

.

My thinking re this is initially in dedupe terms... in that there are three types of data to backup, or three levels of uniqueness:

1) Data that is pretty much the same globally, e.g. C: drives, System_State (for non FSMO, non DFS(R)), and application binaries (for popular (at your site) applications)

2) Data that is unique to one server, but doesn't change much, e.g. DFS(R) data, JPEG stores, mail PST files, MS Exchange

3) Data that is untterly unique (and appears to be 100% random) every day for every backup - e.g. compressed database dumps, large sets on incoming new data.

.

We need to remember that three appliances each containng one MSDP pool will effectively be three distinct islands of de-dupe.  i.e. data backed-up to appliance A will not be deduped against data backed-up to appliance B or appliance C.

.

Now lets look at how to backup the above:

1) Send this set to any of the three appliances, because it is highly likely that similar backup data will have already been ingested.

2) Send this data to one appliance only, because this will achieve the best continuing de-dupe - i.e. you won't have the situation where each appliance is being used to retain one complete non-dedupable copy.

3) Sent this data to advanced disk on any of the three appliances.  There is no point attempting to de-dupe this data.

.

So, how would I confgure storage units for the above:

1) One storage unit group containing each MSDP pool from each appliance - (i.e. three storage units in one group)

2) Three seperate storage unit groups, each containing just one appliance MSDP pool - because this way - if an appliance gets full or goes offline - you can easily re-direct all backup for e.g. appliance B by changing the contents of the storage unit group B to contain either or both of the storage units for appliances A and C.

3) One storage unit group containing all three Advanced Disk storage units.

.

The tricky part, that none of us can help ypu with, because we have no visibility or understanding of your servers and their data contents... is... how to identify which server backups land in scenarios 1) 2) or 3) from above.  It would seem to me that 600 severs and 140 TB of backup data is too much for any single backup admin to remember what they all are, and what their data behaviour patterns and uniqueness are.  So, my advice would be to... identify a couple of colleagues/co-workers who do understand the nature and behaviour of the applications and data, and pepper them with questions to help you understand how to configure your backup policies into the three "uniqueness scenarios".

HTH.

View solution in original post

Marianne
Level 6
Partner    VIP    Accredited Certified

I am wondering what kind of design/planning/sizing was done before the Appliances were purchased.

Are you aware of the fact that there is no global dedupe across Appliances? 
That each Appliance is in fact a 'dedupe island'? Each with own Dedupe/fingerprint db?
That having 3 small 28 TB Appliances is not the same as having one large Appliance?

It seems you want to backup all of your data to all 3 Appliances - were they sized to hold all client data (140 TB) on each of them?

You say the following :

I want to use the 3 NetBackup Appliances so that any of the 3 appliances can do the backup and the restore as necessary

When Monday's backup for Client1 writes to Appliance1, and Tuesday's backup to Appliance2, then only Appliance1 can restore Monday's backup and only Appliance2 can restore Tuesday's backup.

 

View solution in original post

8 REPLIES 8

sdo
Moderator
Moderator
Partner    VIP    Certified

What form do your database backups take?  Agents, or dumps to disk?

Will your appliances also be replicating to another site?

Will your appliances also be duplicating to tape?

.

My thinking re this is initially in dedupe terms... in that there are three types of data to backup, or three levels of uniqueness:

1) Data that is pretty much the same globally, e.g. C: drives, System_State (for non FSMO, non DFS(R)), and application binaries (for popular (at your site) applications)

2) Data that is unique to one server, but doesn't change much, e.g. DFS(R) data, JPEG stores, mail PST files, MS Exchange

3) Data that is untterly unique (and appears to be 100% random) every day for every backup - e.g. compressed database dumps, large sets on incoming new data.

.

We need to remember that three appliances each containng one MSDP pool will effectively be three distinct islands of de-dupe.  i.e. data backed-up to appliance A will not be deduped against data backed-up to appliance B or appliance C.

.

Now lets look at how to backup the above:

1) Send this set to any of the three appliances, because it is highly likely that similar backup data will have already been ingested.

2) Send this data to one appliance only, because this will achieve the best continuing de-dupe - i.e. you won't have the situation where each appliance is being used to retain one complete non-dedupable copy.

3) Sent this data to advanced disk on any of the three appliances.  There is no point attempting to de-dupe this data.

.

So, how would I confgure storage units for the above:

1) One storage unit group containing each MSDP pool from each appliance - (i.e. three storage units in one group)

2) Three seperate storage unit groups, each containing just one appliance MSDP pool - because this way - if an appliance gets full or goes offline - you can easily re-direct all backup for e.g. appliance B by changing the contents of the storage unit group B to contain either or both of the storage units for appliances A and C.

3) One storage unit group containing all three Advanced Disk storage units.

.

The tricky part, that none of us can help ypu with, because we have no visibility or understanding of your servers and their data contents... is... how to identify which server backups land in scenarios 1) 2) or 3) from above.  It would seem to me that 600 severs and 140 TB of backup data is too much for any single backup admin to remember what they all are, and what their data behaviour patterns and uniqueness are.  So, my advice would be to... identify a couple of colleagues/co-workers who do understand the nature and behaviour of the applications and data, and pepper them with questions to help you understand how to configure your backup policies into the three "uniqueness scenarios".

HTH.

Marianne
Level 6
Partner    VIP    Accredited Certified

I am wondering what kind of design/planning/sizing was done before the Appliances were purchased.

Are you aware of the fact that there is no global dedupe across Appliances? 
That each Appliance is in fact a 'dedupe island'? Each with own Dedupe/fingerprint db?
That having 3 small 28 TB Appliances is not the same as having one large Appliance?

It seems you want to backup all of your data to all 3 Appliances - were they sized to hold all client data (140 TB) on each of them?

You say the following :

I want to use the 3 NetBackup Appliances so that any of the 3 appliances can do the backup and the restore as necessary

When Monday's backup for Client1 writes to Appliance1, and Tuesday's backup to Appliance2, then only Appliance1 can restore Monday's backup and only Appliance2 can restore Tuesday's backup.

 

sdo
Moderator
Moderator
Partner    VIP    Certified

I was wondering about this too.  I just kind-of sort-of (on the side) assumed that sizing and scoping had been done, and that three small appliances were chosen because possibly (I know the OP referenced MSDP - but it didn't seem right) they were simply going to be used as advanced disk to duplicate to tape - because I wasn't sure how you'd fit multi-weeks (probably full) and multi-days (probably inc (probably diff)) into 3 x 28 TB anyway... I suppose it could work if good (or really good) de-dupes are achieved - but that all depends upon the nature of the data factored by the rate of change - but then maybe three small appliances because the "quoted" max re-hydration rates to tape... but aren't these supposed to better now anyway?

.

What was missing from all this was how they arrived at 3 x 28 TB appliances.  If no justification, then at least a rationale.  We may never know.  From a scoping/sizing angle I'm confused.

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

One more thing to consider in terms of SDO's Type 2 data set.

If you've got say a 2 TB oracle database, you don't want to be backing it up to all 3 appliances. If you back it up to each one you'll have used 6TB by the end of the 3rd backup instead of (potentially) 2.1 TB.

You'll immediately be losing (wating) 4TB of your total 54TB.

Its better to keep it backing up to a single appliance.

rweiss
Level 2

I found out these 3 appliances are actually 40 TB's each.  The databases we backup are database dumps.  All of the data we backup will be backing up to tape, and we are not replicating to another site.  We copy from disk to tape, and then ship tapes offsite.  So the understanding I am generally getting is that for more static data, we should be using Dedup, but for databases we should be using Advanced Disk ?

 

So when configuring for Dedup, if we migrate a particular client (in testing) from our media server to an appliance, and back it up, it stores an intitial fingerprint on that appliance.  Each time it backs up thereafter, it only backs up to that same appliance, and backs up just the changes since the stored fingerprint, is this logic correct ?

sdo
Moderator
Moderator
Partner    VIP    Certified

For uncompressed database dumps tou should be able to use MSDP, and for compressed database dumps I would use advanced disk as a staging area.

The target of subsequent backups is never based purely upon location of previous backups.  The target of a backup is always based upon the storage unit specified in a backup policy or SLP, and so could be a storage unit group pointing to seveal storage units in which case other balancing algorithms come in to play depending upon the nature of the storage units themselves.

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Using dumps for Oracle and then backing them up if very 1990.

If you're being pushed around by the DBA's to have it THEIR way, look up Oracle Co-pilot (new feature of the 2.7.1 code release), at least that way you can have some sort of control on the Oracle Backups whilst also keeping the DBA's happy.

Marianne
Level 6
Partner    VIP    Accredited Certified

Good dedupe rates for Oracle rman can be achieved if the guidelines in this doc is followed (look for Oracle):

Symantec NetBackup 52xx and 5330 Appliance Capacity Planning and Performance Tuning Guide: DOC8187 

It seems you may benefit from reading the rest of the doc as wel.... 
You may also want to ask your Veritas Partner to assist with dedupe sizing - ever after-the-fact...

Please do not try to configure backups to run to any/all Appliances (as per my previous post).