I have multiple appliances in our environment.
We have added multiple appliances in storage group. I have few queries:
1: If server will be backed up on 1 appliance then will dedup work on all appliances or will it take full backup first on all appliances in that SG then dedup will be in effect.
2. As per my understanding it will take backup copy on all appliances then dedup will work as per appliance then do we need to configure policies appliance wise?
"added multiple appliances in storage group", the storage group you mentioned, is it storage unit group?
Upon configuration, appliance comes with both dedup (MSDP) & AdvancedDisk pool. You can have your backups all going into MSDP pool - the dedup operation works internally and you don't have to worry about it. To configure the backup, create policy and direct it to a storage unit you created for the MSDP pool.
Deduplication works across all backups on a single appliance. I have seen deduplication work on the very first backup to an appliance from the very first client in an environment. It is not on a client by client basis. If client B has matching blocks that client A had pushed across to MSDP X then the blocks will be discarded. However the blocks need to be on MSDP X even if they are on MSDP Y and MSDP Z, MSDP X needs to get them from somewhere in order for deduplication to work. SO to answer your questions:
Riann: In my environment we have 5220 and 5230.
Watson: Yes it Storage unit group.
I understand this but my question is as we have mentioned storage group in policy and in storage group we have X and Y appliances. If they are taking backup of client C first time and used X appliance. In second backup C client used Y appliance. Will it dedup data with ref to X or not?
As per my understanding it should not but just want to be sure.
Every single appliance is considered a MSDP pool by itself, so to your question - No, when backup C client using Y appliance, it wont reference dedup data to X. It has to create a "baseline" in Y but its future reference just like X.
MSDP databases in X & Y do not communicate between one another, so there is no way Y knows about what X has dedup.
Both backups will take up the full amount of space. No dedup unless it dedup'ing with like data from other clients already in the respective pool.
@adhinav - how many MSDP pools do you have?
I once had to balance several hundred policies and several thousand clients across 14 different (64TB) MSDP pools - tricky - so I wrote a script to round robin balance/distribute largest backup clients (and taking in to consideration the time that the schedules fired/initiated) to smallest backup clients, so that we got a very even workload and a very even fit of clients across the MSDP pools... and did NOT use just one storage unit group for all MSDP pools - but instead used one storage unit group (with one storage unit in it) per MSDP pool... so that in the event of an MSDP going down - all we had to do was change the storage unit group to point to a different storage unit - and not have to change many SLPs nor many hundreds of backup policies.
And the script would generate all of the policy amendment commands to balance/distribute/update all schedules in all policies to use the correct SLP. So it was written for a one off hit - as you don't really want clients hopping around different MSDP pools. And, as part of writing it, I had it generate quite nice stats re pool de-dupe ratios, so that I could understand the balancing and consumption of pool space, and also list such things as which were the top 20 largest policies/clients/SLPs etc... and created a form of customer charging totals at the same time.
Sdo: Could you please be more specific. Like which script you created and all. We have 28 appliances means 28 puredisk pools.
As we are also using storage unit goups for all type of backups like Unix, windows, DB and so on.
But in this situation windows client backup can go on any media server. So need something specific way. Client size in our environment is very huge. LIke 2-3 TB each.
Re your use of STGs as backup targets to any/many MSDP storage units. My advice is to sit back for a few moments, and try to imagine where your data is de-duplicating. Several of us have already learnt not to use multiple MSDP STUs in STGs.
Re balancing policies and clients - I was only trying to give the briefest of outlines as to how I tackled the problem, in an effort to steer you in the right direction, and I also tried to remember some of the positive benefits (not so much side effects) of what could also be produced when the problem of balancing backup clients across MSDP pools is tackled programmatically.
It's a little but unfair to say it was done with scripting, as the script wasn't really a script in the end - it became a large body of code - crica 3,600 lines of code (including comments and white space) - and should instead really be thought of as an applet or medium sized application program to reconfigure the SLPs used by all backup policies.
It wasn't easy, and it took several man weeks of coding effort to develop and debug, spread over two and bit months. Personally, I don't think that this is really something that the average scripter (doing 10 to 50 line scripting) could tackle, as you almost have to have a background in applications programming to do something like that - as it involves quite a lot of stateful manipulation of data (using full lists from various sources; images, policies and schedules, clients, STUs, STGs, SLPs). I'm not sure that I've got the time to outline psuedo code for you. And, it would not be possible to develop this out of sight, because it is highly likely that a developor/programmer would have to nuture the code and grow/create the complexities to adapt and take in to consideration all manner of site specific requirements for your environment.
Have you thought instead of perhaps carefully manually balancing say the top 10% to 20% (by size) of backup clients, and then just apportioning the rest of the backup clients as best you can. And fine tune / tweak it afterwards. This way you could probably achieve something fairly useful within a moderate time frame - ok, it won't be perfect, the way a program can do it - but at least it'll be something.
So - to get back to your original questions:
1: If server will be backed up on 1 appliance then will dedup work on all appliances
...sdo: No. De-dupe for an MSDP pool only works agianst what has previously been ingested by that MSDP pool. De-dupe is 'MSDP local' only, and not 'NetBackup Domain global'.
or will it take full backup first on all appliances in that SG then dedup will be in effect.
...sdo: I don't understand the question.
2. As per my understanding it will take backup copy on all appliances then dedup will work as per appliance then
...sdo: I don't understand this part of the question.
do we need to configure policies appliance wise?
...sdo: yes, this is what I would do.
I think this statement may help you too...
When a cient is backed-up to an STG containing multiple STUs for MSDPs, then the decision as to which STU the backup client actually writes to is based upon resource allocation only - so...
...backup clients do not establish 'affinity' with STUs or MSDPs - they simple backup to whichever STU that NBRB (NetBackup Resource Broker) has decided upon.
It would be a very nice feature if we could have an SLP parameter which could be something like 'MINUTES_TO_WAIT_FOR_PREVIOUS_STU'... and somehow get the resource broker to wait a while - and maybe for another parameter like 'NUMBER_OF_PREVIOUS_STUS_TO_WAIT_FOR' such that NBRB would remember the 'n' previous STUs and wait for one of those - at least then we could use perhaps some STUs in STGs, with maybe two or three or maybe four MSDP STUs in them - and not worry so much about loss of de-dupe ratios when backup clients appear to randomly hop (due to next immediate resource allocation) around many STUs in an STG.