I would like to understand more how optimized duplication is happening between two 5240 Appliance.
I have a case of slow duplication between two sites both having 5240 each. Case was open with VERITAS, it turned out that the pipe between sites is not big enough to accomodate the huge duplication traffic. That's fine and somehow accepted by custoemr, however one thing that does not make sense is when Backline explained how data is being transfered from source Appliance to target. Backline said on SLP first job is Backup followed by Duplication. Dedupe at first Appliance happened when Backup job ran, once Backup job completed the whole backup image size is being trasferred to the target Appliance then it is being dedupe in that remote Appliance. For example, backup image size is 1TB which has to be duplicated to remote Applaicnce. This 1TB will be passing thru the link between sites into the remote Appliance, only then that it will be dedupe by that remote Appliance. Meaning the 1TB data already pass thru the link consuming that so much bandwidth on a small pipe (imagine). I was surprised but they said that's how it is designed.
My question is, is this really how oplitzed duplication works?
No that's not how it works unless something has radically changed recently. Duplication between MSDP pools (e.g. 2 x 5240 Appliances) simply looks to see if the target pool already has segments within images being duplicated and if so just sends references to those segments. If it doesn't then the complete segment is sent. So when a new duplication target is set up, it does initially send each segment, but each segment is only sent once. Once the target is seeded with segments, deduplication rates for backups and duplications should be the same.
Note this only applies when duplicating between compatible source and target types; if you try to duplicate between say MSDP and DataDomain, it will rehydrate the source images and send everything to the DataDomain for it to apply its own deduplication algorithms.
The above also applies to replication between different NetBackup domains (AIR).
Regarding your issue, be aware that WAN links may have a theoretical capacity but this can often only be partially used due to latency, packet loss etc. e.g. see https://www.4bridgeworks.com/white-papers