Deduplication is dictated by the requirements of the organization, technical bottlenecks, and operating situations. Ideally, you would deduplicate at the source (at the client) and at the target (globally). This would provide the best of both worlds in that there would be limited bandwidth used to transfer only the unique / changed data after the initial seeding. Then, the data would again be deduplicated in a global fashion at the final resting place. Using appliances, post-processing, or in-line-processing is the same argument revisited in the industry throughout my career.
First it was backup agent, then it was storage, and now it is deduplication specific. The question becomes the weight of the processing and where to place it. While this is important, we need to take a step back and look at the larger picture. How does it fit into the overall enterprise architecture? Where will the greatest benefits be realized in placement of the deduplication? The processing needs to take place somewhere. If we transfer fully populated data sets and then deduplicate them, we are only seeing partial benefit. The network across which we transfer the data does not see a benefit with post-processing. If we consider the entire environment, we will see the results of the greatest benefit.
The market has matured and grown dramatically, so comparison of products is only as good as the most recent information available. However, products are not a solution. The people and process around the products make a solution. Examine the requirements in detail to make certain that they are not skewed for or only benefit a particular group within the enterprise. This will allow you to garner the greatest benefit for the organization. .
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.