Copy Data Management is not a Replacement for Backup
Copy Data Management (CDM) is a very hot topic in both the backup and recovery and storage management markets today. Technology startups are coming out of stealth mode, investors are funding new ventures, and established hardware and software vendors are entering the market.
There are two key messages being promoted by many of the CDM vendors and even some analysts as they seek to disrupt the market and gain rack space in the datacenter:
- Copies of data are inherently bad
- Backup is a useless copy and therefore is inherently bad and you should stop doing backups
The idea that copies are inherently bad is a strong statement, and any technology that virtualizes or diminishes the resource tax associated with creating copies of data is going to immediately look positively disruptive, since anything that reduces storage workloads has to be a good thing, right? Maybe, but let’s not get too excited. You need copies of data to run your business – so not all copies are inherently bad. Having the right number of copies to keep your organization up and running is the better perspective.
This isn’t the first time we’ve heard this kind of bravado, remember. Snapshots, business continuance volumes, storage virtualization devices and other technologies have all made similar claims, with extremely limited success. So before we replace backup technology – a critical component of every infrastructure – with CDM, let’s look at things a little more closely.
At the heart of this market movement is the idea that creating copies of data has business value. Users need to do things, sometimes many different things, with the data created by business processes. Users have been asking for copies to work with and conduct analysis against since the ENIAC. One of the reasons computing is so powerful is it allows us to create new information paradigms and reach new levels of understanding from the information we create. The need to copy data is a business requirement, plain and simple, and should be one IT is ready, willing and heartily able to accommodate. Unfortunately, it’s an expensive requirement to fulfill, and the tab for that expense rarely seems to be picked up by the people asking for 5 new copies of a 60TB database.
The promise of these emerging CDM technologies is that they give the perception to the user that they have a full copy of the data, without actually charging the tax of storing the copy. This is extremely powerful and has tremendous value. Many CDM companies would then have us believe that these copies are appropriate to be used as our backups.
The promise sounds almost too good to be true. Think about it for a second. You can create copies of a data set, all with no actual storage being spent on the copies, and you can recover instantly from the virtual copy by just pointing your production server at the copy. Suddenly the whole world of backup and recovery has changed. Anything too good to be true generally is, and CDM = free backup is no exception. Let’s be crystal clear on this point – in its current form, copy data management is not backup. To be a true backup replacement, CDM needs to encompass the entire datacenter in its scope, not just databases; it needs cataloging, search and version capabilities built into the system, and it needs to be able to take incremental updates from all data sources inherently. No CDM product is today doing that across the datacenter. Most aren’t doing it even for databases.
Could CDM eventually morph with “traditional” backup and become one product? Absolutely, and that is actually entirely likely. Can CDM technology as it exists today be used to replace certain recovery processes? Absolutely. There is a time and a place that a CDM virtual copy could absolutely be used as a method to recover a mission critical database or virtual machine, for instance, and allow for an absolutely incredible RTO. There is no question that, like business continuance volumes or NAS based snapshots, with the right preparation and in the right scenario, CDM technologies will prove that they are a recovery technology of choice.
That doesn’t make them a backup replacement, however.
The fundamental premise upon which the “CDM equals backup” argument is made is that all copies of data are bad. This is just inaccurate. Not all copies of data are bad. The problem with copies is when you have too many copies of data. You have to find somewhere to store them, maintain secure access to them, keep data compliant within them and make sure they don’t leak data out of the organization. Managing all that is tough. How many is too many, though? That gets the proverbial “it depends” answer.
“How many” probably isn’t even the right question to ask in this context, though. The question to ask is whether any given copy is more expensive to maintain than the business value it provides? If the answer is yes, then that copy should not be made. This doesn’t necessarily require detailed financial analysis. A simple smell test works – can the requestor of that copy get their job done without it? No? Then make the copy already or provide a way for the requestor to make it themselves. Put this another way – will the business stop without a given copy? If you can realistically say yes, then make the copy, otherwise move on. CDM allows us to more readily say 'yes' because we can virtualize those copies that make sense to make, reducing the burden and cost associated with them.
Backup simply doesn’t fall into this category of conversation though. There is obvious and inherent value in having an actual, complete backup copy of data that is physical, separate from production, and can be recovered from or to anywhere. Anywhere is a key element of the value a backup copy can provide. If you can’t take a physical copy of the data to any service provider, a datacenter on wheels, or any alternate location and recover without significant infrastructure building, the backup is of exceedingly limited utility, and perhaps your backup process is in need of a re-architecture. This need to get truly mission critical systems back online anywhere anytime is one of the reasons why so many highly sophisticated organizations still have an underpinning of tape in the recovery strategies. Does this mean there is inherent value in every backup copy you have? Absolutely not, and if CDM does nothing else in the market, it should have us looking at why in the world do we have 20 year old tapes we’re paying to store when we know, in our heart of hearts that they have absolutely zero value.
CDM simply doesn’t meet the inherent value proposition of backup, however. To be useful as a recovery platform, you must have like infrastructures in already prepared locations replicating the virtual copies. That isn’t cost effective or feasible for every environment, and perpetuates a philosophy that actually can lead to data loss. What happens when – most certainly not if – that CDM platform fails and your virtual copies disappear? Where is that reliable, known good copy that isn’t you can turn to and get your production data back? And look, this isn’t fear-mongering by an incumbent data protection provider. This is about looking at things honestly and using the right technology for the right use case. CDM has great use cases. Backup just isn’t generally one of them.
Treat your information like your money and protect it accordingly. Just because the hype cycle is high and climbing around CDM, don’t abandon that tried and true philosophy. CDM has its place in many (if not all) environments, but not at the cost of truly and completely protecting your data. Like any technology, look at this emerging trend and put it where it belongs in your infrastructure as an element of your recovery and information management strategy, not the end all be all of it.
Our passion is protecting your data. Check out news and insights from the Veritas Protection blog addressing datacenter issues like disaster recovery, complete data management, backup, and recovery protection.