A CDP Journal...in reverse

Joe_Pfeiffer · ‎10-04-2010

One of the more intricate details when discussing CDP is how all the data is actually stored inside the CDP product. NetBackup RealTime is unique in the sense that it has a patented reverse-journal that stores data differently than other CDP offerings. The best way to explain it is with an example.

Let's say you have a 1TB application of some kind and over the course of time you have the following 3 changes:

Change 1, 100GB
Change 2, 100GB
Change 3, 100GB

This is a mass simplification since CDP works at a block level so these changes would not really be 100GB but something like 4k blocks. I'm assuming that change 1 came in first, followed by change 2 and finally change 3 (in that order). So after these 3 changes the app is now 1.3TB in total.

Most CDP products will take the initial full mirror of 1TB when you first start protecting the app and it stores that 1TB on a LUN or volume of some kind. On a second LUN the journal holds all the changes (100GB x 3 = 300GB in this case). When you go to do a CDP recovery it combines the two locations to create a new LUN that you can mount and read data off of. The time for creating that recovery LUN is proportional to how much data from the 2 LUNs need to be combined that are holding the backup data. If you pick the very first point in time, nothing has to be combined so off it goes and the recovery LUN is mounted. If you pick the very latest point in time though (change #3 in the example) it has to combine all that journal data along with the original LUN meaning it takes the longest since it has to roll each 100GB change with the 1TB to get to the 1.3TB. There are ways to optimize around this so of course you can roll those changes in to the original mirror over time as "maintenance activity". In fact, maintenance activity is a requirement in this architecture since you typically only want to keep a week or two in the journal. Once the oldest change expires it has to go into the original mirror like a synthetic backup does by combining the full with incrementals since throwing the change away would be throwing the data away.

RealTime is different. It stores all the data in one LUN - initial full mirror and the journal of changes which is 1.3TB in the example. When a new changes comes in to the CDP engine it just applies it to the volume immediately. The journal RealTime stores is called a reverse journal since it is a listing of all the changes that need to be undone to go back in time. If you pick the latest point in time for a recover LUN RealTime can present the whole volume right away since no changes need to be undone. If you want the initial full mirror as your LUN it will take the longest since all the changes have to be unapplied. This is the polar opposite of the other method for storing a CDP journal.

So now ask yourself - are you more likely to want to recover to the latest point in time or the oldest point in time? That's why RealTime stores the journal in reverse - latest point in time recovery is why most people buy a CDP product. The other benefit of this is it eliminates any maintenance activity since once old changes expire (say change #1) it can just be thrown it away since RealTime is only throwing away the "undo" entry in it's journal. The actual 100GB was already absorbed in the volume.

Cool stuff, would be interested in anyone looking at CDP products - drop a comment below if you are.

VOX

A CDP Journal...in reverse