It was a big effort but we finally made it: PureDisk 6.6 (a.k.a. Darrieus) is out! It contains a bucket load of improvements::
Storage Pool Installer
New Administration User Interface
Context sensitive help
Remote office job progress information
Optimized deduplication for PDDO/VCB backups
Oracle Agent
Exchange Granular Restore
NTFS Special file type support
Virtual Synthetics
Storage Pool Consistency Checker
PureDisk Command Line
Multi Stream for Replication
Storage Pool Conversion tool (6.2 to 6.5 layout)
I won't go into detail regarding these since they are covered sufficiently in the documentation. Let's instead talk about a background process that's nevertheless crucial: data removal.
Data removal in a deduplicating environment is an under-rated process. Most people, when asked, will think that it is an afterthought when designing a deduplication product but nothing could be farther from the truth. Removing data that was deduplicated requires a careful analysis. When deduplicating data during backup, you essentially remove duplicate parts of data and replace them with references to just one copy. When removing that data you again have to update this “reference” structure: PureDisk has to figure out what part of that data is currently referenced just once (and after removal is therefore not referenced at all) and can therefore be removed. PureDisk performs this task in the background and it constitutes a large part of what is known as “queue processing”.
Now, why do I mention all of this? Well, I do so because we've done some serious work on queue processing. In Caver (the previous, major release of PureDisk) queue processing was structured in such a way that, to completely remove data from a PureDisk storage pool, you would have to run data removal on the MetaBase and then run queue processing four times. Considering that queue processing can take a couple of hours on a 16 TB content router, queue processing is a big resource (disk bandwidth, CPU power) consumer.
In Darrieus, we've overhauled queue processing so that it's “smarter”. It can analyze and update the state of references more intelligently by keeping track of the commands in the queue. As a result, without loss of correctness or functionality, we can remove data by only processing the queue twice. In Caver, the default queue processing frequency was four times a day. In Darrieus, we updated this default to twice a day to achieve the same net effect. The cool thing is that queue processing itself is still just as fast as it was in Caver so the total overhead of queue processing was halved.
Hope you enjoy Darrieus!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.