Well, I wrote an article a while back when I had started to think about disaster recovery, and had intended to follow it up shortly after. Work pressures and other things made sure I wasn't able to pursue the DR process. You can read up on the first part here:
As of the past 2 months, I have started to get things in motion, so this article will be about what I think should be considered for organising yourself for DR, as well as what you should consider for testing your DR restores.
First up would be an accurate document outlining your backup environment. I work for an outsource for a number of clients, and over the period of 6 weeks, I drew up a document outlining our various clients backup environments. This included what software was running on what server (make/model/IP), and what all was backed up. My reasoning behind this was to get sign-off. On a number of occasions I had to restore data that was not in the standard location, and hence not on a backup. Documenting what data is being backed up and how covers you in the event of having to recover all data.
This document would also be a work-in-progress document, subject to any changes. Furthermore, it allows me to show any of our clients what we're protecting for them. Anything that needs to be added/removed would be done after looking through their respective section. It IS a long-winded process to manage, but if done correctly, is an immediate indication for you and your clients of what’s going on in their environment.
The next thing…what to restore too? There would be numerous ways to do this, depending on your infrastructure, and your budget. If you lots of cash to throw at DR, you can set up a nice environment. Similarily, you can get away with restoring to a removable drive, and using that as proof of being able to restore any data. Data would be a normal redirected restore, while Exchange can be duplicated to disk for example.
At the very least invest in a large external drive. Restoring data is easy enough, and you’ll be generating logs to show that this was possible. A number of auditing firms require proof in the form of the logs themselves, and some are happy with a spreadsheet showing when, and how, this was done.
At the very top end, you can consider extra storage if you run large arrays, you can pop in extra disks. Again this can be used for restoring normal data, or recreating VMs if need be.
The choice of hardware to restore to would be entirely up to you…
Lastly…what to restore? What would be most important to your company? What would your company lose out on the most if a particular system/set of data was down for any length of time?
You need to consider what to restore very carefully. A lot of companies cannot do without Exchange being down for too long. Being down for any length of time costs money.
From this perspective, it would be wise to restore the systems you can. For a file and print server, restore user data, application data, company data etc. and check that the files restored correctly and successfully, and that the permissions were restored too. No sense in not checking permissions, and ending up in a situation where everyone has access to company-sensitive data.
At the same time, you can restore either a couple of individual items in Exchange to live mailboxes or a test mailbox; duplicate to disk and bring online on another Exchange server; or restore into an RSG. This will prove Exchange restores work.
How often? Depends on what auditing requirements are. It can be every month, every quarter, or every year. Find out what your company, or the auditors require.
From my side, I’ve got a 96 page document listing everything in the selection lists of my media servers. At any stage I can go back to this and check and make changes. I’ve given the various companies we have in our outsource, their specific sections so that they are aware of what gets done.
I will be implementing a regular quarterly restore test across the board this year, documenting what was restored, and the outcome. I am keeping track of this in a spreadsheet per client, as well as having a working document of normal work-related restores that come across my way outside of my DR testing.
Restores testing will be done on data (randomly selected from anything), along with Exchange data (both duplicating the entire Information Store to disk, and individual mailboxes/items) and SQL (a redirected restore of the production Navision DB at a client to a standby server). This will hopefully satisfy the auditors and companies, and prove my backups are healthy!