What is in this article?:
- The Smart DBA's Guide to SQL Server Disaster Recovery, Part 2
- Peer-Oreiented Model
- The Fifth D: Day-to-Day
- Sidebar 1: KPIs, Outcome Indicators, and Activity Indicators
- Sidebar 2: Why Not Use Traditional Project Management for SQL Server Disaster Recovery?
- Sidebar 3: Quick-Start Checklist
In this conclusion of our two-part series, we show you how to develop, deploy, and perform day-to-day maintenance of your SQL Server disaster-recovery plan.
You need a disaster-recovery program specific to your SQL Server assets—a program developed and maintained by SQL Server professionals.
Our 7D Method will help you outline a robust program—one that includes disaster-recovery planning, testing, analysis, and continuous improvement.
Even if your data is recoverable, you won’t have a SQL Server disaster-recovery program if all you have are backups
You need a disaster-recovery program specific to your SQL Server assets—a program developed and maintained by SQL Server professionals. You're not necessarily looking for a general high-availability plan; that’s another topic for another time. No, you want a data-centric disaster-recovery program that concentrates on SQL Server availability, backup, and continuity. And this plan needs to have a laser focus on baking resiliency into your SQL Server systems so that, in the event of a local or regional disaster, your organization can continue its operations.
We’re applying our 7D Method to the SQL Server disaster-recovery process. This method is a decide and execute process comprised of seven stages, though we're using only the first five Ds for this application. In "The Smart DBA's Guide to SQL Server Disaster Recovery, Part 1," we covered Discover and Design. In this article, we'll continue through Develop, Deploy, and the Day-to-Day management of the SQL Server disaster-recovery program you create.
Get Robust!
In Part 1, we spoke of the relative frequency and impact of four classes of interruption. Traditional disaster-recovery planning focuses on a “smoking pile of rubble” scenario, which assumes that by preparing for the worst, all lesser incidents will be adequately addressed. We challenge that assumption!
Our 7D Method will help you outline a more robust program—one that includes disaster-recovery planning, testing, analysis, and continuous improvement. At the end of this series, your take-away will be a broader understanding of SQL Server disaster recovery, not just a plan-development process. While you're developing your program, we suggest you divide it into two perspectives: one for emergencies that require the servers to be in production offsite (i.e., for serious interruptions) and one for disruptions that don't require leaving the building (i.e., for less significant interruptions).
Each of these perspectives has different triggers for tuning your disaster-recovery program’s written plan and beginning the recovery processes it specifies. Adopting two perspectives means applying new rules, deploying new resources, and determining whether external resources are necessary or if you can recover using the resources you have on hand.
Sidebar 1: KPIs, Outcome Indicators, and Activity Indicators
The Third D: Develop
Optimal preparedness from both of the aforementioned perspectives requires that you measure and report using two types of key performance indicators (KPIs): Outcome Indicators (things that confirm that your program worked) and Activity Indicators (day-to-day protection practices or “in process” metrics). With an admitted shift in perception, your program will benefit if you borrow from the world of Maintenance, Repair, and Overhaul (MRO), whose focus is reliability. The MRO world analyzes incidents as chronic or sporadic.
Chronic incidents. Chronic incidents are variations from an accepted range of performance. The desired outcome is to take the range of variance back to accepted tolerances. Chronic incidents often fall into an “accepted losses” category—they're negative deviations from performance norms but aren't emergencies in and of themselves. They require attention, but they don't warrant dropping everything to correct the situation. They might fall under a “planned downtime” category, in which systems need to be taken offline for patches, updates, hardware refreshes, and so on. They’re often accepted as part of the job, such as the warning messages from a failing disk in a RAID array or warnings that you’re running out of space in the SQL Server transaction log.
Sporadic incidents. Sporadic incidents are typically more severe, and they're more infrequent and unpredictable. Sporadic incidents demand urgent attention. They're time-consuming to resolve and almost always carry a high dollar loss or human cost. We strongly encourage you to include near misses as well as actual incidents in your reviews for this set because these occurrences are generally few and far between. You’ll want as much testing of your preparation for such events as you can get, so that when the real event occurs, everyone has already practiced their roles.
Both chronic and sporadic incidents can be stepping stones whose resolution requires higher levels of performance. Getting back to normal is adequate, but it isn't progress. In most cases, normal is the status quo—not continuous improvement.
Let’s get started with activities that you can do within the next 30 days to jumpstart your SQL Server disaster-recovery program. When you finish, you'll be done with the Develop stage.
Figure 1 shows a SQL Server disaster-recovery program framework. It's a concept drawing that lays out the core processes and key activities of your program. The six columns represent the core processes; under each process, you should list the key activities necessary to successfully execute the process. Although they're presented here as sequential, these activities can be performed in parallel. For example, under Prepare, you can be budgeting at the same time that you’re forming your disaster-recovery committee. You’ll need two of these visual models—one for onsite-level incidents and one for offsite-level incidents. Try to limit the number of key activities to 10; otherwise, the model will be too complex to be useful as a tool. Each key activity will have one or more associated tasks (not shown).
Adopting this visual tool accomplishes several things. First, you get an alignment tool for business sponsors and the IT team, a common language and meaning that shows a general view of desired outcomes, deliverables, and performance directives. Second, you get an instant answer to the question “Where are we?”—which eliminates long report writing. We suggest coloring the activity/owner blocks red, yellow, or green to show progress (and quality, if you have best practices defined).

