3 big-picture factors to consider in your disaster-recovery design
Many DBAs are unsure whether to replicate data by using SQL Server's built-in database replication or the software in their storage system. Unfortunately, there's no definitive answer. Most data centers use both storage replication and database replication because each method fulfills different replication requirements. Storage replication is easy to maintain, and database replication lets you meet specific replication requirements that storage replication can't address. Let's take a look at three of the most important factors to consider when deciding the best combination of database and storage replication for your organization.
Fulfilling Your SLA
The primary reason for replication is disaster recovery. To recover from a disaster, you need a copy of your database; in fact many businesses require a copy of the database, along with its application, at a remote location. Typically, the business owner of the application has a Service Level Agreement (SLA) that specifies how soon the owner's application will become available after a disaster (the RecoveryTime Objective—RTO) and the acceptable amount of data loss the application can suffer (the Recovery Point Objective—RPO). For example, if it takes three days to rebuild the hardware, software, network, and data to make the application available, the RTO is three days. If the database and application are fully backed up to tape and the tapes are shipped offsite every Sunday, the worst case RPO for the data and application would be seven days (i.e., the recovered data would be at most seven days old). To implement a realistic RTO and RPO for an application, you need to understand the timeline of a recovery. Figure 1 shows the timeline of a typical recovery scenario. The steps in the timeline are an overview of what needs to happen before the application can be available. Identified in the timeline is the time period in which the RTO is measured.The definition of the start and end time of where the RTO is within the timeline can be negotiated within the parameters of an SLA. Step 4 in Figure 1, is a reminder that a great deal of infrastructure needs to be in place before—or consequently, during—a disaster to facilitate restoring the data. In steps 5 through 8, data is rolled back to the consistency point of that level. For example, in step 5, the storage-level consistency requires incomplete data in the NTFS to be rolled back to make the NTFS consistent. Once the storage level is consistent, SQL Server attempts to roll back inconsistent transactions. This process happens at each level until it reaches consistency.
When data is asynchronously mirrored at the storage layer, the storage system's software updates the data from consistency point to consistency point. A consistency point is the state of the storage data structure when it's stable and complete. For example, a file system is stable and complete when all pointers to blocks exist and all the blocks those pointers reference exist. If a disaster were to occur while the storage at the destination point was being updated, the incomplete update would be backed out so that the data in storage would be consistent. Synchronously mirrored storage also has to roll back to the previous consistency point, although considerably less data would be rolled back because the consistency points are closer together in synchronous mirroring. With database mirroring, the storage consistency rollback is avoided for the contents of the database. Instead, database mirroring rolls back to create a transactional consistency point for the database. Any incomplete transactions will be rolled back to maintain database consistency.
Some applications work with the concept of application business consistency, which requires an application to roll back an application log, but typically most applications rely on the database software to maintain the application consistency through transactional consistency points. There is also a growing trend in which multiple applications are interdependent; the data from one application is regularly transported to another. For example, a sales order application could feed data into a manufacturing application. If you had to roll back one application, you would have to roll back all other interdependent applications to maintain consistency across all applications.
Database replication mirrors database objects and maintains database consistency between updates, whereas storage replication mirrors a lower level of data and maintains storage consistency. Therefore, at the block level, database replication results in less data loss because fewer incomplete transactions are rolled back than with storage replication. However, database replication can replicate only one database instance. If you have hundreds or even thousands of databases, you'll need a large number of database-replication relationships. Storage replication lets you put multiple databases that have the same disaster-recovery requirement in a single container and maintain just the storage container replication relationship. Even if you have fewer than a hundred databases in a data center, the data center could have a variety of database types (e.g., SQL Server, Oracle, IBM DB2). Learning how to configure and maintain each type of database replication increases the complexity of the task. In addition, database replication replicates only data within the database. Any supporting files, such as database programs, database error logs, and database configuration files, might not be replicated.
Storage replication provides a simple bulk transport that sends and updates data to the destination without concern for any modifications to the higher level data structures at the database and application level. As long as the data is contained in the storage structure slated for replication, the data will be at the destination. With fewer replication relationships to manage and fewer licenses to purchase, administration is simpler with storage replication.
I recommend using storage replication if possible because of its easy setup and maintenance. Database replication is more complex to set up and maintain because of the greater number of relationships that you have to establish between the production site and the disaster recovery site and because each database platform requires different database replication software. For the few databases that have unusual business requirements, such as a very short RTO and a very small RPO, use database replication.