Create a high-availability and disaster-recovery solution that’s easy to configure and support
As a senior consultant with the Financial Services Group of Microsoft Consulting Services, I help clients in the financial industry efficiently use Microsoft database technologies. I recently met with a client who wanted a high-availability and disaster-recovery solution. The client company's DBA had no practical experience supporting SQL Server high-availability solutions (including replication), so the client wanted a solution that would be simple to configure and support. I'll give you some background about this case and show you the steps I used to create a solution that doesn't require extensive scripting skills. If you're looking for an easy-to-use solution for your organization's high-availability and disaster-recovery needs, taking a look at this real-world example can help you decide whether merge replication is a good choice for you.
The client I worked with had two data-processing centers located miles apart and linked by a reliable T2 network connection line between the sites. The client needed a flexible high availability solution that would allow the addition of a third site in the future with minimum effort. Each existing site hosted a SQL Server 2000 database on a Windows 2000 server. Both Windows servers had two processors and 2GB of RAM. The client's database was about 25GB and growing slowly (200MB to 500MB per month). The database had a relatively low load (10 to 50 transactions per minute). The client's application is in use mainly from 7 a.m. to 9 p.m., 5 days a week, and the database needed be available at both sites at all times during business hours.
The client wanted to do data processing at one site and reporting at the other site. In case of disaster at one data center, the client wanted a failover system so that the remaining data center could quickly combine these functions without any reconfiguration. The client had considered splitting data-load operations between data centers but knew that data operators might try to update or delete the same record simultaneously on both sites. To resolve such possible conflicts between the two sites, the client wanted to merge changes that users performed on different sites in different columns, if possible. For example, if a user at SiteA changes column1 and another user at SiteB modifies column2, the client wants to merge both changes in the resulting table. Some tables have one or more text columns. Several tables have IDENTITY columns.
So, from these requirements, the client and I determined that the solution they needed should provide continuous data transfer, site autonomy (so that operations at one site will only minimally affect operations at the other site), processing of changes at both sites, and resolution of data conflicts.
Why Choose Merge Replication?
SQL Server 2000 offers several options that keep databases synchronized close to realtime, including two-phase commit, transactional replication with immediate updating subscribers, snapshot replication with immediate updating subscribers, transactional replication (including bidirectional replication), snapshot replication, transactional replication with queued updating subscribers, and merge replication. Based on the client's requirements, I recommended using SQL Server 2000 merge replication for several reasons. First, merge replication has a robust, built-in conflict-resolution mechanism that can be easily configured using Enterprise Manager. In simple cases, no scripting is required. Merge replication provides an out-the-box mechanism for resolving conflicts at the column level and a high transactional consistency of operations in situations like the one my client described, in which conflicts are rare and only two sites are currently involved in data processing. In addition, merge replication allows replication of text data, which was important to my client because the company had more than 10 database tables that contained text columns. Merge replication includes continuously-running Merge Agents that provide data transmission with acceptable latency. And using merge replication makes adding another server to the existing topology easy.
The other alternatives have their own benefits, but I couldn't recommend using immediate updating and queued updating subscriptions because those options can't update text data. Not all types of transactional replication can merge simultaneous changes in different columns on different sites. Snapshot replication and DTS simply don't work in an environment in which data modifications happen on both sides simultaneously. And two-phase commit is a rather theoretical solution because even the loss of network connectivity for a short time during a transaction would effectively shut down the application. Once I'd decided on the high-availability implementation to use for my client, I prepared the database for merge replication.
Prepare for Merge Replication
Before configuring merge replication, you must first carefully analyze your application and address specific merge-replication requirements. Because this brief article can't provide all the planning and preparation details, I recommend that you read the "Planning for Merge Replication" section in SQL Server 2000 Books Online (BOL) for details. Here are some of the things I had to do for my client's solution.
Merge replication replicates text columns only if they've been updated explicitly by an UPDATE statement, which causes a trigger to fire and update metadata, ensuring that the transaction gets propagated to Subscribers. WRITETEXT and UPDATETEXT operations don't propagate the change to other sites. To solve this problem, I modified a few stored procedures by adding a dummy UPDATE statement after the WRITETEXT or UPDATETEXT operations within the same transaction. BOL gives an example of how to do this type of modification.
To avoid recurring conflicts during merge replication, I configured all foreign key constraints and user-defined triggers (UDTs) with the NOT FOR REPLICATION option. To perform this configuration, select a table from Enterprise Manager, right-click, and select Design Table. Click Manage Relationships, then clear the Enforce relationship for replication check box on the Relationships tab of the Properties screen, as Figure 1 shows; repeat this step for each foreign key in the drop-down list. Next, click Close, then click Save. Click Yes when the confirmation window appears. Close the Design Table window, and repeat these steps for all tables that have foreign keys.
To change the NOT FOR REPLICATION option for the existing trigger, select a table in Enterprise Manager. Right-click, select All Tasks, then select Manage Triggers. From the drop-down list, select a user-defined trigger. As Figure 2 shows, type NOT FOR REPLICATION on the line preceding AS in the text box. Click Apply, then repeat these steps for other triggers in the same table. Finally click OK. You'll need to change the NOT FOR REPLICATION option for all tables that have triggers.
All IDENTITY columns must have the NOT FOR REPLICATION option. (This option is configured automatically when you set up merge replication.) IDENTITY values need to be partitioned by site. When you configure merge replication, to ensure that the identity values being assigned are within those allowed for the site range, SQL Server automatically creates CHECK constraints in each table that has an IDENTITY column. For my client's solution, I carefully planned the range for the IDENTITY column for each affected table. I recommend that you assign a high range of values for the site so that it is practically impossible to reach the limit. Because of some problems with this functionality, don't rely on the automatic identity range handling set by SQL Server. (For an explanation of these identity-range problems, see the Microsoft article "BUG: Identity Range Not Adjusted on Publisher When Merge Agent Runs Continuously" at http://support.microsoft.com/?id=304706.)
I decided to use a default mechanism for conflict resolution (in which the Publisher always wins). This is the simplest variant and, in this case with only two servers and rare conflicts, choosing a default variant was appropriate. Implementing a datetime-dependent conflict resolvers would require modifications to the table structure of many tables. Generally, SQL Server lets you choose from several predefined types of conflict resolvers, or you can write your own conflict resolver. In most cases, I'm sure you'll be satisfied with some type of predefined conflict resolver.
By design, merge replication adds a uniqueidentifier column to each replicated table. The total size of the row for each replicated table, including the one I added, should not exceed 6000 bytes. I worked with the client to change the structure of a few tables that had a larger row size.
SQL Server requires that all tables connected by foreign-key relationships be published together, in one publication. Before configuring merge replication, I created a database diagram for ClientDB in Enterprise Manager and identified all such cases so that I could see the relationships overall. Then, I prepared a spreadsheet with a list of my planned publications and articles to make configuration easier.
Design of Merge Replication
Figure 3 shows the overall design of merge replication for my client. I chose SQLServer1 at Site 1 as Publisher and Distributor. I specified SQLServer2 at Site2 as Subscriber. ClientDB is the database to be replicated. The Snapshot and Merge Agents implement merge replication. The Snapshot Agent prepares snapshot files that contain the schema and data of published tables, stores the files in a snapshot folder, then inserts synchronization jobs into the publication database. The Snapshot Agent also creates replication-specific stored procedures, triggers, and system tables. The Merge Agent merges incremental data changes that occur at the Publisher or Subscribers after the initial snapshot is created, and it reconciles conflicts according to defined rules.
The role of the Distributor is limited in merge replication, so it's common to implement the Distributor locally (on the same server as the Publisher), which is what I did in this solution. The distribution database on the Distributor stores only history and miscellaneous information (e.g., errors) from both servers about merge replication.
Configure Distribution and Publishing
Before configuring merge replication, make sure that you have enough disk space on all servers. In addition to space for the database (ClientDB in this case), you'll need space to store elements such as the Distribution database, snapshot files, and backup files. Next, verify that SQL Server in both locations runs under a domain account. The domain account must have access to both servers and have access to the file share that stores snapshot files.
The easiest way to set up the Distributor and configure the Publisher is by using Enterprise Manager. Make sure that both servers (Publisher and Subscriber) are registered in Enterprise Manager. In Enterprise Manager's left pane, click Databases for the server you want to configure as Distributor (SQLServer1 in this case). On the toolbar, click Tools, select Replication, then select Configure Publishing, Subscribers, Distribution. If replication isn't already configured, the Replication Wizard appears. Click Next, and the Select Distributor screen appears. (If replication is already configured, you will see the Publisher and Distributor Properties screen.) Select Make SQLServer1 its own Distributor, then click Next.
If it appears that SQL Server Agent isn't running, select Yes, configure SQL Server Agent to start automatically on the Configure SQL Server Agent screen, then click Next. If SQL Server Agent is already running, you'll skip this configuration step. In the Specify Snapshot Folder dialog box, confirm the default selection (or choose your own) for the snapshot folder and click Next. If a message appears allowing you to confirm your folder choice, do so.
Next, on the Customize the Configuration screen, select Yes, let me to manually configure the distribution database properties, then click Next. Provide the location of data and log files for the distribution database, then click Next. If possible, use different drives for data and log files. I recommend that you keep the default name for distribution database.
On the Enable Publishers screen, select only the required server (SQLServer1). Close the message if one appears. If the Publisher Properties screen appears, leave the default setting (use trusted connection) and clear the option that requests a password to connect the Publisher with the Distributor. Click OK to leave the screen and return to the Enable Publishers screen. If a message appears, click Yes to accept, then click Next.
On the Enable Publication Databases screen, select Merge for your database (ClientDB in this case), then click Next. On the Enable Subscribers screen, select your database (SQLServer2 in this case), click Next, then click Finish. Close the message window that describes Replication Monitor functionality.
Prepare for the Initial Data Synchronization (Snapshot)
After configuring foreign-key and trigger properties for my client's solution, I prepared to create the snapshot. Merge replication requires that you take an initial snapshot for all replicated tables. During the initial snapshot SQL Server adds merge triggers to each published table and adds a uniqueidentifier column to each table that doesn't already have one. SQL Server also creates a new index on the uniqueidentifier column and creates conflict and other system tables.
Merge replication adds a uniqueidentifier column with the ROWGUIDCOL property and the default value newid() to each published table and creates an index on the column. For large tables, this process can be time-consuming. It's faster to add new columns to already populated large tables, then run the initial snapshot for the publication with the affected tables. (For details, see the Microsoft article "HOW TO: Manually Synchronize Replication Subscriptions by Using Backup or Restore" at http://support.microsoft.com/kb/320499/en-us.)
In my case, data in the ClientDB database was disproportionately spread among almost 200 tables that needed replication. The majority (85 percent) of database space was concentrated in the BatchTrans table, which had more than 53 million records. The next largest table had about 5 million rows.
Table 1 shows how the data was spread among the tables in ClientDB. For my client, I wrote a DTS package that prepared structural changes in the seven largest ClientDB tables (each with more than a million rows). I ran this package only once. Even if new initial snapshots were required later, running the DTS package would be unnecessary because the tables would already have the required uniqueidentifier columns.
Configuring Publications and Generating the Initial Snapshot
After configuring the distributor and pulishers, it was time to generate the initial snapshot. To increase the performance of the Snapshot Agent for large publications (or for publications that have large tables), I recommend creating a new profile for the Snapshot Agent and using it instead of the default one. Figure 4 shows the process for creating a new Snapshot Agent profile.
In Enterprise Manger, select Replication on the Publisher (Distributor) server, then right-click and choose Configure Publishing, Subscribers, Distribution. On the Distributor tab, click Agent Profiles, select the Snapshot tab, and click New Profile. In this case, I changed values for a few parameters that affect the performance of Snapshot Agent. On the Replication Agent Profile Details screen, enter the values that Table 2 shows (or tune these values for your own environment as I did here; I tested these changes in the client's development environment and choose the combination that gave me the fastest time). When you've changed the values appropriately, click OK. Then, on the Agent Profiles screen, select the Snapshot_Speed profile and make it a default profile by selecting Change all existing Snapshot Agents to use the selected profile option. Click OK twice to finish the configuration.
At this point I was ready to configure publications. I decided to create 12 publications: 4 different publications that combine groups of related tables and 8 others in which I've combined nonrelated tables based on their business functions, frequency of changes, size, and other factors. By using several publications, you can configure each Merge Agent to run independently, on its own schedule, using different threads, which speeds up the replication process. You do need to balance the number of agents running with available resources. The sidebar "Steps for Configuring Publications," page 31, gives you the process you need to go through to configure each publication you create.
After generating snapshots for all publications, I made a backup of the ClientDB database on the Publisher server, then copied backup files to the Subscriber server and restored the database there.
Creating a New Merge Agent Profile and Configuring Subscriptions
To increase the performance of the Merge Agent and minimize the effect of nonconvergence when SQL Server processes child and parent records in separate generation batches, I decided to create a new profile for the Merge Agent. (To read about the problem of nonconvergence, see the Microsoft article "PRB: Non-Convergence When SQL Server Processes Child and Parent Generations in Separate Generation Batches" at http://support.microsoft.com/default.aspx?scid=kb;en-us;308266.) To create a new profile for the Merge Agent, start in Enterprise Manager and select the Replication node on the Publisher (Distribution) server. Then, right-click and choose Configure Publishing, Subscribers, Distribution. On the Distributor tab, click Agent Profiles, and on the Merge tab, click New Profile.
As Figure 5 shows, on the Replication Agent Profile Details screen, enter the values that Table 3 shows, then click OK and confirm changes if a message appears. On the Agent Profiles screen, select the Merge_Speed profile and make it a default profile by selecting Change all existing Snapshot Agents to use the selected profile. Click OK twice to finish the configuration.
Next, I configured Subscriptions. Starting in Enterprise Manager, select the Replication node of the Subscriber server, select Publications, right-click on a publication, and select Push New Subscription. Click Next, and in the Enabled Subscribers section of the Choose Subscribers screen, select the subscriber server (SQLServer2 in this case), and click Next. On the Choose Destination Database screen, verify the database name (ClientDB in this case), then click Next. On the Set Merge Agent Schedule screen, select Continuously (important to note that this is not a default option), and click Next.
When you see the Initialize Subscription screen, select No, the Subscriber already has the schema and data (again, this is not a default option) and click Next. You don't need to select this option because you've already restored a backup of the database to the target server. On the Set Subscription Priority screen, select Use the Publisher as a proxy, then click Next. Then, on the Start Required Services screen, accept all settings and click Next. Click Finish to complete the wizard, then click Close. In the right pane of Enterprise Manager, you'll see a new line with the newly configured subscription information. You'll need to repeat these steps for all configured publications
Test Merge Replication Before Going Live
Before implementing merge replication in your production environment, I recommend that you test its performance on similar hardware with the expected data load. It's possible that in your particular case, merge replication might not perform well enough to satisfy your specific business requirements. This is true especially in situations that have persistent high data loads. If this is your situation, consider transactional replication or other data transfer scenario. I know of one case when merge replication slowed down the data load on one of the servers by a factor of 2.
However, for small and midsized databases that have evenly spread and a not-very-aggressive transaction volume (as I described earlier), merge replication will usually work fine. Make sure that your user data, log, and TempDB and Distribution files are located on different drives. For additional explanation and advice, read the Microsoft article "SQL Server 2000 Merge Replicationand Optimization" at http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/mergperf.mspx.
This implementation of merge replication satisfied my client's requirements. This solution supports simultaneous data entry on different servers (including modification of text columns) and doesn't require special data partitioning. It avoids many conflicts at the row level by merging changes in different columns into one record. And this solution makes it easy to increase the number of subscribers. This solution doesn't require deep scripting skills, and it can easily be supported by a relatively inexperienced DBA who wants to learn new technology. If your situation is similar to the one I've described here, merge replication is a good choice.