Replicate Publisher changes to Subscribers and Subscriber changes back to the Publisher

SQL Server's advanced replication features let you synchronize data between databases, whether the databases reside on the same or different servers. But why not just back up the database and restore it on another server or use Data Transformation Services (DTS) to transfer tables and data? If you're working with read-only databases, these techniques work fine. But what if you have a large (100GB or larger) online transaction processing (OLTP) database that handles 100 or more transactions a minute? Or what if you have more than 1000 users running select, insert, update, and delete statements at least 8 hours a day? In these scenarios, running a backup and restore every 10 minutes—which would lock your tables and generate irate calls from users—is unacceptable. Using DTS or bulk copy program (bcp) to transfer that much data whenever you want to synchronize databases also isn't an option; the transfer would take far too long.

Although replication doesn't speed up data transfer, it does let you transfer data from one server to others all at once, then apply those changes to other databases. In short, replication can be the best choice for data transfer if you're looking for a happy medium between data concurrency and availability.

SQL Server 6.0 introduced replication support, and SQL Server 6.5 added minor enhancements. But SQL Server 7.0 takes replication to a higher level of functionality and ease of use. SQL Server 7.0 goes beyond snapshot and standard transactional replication to support two-way (or immediate updating subscription) replication and merge replication. It also lets you replicate data to and from non-SQL Server platforms. And SQL Server 7.0 automates all replication tasks, letting database administrators set up and administer replication through wizards without writing a single stored procedure. Through the replication wizards, you can even instruct SQL Server to write cleanup tasks and notify the administrator of errors by email or pager. Let's walk through SQL Server 7.0's wizards to set up an example immediate-updating subscription replication solution, examine what SQL Server does behind the scenes to implement this solution, then test the solution.

Publishers, Subscribers, and Distributors


Microsoft uses a Publisher-Subscriber metaphor to describe SQL Server's replication process. Publishers are servers that make data available for replication. Subscribers are servers to which SQL Server replicates the data. Note that each Publisher can simultaneously act as a Subscriber. On the Publisher, SQL Server generates publications, which can contain one or more articles. An article is simply a table or a subset of a table. You then create subscriptions by choosing the publications you want the Publisher to replicate to Subscribers. Each subscription can contain one or multiple articles, and to be replicated, articles must belong to subscriptions. The last piece of the replication puzzle is the Distributor—the server that tracks the data and transactions that you need to replicate. Note that the Distributor can be the Publisher server or a separate server.

SQL Server 7.0 supports three types of replication: snapshot, transactional, and merge. Snapshot replication transfers a "snapshot" of the data—a copy of the data at a specific point in time—from one database to another. Snapshot replication generates minimal additional workload on the Publisher and Distributor servers, but it doesn't give you a high level of data consistency among replication participants because it doesn't keep Publisher and Subscriber databases in synch. Thus, snapshot replication is best for situations where you don't need up-to-the-minute data synchronization.

In contrast, with transactional replication, several processes continuously monitor changes on the Publisher and periodically replicate those changes to Subscribers. With SQL Server 7.0, you can set up replication to also monitor Subscriber databases and replicate changes back to the Publisher. Microsoft calls this functionality immediate-updating subscription. Transactional replication is best for situations where you need constant synchronization between the Publisher and Subscriber servers.

SQL Server 7.0 also features support for the new merge replication function, which lets you merge data from databases on several servers to one database on the central server. Merge replication is best for environments in which a central server acts as the organization's main repository and must be regularly synchronized with all the Subscribers. Subscribers in merge replication also can modify data. If the Publisher and a Subscriber modify the same data at the same time, a conflict reader algorithm determines the winner.

When considering replication, you have to balance replication latency and data integrity. With minimal latency—that is, if you replicate transactions as soon as SQL Server commits them on the Publisher—you increase the workload for both the Publisher and Distributor. To handle the load, you might want a separate server acting as Distributor. However, if you accept some delay between transaction commitment and replication, the workload is more manageable but you risk not having up-to-the-minute data on the Subscribers. For example, consider a bank headquarters, which uses SQL Server to generate daily management reports, and the bank's branch offices, which use SQL Server to track ATM transactions. You could accept some delay in replicating transactions to the headquarters. However, you need to replicate withdrawal of money at one branch to the rest of the branch servers right away.

In this example, you would invest in a separate Distrib-utor server to forward transactions from each Publisher (each branch server) to the Subscribers (the rest of the branch servers). With SQL Server 7.0, you would set up a central Distributor and configure all branch offices as Publishers with immediate-updating subscription replication. In this configuration, all bank branches would have up-to-the-minute balances of each account at all times, regardless of the withdrawal location. You would set up headquarters, however, as a Subscriber that pulls subscriptions from all branch servers at the end of each business day. Managers, who don't need the latest account information for each customer, would then receive daily activity reports in the morning.

Creating a Publisher


SQL Server 7.0's replication wizards make setting up most replication tasks an easy matter. (You can also use stored procedures to implement replication processes. Although this article doesn't cover the process, "Custom Replication," page 37, shows how you can use stored procedures to implement custom replication solutions.) To see how to use the replication wizards, let's walk through a simple example that sets up an immediate-updating subscription replication process between databases on two servers.

To set up replication, you must use a login account that is a member of SQL Server's Process Administrators (or higher authority) server role. (SQL Server 7.0 has seven built-in server roles; for a list of these roles and their permissions, see Michael D. Reilly, Certifiably SQL, "Login Security," January 2000.) To assign a login to a particular server role, right-click the login under the Security tab in SQL Server Enterprise Manager, then choose Properties from the resulting pop-up menu. Click the Server Roles tab (as Figure 1 shows), select the appropriate role (I'm using the built-in sa account for this example), and click Apply.

You're now ready to run the Configure Publishing and Distribution Wizard, which you find under the Tools/Replication menu. This wizard lets you select the current or a different server as the Distributor and set up the Publisher, which by default is the current server. You also use this wizard to choose the servers you want as Subscribers to the current server's publications. The Configure Publishing and Distribution Wizard automatically creates snapshot, log reader, merge, and distribution agents as well as replication cleanup jobs.

After the introductory screen, the Configure Publishing and Distribution Wizard asks you to choose a Distributor. Let's select the TANYA server (the Publisher) as the Distributor server. The next wizard screen lets you accept a default name (distribution) and location (E:\MSSQL7\data) for the distribution database and enables the current server as Distributor. However, for demonstration purposes, let's select the option to customize the publishing settings. Figure 2 shows the resulting wizard screen, which lets you provide a name and location for the distribution database. For this example, let's leave these settings at their defaults.

Figure 3 shows the next wizard screen, which lets you enable one or more servers to use TANYA as their Distributor after you configure them as Publishers. Clicking the Properties button (...) next to the distribution database name in the Registered servers window brings up the screen in Figure 4. From here, you can change the snapshot folder location. The snapshot folder is where SQL Server stores the snapshot data before replicating it to the Subscribers. The default location is \\TANYA\E$\MSQL7\ReplData, where E$ is the share on which you installed SQL Server. This screen also lets you specify the security profile for the replication agents, which run on the Publisher. By default, replication agents impersonate the SQL Server Agent account on the Pub-lisher. For our exam-ple, let's enable only REBA to use TANYA as a Publisher, accept the default security profile, and accept the default location for the snapshot folder.

The next step is to enable the databases you want to publish. After you enable the database for publishing, the database owner can create a publication in this database. For this example, let's select SQL Server's Northwind sample database as our publication database, as Figure 5 shows. Note that this wizard screen also lets you enable transactional and merge replication. Let's select transactional replication only.

The wizard then asks you to select servers to subscribe to publications from the current Publisher; select TANYA and REBA. The last screen in this wizard gives you an overview of what you've instructed SQL Server to do: enable TANYA as a Publisher and Distributor, enable the Northwind database for publishing, and let the TANYA and REBA servers subscribe to publications on the TANYA server. When you click FINISH, SQL Server runs the code to set up these servers. When SQL Server is finished, Enterprise Manager pops up a message that says you've successfully configured TANYA for replication.

Creating a Publication


Now you need to run the Create Publication Wizard to create publications for replication. For this example, let's replicate Northwind's Customers table on the TANYA server. Figure 6 shows the wizard screen that lets you choose the type of replication you want to use with the publication. Select Transactional publication. Figure 7, page 34, shows the wizard's next screen, which lets you choose whether to allow immediate-updating subscriptions; select Yes to guarantee data consistency on both servers. By activating immediate-updating subscriptions, you instruct SQL Server to apply committed transactions at the Subscribers immediately after applying them at the Publisher. The wizard then asks whether all Subscribers are SQL Servers.

The next wizard screen lists tables that can't participate in transactional replication. To uniquely identify each modified row for replication, SQL Server requires primary keys on each table participating in transactional replication. So, in a transactional replication solution, you can't replicate tables that don't have a primary key. Another catch with SQL Server replication is that you can replicate identity column values but not the identity property. So if you want to set up two-way (or immediate-updating subscription) replication, you can't use tables with identity columns.

When you choose the Customers table, the wizard pops up the Enterprise Manager message in Figure 8, page 34, which says SQL Server is adding a timestamp column to the table. Contrary to common belief, the timestamp column isn't a date-and-time field; it's a field that tracks the sequence of modifications. Thus, if a table doesn't already have a timestamp column—as is the case with the Customers table—SQL Server adds the column as a way to identify the sequence of modifications that occur on the same table but on different servers.

The remaining wizard steps configure the Northwind publication on TANYA and let you define data filters to specify which data you want to replicate. For simplicity's sake, let's replicate all the data in the Customers table.

Configuring a Subscriber


With your Publisher, Distributor, Subscribers, and publications set up, you can now configure the type of subscriptions you want on the REBA Subscriber. You can select either pull or push subscriptions. Push subscriptions tend to work best for replicating frequently changed values; pull tends to work best for retrieving infrequent data updates. For this example, choose Push New Subscription from the Create and Manage Publications screen in Figure 9. The resulting Push Subscription Wizard screen lets you select which database you want to push the Customers table to; from a drop-down list of databases, select the Test database on REBA. The Push Subscription Wizard then asks you whether you want this to be an immediate-updating subscription; select Yes.

The next wizard screen lets you specify the acceptable latency—how often you want to deliver transactions from one server to another—by setting the Distribution Agent schedule. Choose continuously updating subscribers, which guarantees minimal latency. The wizard then lets you specify whether the Subscriber needs to receive table schema and data. Because REBA's Test database doesn't have any user tables, select the option to transfer table schema and data. You can now start the following required services on the participating servers: Microsoft Distributed Transaction Coordinator (MSDTC) on both the Publisher and the Subscriber, and SQL Server Agent on the Publisher. The last wizard screen lets you review the options you've chosen, then sets up the subscription on REBA.

Viewing Your Handywork


From Enterprise Manager, you can view the results of the replication wizards' work. First, notice in Figure 10 that the Northwind database is now available in the hierarchy of Replication Monitor, Publishers, TANYA. If you expand the Northwind database folder and look at the stored procedures, you'll see sp_MSsync_ins_customers_1, sp_MSsync_upd_customers_1, and sp_MSsync_del_customers_1. These procedures forward committed transactions from TANYA to REBA. Now look at the stored procedures in REBA's Test database. You'll notice similar procedures sp_MSins_customers, sp_MSupd_customers, and sp_MSdel_customers, which forward committed transactions from REBA to TANYA.

If you look at the tables on REBA, you'll see that the replication process has added the Customers table and system tables MSreplication_objects, MSreplication_subscriptions, and MSsubscription_properties to REBA. These tables track the Publisher and Subscriber, as well as replication type and frequency information.

The SQL Server Agents folder under TANYA shows that the replication process has added several replication jobs. Extended and system stored procedures handle these replication jobs, but how does SQL Server know which parameters to pass to the stored procedures? Notice in Figure 10 that TANYA now has a distribution database. When you configured TANYA as a Distributor, the replication wizard added to this distribution database 21 replication-specific tables—along with other system tables present in every database by default. These tables contain all the information the replication agents and replication-specific stored procedures need.

To test whether this replication process works, you can run the following query on the TANYA server:

Use NORTHWIND
update customers set contactname = "maria anderson" where customer_id = 'alfki'

This query modifies the top row of the Customers table. From Enterprise Manager, you can then look at TANYA's Log Reader Agents folder to see that 1 transaction(s) with 1 command(s) were delivered. The Status is Running. If you check the top row of REBA's Test database, you'll see that the replication process has set ContactName to maria anderson.

To investigate the cleanup jobs that the replication wizards created on TANYA, open the Miscellaneous Agents folder. Cleanup jobs remove the replicated transactions from the appropriate tables in the distribution database.

How does SQL Server implement this replication process? First, SQL Server generates table schema (.sch) files, which the database system uses to create tables. SQL Server also generates table index creation (.idx) files, which it uses to create indexes on Subscribers, and bcp files for all replicated tables. The database system then generates stored procedures for each insert, update, and delete action on the Publisher. SQL Server applies schema files on each Subscriber and applies index files to appropriate tables on each Subscriber. SQL Server then uses bcp to copy data into the Subscriber tables. Finally, SQL Server creates stored procedures for insert, update, and delete actions only on immediate-updating subscriptions.

If SQL Server automatically generates these replication tasks, why should you be interested in the behind-the-scenes details? Even though SQL Server 7.0 dramatically improves and simplifies replication setup and implementation, problems still crop up. The causes might have nothing to do with SQL Server. Perhaps an inexperienced DBA didn't answer a wizard question appropriately or didn't use the right security settings. Whatever the cause, when you're trying to troubleshoot replication problems, every piece of information you can bring to the investigation is valuable.

Without a doubt, replication is an advanced feature, but it offers exciting capabilities—especially if your organization needs to keep data on different servers in synch. SQL Server 7.0's new functionality makes replication even more powerful and flexible, and its wizards make setting up replication a simple matter of answering questions—just make sure you give the right answers.