Managing a distributed database system presents many challenges, not the least of which is the need to quickly synchronize updates. Transactional replication and merge replication are well-known, effective replication techniques, but each has limitations and can limit performance. If you're frustrated by the limitations of these replication methods, you might be overlooking another solution: bidirectional transactional replication. In some cases, implementing this replication technique provides a remarkably fast, geographically dispersed, multisite update solution—if you're willing to jump some hurdles. Microsoft's SQL Server Development Customer Advisory Team helps customers architect bidirectional transactional replication in markets such as finance, stock trading, and telecommunications. Most of these customers use the technology to create a database-reporting solution that they can quickly and easily convert into a failover or disaster-recovery site.
SQL Server has supported this fast, powerful form of transactional replication since SQL Server 7.0. However, few DBAs are aware of it or have harnessed its power, largely because of inadequate documentation and a general lack of understanding about its functionality. SQL Server 7.0 Books Online (BOL) includes a scanty section about bidirectional transactional replication, which Microsoft regrettably omitted from early SQL Server 2000 BOL editions. Microsoft has since included some documentation in the latest SQL Server 2000 BOL edition, which you can download at http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp.
Web Table 1 (which you can download at InstantDoc ID 42858) compares the features of four replication methods: bidirectional transactional, transactional queued, transactional immediate updating, and merge. Most DBAs use transactional replication to disseminate a flow of logged transactions from one source, the Publisher, to one or more non-updated, usually read-intensive target servers (e.g., reporting servers), the Subscribers. DBAs typically use this type of replication to increase aggregate read performance. Immediate updating subscriptions, which Microsoft introduced in SQL Server 7.0, lets transactional subscribers modify data on the Subscriber. However, subscribers can modify data only through a network connection to the Publisher (i.e., the Subscriber must be online). In SQL Server 2000, queued updating subscribers can modify data on the Subscriber without being connected to the Publisher. Both of these methods require you to add a column to user tables. Plus, these methods don't support updating of text or image columns and track changes by using automatically generated triggers.
Although merge replication (which Microsoft introduced in SQL Server 7.0) supports modification of text and image columns, it also uses triggers and requires that you add to tables a replication-specific column of data type uniqueidentifer (i.e., a GUID), which is an extra 16 bytes and is globally unique. Adding these triggers and columns can cause trouble for some applications, and many DBAs frown on the practice. Microsoft designed merge replication specifically for offline mobile users (e.g., users of sales-force automation applications), providing extensible change- and conflict-tracking and management capabilities as well as highly flexible partitioning and filtering functionality. However, to get these benefits, you have to accept the loss of transactional consistency, the addition of a GUID column and triggers, and storage of extra metadata (for example, the merge-generated triggers populate tracking tables to identify data modifications and possible conflicts).
If you want a high-performance, non-triggerbased way to keep two updateable but transactionally consistent sites in sync without adding extra columns, bidirectional transactional replication might be your solution. As the name implies, bidirectional transactional replication provides replication between two servers by using transactional replication agents: the Logreader Agent and the Distribution Agent. The Logreader Agent finds transactions for replicated articles in the database log, builds a corresponding SQL INSERT, UPDATE, or DELETE statement, and writes these statements as transactions to the distribution database. The Distribution Agent finds replicated transactions in the Distributor server database and writes them to the subscribing database. In this form of replication, the Publisher is also a Subscriber and the Subscriber is also a Publisher of the same data. For example, as Figure 1 shows, server LONDON transactionally publishes the Customer table as Customer_Publication_A and server CAPETOWN transactionally subscribes to server LONDON's Customer_ Publication_A. Server CAPETOWN now transactionally republishes the same Customer table as Customer_Publication _B, and server LONDON subscribes to Customer_Publication_B. Note that this method establishes these relationships without requiring you to add a replication-specific column to the Customer table, and you don't have to add triggers on either server to track changes.
You've probably figured out that the relationships I described won't work because any change on either server would cause an endless barrage of data between servers. If a user updates a Customer row on server LONDON, the LONDON Logreader Agent would detect the transaction and write it to its distribution database. The Distribution Agent belonging to server CAPETOWN would pick up the transaction from the distribution database on LONDON and write it to the Customer table on server CAPETOWN. However, the Logreader Agent on server CAPETOWN, which is monitoring the published database log, would also pick up the transaction and put it in its distribution database. The Distribution Agent on server LONDON would find the transaction in the distribution database on CAPETOWN and reapply it to server LONDON. Because neither server knows which server initiated the change, both servers would continually propagate the update. You wanted replication; you got Agent Smith from The Matrix Revolutions!
Thankfully, the solution is relatively simple. When you use the sp_addsubscription stored procedure to add the subscription to the Subscriber, you need to add the parameter @loopback_detection = 'true' to the stored procedure. (For more information about using sp_addsubscription, see the Microsoft article "HOW TO: Implement Bidirectional Transactional Replication" at http://support.microsoft.com/default .aspx?scid=kb;en-us;820675&Product=sql.) Adding this parameter correctly identifies each transaction so that the Distribution Agent doesn't redeliver it to the original Publisher. Unlike trigger-based merge replication, which requires some complex coding to figure out appropriate data row and column modifications as well as conflicts, this log-based transactional-replication method is relatively simple. Bidirectional transactional replication lets the Publisher read transactions to and apply transactions from Subscribers at a high rate of speed—often thousands of commands per second—and at low latencies (the time lag between when a change occurs on one server and when it's applied to the other server). Of course, these rates depend on network and disk read/write latency and the size of the rows being modified. Nevertheless, on a fast, reliable network, even servers as far apart as New York and Los Angeles could operate at only a few seconds (possibly less than 4 seconds) behind each other. Plus, unlike merge replication, transactional replication lets you replicate the execution of stored procedures. You might want to use stored-procedure replication when you need to modify a large number of rows. Instead of moving the resulting rows across the network, you replicate only the procedure execution statement that the Publisher issued. For example, you might replicate the statement
@month, @status, @value,
thus saving a lot of I/O, networking, and CPU time.
Is It for You?
You can realize many performance benefits without changing your schema when you use bidirectional transactional replication. But you must also evaluate the costs and disadvantages of this type of replication. The complexity of initial setup, difficulty of conflict management, and limited extensibility are all factors you need to weigh before deciding bidirectional transactional replication is right for your environment.
Complex setup. Because bidirectional transactional replication isn't fully UI-wizard driven, you have to create or modify some replication setup scripts by hand, and Subscriber initialization can be complex. For example, one method of setting up bidirectional transactional replication of a database between the two servers in Figure 1 requires the following steps:
- Detach a database from server LONDON, and attach it to server CAPETOWN.
- Use the Create Publication Replication Wizard to create a transactional publication on server LONDON and server CAPETOWN.
- Because you already have the data you want to replicate, use the no-sync option on server CAPETOWN when transactionally subscribing to the publication on server LONDON.
- Unfortunately, when subscribing, you can't add the @loopback_detection = 'true' parameter by using the Replication Add Subscription Wizard. So on server CAPETOWN, you must use the Generate Script Wizard to generate and save the replication script for the subscription. Drop the subscription from server CAPETOWN, modify the replication script you generated and saved, add the @loopback_detection = 'true' parameter to the call to the sp_addsubscription procedure, then execute the script on and against server CAPETOWN.
- You now modify the replication script again, this time adding the correct server and database names for server LONDON, and execute the script on and against server LONDON.
Voilà—you've installed bidirectional transactional replication! And, no, it wasn't easy.
Complex conflict management. The next factor to consider is the rather sticky area of conflict management. A conflict can arise when two servers modify the same row, causing data to be out of sync. Unlike other Subscriber-updateable forms of replication, bidirectional replication doesn't offer an out-of-box conflict-management solution. Instead, you're left to your own devices. To handle conflicts, merge replication provides rich out-of-box features to manage complex conflict cases (e.g., row-level and column-level conflicts) and offers developers and architects extended flexibility in the form of custom conflict handlers written in T-SQL or other programming languages. Immediate updating replication Subscribers avoid conflicts by using the Distributed Transaction Coordinator (DTC) to perform a two-phase commit. However, this technique requires an online, connected environment. Although the queued-updating solution supports offline data modification, queued updating offers a limited conflict-handling model; either the Subscriber or the Publisher changes will take precedence for the entire transaction batch. Furthermore, as I noted before, neither queued updating nor immediate updating supports modification of the text or image data type, whereas both merge and bidirectional transactional replication do.
Bidirectional transactional replication works best when you don't expect conflicts, when you can avoid conflicts by using application restrictions (e.g., users can modify data based on data ownership), when publications are filtered, or when only one server is modified at a time. (The last condition occurs most often in a high-availability or failover scenario.) You can avoid conflicts in bidirectional transactional replication in several ways.
Many database designs use the IDENTITY data type as a primary key. To effectively use this kind of table design with bidirectional transactional replication and avoid conflicts such as duplicate keys, you must include two steps in your setup process. First, you must define the IDENTITY column with the NOT FOR REPLICATION property. Second, if the design requires insertion of rows independently on both servers, you need to start each IDENTITY column at a different number. For example, you might specify that the Customer table on server LONDON starts at the IDENTITY value 0 and the Customer table on server CAPETOWN starts at the IDENTITY value 1,000,000. You achieve this separation by using the DBCC CHECKIDENT RESEED command directly or by using the sp_addscriptexec functionality to send the commands to replicated servers after the initial setup and synchronization phase.
Alternatively, you can avoid duplicate-key conflicts by designing your tables so that they use UNIQUEIDENTIFIER as a primary key. But this method increases overall storage-space requirements, and not everyone likes having a tongue-twisting report-spaceeating customer ID such as C677EF69 -8645-4C1D-8816-7E629F8B1AA0 (which is the ID SQL Server generates for this data type).
Tracking conflicts after they occur can also be difficult when you're using bidirectional transactional replication. Update conflicts are especially difficult to track because the last update to a specific column for a row overrides the previous result. For example, a conflict would occur if a record is updated on server LONDON, but when it arrives at the destination server CAPETOWN, the record has already been deleted. In this situation, the default behavior for transactional replication is to raise an error and stop processing. At this point, the administrator has a couple of options, including removing or modifying the entry in the replication system tables in the distribution database—thus breaking the commandment "Thou shalt not modify a system table"—or re-initializing the subscription. However, depending on the type of connection you have, re-initializing could be a glacial experience. Another alternative is to rerun the Distribution Agent with the additional parameter Skiperrors %error_number%. To further reduce conflict risk, you can filter publications on a field such as Region_ID. With this filter, server LONDON would publish only data that belongs to Region_ID=1, and server LONDON would subscribe to server CAPETOWN, which publishes only data from Region_ID=2.
You can also use one of the following nonpartitioned methods to track and manage conflicts: modifying the text or body of the replication-generated stored procedures to include custom code; adding tracking columns (e.g., universal datetime, unique identifier, or priority data type columns) or triggers to the replicated tables; selecting the XCALL option, which provides full access to a modified record's current and previous values; or some combination of these actions. For sample code and more information about these methods, see the "Bidirectional Replication" BOL topic at mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\replprog .chm::/rp_replsamp_3ve6.htm. As you can see, handling conflict is the Achilles' heel of bidirectional transactional replication.
Limited extensibility. Another factor to consider when thinking about using bidirectional transactional replication is whether you'll want to extend the bidirectional design to include more than two servers. Adding filters for partitioning purposes to publications can extend bidirectional replication to include multiple servers. However, doing so can dramatically increase complexity, administration costs, and even the potential for performance degradation. Performance is highly dependent on filter complexity. For optimal performance and supportability, design your replication system so that it always has only one central hub and so that communication is always between no more than two servers. For example, as Figure 2 shows, server LONDON can bidirectionally replicate with server CAPETOWN and server NEWYORK, but server CAPETOWN and server NEWYORK can't replicate directly with each other. In this unfiltered design example, all changes would flow among all the servers: Changes in CAPETOWN would eventually arrive in NEWYORK, but they'd always go through the central LONDON server. If necessary, you could implement simple business rules in your user application to enforce modifications only to rows belonging to a specific location or server. This design is advantageous in applications that let users browse all data but modify only local records. However, if users frequently make numerous changes on each server, dramatically scaling out this unfiltered design can be challenging because the high number of transactions flowing between servers could increase the load on network performance.
Further Improving Performance
When you use bidirectional transactional replication, you can further decrease latency between servers by lowering the PollingInterval parameter of the Logreader and Distribution agents for each published and subscribed database. For example, setting PollingInterval to 2 will cause the agents to "sleep" for a maximum of 2 seconds after they find no transactions to replicate. So the Distribution Agent could be a maximum of only 2 seconds behind the transactions the Logreader writes to the distribution database. And the Logreader Agent would be a maximum of 2 seconds behind a transaction committed to the log. Of course, as long as transactions are waiting to be replicated, the agents don't enter into the PollingInterval logic.
The next release of SQL Server, SQL Server 2005, will offer a transactional-replication feature named Peer-to-Peer, which will expand on the current bidirectional offering by providing out-of-box support for multiple bidirectional subscribers. Using the example in Figure 2, Peer-to-Peer's enhanced loop-back detection algorithm will let CAPETOWN directly replicate to and from NEWYORK instead of depending on LONDON. Although this feature offers good scalability (depending on data-change volume, you could support between 10 and 20 peer-to-peer servers) and increased UI integration and supportability, this version still won't support out-of-box conflict management.
Bidirectional transactional replication offers exceptional delivery rates and very low latency, supports modification to text and image data types, maintains transactional consistency, and doesn't require additional columns or triggers. However, it lacks the highly advanced conflict-management and powerful dynamic-filtering support that merge replication offers. Is it for you? If you want high performance, bidirectional transactional replication has no equal.
Cedric Britt and Samuel Penn,|
"Transactional Replication Without the Snapshot Agent," May 2001, InstantDoc ID 20005
"Why Not Transactional Replication?" May 2002, InstantDoc ID 24519
"HOW TO: Implement Bidirectional Transactional Replication," http://support .microsoft.com/default.aspx?scid=kb;en-us ;820675&Product=sql
"Transactional Replication Performance Tuning and Optimization," http://support.microsoft.com/default