Peer-to-peer replication is a SQL Server 2005 technology option that can help you create a more available system and also let you scale your geographic workloads by replicating data between remote sites.This technology isn't completely new; it's based on the transactional-replication technology in SQL Server 2000. In fact, you might be using peer-to-peer replication technology today in SQL Server 2000 if you're using bi-directional transactional replication. But peer-to-peer replication in SQL Server 2005 significantly enhances this existing technology and also provides a series of wizards that can help you manage the setup and configuration of the Peer-to-Peer infrastructure. Here's an introduction to what peer-to-peer transactional replication is, what it looks like, and how to get the tools you need to use it.

What It Is


SQL Server provides two main types of replication: merge replication and transactional replication.You typically use merge replication between a database server and a client machine.This form of replication is great to use when you need to have client machines that aren't constantly connected to the main database. Merge replication includes features that perform conflict resolution to ensure that data always stays up to date in the database but lets you query and update local data, then merge it back into the main data store. I'll cover merge replication in more detail in an upcoming column.

You use transactional replication primarily between servers to keep sets of data up to date on multiple servers.You can also use transactional replication as a scalability solution because it lets you replicate data to multiple servers so that applications can hit various servers independently instead of overloading just one server.The ability to have multiple copies of your data also increases data availability; if one of your replication sites goes offline, all you need to do to access the data is switch to a different site. Be aware that SQL Server can't automatically perform this switch.You perform the switch at the application level by adding code that automatically redirects a connection if a particular database isn't available.

Peer-to-peer replication uses existing transactional replication technology, which lets you have identical databases continuously synchronizing in near real time, then enhances this functionality by providing bi-directional replication between all peers in a system. Peer-to-peer is a form of asynchronous replication, so it has some latency. In addition, it's possible that if one of your servers goes down, not all of the transactions will make it to the other servers. However, because peer-to-peer replication operates in near real time, the amount of latency and potential data loss is extremely low.

What It Looks Like


The most basic concept you need to understand in peer-to-peer replication is that every server contains all of the data but each server is responsible for updating only its own subset of data. Thus, every server carries the same schema and each server is a subscriber to all the changes that happen on the other servers while being a publisher of its own changed data. When data changes on one server, those changes go out to all the subscribers in the peer-to-peer network.

Imagine a typical peer-to-peer configuration in which three servers are distributed geographically, one in London, one in Los Angeles, and one in Taipei. Each server contains and updates data specific to its geographic location and also sees all the data from the other locations. A key part of peer-to-peer replication is that each server is responsible for changing its own data set and no other location can change any of the data in that data set. If this rule is violated, the data could be changed in two places, and because there's no data locking between sites, when the data is replicated, you can end up with inconsistent results. For example, in our three-server scenario, it's likely that you would divide the data based on the geographical location of the servers. Thus, the server in London would see and have access to query all the data in the organization, but would be responsible for updating just the local London data. Likewise, the server in Los Angeles could query the London data but couldn't update it.

This architecture gives you the ability to scale the number of users or processes running on each server while still giving users access to all the data from every server. It also provides high availability for the entire system; if one or more sites go offline, the other sites will continue to operate and replicate data.You can redirect the applications at the offline site so that they talk to one of the other replication sites in the network and continue to query and work with data, thus reducing possible downtime. Once the offline site comes back online, the replication process restarts and the site receives all of the changes that have happened at the other sites.

The Tools to Make it Work


The process of setting up and configuring peer-to-peer transactional replication is simplified in SQL Server 2005 Management Studio.You can find the steps for setup and configuration in the Microsoft article "How to: Configure Peer-to-Peer Transactional Replication (SQL Server Management Studio)" at http://msdn2.microsoft.com/en-us/ library/ms152536.aspx. In addition to making peer-to-peer replication easier to set up, SQL Server 2005 also introduces enhancements to the replication monitor that help you more easily understand and watch what's happening in your replication infrastructure.The replication monitor lets you keep track of the latency in your environment—right down to a publication level—and can help you understand and tune your system to reduce the latency. For more information about the replication monitor, read the Microsoft article "Monitoring Replication with Replication Monitor" at http://msdn2.microsoft.com/en-us/library/ms151780(en-US,SQL.90).aspx.

Peer-to-peer replication is an interesting technology that can bring significant scalability and availability benefits to many database environments, and in SQL Server 2005 it's simpler to set up, configure, and manage than in previous SQL Server releases. If you have multiple remote sites and the ability to split your data sets so that certain sites have unique sets of data, you can take advantage of all the great benefits of this next generation of replication.