Downloads
142171.zip

SQL Server replication encompasses a set of capabilities that have yet to be fully replaced by other features found natively in SQL Server. Some of the common use cases include the ability to distribute data to separate SQL Server instances and databases for reporting and for offloading activity away from a primary production database. Replication can bring data nearer to the users in distributed environments and enable scaling out activity by multicasting to multiple destinations. And more controversially, some have even found a place for replication from a disaster recovery or high availability perspective. (For more information about this particular subject, see Paul S. Randal's blog post " In defense of transactional replication as an HA technology .")

Although the learning curve for replication can be intimidating, the successful use of replication comes down to understanding the components involved. The best way to gain this understanding is to practice basic deployments, then investigate the associated user schema modifications, metadata objects, SQL Server Agent jobs, and associated agent settings. To help you, I'll first briefly go over replication terminology and describe how the various components work together toward a solution. I'll then step you through a standard deployment of transactional replication, which is a common type of replication.

Understanding the Basics of Replication

There are different types of replication, so I'll start off describing the feature set at a high level. Replication is a bundling of technologies that enable you to copy, distribute, and synchronize specific types of database objects and their associated data and dependencies from one database to one or more databases on the same SQL Server instance or different SQL Server instances (which is more common). The SQL Server instance hosting the database from which you'll be distributing data is called the Publisher. Within a user database, you can define one or more publications, which are a logical bundling of one or more articles. An article is a specific object within the publication database that you want to distribute to another database. Some of the permitted article object types include user-defined tables, stored procedures, and views.

Replication requires a separate database to house both metadata and in-flight data. This database is called the distribution database, and the SQL Server instance that hosts it is called a Distributor. The Distributor can be the same instance as the Publisher, a separate SQL Server instance, or the instance where the data is being distributed to. The decision of where to place the distribution database isn't always consciously considered, but when it is, it's typically due to performance-overhead considerations or availability concerns (e.g., if transactional replication is used in conjunction with database mirroring).

The entity that receives the data from the Publisher is called the Subscriber. The Subscriber can be the same SQL Server instance as the Publisher, the same SQL Server instance as the Distributor, or a separate SQL Server instance altogether. A specific Subscriber is defined by adding a subscription to a specific publication. The subscriber database can contain both replicated and nonreplicated objects and can host more than one subscription from different publications.

A subscription can be defined as push (data is pushed to the subscription database from the Distributor) or pull (data is pulled to the subscription database from the Subscriber). Push subscriptions are more common with snapshot and transactional replication deployments. Pull subscriptions are more common with merge replication because the subscriber might be disconnected more frequently and might need to control when data is refreshed on-demand. The push/pull decision is sometimes also based on the capacity and overhead of the various servers participating in the replication topology.

External executables, which are called replication agents, move the data from the Publisher to the Distributor and then to the Subscriber. The type of replication agent is dependent upon the type of replication being used.

Choosing a Replication Type

There are three major types of replication: snapshot, transactional, and merge. There are also variations based on these, such as peer-to-peer replication, but I won't be elaborating on them here.

Snapshot replication lets you distribute data from a specific point in time. That snapshot is distributed, but it's not updated further by the replication process unless a new snapshot is created and applied again to the Subscriber. Snapshot replication is most commonly used to establish the article schema and associated data on the subscribers for transactional and merge replication.

Merge replication allows Subscribers to directly modify the subscription data (articles) and then synchronize the modified rows with the Publisher. As mentioned previously, the Subscribers might be offline for periods of time, reconnect periodically, and synchronize on-demand. At this point, their changes are synchronized with the Publisher and vice versa. This introduces the possibility of data conflicts, where a Publisher and one or more Subscribers attempt to modify the same set of data.

This last type of replication -- and the one I'll be discussing from this point forward -- is transactional replication. When architected properly, transactional replication can provide low-latency streaming of data modifications made at the Publisher to one or more Subscribers. Replicated data is typically treated as read-only on the Subscriber. Modifications to Subscriber data will cause the overall data set to get out of sync with the Publisher data. Transactional replication offers the now deprecated "updatable subscriptions" feature, which allows Subscribers to replicate data from the Subscriber back to the Publisher. Peer-to-peer replication (a SQL Server Enterprise edition feature) also allows bi-directional replication of data modifications of transactional replication databases.