Continually monitor performance
SQL Server has a vast number of features available to DBAs and database developers to ensure that tasks are properly implemented, designed, and maintained with great stability. Two features that you might not have considered combining but that greatly complement each other are SQL Server Integration Services (SSIS) and merge replication.
SSIS was introduced in SQL Server 2005 as a replacement for DTS. With SSIS, you gain the ability to implement enterprise solutions for extraction, transformation, and loading (ETL). It also gives you an incredibly robust automation and job services platform. A DBA can implement various SSIS packages to automate high-level preventative tasks to ensure SQL Server instances are always running as best they can in their environments.
In SQL Server, merge replication has evolved into a true enterprise data synchronization and distribution layer. As with many features in SQL Server, setting it up is easier than maintaining it. An improperly maintained merge replication system can lead to downtime, slower overall performance, and the need for redesigns. The cost of these problems is always seen by the business as a budget loss, not accounted for in the initial budget process.
To prevent such problems, there are steps you can take to improve the processes needed to maintain merge replication. One of the most important steps is obtaining a performance baseline to assist with any problems that might arise later.
Knowing the current state of merge replication on the database server is crucial to being proactive and preventing problems. There are three primary areas that you should concentrate on for baseline information collection:
Distribution statistics can be captured by monitoring the execution plan cache and index utilization. This involves monitoring fragmentation and other areas related to query tuning. Because most administrators and DBAs are familiar with this type of monitoring, I'll move on to the publication and subscription areas.
Both publications and subscriptions can be monitored using a joint set of operations. Several system stored procedures and tables are available to assist in collecting information about how the publications and subscriptions are functioning. For example, in a publisher and pull subscriber landscape, the following system stored procedures and tables provide performance indicators for baseline collections:
Note that the type of subscriptions being used in a replication landscape can affect where the merge replication tables are located and where merge-replication system stored procedures are executed. Pull versus push subscriptions cause differences in where the metadata is located and where work is performed when historical and active sessions are synchronized. For more information about the differences between pull and push subscriptions, see "Implementing Replication Overview" in SQL Server Books Online (BOL).
Gathering the Subscription Header Information
The first step in collecting the baseline data for the publications and subscriptions is gathering the subscription header information. You can use the code in Listing 1 to create the SubscriptionHeader table, which will contain the subscription information for each subscriber to the specified publication. Figure 1 shows the layout of this table.
To populate the SubscriptionHeader table, you use MSmerge_agents (the distribution database table) and sysmergesubscriptions (the publication database table). The query in Listing 2 shows how to retrieve the required data. To use this code, you need to replace <publication_db>.<schema> with the name of your publication database in the code in callout A.
Using the query in Listing 2, you can also periodically perform an incremental update of the SubscriptionHeader table. The query checks to see whether subscriptions have been added, have been removed, or require updates. This is a crucial step. It ensures that the appropriate subscriptions are included in the baseline collection so that the statistics about those subscriptions can be gathered.
For example, suppose a new subscription is added to a publication landscape that covers the entire United States, and each state is set as a region. Each region has the potential to have varying latency based on connectivity, the amount of data (which can be affected by filtering in the publication), and concurrent subscriber connections. Regularly running the query will ensure that the new subscription is added to the baseline collection so that its statistics can be gathered. If the query isn't run regularly, the new subscription's statistics won't be gathered. Consequently, if there's a performance problem, it would be much more difficult to determine whether the new subscription is causing the problem.