Downloads
8759.zip

Transfer-delta procedures can give you the best of replication without the cost

SQL Server 7.0 has made replication easier than ever, but to maximize your expertise, consider all your options. You can use replication wizards, Data Transformation Services (DTS), the SQL Distributed Management Objects (SQL-DMO) interface, or even binary code if you want to. But if you're already comfortable with T-SQL, consider learning how to use custom stored procedures for copying data. In this article, I show you how and help you understand when to use this method.

According to SQL Server Books Online (BOL), replication copies and distributes data and database objects from one database to another, and certain types keep the copies consistent with the sources. Snapshot replication makes static copies of your choice of tables, columns, and records; transactional replication starts with a snapshot but periodically copies changes in the source tables to the target tables, making it dynamic. Because snapshots aren't updated regularly, they usually don't have performance problems. However, transactional replication keeps a log reader process running continually. This process tracks which changes have been copied and which changes are waiting for the next update. Because the log reader process uses system resources, the more changes users make to the source tables, the heavier the log reader's performance load is.

For transferring data, you can create custom stored procedures that combine some of the best features of snapshot and transactional replication, while avoiding some of their weaknesses. I refer to these custom procedures as transfer-deltas, because transferring only the changes to data is one of their most important features. Unlike when you use standard replication through the Enterprise Manager or directly execute some of the hundreds of replication procedures in SQL Server 7.0, when you write and maintain these custom stored procedures, you use T-SQL code.

Coding your own procedures has several advantages over using Enterprise Manager or system procedures to set up and maintain your replication needs for read-only destination tables (except in cases that require merge replication). First, like transactional replication, custom transfer-delta procedures move only data that has changed. But like snapshot replication, transfer-delta procedures require minimal processing overhead. (The custom procedures don't need a log reader process because they don't have to maintain information about which records have moved or need to move.) Second, like synchronization, these procedures can make all records match every time the procedure runs. Transactional replication only makes sure that records are moved once, which might be exactly what you need. But if you need destination records to always match the source exactly, transactional replication won't work. If records are changed on or deleted from the destination table, transactional replication doesn't make the changed records on the destination match the source, nor does it replace deleted records on the destination table. Another advantage with custom transfer-delta procedures is that source tables can undergo design modifications without requiring troublesome replication administration. Also, you can control the updates through conventional job scheduling.

Of course, custom code isn't always better than other types of replication. The performance advantage of these custom procedures over transactional replication diminishes the more frequently you need to transfer the data. For example, when you need constant updates to a hot backup server, running custom procedures repeatedly is a poor alternative to the performance of the log reader agent that transactional replication uses.

If your updates aren't too frequent and you want to try custom replication, start by deciding whether to push or pull records, and create a link for the remote server. If you execute the stored procedures on the server that holds the source tables, you're pushing the changes; if you execute them on the server that receives the records, you're pulling the changes. Create the link to the remote server by using sp_addlinkedserver or Enterprise Manager to add that server to the local server's linked servers list (SQL Server Group/servername/Security/Linked Servers). You can put your source and destination tables on the same server, but the examples in this article assume you have separate servers.

A typical transfer-delta procedure contains three parts. The first part deletes obsolete records in the destination tables, the second part modifies changed records, and the third adds new records. This order is important because it prevents the procedure from unnecessarily evaluating obsolete and new records.

Let's look at an example that shows the three parts individually. Listing 1 puts the three parts together into a complete procedure. The example uses a source table named Publishers that resides in a local database (the Publishers table in Microsoft's Pubs database) and a destination table, also named Publishers, in a remote database on a server named Server2. The table has a one-column primary key, pub_id, and four attribute columns: pub_name, city, state, and country.

The first part of the procedure consists of a DELETE query to remove any record that exists in the destination table but that no longer exists in the source table. If a table has more than one column in its primary key, this DELETE query must compare every primary key column against its match in the destination table. You could also use NOT IN with the correlated subquery, but doing so could make multicolumn key situations more complex. If you want to make the destination table a historical record instead of a duplicate, simply leave out the DELETE query.

The second part consists of an UPDATE query. If the source and destination tables have a timestamp column (set every time the source table is updated), this query only has to compare the primary key and timestamp columns to check whether destination records need updating. If the table has no timestamp column, the query must compare every attribute column in the source table with its matching column in the destination table (as a commented section in Listing 1 shows). If the timestamp column's value (or any attribute column's value) doesn't match its counterpart, the query updates the value for all attribute columns and the timestamp column, if one exists.

If the source and destination tables have more than one column as part of the primary key, drop the JOIN/ON clause (as Listing 1 illustrates), separate the table names with commas, and add key-column comparison statements to the WHERE clause. Listing 2 shows examples for tables with and without timestamp columns.

When you use a transfer-delta stored procedure for an intersection (many-to-many) table whose primary key comprises two columns (each a foreign key) and no attribute columns, you don't need an UPDATE query because only attribute columns get updated. Likewise, if the key has attribute columns, you need an UPDATE query.

The third and final part of a transfer-delta procedure is an INSERT statement that copies records to the destination table if the source table has a record with a primary key that isn't in the destination table. The statement copies every column, whether primary key or attribute, and must use every column in the primary key to compare the source and destination tables. When multiple columns make up the primary key, you can concatenate them to compare them all, as Listing 3 demonstrates.

You can modify the basic form of Listing 1 in variations to cover most situations. For example, Listing 3 is a pull procedure that uses casting and concatenation to combine two primary key columns with different data types. CAST (and CONVERT) changes a column's original data type to a different data type in the result set. Concatenating joins two or more values into one value, but their data types must be compatible. For instance, to concatenate values from a date column, a numeric column, and a text string, you need to change the date and number values to text by using CAST or CONVERT.

Custom transfer-delta stored procedures can also scrub and merge data as it's moving. For simple scrubbing to make data forms consistent, you can cast or convert data types, concatenate to combine multiple columns into one, or use CASE statements to conditionally replace values. You can accomplish simple merging by having multiple sets of source tables and one set of destination tables, omitting DELETE queries from your transfer-delta procedures. For example, if you have two Sales databases that you want to keep separate, but you want to combine historical sales records in a third location, you can use procedures that copy records from the two sources into a CombinedSales table. If one source table spells out state names and another abbreviates them, you can use a CASE statement in one table's transfer-delta procedure to make it match the other table's convention.

When you have a set of these procedures for each table, you can run them all by calling one global procedure. Listing 4 is an example global procedure, which expects individual table procedures to be named sp_TransferDelta_tablename. This approach can be useful during development and debugging. However, when you're scheduling a job to automate the updates, you assist future troubleshooting if you make each transfer-delta procedure a separate step, rather than calling a set of them from a global procedure.

When you're creating a new transfer-delta procedure, you can use the following query to generate a comma-delimited list of the column names that you'll need for the INSERT queries:

SELECT column_name + ',' FROM information_schema.columns
WHERE table_name = 'publishers'

To maximize your expertise and discover more advanced options for custom stored procedures, study the replication system stored procedures and extended stored procedures in the Master database. You can generate a list of them by running the following query:

"SELECT DISTINCT Name FROM SysObjects WHERE (Type = 'P' OR Type = 'X')
AND (Name LIKE '%agent%' OR Name LIKE '%repl%' OR Name LIKE '%dist%'
OR Name LIKE '%publ%' OR Name LIKE '%merge%' OR Name LIKE '%subs%'
OR Name LIKE '%sync%') ORDER BY 1"

If you can't see the Master database in Enterprise Manager, you can make it appear by right-clicking the server name, selecting Edit SQL Server Registration Properties, then selecting the Show system databases and system objects check box and clicking OK.

Note that the procedures in the Master database's Extended Stored Procedures folder are compiled objects (including some that start with sp_), so you won't be able to study their source code by double-clicking them as you can with other stored procedures. Fortunately, they all appear in BOL, so you can learn something about them.

You can create transfer-delta procedures when you need the benefits of customization, or you can use them as a base for even more sophisticated custom procedures. Developing your ability to go beyond the wizards built into SQL Server will give you more flexibility in transferring data and expand your options when you need the benefits of replication.