Log shipping is a robust yet low-tech, low-cost way to achieve high availability and disaster protection in your SQL Server environment. Log shipping automatically ships—copies and restores—a production server's transaction logs to a standby server, which stands ready to take the place of a failed production server. If you keep your standby server's copy of production data current (within minutes of the production data), you have a warm standby.
For years, SQL Server developers and DBAs have had to create customized scripts to implement log shipping. But with SQL Server 7.0, Microsoft provided undocumented hooks into the extended stored procedure xp_sqlmaint to help automate log shipping. Xp_sqlmaint calls the Sqlmaint.exe utility, as do jobs that the Database Maintenance Wizard creates. In fact, the Microsoft BackOffice 4.5 Resource Kit (BORK) Log Shipping utility consists of scripts and instructions that use these hooks into xp_sqlmaint for creating a mostly automated log-shipping solution. This article covers log shipping with SQL Server 7.0, as BORK defines the process. You can get BORK from Microsoft Press for $250 retail (about $130 street price).
Microsoft developed the BORK Log Shipping utility, as it did all its resource kit utilities, for internal use at Microsoft; Microsoft Product Support doesn't support the utility. And you'll find the BORK Log Shipping utility rough around the edges. Although it automates much of the log-shipping process, the utility requires a fair amount of manual setup and administration. However, SQL Server 2000 Enterprise Edition fully supports log shipping and provides a more polished interface than the BORK utility. For a summary of SQL Server 2000's log-shipping functionality, see the sidebar "Log Shipping with SQL Server 2000," page 46. Watch for an upcoming article in SQL Server Magazine that covers SQL Server 2000 log shipping in detail.
How Log Shipping Works
In a log-shipping implementation, failing over to the standby server is a manual process. If you require automated failover, a better solution is Microsoft Cluster Services (MSCS), which automatically fails over one cluster node to another cluster node. However, the clustering solution costs more than log shipping, requires additional specialized hardware, and forces a closer coupling, or dependency, of primary and secondary servers. To set up log shipping, all you need is the BORK utility, enough disk space on your production and standby servers to store the database and transaction log backup files, and a network connection between the servers.
Before jumping into how you set up log shipping, let's quickly look at the five basic steps of the log-shipping process, which Figure 1, page 46, shows.
Perform a full database backup of the production database. To initialize log shipping, you must first perform a full database backup of the production database. The production database must have the Truncate log on checkpoint and Select into/bulk copy database options off so that SQL Server will log all data changes. Transaction log backups are impossible to perform when you have these database options on.
Restore the full backup to the standby server without recovery. You must restore the full database backup to the standby server without recovery—that is, with either the NORECOVERY or the STANDBY option. The NORECOVERY mode prevents users from reading data from the database during the restore. But the STANDBY mode lets users read the data already on the standby server. Using STANDBY mode when you restore the full database backup to the standby server is fine. However, any transaction log restore requires exclusive use of the database, so letting users query a standby database can delay the log-shipping transaction log restore.
Perform transaction log backups to files on the production server. When using the BORK Log Shipping utility, you must name the transaction log backup files according to the convention: dbname_tlog_yyyymmddhhmm.trn (the default for Database Maintenance Plans). The timestamp in the file's name lets the log-shipping process restore the transaction log files in sequence.
Copy the transaction log backup files to the standby server. A SQL Agent job running on the standby server uses xp_sqlmaint to copy the files to the standby server. (I tell you how to set up this job later.)
Restore the transaction log backup files to the standby server without recovery. Another SQL Agent job running on the standby server uses xp_sqlmaint to restore the transaction logs.
Log shipping is very reliable. After you have the previous steps in place—and as long as the transaction log backup chain remains unbroken—log shipping will run indefinitely. However, installing SQL Server service packs and recovering the standby database will interrupt the process and require you to reinitialize the standby server and restart log shipping.
For smaller databases, the reinitializing process goes quickly, but for larger databases, the process can be time-consuming. Still, after you have the log-shipping process running, you can continue to perform full and differential database backups of your production database, in accordance with your usual backup schedule, without interfering with log shipping. You can then archive these database backups without restoring them to the standby server.
Log shipping's only significant vulnerability lies in the transaction log chain. You must apply the transaction logs in sequence, and all restores must be successful, or the process will fail and you'll have to reinitialize the standby server. Ensuring that your backup and restore practices protect the continual log-shipping process is vital.
Installing Log Shipping
Now that you've seen an overview of how log shipping works, let's go back to the beginning. To install log shipping, complete the following six steps.
Perform a full database backup. As I noted earlier, you must make sure that the production server database has the database options Truncate log on checkpoint and Select into/bulk copy off. You then need to perform a full database backup, preferably to a disk file.
Apply the full database backup to the standby server. Restore the full backup to the standby server database without recovery. For example, if you're restoring a database called Pubscopy to a standby server, you can specify either Leave database nonoperational, but able to restore additional transaction logs or Leave database read-only and able to restore additional transaction logs in the Enterprise Manager Restore database dialog box, as Figure 2, page 47, shows. The Leave database nonoperational option is equivalent to the T-SQL command
and the Leave database read-only option is equivalent to the T-SQL command
Create a job that periodically backs up production database transaction logs. Create a job on the production server to regularly back up transaction logs to disk. The easiest way to do this (and the method the BORK Log Shipping utility supports) is to create a Database Maintenance Plan for backing up the production database's transaction logs at a desired interval. Remember to note which directory you save the backups to and whether you specified a directory for each database's transaction log backup file. If you have many databases involved in the log-shipping process, specifying a subdirectory for each database's files (as Figure 3 shows) can help you manage the many backup files the system will produce.
Make sure you give the Database Maintenance Plan a clear, descriptive name—something other than the default Database Maintenance Plan 1—such as Pubscopy log shipping. You can specify that the job run as often as once per minute, but that setting means 60 transaction log files per hour, per database. Although you might be tempted to run the backup job that frequently to keep your server as warm as possible, for large, active databases, a job frequency setting in the range of 5- to 15-minute intervals is more manageable.
Apply the BORK Log Shipping scripts on the standby server. Using Query Analyzer or OSQL, execute the BORK Log Shipping install scripts in the msdb database on the standby server. The first script, instlog.sql, creates three tables in msdb and also creates two stored procedures: one to create the backup (log-shipping) plan and another to add a particular database to the plan. The second script, log_ship_sprocs.sql, creates stored procedures for monitoring log shipping (I cover these stored procedures a little later).
The instlog.sql script creates three tables: backup_movement_ plans, backup_movement_plan_databases, and backup_movement_plan_history. Figure 4, page 48, shows the tables' columns and relationships. Backup_movement_plans contains a plan name, which should be the same as the name of the Database Maintenance Plan you created earlier. The table's primary key is an automatically assigned globally unique ID (GUID) called the plan ID. The table also contains the source and destination directory names for the transaction log files. Backup_movement_plan_databases contains the plan ID along with source and destination database names and status information about the last copy and last load. (Note that the backup_movement_plan_databases table doesn't have a primary key.) The third table, backup_movement_plan_history, contains a history of copy and load events for a given plan, including durations and error messages.
BORK also comes with 46 pages of log-shipping documentation, but many people find some of the information redundant, confusing, or—in the case of some raw output—plain unhelpful. The documentation contains an example; however, the example shows how to set up log shipping between different databases on the same server instead of the more common production scenario of having a production server that has a database named the same as a database on a standby server.
Create a backup movement plan on the standby server. You create a backup movement plan on the standby server by executing the sp_create_backup_movement_plan stored procedure in the msdb database. (BORK's instlog script also provides this stored procedure.) The stored procedure's purpose is to add the backup plan name to the backup_movement_plans table and to create the transaction log copy and load jobs in SQL Agent.
Sp_create_backup_movement_plan's documentation can be misleading and doesn't distinguish required parameters from optional parameters. Here's a more standard description of the stored procedure's syntax:
\[@name =\] plan_name,
\[@source_dir = \] source_directory_name,
\[@dest_dir = \] destination_directory_name,
\[, \[@sub_dir = \] subdirectory_flag (bit) \]
\[, \[@load_job_freq = \] load_frequency (minutes) \]
\[, \[@copy_job_freq = \] copy_frequency (minutes) \]
The following statement adds a backup plan to the table for a database called Pubscopy:
@name = "Pubscopy log shipping",
@source_dir = "\\srv1\d$\mssql7\backup\pubscopy",
@dest_dir = "\\srv2\d$\mssql7\backup\pubscopy",
@sub_dir = 0
The @name, @source_dir, and @dest_dir parameters are required. The plan name must be the same as the name of the Database Maintenance Plan you created on the production server. The directories tell Sqlmaint.exe where to find the transaction log files and where to copy them to. The directories also assume that the standby server's SQL Agent has appropriate read and write permissions to those directories.
The @sub_dir parameter is useful only if you don't supply subdirectory information in the directory parameters. The parameter defaults to 1, and this setting specifies that the SQL Agent jobs can find each transaction log in, and copy each log to, a subdirectory with the database name under the directories you specified. The load and copy frequencies default to 5 minutes, which is a good place to start.
In adding the plan name to the backup_movement_plans table, sp_create_backup_movement_plan obtains a GUID for the plan_id column and populates the plan_name, source_dir, destination_dir, and database_subdir columns with the information you pass to the stored procedure as parameters.
You use the @load_job_freq and @copy_job_freq parameters when creating the SQL Agent copy and load (restore) jobs. These jobs call the xp_sqlmaint extended stored procedure, then pass the procedure the plan name and a parameter identifying whether the action is to copy or to load. Xp_sqlmaint then reads the plan's row in the backup_movement_plans table to determine the source and destination directories.
Add the database to the backup movement plan. After you've created the SQL Agent jobs and the backup_movement_plans table has a row in it, you add a database to the movement plan by executing the sp_add_db_to_backup_movement_plan stored procedure (which the instlog script also supplies). This stored procedure lets you specify the database names for log shipping on the source and destination servers. Typically, these database names will be the same, but the stored procedure lets you specify different names (for example, to implement log shipping between databases on the same server). You can also specify a load delay (how long to wait before restoring) and a retention period (how long to keep restored transaction logs before purging them). The stored procedure's syntax is
\[ \[@plan_id = \] planid (uniqueidentifier)\]
\[,\] \[@plan_name = \] planname,
\[ @source_db = \] source_db_name,
\[ @dest_db = \] destination_db_name
\[, \[@load_delay =\] load_delay (minutes) \]
\[, \[@load_all =\] load_all_flag (bit) \]
\[, \[@source_server =\] source_servername \]
\[, \[@retention_period = \] retention_period (hours) \]
The following statement adds the Pubscopy database to the backup movement plan:
@plan_id = null,
@plan_name = "Pubscopy log shipping",
@source_db = "Pubscopy",
@dest_db = "Pubscopy",
@load_delay = 0,
@load_all = 1,
@source_server = "srv1",
@retention_period = 6
The stored procedure requires either the plan ID or the plan name. I agree with the BORK documentation, which recommends specifying the plan name instead of the GUID for clarity. The source and destination database names are required, although the stored procedure doesn't explicitly test for them. The optional load delay defaults to 0 minutes, but you can force a longer delay before restoring the transaction log backups after a copy.
The optional load_all flag defaults to 1, meaning that each time the job runs, it will try to load all the transaction logs available. This option can help the system catch up with transaction log restores if it falls behind because, for example, you turned off the copy and load jobs for a while or the jobs took longer than expected. The source server name provides information for the backup_movement_plan_history table, and the retention period specifies how long to keep restored transaction logs. Fortunately, the log-shipping process doesn't delete any transaction logs that it hasn't restored, no matter how old the logs are.
After you've executed the sp_add_db_to_backup_movement_plan stored procedure, log shipping will start—unless you've made a mistake in the installation process. In my experience, the most common mistake is sending inaccurate parameter names, such as directories or maintenance plan names, to the stored procedures. But troubleshooting log shipping can be challenging because xp_sqlmaint's error messages aren't always clear. Here's a sample cryptic error message from the transaction log restore (i.e., load) job history:
Luckily, as BORK's Log Shipping documentation says, only a few things can go wrong with log shipping: the transaction log backup plan can fail, the copy job can fail, or the load (restore) job can fail.
The BORK documentation contains a helpful section about troubleshooting log shipping, and the BORK utility provides stored procedures that help you monitor and troubleshoot log shipping. After you apply the BORK instlog.sql script, you need to apply the log_ship_sprocs.sql script, which creates four stored procedures: log_ship_status, log_ship_entity_log, log_ship_history_purge, and log_ship_alert. Although the scripts don't require you to, creating and running these stored procedures in msdb is safest because you don't risk accidentally creating the procedures in another database.
Log_ship_status accepts optional primary server and primary database parameters and returns a set of useful status values about your log-shipping plans. Returned status values include a column called delta, which shows the delay, in minutes, between the current time and the last transaction log restore for a given plan. A large delta can indicate problems, such as a breakdown in the copy or load process.
Log_ship_entity_log returns a snapshot of the backup_movement_plan_history table, filtered by parameters that distinguish the source server and database and the destination server and database. Log_ship_history_purge removes history rows older than the number of days you specify in a parameter; the parameter value must be a number greater than 1. Finally, log_ship_alert generates a SQL Server error log and a Windows NT Application Event Log message when the log-shipping delta (less the load delay) exceeds an alert length that you specify.
These stored procedures can help you track down many log-shipping problems. For example, after log shipping is operational, you might have activity on the production server that makes your transaction logs so large that they don't restore quickly on the standby server. This problem might show up as large delta values from the log_ship_status stored procedure or as a large-delta alert from the log_ship_alert stored procedure.
When you first start using log shipping, you'll probably have to experiment before you fully understand the parameters involved and arrive at the best settings for your environment. And you'll probably have to delete your flawed log-shipping plans and start over. Unfortunately, the BORK utility doesn't provide a delete stored procedure. You can manually delete a log-shipping plan by truncating the tables (truncate backup_movement_plans last because it's part of a foreign key contraint), then manually deleting the jobs, but that method can become tedious. Listing 1 shows a simple stored procedure to delete backup plans and their jobs. To make the listing brief, I left out error handling after the delete statements; you'll want to add your own error-handling routines to the procedure.
Sooner or later, you'll need to use log shipping to fail over. Server failures and server hardware upgrades force a failover. Even service pack installations and other diagnostic activity against a production server can cause you to rely on your standby server. Testing your failover process and documenting the steps so that you and others can perform the process under stressful conditions is essential. Here are the three basic steps for failing over to a warm standby.
Capture the last good transaction log backup from the production server, then apply the backup to the standby server with recovery. If you can get one last transaction log backup from the production server before it goes down, you might be able to fail over to the standby server with almost no data loss. However, if the SQL Server is suddenly unavailable, you'll lose all transactions that weren't finished before the last good transaction log backup.
Apply server logins and database permissions on the standby server to match those of the production server. First, you must add the logins to SQL Server on the standby system or make sure that the logins are already there. Depending on your environment, you can try using the sp_change_users_login stored procedure with the AutoFix option to link the database users in the standby server to the logins. However, Microsoft doesn't recommend this stored procedure in "security-sensitive" situations. (For more information about potential problems, see the Microsoft article "FIX: Sp_Change_Users_Login with Auto_Fix Fails When Run with Local Cursor Database Option Set" at http://support.microsoft.com/support/kb/articles/Q247/2/35.asp.)
A surefire way to transfer logins and permissions is to apply a script just for that purpose. The sp_revokedbaccess stored procedure, which SQL Server 7. 0 Service Pack 2 (SP2) fixed, removes orphaned users from a database. You can then write a script that uses the sp_grantdbaccess stored procedure along with the GRANT command to assign permissions. Don't forget to test the logins to make sure they can connect to and reach the data on the standby server.
Connect to the standby server. Unlike a server in a cluster, the standby server will have a different name and different IP address from the failed production server, so you'll have to change the client applications' data sources. To make this change easier, you can use a connection broker, which is a table or a file that tells the clients which SQL Server system they should connect to.
Speaking from Experience
I've been working with a log-shipping installation in production for more than 9 months on a set of databases holding more than 300GB of data. Some of the databases approach 35GB in size. Large databases present the greatest challenges when you're performing management tasks, including log shipping. Here are some of the things we've learned through experience.
Size matters. Databases larger than, say, 20GB can take a long time to back up, copy, and restore. A 30GB database can take more than an hour to back up and can produce a backup file nearly as large as the database itself. Copying a 30GB file to the standby server can take hours unless you have a fast connection between the servers. Restoring the database across the same link can take much longer than backing up the database, and the copy or restore process might generate extra workload for the production server's I/O subsystem. We found that, over time, setting up log shipping for large databases one by one was easier than trying to set them up all at once. With smaller databases (a few gigabytes or less), setting up multiple log-shipping processes goes quickly and isn't a problem.
Avoid interference. Long-running database backups block scheduled transaction log backups, producing error messages in the SQL Agent jobs. We decided to schedule transaction log backups only outside the database backup window.
The database as a stage. Loading large amounts of data into the production server's databases and performing massive updates on the loaded data often dramatically increased the size of the transaction log backup files. As a result, when we performed log backups during the data-loading and massaging process, the standby server restore jobs spent a lot of time in a rollback state, handling the massive inserts and updates. This situation caused an accumulation of backup jobs on the standby server and often delayed log shipping by up to 24 hours. Our solutions were to load data that needed massaging into a staging database that doesn't need log backups and to shut down the transaction log backup jobs on any other log-shipping databases during large data loads.
No reporting, please! You can manually restore the database and transaction logs to the standby server with STANDBY, which puts the database into read-only mode, or with NORECOVERY, which leaves the database in a non-operating mode. However, in STANDBY mode, if any user is querying the database or maintaining any kind of connection to it, the log-shipping load job can't restore any logs because restoring a transaction log requires exclusive use of the database. So, you might be tempted to restore the standby database with NORECOVERY to prevent users from reading the data. Unfortunately, when xp_sqlmaint restores the transaction logs, it restores them with STANDBY, making the databases read-only. Because our transaction log backup frequency is 5 minutes, we had to implement a policy that users (and developers) can't access the standby server databases.
Multiple databases with STANDBY. When you specify a database restore with STANDBY, you must also specify an undo file name. We found that if we tried to restore more than one database at a time, the databases couldn't use the same undo filename. If they did, the restores failed. We perform our full database restores with NORECOVERY instead, even though the standby databases ultimately end up in read-only mode when xp_sqlmaint restores the transaction logs.
Yet another service pack. When you install a SQL Server 7.0 service pack, all databases must be in a recovered mode. Consequently, to apply a service pack, you must recover the databases, apply the service pack, reinitialize the standby server's databases from a full database backup, then restart log shipping.
Don't detach. Using sp_detach_db on a standby database that is in NORECOVERY or STANDBY mode produces the error message
<i>Error while closing database 'pubscopy' cleanly.</i>
<i>Successfully detached database 'Pubscopy'.</i>
When you then reattach the database, it will be in a recovery mode. Subsequent attempts by the load job to apply transaction log backups will fail. Essentially, using sp_detach_db on a standby database that is in NORECOVERY or STANDBY mode breaks log shipping, and I haven't seen an explanation from Microsoft about the error message. Watch out, because the login that attached the database will also own the attached database.
With a little setup work, you can implement log shipping as an inexpensive and reliable high-availability solution for SQL Server 7.0 that provides nearly realtime disaster recovery. And log shipping is easier to set up and manage in SQL Server 2000. Watch SQL Server Magazine for an upcoming article about log-shipping enhancements in SQL Server 2000.