Real-world databases on the SQL Server platform continue to get larger, often ranging into multiple terabytes in many production environments. There has also been a trend toward consolidating SQL Server environments to reduce costs and wasted resources. Consolidated environments consist of multiple databases on a single instance, multiple instances on one server, or virtualized servers. These very large databases (VLDBs) and consolidated environments present their own performance and management challenges, one of which is ability to effectively provision for disaster recovery within the business service level agreements (SLAs) and maintenance windows. To help address some of these challenges, Microsoft introduced backup compression initially in SQL Server 2008, Enterprise Edition, then in SQL Server 2008 R2, Standard Edition. Backup compression provides significantly smaller and faster backups and restores in most environments.
Compression requires less disk space to store backups (and therefore the ability to store more backups locally), less network bandwidth to move backups around, less tape storage for offsite archive, and faster backups to reduce pressure on maintenance windows. Crucially, compressed backups also result in faster restores, which often have to be performed off hours in a high-pressure disaster recovery situation. However, if the databases contain precompressed data (e.g., BLOB formats like .pdf and .jpg files), the benefits of compression can be very limited if not negligible. In addition, compressed backups come with a CPU overhead, which can be a significant problem in SharePoint environments, for example.
SQL Server includes functionality for differential backups, which can provide additional disk and time savings for VLDBs and consolidated environments, particularly when they contain data that doesn’t compress effectively. DBAs have been reluctant to use differential backups in the past, but it’s a practice that DBAs need to understand and implement as databases grow larger and business SLAs can no longer be met. To this end, we’ll discuss the concepts behind differential backups, how to take and restore differential backups, and when to use them.
What Is a Differential Backup?
The key to establishing a solid differential backup strategy is the ability to have confidence and reliability in the base of the differential, or differential base. A differential base is a full, file, or file group backup of the database. It’s required for differential backups. Backup options such as Checksum and Restore with Verify are useful to build confidence in the differential base’s reliability.
There are two distinct types of differential backups: single-base (which is a differential backup that’s based on a single full backup) and multibase (which is a differential backup that’s based on different file or file group backups). The most commonly used type is the single-base differential backup. You must run a single-base differential backup on databases operating under the simple recovery model. If you try to run a multibase differential backup, you’ll receive an error and the differential backup will fail.
You can run either a single-base differential backup or multibase differential backup on databases operating under the full recovery model. (You can also run a multibase differential backup on bulk-logged recovery model databases that meet specific criteria.) However, multibase differential backups are very complex to administer and maintain. For this reason, Microsoft generally recommends that they be avoided (see “Working with Multibase Differential Backups”).
Differential backups are totally independent from the transaction log backups and won’t interrupt the log sequence number (LSN) chain or log shipping. Implementing differential backups can greatly reduce the number of required restores, as the volume of transaction log backups could be large over a week. Restoring a differential backup will bring the database up-to-date more quickly.
Copy-only backups don’t affect differential backups, so it’s possible to use the WITH COPY_ONLY option to perform full backups without interrupting normal differential backup management. A copy-only full backup can’t be used as a differential base for restores.
SQL Server allocates space in extents. An extent is eight physically consecutive pages of 8K storage. SQL Server maintains one differential bitmap page for every 64,000 extents, which is called a Global Allocation Map (GAM) interval. The differential bitmap page contains a bit for every extent in each GAM interval. Adding data to or changing data in a database changes the extent allocation. For each extent changed since the last full backup, the bit is set to 1 in the differential bitmap page. Only when a full backup is taken will all the values be reset to 0. When a differential backup is taken, the differential bitmap pages are queried to identify which extents need to be backed up. Even if only one page in an extent has changed, the differential backup will back up the extent containing it.
Differential backups are supported for read-only and read/write databases operating under the full, bulk-logged, or simple recovery model. They work on FILESTREAM data files and full-text catalogs (except on FAT32 file systems) so they can potentially add value in most backup scenarios. Differential backups aren’t supported for the master database.
The benefits of using differential backups can be huge. Most databases rarely change more than 3 percent to 5 percent per day, so the differential backup file (with or without backup compression) will be significantly smaller in size than a full backup file and will slowly increase in size each day as the data changes.
Information about differential backups is stored in the sys.database_files catalog view and in sys.master_files catalog view. If the database is read-only then only the sys.master_files catalog view is updated.
The code in Listing 1 (below) shows the time and LSN of the most recent differential base for each database. You can use this code to verify the existence of a differential base and how recently it was taken and evaluate whether it’s suitable to commence differential backups.
The code retrieves information from the following columns in sys.master_files:
- differential_base_lsn. This column contains the last LSN of the current base backup. Data extents changed after this LSN will be included in the next differential backup.
- differential_base_guid. This column contains the unique identifier of the current base backup on which the next differential backup will be based.
- differential_base_time. This column contains the date and time of the current base backup.
If a full backup has never been taken for a database, these values will be NULL.
How to Implement a Differential Backup Strategy
To demonstrate the differential backup process, we have chosen to implement a single-base differential backup on a database operating under the full recovery model. Figure 1 shows a typical backup strategy in which full and transaction log backups are performed. The automated backup job takes a full backup nightly at 6 p.m. and subsequent log backups every 3 hours. In a 24-hour cycle, one full backup file and seven transaction log backup files are created.
Suppose you want to change the strategy by replacing the daily full backups, excluding Sunday, with differential backups, as shown in Figure 2. The first step in implementing this differential backup strategy is to create the differential base with the BACKUP DATABASE command:
Next, you need to perform the incremental transaction log backups by running the BACKUP DATABASE command:
In this command, (x) indicates a unique monotonic number in the log backup set. We recommend that you use a unique monotonic number because appending backups to a single file can cause confusion and increase the possibility of accidently overwriting appended log backups.
The last step is to perform the differential backups. You need to add the WITH DIFFERENTIAL option to the BACKUP DATABASE command:
WITH DIFFERENTIAL, INIT
How to Restore Differential Backups
To demonstrate how to restore differential backups, let’s suppose that a system failure occurred at 2:30 p.m. on Tuesday. The first thing you’d want to do is perform a tail log backup, which is a transaction log backup that includes the portion of the log that hasn’t previously been backed up (known as the tail end or the active part of the log). The tail log backup doesn’t truncate the log and is used when you need to bring the database back to the point in time at which it failed.
To initiate a tail log backup, run the BACKUP LOG command:
(Note that a tail log backup isn’t possible when a database is running under the BULK_LOGGED recovery model and there has been a minimally logged operation performed since the previous log backup.)
Next, you need to restore the full backup using the RESTORE DATABASE command with the NORECOVERY option:
WITH REPLACE, NORECOVERY
The NORECOVERY option allows other backups to be applied. The best practice is to use the NORECOVERY option on every restore, then use a RESTORE statement with the RECOVERY option to bring the database online.
Although the command to restore the full backup uses the REPLACE option to overwrite the existing database, using this option in a production environment can be extremely dangerous. In a production environment, we recommend that you restore the backup on a new database to prevent the possibility of overwriting the database with a corrupt backup.
Now you need to restore the latest differential backup (the one taken at 6 p.m. on Monday) using the RESTORE DATABASE command with the NORECOVERY option:
After that’s done, you can restore the transaction log backups that were taken after the differential backup. So, for this example, you need to restore the log backups taken at 9 p.m. on Monday and at midnight, 3 a.m., 6 a.m., 9 a.m., and noon on Tuesday. The transaction log backups can be restored more quickly through SQL Server Management Studio (SSMS), as it will restore all of the log backups in sequence in a single action. Listing 2 (below) shows the code to run in SSMS. As you can see, you need to use the RESTORE LOG command with the NORECOVERY option.
At this point, you need to restore the tail log backup using the RESTORE LOG command with the NORECOVERY option to bring the database up to the point in time that the failure occurred (2:30 p.m. on Tuesday):
Finally, you can bring the database online using the RESTORE statement with the RECOVERY option:
When to Use Differential Backups
Many DBAs are reluctant to use differential backups because they don’t know when it’s best to use them. Here’s a quick initial guide that can help you make the decision:
- If your databases are small or compress effectively enough so that your full and transaction log backups fall within storage and SLA limits, differential backups are unnecessary.
- If your databases change a lot between backups, you might as well perform full backups.
- If the changes to your database are few and the transaction log backups would take longer to restore than the differential backups, using differential backups might make sense and are worth investigating.
If you’re unsure about how much data gets changed in your databases, there’s an undocumented command and a script that you can use to find out. You can then use this information to estimate the size of the differential backup. For more information about the undocumented command and script, see the sidebar “Determining How Much Data Has Changed.”
Another reason why some DBAs are reluctant to use differential backups is that they can complicate the backup management process. When the amount of data change (and therefore the size of the differential backups) is unpredictable, it’s more difficult to judge whether taking full and transaction log backups will be more effective than including differential backs.
How Differential Backups Stack Up
To determine the effectiveness of the differential backup process, we tested three backup scenarios, evaluating them on:
- The weekly disk space required to support the backup strategy
- The maximum number of restores required to recover the database back to the point in time at which it failed
The customer database we used for these tests was 156GB and backed up using SQL Native backup without compression. Note that for each scenario, the daily transaction log backup consisted of interim transaction log backups, which were taken at 3-hour intervals. The interim transaction log backups were retained for point-in-time recovery. A tail log backup was also taken during the restore process in each scenario. The maximum number of restores required includes the restore of the tail log backup.
Here are the three backup scenarios and their results:
Backup scenario 1. In this scenario, we scheduled daily full backups and daily transaction log backups every three hours, using a seven-day rolling retention period. As Table 1 shows, the current storage requirement for this scenario is 1336.5GB, which will rise as the database grows. However, the maximum number of restores required remains static.
Backup scenario 2. In this scenario, we scheduled a weekly full backup and daily transaction log backups every three hours, using a seven-day rolling retention period. As Table 2 shows, there’s a low 380GB storage requirement. Because the full backup is produced only once on Sunday and is retained to the following Sunday when it’ll be overwritten, this mechanism isn’t subjected to daily 156GB increments in storage. However, the time needed to apply maximum number of restores would devalue the storage gain.
Backup scenario 3. In this scenario, we scheduled a weekly full backup, daily differential backups, and transaction log backups every three hours, using a seven-day rolling retention period. The previous day’s differential backup is deleted after the current day’s differential backup is successfully completed and verified. As Table 3 shows, the differential backups reduced the maximum number of restores compared to scenario 2 and reduced backup storage space compared to the daily full backups in scenario 1.
- Scenario 1 versus scenario 2: Although Scenario 1 requires fewer restores, it requires significantly more disk space to retain the backups.
- Scenario 1 versus scenario 3: Although Scenario 3 requires about the same number of restores, it requires about 60 percent less disk space to retain the backups.
- Scenario 2 versus scenario 3: Although Scenario 2 requires less disk space, the number of restores as the week progress gets prohibitively higher.
Backup compression can provide additional disk space savings for full, transaction log, and differential backups, so the numbers in Table 3 might improve across the board. However, as we mentioned previously, compression isn’t available in all version and editions of SQL Server (which is why we didn’t include it in the scenarios) and its effectiveness can be limited in databases containing precompressed data.
When the Next Full Backup Should Be Taken
The next issue to face the DBA is when the next full backup should be taken, because without management, the rolling differential backups can soon exceed the size of the differential base. It’s important that you remember the reasons for using differential backups: to reduce the amount of storage space required to store the backups and to provide faster restores. For example, in scenario 3, the differential backup is more than half the size of the full backup, in which case the disk-space savings from using differential backups is no longer worth the increase in restore times. So, you can begin with a new differential base backup the following week.
There are certain circumstances in which this evaluation time isn’t sufficient to ensure that differential backups remain small enough to be managed effectively. If the database suddenly changes significantly during the week because of a large data load, large index defragmentation operation, widespread data updates, or another event, then an unscheduled full backup would be required. This high degree of monitoring and management is one of the main drawbacks preventing DBAs from rolling out differential backup solutions. Some third-party SQL Server backup solution providers have already included dynamic management functionality to address this drawback.
A Valuable Option
Differential backups have been present in SQL Server since the early days, although they appear to be a forgotten functionality. Differential backups are very powerful and, if used, can be a crucial part of the DBA's storage management strategy. Easily integrated into any backup scenario, DBAs can instantly realize savings in both the storage required for backups and the associated costs per gigabyte. Differential backups are an option for any DBA, as the functionality is supported in all product versions. When combined with the native compression in SQL Server 2008 R2 Standard Edition and SQL Server 2008 Enterprise Edition, differential backups can provide an even greater savings on the storage cost per gigabyte.
Listing 1: Code to Check the Most Recent Differential Base
mf.name as DatabaseFileName,
FROM sys.master_files mf
JOIN sys.databases d on mf.database_id = d.database_id
-- Exclude the master and tempdb databases.
WHERE mf.database_id not in ('1','2')
-- Exclude log files.
AND mf.file_id <> '2'
Listing 2: Code to Restore the Transaction Log Backups
RESTORE LOG Orders FROM DISK='d:\backups\Orders2.trn' WITH NORECOVERY GO
RESTORE LOG Orders FROM DISK='d:\backups\Orders3.trn' WITH NORECOVERY GO
RESTORE LOG Orders FROM DISK='d:\backups\Orders4.trn' WITH NORECOVERY GO
RESTORE LOG Orders FROM DISK='d:\backups\Orders5.trn' WITH NORECOVERY GO
RESTORE LOG Orders FROM DISK='d:\backups\Orders6.trn' WITH NORECOVERY GO