Minimally Logged Inserts

Improve your insert statements’ performance

 

Executive Summary:

The amount of logging can affect a SQL Server operation’s performance. Learn which insert methods can benefit from minimal logging, to improve your statements’ performance.

 

SQL Server supports several methods for inserting data into tables, including SELECT INTO, BULK INSERT, bcp.exe, INSERT SELECT FROM OPENROWSET(BULK …), and regular INSERT SELECT. Some of the insert methods perform minimal logging in certain circumstances. Because writes to the transaction log in SQL Server are sequential, the amount of logging can affect an operation’s performance. Operations that perform minimal logging run substantially faster than operations that perform full logging, as well as reduce the load on the disks holding the transaction log.

In this article I describe the insert methods that can benefit from minimal logging and the requirements they must meet. I provide you with tools to determine how much logging was involved in a particular operation and to discover what information was logged. I also cover some important enhancements in SQL Server 2008 in terms of minimally logged insert methods.

Note that SQL Server supports other types of minimally logged operations, such as index and large object operations, but this article focuses on insert methods. Also, SQL Server supports minimal logging for certain insert methods that don’t use T-SQL (e.g., that use SQL Server Integration Services—SSIS, or the bulk-copy API), but this article focuses on methods using T-SQL.

For more information about Bulk Import Optimizations, see the series of blogs written by Sunil Agarwal, who is with the SQL Server development team at Microsoft (blogs.msdn.com/user/Profile.aspx?UserID=13989).

Prior to SQL Server 2008


Prior to SQL Server 2008, only the following insert methods benefitted from minimal logging: SELECT INTO, BULK INSERT, bcp.exe, and INSERT SELECT FROM OPENROWSET(BULK …). Let’s discuss the SELECT INTO statement first, because the other insert methods (which I refer to as the bulk import methods) have different requirements for minimal logging.

One requirement that all insert methods share in order to allow minimal logging is that the database recovery model must be set to SIMPLE or BULK_LOGGED. If you set the database recovery model to FULL, all insert methods will incur full logging. Of course, before you set your database recovery model to something other than FULL, you need to make sure the setting is acceptable for your environment’s recovery needs. For more information about recovery models, see SQL Server Books Online.

SELECT INTO. The only requirement necessary to allow minimal logging for the SELECT INTO statement is to set the recovery model to SIMPLE or BULK_LOGGED. Prior to SQL Server 2008, the regular INSERT SELECT statement couldn’t benefit from minimal logging; therefore, SELECT INTO had an advantage over INSERT SELECT. However, SELECT INTO has several shortcomings compared with INSERT SELECT.

SELECT INTO creates the target table and populates it with data, combining the Data Definition Language (DDL) and the Data Manipulation Language (DML) activities into one statement. SELECT INTO doesn’t give you control over definition of the target table—this is dictated by the queried source. Also, in SQL Server—unlike in some other database platforms—both DDL and DML are transactional. The implication is that while a SELECT INTO statement is running, both the data involved in the DML part and the metadata involved in the DDL part (rows written to system tables) are exclusively locked. So if you try to query the system catalog while a SELECT INTO statement is in progress, and your query attempts to grab locks on metadata rows that are locked by the SELECT INTO transaction, your request will be blocked. Even though a minimally logged SELECT INTO statement can run substantially faster than a fully logged INSERT SELECT statement, the SELECT INTO statement can take a long time to run if you’re dealing with large volumes of data.

Bulk import methods. The requirements for minimal logging for the bulk import methods are different from those for the SELECT INTO statement. In addition, the requirements are somewhat tricky and require close attention to detail. As I already mentioned, the database recovery model must be set to SIMPLE or BULK_LOGGED to allow minimal logging. In addition, the following requirements must be met:

  • The target table must not be marked for replication.
  • The TABLOCK option must be specified.
  • If the target is a B-tree (clustered or nonclustered index), it must be empty. If the target is a heap, it doesn’t have to be empty.

A note regarding the last point: You might have gotten a recommendation to specify a value for the BATCHSIZE parameter to control the number of rows in each batch. Each batch is handled as a separate transaction. Keep in mind, however, that if the target is an empty B-tree to begin with and qualifies for minimal logging, only the first batch will benefit from minimal logging. As far as the other batches are concerned, the target is nonempty. Also, keep in mind that the table itself can be organized as a heap, with nonclustered indexes that are organized as B-trees. So for the same operation, the heap can behave in a certain way in terms of logging and the nonclustered index B-trees can behave differently.

To make things easier for myself, I use the following logical expression to summarize the requirements from the bulk import methods to allow minimal logging prior to SQL Server 2008:

       non-FULL recovery model<br>
AND not replicated<br>
AND TABLOCK<br>
AND (<br>
               Heap<br>
       OR (B-tree AND empty)<br>
       )

 

Before I discuss the enhancements in SQL Server 2008, it’s important to note that while a backup is running, minimal logging is temporarily disabled for the duration of the backup. The operation will still run, but it will perform full logging.

Enhancements in SQL Server 2008


SQL Server 2008 introduces a few important enhancements regarding minimally logged insert methods. The regular INSERT SELECT statement (as opposed to using the BULK rowset provider) can now also be handled with minimal logging. This improvement is important for two reasons. First, you can fully control the target table’s \\[schema\\] definition. Second, unless there are logical reasons for you to put the CREATE TABLE and INSERT SELECT statements in the same transaction, you can run the two in separate transactions. The CREATE TABLE statement will finish very quickly and release all locks on metadata, preventing the problem I described earlier with the SELECT INTO statement. So in SQL Server 2008, a best practice is to use CREATE TABLE with INSERT SELECT instead of SELECT INTO. For example, instead of

SELECT x, y, z INTO TargetTable FROM SourceTable;

 

use

CREATE TABLE TargetTable(x …, y …, z …);<br>
INSERT INTO TargetTable WITH (TABLOCK) (x, y, z)<br>
   SELECT x, y, z FROM SourceTable;

 

Note that the regular INSERT SELECT statement doesn’t support all the options that the other bulk import methods do. For example, it doesn’t support the BATCHSIZE option. Also, when you specify the TABLOCK option you get a full, exclusive table lock, which can be held by only one session at any given moment. With the other bulk import methods, the TABLOCK option represents a bulk update lock that multiple sessions can hold concurrently, supporting parallel loads from multiple clients. Still, the improvement in SQL Server 2008 regarding INSERT SELECT is a big step forward.

In SQL Server 2008 you can also turn on trace flag 610 to introduce new behavior concerning insertions into B-trees. To enable this trace flag, start the SQL Server service with the -T parameter and the trace flag number. Alternatively, run

DBCC TRACEON(610, -1);

 

With this trace flag on, insertions into a B-tree don’t have to use the TABLOCK hint to benefit from minimal logging. Also, insertions into nonempty B-trees can be done in minimally logged mode—at least partially. When new pages are allocated because of insertion of a new key range, the writes to the new pages can be minimally logged. Rows that are inserted into existing pages are still fully logged. An example of a new key range is inserting keys 1000001 through 2000000 into a table that already contains keys 1 through 1000000. To allow minimal logging in this special case, SQL Server needs to protect the key range that is added and does so by using a key-range lock.

Note that although trace flag 610 can improve the performance of certain operations, it might degrade the performance of other operations. You should thoroughly test using this trace flag in your environment to determine whether it improves the performance of your system overall.

Here’s the revised logical expression summarizing requirements for minimal logging in SQL Server 2008 that is applicable to bulk import methods, including the regular INSERT SELECT:

       non-FULL recovery model<br>
AND not replicated<br>
AND (<br>
                (Heap AND TABLOCK)<br>
       OR (B-tree AND empty AND TABLOCK)<br>
       OR (B-tree AND empty AND TF-610)<br>
       OR (B-tree AND nonempty AND TF-610 AND new key-range)<br>
       )

 

Determining the Amount of Logging


Determining when an insert method is handled with minimal logging can be tricky. In addition, whether minimal or full logging was used isn’t always clear—the scenario might be somewhere in between. When in doubt regarding the logging involved with a certain operation, I query an undocumented table function called fn_dblog. This function returns the transaction log records with log serial numbers (LSNs) in the range provided as the two input arguments. If you provide two NULLs as inputs, the function simply returns the whole transaction log of the current database. Unfortunately, the function itself and the output it returns aren’t documented, so you’ll have to rely on your experience and common sense to interpret the output.

The attributes of interest for our purposes are Operation, Context, \\[Log Record Length\\], and AllocUnitName. Operation contains the log operation (LOP) performed; for example, setting bits in a bitmap page (LOP_SET_BITS), modifying a row (LOP_MODIFY_ROW), and so on. Context provides the log context (LCX) for the operation—usually the type of affected object; for example, a GAM page (LCX_GAM), a heap (LCX_HEAP), and so on. By aggregating the \\[Log Record Length\\] attribute you can calculate the size of the data written to the log. And finally, by filtering the AllocUnitName attribute you can focus on the activity related to the specific object of interest.

In order to be able to test logging-related activities, you need a target database, as well as a source table to query. Run the code in Listing 1 to create a database called testlogging and a function called GetNums. The function accepts a number as input and returns a result set with a column called n that holds a sequence of integers in the range 1 through the input value.

The code in Listing 2 contains an example for a test I created that performs a certain insert method, using queries against the fn_dblog function to return information about the logging involved. You can use this code as a basis for your own tests. Simply replace the section in Callout A between the comments Start Preparation and End Preparation with your own code that prepares the environment for the test, and the section in Callout C between the comments Start Operation and End Operation with the code that represents the actual operation for which you’re analyzing the logging behavior.

The code in Listing 2 tests a regular INSERT SELECT statement against an empty heap, run in a database with a SIMPLE recovery model. The preparation part in Callout A sets the database recovery model to SIMPLE and creates a table called dbo.T1. (Note that if you want to run a test in a database with a FULL recovery model, make sure you run a full database backup after setting the RECOVERY option to FULL to make sure the database will get out of log truncate mode.) The code then issues a CHECKPOINT command to make sure that all dirty pages are flushed to disk, enabling truncation of the inactive portion of the log.

The code in Callout B collects current information related to the object before the test is issued: count of log records, total log record length, and current timestamp. The code then opens a transaction to ensure that as long as the transaction isn’t committed, the log records from this point will be part of the active portion of the log, preventing overwriting them.

The code in Callout C holds the actual operation that is the focus of the test. In this case, it’s an INSERT SELECT statement of 1,000,000 rows, each a bit over 2,000 bytes long, against an empty heap, run without the TABLOCK hint. By uncommenting the current comment that appears in the INSERT statement immediately after the table name, you can run the same test with the TABLOCK hint.

The code in Callout D collects aggregated information from fn_dblog again, only this time after the operation, and calculates differences from the measures collected before the operation. The output of the query gives you the count of log records, size written to the log in megabytes, and the duration in seconds.

The code in Callout E generates a histogram with 10 equally sized steps, showing the distribution of log record lengths among the steps. In the output you get the step number (n), the low boundary point of the step (lb), high boundary point (hb), and number of log records with lengths within the current step’s range. By analyzing this histogram, you can tell how many log records of different lengths you have.

The code in Callout F gives you the breakdown of the log records by operation, context, and log record length rounded to the nearest multiplication of 100. Finally, the code commits the transaction.

I ran the code in Listing 2 in SQL Server 2008 without the TABLOCK hint and got the output shown in Figure 1. Then I ran the code with the TABLOCK hint and got the output shown in Figure 2.

Figure 1 clearly shows that full logging took place. The first output shows that a total size of about 2GB (2044.24MB) was written to the log. If you do the math, you can see that it’s reasonable for full logging of 1,000,000 rows, each a bit over 2,000 bytes long. The histogram shown in the second output in Figure 1 tells you that there were about 500,000 very small records, and exactly 1,000,000 log records that are similar to the size of the data record. The breakdown in the third output in Figure 1 tells you that the small log records involve setting the bits of allocation and space usage bitmaps (GAM, IAM, PFS), and formatting of the heap pages that were allocated. The large log records were, of course, due to the fully logged actual row inserts.

Figure 2 clearly shows that minimal logging took place. The first output in Figure 2 shows a total size of only 6.25MB written to the log. The two other outputs tell you that all log records are very small, and that they include only updates to allocation and space usage bitmaps.

Improved Performance


Understanding SQL Server’s treatment of insert methods in terms of logging behavior can help you improve the performance of your insert statements. The requirements for minimal logging are subtle—if you neglect to follow one small requirement, your insert could end up performing full logging. You need to become familiar with the tools that will help you determine the logging that’s involved with the insert method you’re considering.

SQL Server 2008 enhances the methods that it can handle with minimal logging. It supports minimally logged regular INSERT SELECT statements. In addition, turning on trace flag 610 lets SQL Server 2008 support minimal logging against a nonempty B-tree for new key ranges that cause allocations of new pages.

Listing 1: Code to Create the testlogging Database and GetNums Function 

IF DB_ID('testlogging') IS NULL CREATE DATABASE testlogging;

GO

USE testlogging;

IF OBJECT_ID('dbo.GetNums', 'IF') IS NOT NULL DROP FUNCTION dbo.GetNums;

GO

CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE

AS

RETURN

  WITH

  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),

  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

  L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),

  L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),

  L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),

  L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),

  Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS n FROM L5)

  SELECT TOP (@n) n FROM Nums ORDER BY n;

GO

Listing 2: Code to Perform a Test Insert Method

BEGIN CALLOUT A

-- *** Start Preparation *** --

SET NOCOUNT ON;

USE testlogging;

 

ALTER DATABASE testlogging SET RECOVERY SIMPLE;

 

IF OBJECT_ID('dbo.T1', 'U') IS NOT NULL DROP TABLE dbo.T1;

 

CREATE TABLE dbo.T1

(

  col1 INT NOT NULL,

  col2 BINARY(2000) NOT NULL

);

-- *** End Preparation *** --

END CALLOUT A

 

CHECKPOINT;

GO

 

BEGIN CALLOUT B

DECLARE @numrecords AS INT, @size AS BIGINT, @dt AS DATETIME;

 

SELECT

  @numrecords = COUNT(*),

  @size       = COALESCE(SUM(\\[Log Record Length\\]), 0),

  @dt         = CURRENT_TIMESTAMP

FROM sys.fn_dblog(NULL, NULL) AS D

WHERE AllocUnitName = 'dbo.T1' OR AllocUnitName LIKE 'dbo.T1.%';

END CALLOUT B

 

BEGIN TRAN

 

BEGIN CALLOUT C

-- *** Start Operation *** --

INSERT INTO dbo.T1 /*WITH (TABLOCK)*/ (col1, col2)

  SELECT n, CAST(0x01 AS BINARY(2000))

  FROM dbo.GetNums(100000);

-- *** End Operation   *** --

END CALLOUT C

 

BEGIN CALLOUT D

SELECT

  COUNT(*) - @numrecords AS numrecords,

  CAST((COALESCE(SUM(\\[Log Record Length\\]), 0) - @size)

    / 1024. / 1024. AS NUMERIC(12, 2)) AS size_mb,

  CAST(DATEDIFF(millisecond, @dt, CURRENT_TIMESTAMP)/1000. AS DECIMAL(12,3))

    AS duration_sec

FROM sys.fn_dblog(NULL, NULL) AS D

WHERE AllocUnitName = 'dbo.T1' OR AllocUnitName LIKE 'dbo.T1.%';

END CALLOUT D

 

BEGIN CALLOUT E

-- Histogram

DECLARE @numsteps AS INT = 10;

DECLARE @log AS TABLE(id INT IDENTITY, size INT, PRIMARY KEY(size, id));

 

INSERT INTO @log(size)

  SELECT \\[Log Record Length\\]

  FROM sys.fn_dblog(null, null) AS D

  WHERE AllocUnitName = 'dbo.T1' OR AllocUnitName LIKE 'dbo.T1.%';

 

WITH Args AS

(

  SELECT MIN(size) AS mn, MAX(size) AS mx,

    1E0*(MAX(size) - MIN(size)) / @numsteps AS stepsize

  FROM @log

),

Steps AS

(

  SELECT n,

    mn + (n-1)*stepsize - CASE WHEN n = 1 THEN 1 ELSE 0 END AS lb,

    mn + n*stepsize AS hb

  FROM dbo.GetNums(@numsteps)

    CROSS JOIN Args

)

SELECT n, lb, hb, COUNT(size) AS numrecords

FROM Steps

  LEFT OUTER JOIN @log

    ON size > lb AND size <= hb

GROUP BY n, lb, hb

ORDER BY n;

END CALLOUT E

 

BEGIN CALLOUT F

-- Breakdown of Log Record Types

SELECT Operation, Context,

  AVG(\\[Log Record Length\\]) AS AvgLen, COUNT(*) AS Cnt

FROM sys.fn_dblog(null, null) AS D

WHERE AllocUnitName = 'dbo.T1' OR AllocUnitName LIKE 'dbo.T1.%'

GROUP BY Operation, Context, ROUND(\\[Log Record Length\\], -2)

ORDER BY AvgLen, Operation, Context;

END CALLOUT F

 

COMMIT TRAN

 

 

Figure 1: Output from Running Listing 2 without the TABLOCK hint

numrecords  size_mb     duration_sec

----------- --------    -------------

1593768     2044.24     130.603

 

n           lb          hb          numrecords

----------- ----------  ----------  -----------

1           59          264.4       593768

2           264.4       468.8       0

3           468.8       673.2       0

4           673.2       877.6       0

5           877.6       1082        0

6           1082        1286.4      0

7           1286.4      1490.8      0

8           1490.8      1695.2      0

9           1695.2      1899.6      0

10          1899.6      2104        1000000

 

Operation         Context     AvgLen      Cnt

----------------  ---------   ----------- -----------

LOP_SET_BITS      LCX_GAM     60          31253

LOP_SET_BITS      LCX_IAM     60          31253

LOP_MODIFY_ROW    LCX_PFS     80          281253

LOP_FORMAT_PAGE   LCX_HEAP    84          250000

LOP_FORMAT_PAGE   LCX_IAM     84          1

LOP_MODIFY_ROW    LCX_IAM     88          8

LOP_INSERT_ROWS   LCX_HEAP    2096        1000000

Figure 2: Output from Running Listing 2 with the TABLOCK hint

numrecords  size_mb     duration_sec

----------- --------    -------------

94351       6.25        105.123

 

n           lb          hb          numrecords

----------- ----------  ----------- -----------

1           59          63.2        62592

2           63.2        66.4        0

3           66.4        69.6        0

4           69.6        72.8        92

5           72.8        76          0

6           76          79.2        0

7           79.2        82.4        1

8           82.4        85.6        1

9           85.6        88.8        31297

10          88.8        92          368

 

Operation         Context     AvgLen      Cnt

----------------  --------    ----------- -----------

LOP_SET_BITS      LCX_GAM     60          31342

LOP_SET_BITS      LCX_IAM     60          31342

LOP_FORMAT_PAGE   LCX_IAM     84          1

LOP_MODIFY_ROW    LCX_IAM     88          1

LOP_MODIFY_ROW    LCX_PFS     88          31665

Discuss this Article 6

sqlmagbm21
on Aug 15, 2009
I knew about the INSERT INTO performance improvement, but this articls disects and helps quantify one of the many benefits of moving into SQL 2008 from SQL 2005 or older versions.
lovalvob
on Sep 23, 2009
Very useful, but it appears to be a good deal less effective when there are indexes (even non-clustered). Can you comment?
smong
on Oct 1, 2009
I should clarify that all I care to minimally log are data warehouse INSERTs. Thus, I can get away with the simple rule of "...ignoring 'Log Record Length' and simply look at aggregate counts of LOP_INSERT_ROWS."
jvinualzar
on May 10, 2010
you talk about some times performance penalty when using the T-610 but don't specify when or why.

Some guidance?
smong
on Sep 30, 2009
lovalvob... Itzik mentioned TF 610, but his code does not use it. Giving credit where it's due, he also pointed to Sunil's blog, where you may find the following helpful: http://blogs.msdn.com/sqlserverstorageengine/archive/2008/03/23/minimal-logging-changes-in-sql-server-2008-par t-2.aspx TF 610 is responsible for virtually all the magic shown (INSERT INTO non-empty clustered index w/out TABLOCK, yet no meaningful logging!) BTW, I always have to keep in mind that BULK INSERT (my preference) behaves differently than INSERT INTO, with respect to locking. For more on BULK INSERT's behavior, go to part 3 of Sunil's blog post. As Byzantine as minimal logging is, the real trick is detecting minimal logging--or a lack thereof. Itzik's code makes a great contribution in this area. Personally, however, I ignore "Log Record Length" and simply look at aggregate counts of LOP_INSERT_ROWS. Anything less than the number of rows actually inserted means that minimal logging took place...or that you're pulling from a truncated log :)
jvinualzar
on May 10, 2010
Incredible article, as usually, but the links are bloken (figures and listings), aren't them?

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottOur Experts will show you:
• Common SQL Server
Problems
• Best Practices for T-SQL
• SQL Server Integration
Services
• Database Development

Come See Michael Otey & Tim Ford in Person!

Early Registration Now Open

From the Blogs
May 21, 2013
blog

A Common Misconception about MAXDOP

Out of the box, SQL Server is (and has been) able to take advantage of multiple processors/cores without any effort on behalf of administrators....More
May 9, 2013
blog

My ISO 8601-Compliant Signature 2

My family recently just "officially" announced that we're in the process of adopting a child from South Africa. We're quite excited, of course, but there's a ton of paperwork to do—along with the need for gobs of signatures....More
May 8, 2013
blog

Use SSIS for ETL from Hadoop

In this blog post, Mark Kromer walks you through using SSIS as a way to use ETL techniques using Microsoft's Hadoop on Windows (HDInsight) as a source using Hive connectors...More
SQL Server Pro Forums

Get answers to questions, share tips, and engage with the SQL Server community in our Forums.