Different industries load different kinds of data. Telecommunications companies load high volumes of Call Detail Registration (CDR) records, financial trading systems load trades or stock-ticker information, and Web hosters load IIS log files to support click-stream information. All real-life projects include some form of high-volume data loading. In my work with the Microsoft SQL Server Development Customer Advisory Team, I use some basic best practices to help customers in all kinds of industries implement high-volume data loading.

Maybe you need initial data from an outside system such as a mainframe or UNIX text files. Or maybe you must deal with a continuous stream of data—such as a stock ticker, news updates, or airline arrival and departure information—or a queue that receives messages that you need to load into a database table. Let's review a few basics of data loading and look at how to choose the best data-loading option. Then, let's see how to optimize performance during bulk loading in SQL Server 2000.

Data-Loading Options

When you need to load a large amount of data into SQL Server, you must first consider what kind of loading tool to use. When you load data into SQL Server from an external data source, you can choose between using INSERT statements or using a bulk loader such as BULK INSERT or bulk copy program (BCP). Alternatively, you can use Data Transformation Services (DTS) or XML Bulk Load, but these options, which use either the ODBC BCP API or the OLE DB IRowsetFastLoad interface, are subject to the same rules as BULK INSERT and BCP. I focus on BULK INSERT and BCP in this article because they both come with every edition of SQL Server, including Microsoft SQL Server Desktop Engine (MSDE).

What's Wrong with INSERT? Nothing, but...

In most cases, the preferred mechanism for adding new data into a table is the INSERT statement. However, high-volume data loading presents a performance challenge that you can best address by using streaming techniques rather than the traditional method of sending separate statements for each row or set of rows.

When you send an INSERT statement, you effectively send over the wire metadata and data values for each statement or batch of statements. In streaming, you send the metadata once, followed by a continuous flow (stream) of data values. With streaming, the server can set up the receiving internal structures (such as buffers) one time, according to the data types for each column, and continue to accept values in the structures while the stream transfers rows (data values). With an INSERT statement, the server must repeat the initialization process for each statement. Although INSERT statements can reuse execution plans to lower the cost of setting up structures inside the server, a streaming architecture always outperforms a statement-based data-loading architecture.

BULK INSERT and BCP both offer the ability to stream data into SQL Server, so which bulk loader should you use? Your choice depends on two factors: the number of concurrent sessions you need and whether you must perform remote bulk loads (i.e., loads made from a remote machine over the network into a SQL Server database). The main task of a bulk loader is to turn a data source (e.g., a flat file) into a rowset so that the bulk loader can stream the data into SQL Server. The file parser reads the input file in 64KB chunks by using an asynchronous I/O read pattern. Then, the file parser hands the data to the rowset producer, which creates the rowset and streams it out.

At this point, BULK INSERT and BCP differ. BULK INSERT is an internal T-SQL command that runs inside the SQL Server process and address space. BCP doesn't run inside the SQL Server address space or process; it's a separate process that talks to SQL Server through an interprocess communications (IPC) mechanism. Therefore, BCP can run anywhere on the server or the client. As Figure 1, page 37, shows, BULK INSERT sends the rowset directly to the query processor. BCP, however, uses the Tabular Data Stream (TDS) application-level protocol to send the rowset over the wire. SQL Server receives the TDS rowset and turns it into an OLE DB rowset for the query processor.

You can use the SQL Trace tool to confirm this rowset conversion. For example, in Query Analyzer or through OSQL, run the following T-SQL BULK INSERT statement:

BULK INSERT \[sales\] FROM
  'f:\data\sales.char'
WITH (batchsize = 10000,
  datafiletype = 'char', tablock)

When you trace the results in SQL Server Profiler, you get the SQL Trace output that Figure 2 shows, which indicates that the INSERT statement is fed from an IRowset. To run the same test when you're using BCP, run the following statement from the command prompt:

BCP.EXE testdb.dbo.sales IN
  f:\data\sales.char -S
  (local)\dev -T -c -b 10000

Because BCP sends the rowset remotely over TDS, you get the result that Figure 3 shows. The BULK INSERT statement results in the SQL:BatchCompleted event INSERT FROM \{IRowset 0xE0808A05 \}, whereas the BCP.EXE statement results in the SQL:BatchCompleted event INSERT BULK with ().

Besides the difference in implementation (in-process versus remote) between BULK INSERT and BCP, the two techniques differ in scale. The number of BULK INSERT sessions scales linear to the number of available CPUs in the machine running SQL Server. Because BULK INSERT relates to one thread per session in SQL Server, it never scales beyond the number of CPUs in the machine. So on an eight-CPU machine, you can have eight parallel BULK INSERT sessions (assuming you're using a separate User Mode Scheduler—UMS— for each session); if you initiate a ninth session, it won't start until a currently running session finishes. BCP doesn't have this limitation but is limited to network or IPC bandwidth.

Optimizing Performance by Limiting Logging

After you've decided which bulk-loading tool you want to use, you can optimize the bulk load's performance. The best way to improve your bulk-load performance is to minimize transaction logging. Typically, when SQL Server is loading data, the transaction log records each row insert, each page allocation, and each extent allocation. During a high-volume data load, recording all these transactions in the transaction log can seriously impede performance because the transaction log is the throughput bottleneck of your database. If you're adhering to best practices, you've already made sure that the SQL Server transaction log file for the database you're loading into resides on its own spindle or set of spindles and isn't shared with data files or load files. To further optimize performance, you can make SQL Server minimize the number of transaction log records it needs to write by making the inserts non-logged operations.

The term non-logged operation is a bit misleading—SQL Server always writes certain transaction log records so that database changes are durable and recoverable. But SQL Server can minimize the number of transaction log records by writing a record only for each extent allocation. (An extent is eight 8KB pages, or 64KB.) This practice reduces contention on the transaction log and increases overall throughput of the data load.

To best optimize your data-loading performance, when you create your data-loading strategy, be sure you adhere to the following five minimal-logging operations rules:

  1. Set the database recovery model to Bulk-Logged or Simple.
  2. Don't replicate the target table; doing so will create a larger-than-normal transaction log record because the log would need more details to reconstruct each operation when replicating it. The increase in record size will decrease the overall throughput of the transaction log and decrease data-load capacity.
  3. Be sure the target table doesn't contain any triggers because SQL Server uses transaction log information to create the Inserted and Deleted virtual tables inside a trigger. If you have triggers, SQL Server will log inserts as separate transaction log entries; you can't have triggers and have non-logged operations.
  4. Be sure the target table contains no rows or has no indexes associated with it. When a table contains rows but no indexes, SQL Server loads the data on a newly created extent so that it can improve efficiency by writing a transaction log record only for extent allocations. This way, in the case of a failure in which SQL Server needs to roll back a transaction, SQL Server can simply undo the extent allocations without destroying rows that already existed in the table when the load started. When the table contains both rows and indexes, the strategy of loading the data on a new extent makes optimization too complex. When the table doesn't contain rows, SQL Server can lock both the table and the index structures without harming concurrency—nobody will be operating against rows that don't exist.
  5. Use a TABLOCK hint with the bulk-load operation.

When you fulfill these requirements, you can use any bulk-loading tool—including BCP, BULK INSERT, tools that use the OLE DB interface IRowsetFastload (including the DTS Data Pump and XML Bulk Load), or the ODBC BCP API—to load data with minimal logging.

Verify Minimal Logging

To verify that your operation is using minimal logging, you can use one of two techniques, depending on whether you're loading into a heap or a table that contains a clustered index. If you're loading into a heap (i.e., a table without a clustered index), you can easily see whether you're using minimal logging by checking the lock type that SQL Server uses. With minimal logging, SQL Server acquires a table lock of req_mode = 12. Information about the table lock resides in the syslockinfo system table. To access the information, wrap the BULK INSERT statement in a BEGIN TRAN statement, as Listing 1 shows. The code uses the sp_lock stored procedure to load data and show the lock results, returning output similar to that in Figure 4. The presence of the lock type TAB and mode BU indicates that SQL Server is performing a minimum-logged bulk load into a heap.

When you're loading into a table that contains a clustered index, determining whether you're using minimal logging is more difficult. SQL Server doesn't use a BU lock with a clustered index, so you can't use the lock type to validate whether you're using minimal logging in the bulk operation. Instead, you need to examine the number and size of the log records. To do so, use the following query:

SELECT \[# Log Records\]
  = count(*),
  \[# Log Bytes\] = sum
     (\[Log Record Length\])
FROM     ::fn_dblog(NULL, NULL)

For example, our team used the code that Listing 2 shows to run a test of this method. We loaded a native BCP file containing 20,000 rows and 20,280,000 bytes, then changed the variables that influence the amount of logging to simulate different scenarios. We received the results that Table 1 shows, which reveal how changing variables affects logging.

After you've determined how to get the best data-load performance for one session, you can work on scaling out your data-load operation. Even when you know how to optimize one session, you can't simply start running multiple sessions in parallel to speed up your data load. The Customer Advisory Team will cover parallel data loading in a future column.

To maximize your data-loading throughput, you must understand how the transaction log, recovery model, and locking affect your data-load potential. The Customer Advisory Team works to help most of its customers improve their data-loading throughput because these days, almost every project starts with existing data.