When SQL Server reads data under both Read Uncommitted and Read Committed isolations, updates running during an index-order scan can result in reading the same row multiple times or skipping rows.
When SQL Server reads data under the Read Uncommitted isolation while another process is inserting data, read-consistency problems can develop. I explained these problems in “SQL Server Read-Consistency Problems, Part 1” and “SQL Server Read-Consistency Problems, Part 2.” Under Read Uncommitted, when all index leaf data needs to be scanned and the execution plan shows Index Scan with Ordered: False, the storage engine might use an allocation-order scan rather than an index-order scan. In such a case, the reader might get the same row twice or skip rows that existed when the scan started, as a result of split operations in the index leaf involving data movement. Under the Read Committed isolation or higher, the storage engine opts for an index-order scan to prevent those inconsistent reads (unless the data can’t be changed—e.g., when using TABLOCK or when the data resides in a READ_ONLY file group or database, where it’s “safe” to use an allocation-order scan).
So, an index-order scan prevents getting the same row twice or skipping rows when insertions are running during the read—but what about updates running during the read? In this article, I show that under both Read Uncommitted and Read Committed isolations, updates running during an index-order scan can result in reading the same row multiple times or skipping rows.
To illustrate how you can get inconsistent reads if updates are running during an index-order scan, I use a simple table called Employees with a clustered index on the salary column. Run the code in Listing 1 to create a sample database called testdb, and within it a table called Employees with four rows. Notice that the Employees table has a filler column that’s 2,500 bytes in size, meaning that a leaf page in the index will accommodate no more than three rows. Because the table has four rows, the clustered index leaf level currently has two pages. The employees D, A, and C with the respective salaries 1,000, 2,000, and 3,000 reside in the first page in the index leaf level’s linked list, and employee B with the salary 4,000 resides in the second page.
Suppose you need to query all rows in the Employees table, and SQL Server’s storage engine uses an index-order scan (reading the index leaf pages in linked list order). This can be the result of a request for data sorted by salary, or simply a request for all rows without an ORDER BY clause running under the Read Committed isolation or higher when the storage engine opts for an index-order scan to prevent inconsistent reads resulting from insertions. SQL Server can also use a partial-order scan at the leaf level of the index when you ask for a range of salaries.
But to simplify the explanation, suppose you query all rows and SQL Server uses an index-order scan. If you query the data under Read Uncommitted, no shared locks are acquired during the read. If you query the data under Read Committed, shared locks are acquired but are released as soon as you’re done reading the resource (i.e., the row or page, depending on the granularity of lock SQL Server decided to acquire). In other words, once your SELECT operation is done reading a row under both Read Uncommitted and Read Committed, no shared locks are held on the rows SQL Server read, even if the read is still in progress. This means that while the read is in progress, other processes can update rows that were already read, as well as rows that the read hasn’t yet reached.
Suppose that when your SELECT operation starts, it reads the rows in the first page in the linked list, returning employees D, A, and C with their respective salaries 1,000, 2,000, and 3,000. Before SQL Server reads the next page, another process modifies the salary of employee D from 1,000 to 5,000. This update will cause the row to move to the second page because of its new key value. Next, the read moves to the second page, returning employee B with the salary 4,000 but also employee D again with the new salary 5,000. In other words, employee D is returned twice, once with the salary 1,000 and once with the salary 5,000.
At this point employees A and C reside in the first page (with salaries 2,000 and 3,000) and employees B and D reside in the second page (with salaries 4,000 and 5,000). Suppose you start another SELECT operation. SQL Server reads the first page, returning employees A and C. Before SQL Server moves to the second page, another process modifies the salary of employee D back to 1,000, so the row moves from the second page to the first page. Your read moves to the second page and finds only employee B. So your query returns only employees A, C, and B, skipping the row of employee D.
To prove that you can get a row for the same employee twice and that you can skip an employee altogether, let’s open two connections (call them connection 1 and connection 2). Then, run the code in Listing 2 in connection 1 and the code in Listing 3 in connection 2. The code running in connection 1 executes an infinite loop that updates the salary of employee D from its current value to 6,000 minus the current value; namely, the salary will switch between the values 1,000 and 5,000 in each iteration of the loop. This in turn causes the row to move back and forth between the first and second pages of the index.
The code running in connection 2 queries all rows from Employees, using a SELECT INTO statement and populating a temporary table with the data that was read. Because the code runs under the Read Committed isolation, the storage engine uses an index-order scan. The code breaks from the loop as soon as the query finds five rows in the temporary table, and returns those rows in the output. Of course this result means that a row was read more than once. Table 1 shows the output I got from running Listing 3.
To prove that your query can skip rows, simply revise the number in the IF statement from 5 to 3 and run the test again. This time the code in connection 2 will break from the loop as soon as the query finds only three rows in the Employees table, meaning that one employee was skipped. Table 2 shows the output I got from running Listing 3 with a revised IF statement.
Several workarounds are available if you don’t want to allow your queries to skip rows or return rows multiple times. One option is to run the SELECT statement with the TABLOCK hint, which prevents other processes from changing data while a read is running. Of course, you need to consider the effect on concurrency—attempts to modify data during a read will be blocked.
Another option is to use a higher isolation level, such as Repeatable Read or Serializable. This method prevents getting the same row twice because shared locks are kept until the end of the transaction, so an update of a row that was already read will be blocked. However, rows that you haven’t yet read can still be modified and therefore skipped during a SELECT statement running under Repeatable Read because they aren’t yet locked. SQL Server uses keyrange locks in the Serializable isolation, which prevents skipping rows. Again, you need to consider the effect on concurrency to determine whether this workaround is worthwhile.
In SQL Server 2005, both snapshot-based isolations— Snapshot and Read Committed Snapshot—prevent a read from getting the same row multiple times or skipping rows. These problems are prevented because both isolations "freeze" a snapshot of the data in time by recording row versions in the tempdb database. The Read Committed Snapshot isolation is adequate when the original isolation the read was running under was Read Committed or Read Uncommitted. Read Committed Snapshot is logically more similar to Read Committed, whereas Snapshot is more logically similar to Serializable. In addition, Read Committed Snapshot uses less overhead than Snapshot. Of course you need to thoroughly test these isolations in your environment before you use them in production because versioning will create some overhead on the tempdb database.