Executive Summary:
When SQL Server reads data under both Read Uncommitted and Read Committed isolations, updates running during an index-order scan can result in reading the same row multiple times or skipping rows.
|
When SQL Server reads data under the Read
Uncommitted isolation while another process
is inserting data, read-consistency problems
can develop. I explained these problems in “SQL Server
Read-Consistency Problems, Part 1” and “SQL Server Read-Consistency
Problems, Part 2.” Under Read Uncommitted, when all index leaf
data needs to be scanned and the execution plan shows
Index Scan with Ordered: False, the storage engine might
use an allocation-order scan rather than an index-order
scan. In such a case, the reader might get the same row
twice or skip rows that existed when the scan started,
as a result of split operations in the index leaf involving
data movement. Under the Read Committed isolation or
higher, the storage engine opts for an index-order scan to
prevent those inconsistent reads (unless
the data can’t be changed—e.g., when
using TABLOCK or when the data
resides in a READ_ONLY file group
or database, where it’s “safe” to use an
allocation-order scan).
So, an index-order scan prevents
getting the same row twice or skipping
rows when insertions are running
during the read—but what about
updates running during the read? In
this article, I show that under both
Read Uncommitted and Read Committed
isolations, updates running
during an index-order scan can result
in reading the same row multiple times
or skipping rows.
Theory
To illustrate how you can get inconsistent
reads if updates are running
during an index-order scan, I use a
simple table called Employees with a
clustered index on the salary column. Run the code in
Listing 1 to create a sample database called testdb, and
within it a table called Employees with four rows. Notice
that the Employees table has a filler column that’s 2,500
bytes in size, meaning that a leaf page in the index will
accommodate no more than three rows. Because the table
has four rows, the clustered index leaf level currently has
two pages. The employees D, A, and C with the respective
salaries 1,000, 2,000, and 3,000 reside in the first page in
the index leaf level’s linked list, and employee B with the
salary 4,000 resides in the second page.
Suppose you need to query all rows in the Employees
table, and SQL Server’s storage engine uses an index-order
scan (reading the index leaf pages in linked list order). This
can be the result of a request for data sorted by salary,
or simply a request for all rows without an ORDER BY clause running under the Read Committed isolation or
higher when the storage engine opts for an index-order
scan to prevent inconsistent reads resulting from insertions.
SQL Server can also use a partial-order scan at
the leaf level of the index when you ask for a range of
salaries.
But to simplify the explanation, suppose you query
all rows and SQL Server uses an index-order scan. If
you query the data under Read Uncommitted, no shared
locks are acquired during the read. If you query the data
under Read Committed, shared locks are acquired but
are released as soon as you’re done reading the resource
(i.e., the row or page, depending on the granularity of
lock SQL Server decided to acquire). In other words,
once your SELECT operation is done reading a row
under both Read Uncommitted and Read Committed, no
shared locks are held on the rows SQL Server read, even
if the read is still in progress. This means that while the
read is in progress, other processes can update rows that
were already read, as well as rows that the read hasn’t yet
reached.
Suppose that when your SELECT operation starts, it
reads the rows in the first page in the linked list, returning
employees D, A, and C with their respective salaries 1,000,
2,000, and 3,000. Before SQL Server reads the next page,
another process modifies the salary of employee D from
1,000 to 5,000. This update will cause the row to move
to the second page because of its new key value. Next,
the read moves to the second page, returning employee B
with the salary 4,000 but also employee D again with the
new salary 5,000. In other words, employee D is returned
twice, once with the salary 1,000 and once with the salary
5,000.
At this point employees A and C reside in the first
page (with salaries 2,000 and 3,000) and employees B
and D reside in the second page (with salaries 4,000 and
5,000). Suppose you start another SELECT operation.
SQL Server reads the first page, returning employees A
and C. Before SQL Server moves to the second page,
another process modifies the salary of employee D back
to 1,000, so the row moves from the second page to the
first page. Your read moves to the second page and finds
only employee B. So your query returns only employees
A, C, and B, skipping the row of employee D.
Proof
To prove that you can get a row for the same employee
twice and that you can skip an employee altogether, let’s
open two connections (call them connection 1 and connection
2). Then, run the code in Listing 2 in connection
1 and the code in Listing 3 in connection 2.
The code running in connection 1 executes an infinite
loop that updates the salary of employee D from its current
value to 6,000 minus the current value; namely, the
salary will switch between the values 1,000 and 5,000 in
each iteration of the loop. This in turn causes the row to
move back and forth between the first and second pages
of the index.
The code running in connection 2 queries all rows
from Employees, using a SELECT INTO statement
and populating a temporary table with the data that was
read. Because the code runs under the Read Committed
isolation, the storage engine uses an index-order scan.
The code breaks from the loop as soon as the query finds
five rows in the temporary table, and returns those rows
in the output. Of course this result means that a row was
read more than once. Table 1 shows the output I got from
running Listing 3.
To prove that your query can skip rows, simply revise
the number in the IF statement from 5 to 3 and run the test
again. This time the code in connection 2 will break from
the loop as soon as the query finds only three rows in the
Employees table, meaning that one employee was skipped.
Table 2 shows the output I got from running Listing 3 with
a revised IF statement.
Workarounds
Several workarounds are available if you don’t want to
allow your queries to skip rows or return rows multiple times. One option is to run the SELECT statement with
the TABLOCK hint, which prevents other processes from
changing data while a read is running. Of course, you need
to consider the effect on concurrency—attempts to modify
data during a read will be blocked.
Another option is to use a higher isolation level, such as
Repeatable Read or Serializable. This method prevents
getting the same row twice because shared locks are kept
until the end of the transaction, so an update of a row that
was already read will be blocked. However, rows that you
haven’t yet read can still be modified and therefore skipped
during a SELECT statement running under Repeatable
Read because they aren’t yet locked. SQL Server uses keyrange
locks in the Serializable isolation, which prevents
skipping rows. Again, you need to consider the effect on
concurrency to determine whether this workaround is
worthwhile.
In SQL Server 2005, both snapshot-based isolations—
Snapshot and Read Committed Snapshot—prevent a read
from getting the same row multiple times or skipping rows.
These problems are prevented because both isolations
“freeze” a snapshot of the data in time by recording row
versions in the tempdb database. The Read Committed
Snapshot isolation is adequate when the original isolation
the read was running under was Read Committed or Read
Uncommitted. Read Committed Snapshot is logically
more similar to Read Committed, whereas Snapshot is
more logically similar to Serializable. In addition, Read
Committed Snapshot uses less overhead than Snapshot.
Of course you need to thoroughly test these isolations
in your environment before you use them in production
because versioning will create some overhead on the
tempdb database.
End of Article