Row-level versioning helps SQL Server 2005 manage competition
One new functionality in SQL Server 2005 that I believe will be very useful is row-level versioning (RLV). RLV lets SQL Server track older committed versions of rows and make those versions available to users through several features. SQL Server 2005 uses RLV to implement (among other things) a new isolation level and a variation of an existing isolation level, which provides an optimistic concurrency model. I can't give you the full details this month, but I hope to start you thinking about whether to use this new feature in your SQL Server 2005 applications.
The ability for multiple processes to access or change shared data simultaneously is called concurrency. The more user processes can execute without blocking each other, the greater the concurrency of the database system. Concurrency decreases when a process that's changing data prevents other processes from reading the data or vice versa. Multiple processes attempting to change the same data at the same time also degrades concurrency because they can't all succeed without sacrificing data consistency.
How an application balances concurrency and consistency depends partly on whether the database system is using pessimistic or optimistic concurrency control. Pessimistic control assumes that another user's data modification will likely affect any given read operation. That is, the system assumes a conflict will occur. The default behavior of pessimistic control is to use locks to block access to data that another process is using. Optimistic concurrency control assumes a low likelihood of conflict. It allows simultaneous access to the data and resolves conflicting modifications only when they occur.
Historically, SQL Server's concurrency control model was pessimistic, based on locking. Although locking is still the best concurrency-control choice for most applications, it can introduce significant blocking problems for some applications. The biggest problem arises when locking causes the writer-block-reader or reader-block-writer conflict. If a transaction changes a row, it holds exclusive locks on the changed data. SQL Server's default behavior is to prevent other transactions from reading the row until the writer terminates the transaction by committing or rolling back.
Although their default is to lock the changed data, SQL Server 2000 and 7.0 let you change the transaction isolation level to READ UNCOMMITTED, which the application can request by either setting the isolation level for the connection or specifying the NOLOCK or READUNCOMMITTED table hint. This setting lets a process read the changed data without blocking. But always consider this non-locking scan carefully before using it—it won't necessarily return transactionally consistent results because it's reading the data before the transaction commits.
In versions before SQL Server 2005, you can avoid having writers block readers if you're willing to risk inconsistent data. If your queries must always have access to committed data, you need to wait for changes to be committed (or for transactions to be rolled back). The reader-block-writer situation usually causes contention problems only if a process is in REPEATABLE READ or SERIALIZABLE isolation level and reads data within a transaction. In this case, SQL Server holds the shared locks on the read data until the end of the transaction and blocks any process trying to update the data.
RLV to the Rescue
Even if the application requires results based on committed data, you can obtain committed values in two ways. If the reader needs the latest committed value of the data, the application can use pessimistic concurrency control and readers can wait (on a lock) for writers to complete their transactions and commit their changes. Other times, any committed data values might be sufficient even if they aren't the most recent. In this case, a reader might be fine if SQL Server could provide a previously committed value of the row.
SQL Server 2005 introduces an isolation level called snapshot isolation (SI) and a new non-locking variant of read committed isolation called read committed snapshot isolation (RCSI). Together, these isolation levels are called snapshot-based isolation levels. Both allow a data reader to access a previously committed value of a row without blocking, thereby increasing concurrency. For snapshot-based isolation to work, SQL Server must keep old versions of all updated rows. Because it might need to keep multiple versions of a row, this new behavior is called multi-version concurrency control or row-level versioning.
To support storing multiple versions of rows, SQL Server utilizes additional disk space from the tempdb database, so you'll need to monitor and manage this disk space. Versioning makes any transaction that changes data save the old versions in tempdb so that SQL Server can construct a "snapshot" of the database (or part of it) from the old versions. Be aware that the very act of enabling RLV causes SQL Server to start saving row versions, even if you never read the older data.
As I mentioned, full SI and RCSI both use RLV. You enable them by using the ALTER DATABASE command:
ALTER DATABASE \[database_name\] SET ALLOW_SNAPSHOT_ISOLATION ON
Both options let you issue write operations that don't block readers, and both let readers see older committed versions of the data. However, the two options have some important differences. To demonstrate the differences, let's look at an example. First, create a new database, and allow both variations of snapshot-based isolation in it:
ALTER DATABASE testdb SET ALLOW_SNAPSHOT_ISOLATION ON
ALTER DATABASE testdb SET READ_COMMITTED_SNAPSHOT ON
Then, create a table t like this:
INSERT t VALUES (1,5)
To see both variations of snapshot-based isolation, you need three connections, as Listing 1 shows. This example shows that in versioning, readers (Connections 1 and 2) don't block writers (Connection 3) and vice versa. Versioning lets the transaction on Connection 1 read the value of A as it existed before Connection 3 made the change. However, Connection 2 behaves a bit differently.
The biggest difference between full SI and RCSI is that a transaction using SI accesses the row versions as of the beginning of the transaction. For RCSI, the connection accesses the versions as of the beginning of the statement. So if Connection 2's SELECT were a very long-running statement, possibly joining and aggregating millions of rows to produce a report, all the values it read would be consistent as of the time the SELECT started, no matter how many concurrent updates were happening. The concurrent updates wouldn't block the big SELECT, and the SELECT wouldn't block the updates.
A related issue that will take a lot of application developers and testers by surprise is that the start of the transaction, for the purpose of snapshot isolation, always means the first time the transaction accessed any data. If Connection 1 had issued the BEGIN TRAN statement but hadn't executed the SELECT before Connection 3 updated the data, Connection 1 would see only the updated data because its transaction didn't begin until after the update in Connection 3.
Another difference between SI and RCSI is that you must enable SI at the session level. Thus, a SQL Server 2000 application will require code changes to take advantage of SQL Server 2005's non-blocking read and write operations. RCSI might be considered weaker than SI because RCSI returns only the data that was committed as of the start of the statement, but using RCSI doesn't require any code changes to implement optimistic concurrency. Any connections that would normally run in the default read-committed isolation will run in RCSI once the READ_COMMITTED_SNAPSHOT option is turned on for a database.
One final difference between the two snapshot-isolation variations is how they handle update conflicts. Although readers and writers and don't block each other in snapshot isolation, two writers might still conflict. In the example above, suppose Connection 1 tried to update the row it read that Connection 3 has already updated:
UPDATE t SET A = 15 WHERE K = 1
Connection 1 receives the following error:
Snapshot isolation transaction aborted due to update conflict. You cannot use snapshot isolation to access table 'dbo.t' directly or indirectly in database 'testdb' to update, delete, or insert the row that has been modified or deleted by another transaction. Retry the transaction or change the isolation level for the update/delete statement.
RCSI can't result in update conflicts, so application developers need to include less error checking. And Connection 2 could successfully update any rows from the committed transactions in Connection 3, as could any connection running in the normal read-committed isolation.
The Other Side of the Coin
Although both snapshot-based isolation levels can provide increased concurrency in your applications, you need to carefully consider how and whether to use them. As soon as you set either option for a database, all updates become versioned, so update operations take longer and SQL Server needs space in tempdb. SQL Server 2005 supplies several dynamic management views to monitor the snapshot operations and the row versions.
In addition, SQL Server needs to store more information in each row. To enable access of older versions, each row involved in versioning needs an additional 14 bytes of information. Increasing the row size can affect tables that have little space left on the pages. If many rows grow, pages may have to split, which further degrades performance.
Start now to think about the tradeoffs in using snapshot-based isolation levels. By the time you start upgrading your applications or building new SQL Server 2005 solutions, you'll be able to make the best choice for your systems.