DBAs often have to update rows in tables. When the table is small, you might be able to update the entire table with a simple statement that looks something like
SET SomeColumn = 'something'
For large tables, however, such a simplistic approach is impractical because the transaction size will lock out users.
When you are faced with updating most or all rows in a large table and you want to avoid locking out users, you can use the TOP clause to limit the transaction size. However, there’s a trick to using this clause: You need to keep track of which rows have already been updated so that you don’t update them a second time.
The OUTPUT clause in SQL Server 2005 and later provides a way to expose which rows are affected by a Data Manipulation Language (DML) statement. In SQL Server 2005, you can use the OUTPUT clause with the UPDATE, INSERT, and DELETE statements. In SQL Server 2008, you can also use the OUTPUT clause with MERGE statements. You declare a local table to hold the primary key values of the table being updated, then use the OUTPUT clause to capture the primary key values for the rows that are updated. Listing 1 shows what this code looks like.
For the WHILE loop to start, the @@ROWCOUNT function must return a value greater than 0. As callout A in Listing 1 shows, you can trick the WHILE loop into initially executing by including a meaningless SET statement right before the WHILE condition. After that, the WHILE condition is dependent on the UPDATE statement’s row count. The WHERE clauses in callout B prevent the same row from being updated twice.
The declared table has a clustered unique index (by virtue of the primary key declaration), so the UPDATE operation’s performance should be acceptable. You can tweak its performance by changing the TOP value— the larger the value, the quicker the update process. The tradeoff is that you’re locking more rows for a longer period of time with a larger TOP value.
—Lawrence Rogers, senior consultant, Daugherty Business Solutions