Immediately access just-modified data for further processing
Have you ever needed to modify data and then—in the same transaction—access that just-modified data for further processing? For example, suppose you need to load multiple rows into an Orders table with one INSERT SELECT statement. An identity property automatically generates the order IDs. You need to access the just-loaded rows to get the newly generated order IDs, then use those IDs to generate order details. In SQL Server 2000, such a task is far from simple. SQL Server 2005, however, lets you specify an OUTPUT clause into INSERT, UPDATE, and DELETE statements so that the query returns the modified data. In this article, I describe how to use the new OUTPUT clause for modifications and I provide several practical scenarios in which the clause can be handy.
New OUTPUT Clause
You can specify the new OUTPUT clause in INSERT, UPDATE, and DELETE statements to obtain results such as the IDs I mentioned. Following the OUTPUT clause, you specify the attribute list that you want to return from the inserted and/or deleted table, much like you do in triggers. In the OUTPUT clause, deleted and inserted represent the old and new images of the rows, respectively. For example, if you want to delete rows from an Orders table and return the deleted rows, your code would look like
DELETE FROM Orders OUTPUT deleted.* WHERE ...
You can specify a list of attributes that you need to return (e.g., deleted.orderid, deleted.custid) rather than all the columns. In DELETE statements, you have access only to deleted, in INSERT statements only to inserted, and in UPDATE statements, you have access to both. You can return the output to the caller (client application), as in the above example. By using the INTO clause, you can also direct the output to a table variable, a temporary table, or a real table, as follows:
DELETE FROM Orders OUTPUT deleted.* INTO @DeletedOrders WHERE ...
Note that the target table must exist.
Those are the basics of working with the new OUTPUT clause. For the complete documentation and syntax, please consult SQL Server 2005 Books Online (BOL). In the next section, I cover several examples that show practical uses of DML with results.
The first scenario is, as I mentioned, returning identity values from a multi-row INSERT statement. Run the code in Listing 1 to create the Orders table, which I use in this example. You first need to use an INSERT SELECT statement to load multiple rows into the Orders table. Let's say the following code represents the multi-row INSERT SELECT statement you want to run (don’t run it yet):
INSERT INTO dbo.Orders(custid, orderdate) SELECT 'A' AS custid, '20030101' AS orderdate UNION ALL SELECT 'B', '20030101' UNION ALL SELECT 'A', '20040101' UNION ALL SELECT 'B', '20040101' UNION ALL SELECT 'A', '20050101' UNION ALL SELECT 'B', '20050101';
Your problem is that the identity property attached to the orderid column generates the order IDs, and you need to access the newly generated order IDs to obtain order details. To do so, run the code that Listing 2 shows. This code declares a table variable called @NewOrders, loads the data into the Orders table, and uses the new OUTPUT clause to direct a copy of the new rows to the table variable. You can then access the new order IDs by querying @NewOrders and perform any further processing you need.
Table 1 shows Listing 2's output. Note that in Listing 2's code, there’s no guarantee that rows will be inserted in any specific order. The identity values might get assigned in an order that's different from what you see in Table 1.
Another scenario in which the OUTPUT clause can be handy is archiving deleted data. Suppose you need to delete all orders placed before 2004 and archive those orders in the OrdersArchive table (which you can create by running the code in Listing 3). Using the new OUTPUT clause, you can delete and archive in one step:
DELETE FROM dbo.Orders OUTPUT deleted.orderid, deleted.custid, deleted.orderdate INTO dbo.OrdersArchive(orderid, custid, orderdate) WHERE orderdate < '20040101';
Query the OrdersArchive table after the deletion, and you'll get the output that Table 2 shows, containing all orders made before 2004:
SELECT * FROM dbo.OrdersArchive;
If you want to process deleted orders in the client application, instead of directing a copy to a table, simply omit the INTO clause. For example, to delete orders placed before 2005 and return the data to the caller, run
DELETE FROM dbo.Orders OUTPUT deleted.* WHERE orderdate < '20050101';
The caller will get the output that Table 3 shows.
You can also use the OUTPUT clause in modification statements that join tables, and to return attributes from any table in the join. In my example, I’ll join the existing Orders table to the Customers table (which you can create and populate by running the code in Listing 4).
Suppose you realize that all orders placed by Australian customers have been incorrectly entered with a US EST datetime value in the orderdate. You have to fix the error by incrementing all order datetime values by 14 hours so that they represent the correct time on the Australian eastern coast. You also need to return to the caller the order and customer information, as well as the old and new datetime values. The following UPDATE statement achieves this task and returns the output that Table 4 shows:
UPDATE O SET orderdate = DATEADD(hour, 14, orderdate) OUTPUT inserted.orderid, inserted.custid, C.custname, deleted.orderdate AS olddatetime, inserted.orderdate AS newdatetime FROM dbo.Orders AS O JOIN dbo.Customers AS C ON O.custid = C.custid WHERE C.country = 'Australia';
Notice that the OUTPUT clause refers to deleted and inserted—which represent the old and new images of the modified orders—as well as C, which represents the joined Customers table.
The final scenario I'd like to discuss is event- or message-queue processing. You can develop simple queue-processing applications that use the new OUTPUT clause, along with other T-SQL enhancements. Run the code in Listing 5 to create the MsgQueue table and populate it with sample data. The MsgQueue table represents a queue of messages. Suppose different components of the application periodically load new messages to the table. You need to develop T-SQL code for a component that processes new messages and deletes them from the queue. And you need to develop your code in such a way that you'll be able to run multiple instances of the component to provide load balancing. Here’s the main piece of code that will accommodate multiple processes (that don't conflict with each other) and provide load balancing:
DELETE TOP (1) -- or more than one FROM dbo.MsgQueue WITH(READPAST) OUTPUT deleted.*; -- possibly into table -- Process returned messages here
Each instance of the component deletes one or more messages and returns them to the caller (or to a table) for processing. SQL Server 2005 supports the TOP option for modifications, letting you limit the number of rows to delete. SQL Server 2005 also supports the READPAST option for modifications, letting you skip locked rows instead of getting blocked by other queries.
Simpler and Faster
I suspect that programmers will enthusiastically embrace the ability to generate results from DML. There are so many interesting ways to use this enhancement. For many tasks, code that was once complex and slow can now be rewritten with the OUTPUT clause, thereby reducing complexity and improving performance. I’ve showed four scenarios in which I find the OUTPUT clause extremely useful, but I suspect that I’ve merely scratched the surface.