Some data modification tasks involve merging source data into a target table. Before Microsoft SQL Server 2008 you had to carry out such tasks with multiple modification statements, requiring you to access both the source and the target multiple times, and causing complexities in terms of handling the modifications as an atomic unit. Microsoft SQL Server 2008 adds the standard MERGE statement to simplify and optimize these tasks.
Some data modification tasks involve merging source data into a target table. Such tasks are common in both OLTP and data warehouse environments. A simple example from an OLTP environment is modifying a target Customers table with more recent customer information from a source file. You’d need to overwrite the attributes of existing target customers with the newer information, as well as insert into the target any customers in the source that don’t already exist in the target. Examples from a data warehouse environment include handling slowly changing dimensions and merging fact additions and changes into a target fact table as part of your daily extraction, transformation, and loading (ETL) process.
Before SQL Server 2008 you had to carry out such tasks with multiple modification statements, requiring you to access both the source and the target multiple times, and causing complexities in terms of handling the modifications as an atomic unit. SQL Server 2008 adds the standard MERGE statement to simplify and optimize these tasks.
I’ll use a simple example to help familiarize you with the basic syntax and concepts of the MERGE statement. Suppose you have a target Customers table in your OLTP environment, and you want to merge more recent customer info from a staging table called CustomersStage into the Customers table. Run the code in Web Listing 1 above to create the Customers and CustomersStage tables and populate them with sample data. This code creates the tables in the tempdb database, for demonstration purposes only. (Under normal circumstances, these tables are in a user database.)
Table 1 and Table 2 show the current contents of the Customers and CustomersStage tables. I’ll refer to the Customers table as the target and the CustomersStage table as the source. Updating the Customers table with info from the CustomersStage table involves two actions:
- Identifying customers that appear in both the source and the target (based on a match between the source and target custid values), then updating target customers by overwriting all their attributes (except custid values) with the corresponding source customer attributes.
- Identifying customers in the source that don’t exist in the target (based on the same condition), and inserting those customer rows into the target.
To achieve this task in SQL Server versions prior to 2008, you’d probably use an UPDATE statement based on an INNER JOIN followed by an INSERT statement with a NOT EXISTS predicate or an OUTER JOIN filtering only outer rows. Such a solution would require you to access the tables twice. Also, such a solution would complicate things in terms of handling the operation as an atomic one—more specifically, in avoiding conflicts with other INSERT operations to the target that take place between your UPDATE and INSERT operations.
In SQL Server 2008 you can use the new standard MERGE statement, as the code in Listing 1 shows, to achieve this task simply and efficiently. The semantics of the MERGE statement are very similar to those of an OUTER JOIN, with sections defining the target, the source, and the action to take when a match occurs or doesn’t occur.
The MERGE INTO clause defines the target table (e.g., Customers) and optionally assigns it an alias (e.g., TGT). The USING clause defines the source table (e.g., CustomersStage) and optionally assigns it an alias (e.g., SRC). The ON clause defines the MERGE condition.
In the WHEN MATCHED THEN clause, you define an action to take against the target when the source row has a matching target row based on the ON predicate. In our example, the action is UPDATE— that is, update existing target customers (2, 3, and 5) with the more recent customer info from the source. Notice the syntax of the UPDATE statement is very similar to that of a plain UPDATE statement, with one difference: You omit the name of the target table because you already defined it in the MERGE INTO clause. You can think of the action in this clause as single row operation—one UPDATE per each source row that finds a match in the target based on the ON clause. However, in practice, the MERGE statement is applied as a single set-based operation for all rows.
In the WHEN NOT MATCHED THEN clause, you define an action to take against the target when the source row doesn’t find a matching target row based on the ON predicate. Going back to JOIN semantics, you can think of the operation as source LEFT OUTER JOIN target. The WHEN MATCHED THEN clause defines the action to take in case of matches—namely, with inner rows; the WHEN NOT MATCHED THEN clause defines the action to take in case of non-matches— namely, with outer rows. In our example, the action is INSERT—that is, insert into the target source customer rows that don’t exist in the target (6 and 7). The statement is similar to a plain INSERT statement, with one difference: Similar to the WHEN MATCHED THEN clause, you don’t need to specify the target table name because you already defined it in the MERGE INTO clause.
In order to always start with the same sample data, run the MERGE statement in each example in the context of an explicit transaction, then roll it back when finished (after querying the Customers table to see the changes). For example, run MERGE statements as follows:
BEGIN TRAN; MERGE ...; SELECT * FROM dbo.Customers;
To see the changes that occur, you can compare the output of this code with the contents of the Customers table before the change (which Table 1 shows).
Now, run the full MERGE statement from Listing 1, within the context of a transaction. Table 3 shows the updated contents of the Customers table. The attributes of customers 2, 3, and 5 were updated because those source customers existed in the target. Rows for source customers 6 and 7 were inserted because they existed in the source but not in the target.
OUTPUT Clause and $action Function
If you want your MERGE statement to output the rows that are modified, you can add the OUTPUT clause at the end of the statement. The semantics of the OUTPUT clause when used with the MERGE statement are similar to those when used with INSERT, UPDATE, and DELETE statements. Namely, in the OUTPUT clause you can refer to deleted. | *> and inserted. | *>. When a row is deleted by a DELETE action, deleted attributes in the output clause return attributes from the deleted row, and inserted attributes return NULLs. When a row is inserted by an INSERT action, deleted attributes in the output clause return NULLs, and inserted attributes return attributes from the inserted row. When a row is updated by an UPDATE action, deleted attributes in the output clause return attributes from the old image of the row (before the change), and inserted attributes return attributes from the new image of the row (after the change). You can also refer to the $action function in the OUTPUT clause to return a string representing the action that modified the row (i.e., INSERT, UPDATE, DELETE).
To illustrate using the OUTPUT clause and the $action function, edit the code in Listing 1 by deleting the final semicolon and adding the following text:
OUTPUT $action, deleted.*, inserted.*;
Figure 1 shows the output from running this modified MERGE statement. Note that three rows were updated (customers 2, 3, and 5). Attributes from the deleted table (i.e., the second through fifth columns in Figure 1) show the old image of the rows, and attributes from the inserted table (i.e., the sixth through ninth columns in Figure 1) show the new image. Two rows were inserted (customers 6 and 7). Attributes from the deleted table show NULLs, and attributes from the inserted table show the values in those new rows.
Ready, Set, MERGE
I’ve covered the fundamentals of the MERGE statement. For additional details, including some of MERGE’s more advanced aspects, as well as its restrictions, see the Web-exclusive sidebar “Beyond MERGE’s Basics.”
The new MERGE statement is undoubtedly powerful and elegant. Although my examples illustrate using the MERGE statement in an OLTP environment, you can easily adopt these examples in a data warehouse environment. You’ll probably find MERGE to be useful in both types of environment. In “Grouping Sets, Part 1” (October 2007) and “Grouping Sets, Part 2” (November 2007), I discussed SQL Server 2008’s new features related to grouping sets, and I provided example of using the new MERGE statement to handle incremental updates of aggregates stored in a data warehouse.