New tricks for an old tool
When a newfeature is introduced in SQL Server, it usually takes some time before users realize certain aspects of working with the feature. In time, people develop best practices for using the feature and discover interesting, nontrivial things about it. In this article, I discuss some tips concerning the MERGE statement that I discovered after several years of working with the statement. I cover four main tips: preventing MERGE conflicts, understanding that the MERGE ON clause isn't a filter, realizing that the MERGE USING clause is like the FROM clause in a SELECT statement, and, finally, referring to source table elements in the OUTPUT clause.
For the purposes of this article, I use a sample database called TSQL2012. You can download the source code to create the database and populate it with sample data. Some of my examples use the Sales.Customers table as the source table for the MERGE statement. All of my examples use a table called Sales.MyCustomers as the target table. Use the code in Listing 1 to create the table in the TSQL2012 database.
Preventing MERGE Conflicts
One of the classic tasks that's handled with the MERGE statement is update where exists and insert where not exists. Some people refer to this task as upsert. For example, suppose you have a stored procedure that accepts as input parameters the attributes of a customer: @custid, @companyname, @country, @phone. The procedure is supposed to check whether the customer ID already exists in the target table Sales.MyCustomers; if it does exist, update the target row, overwriting the nonkey columns with the new values, and if it doesn't exist, add a new row. This task can be accomplished with the following MERGE statement:
USING ( VALUES( @custid, @companyname, @country, @phone ) )
AS SRC( custid, companyname, country, phone )
ON SRC.custid = TGT.custid
WHEN MATCHED THEN UPDATE
SET TGT.companyname = SRC.companyname,
TGT.country = SRC.country,
TGT.phone = SRC.phone
WHEN NOT MATCHED THEN INSERT
VALUES( SRC.custid, SRC.companyname, SRC.country, SRC.phone );
The problem is that under the default isolation level, read committed, conflicts can occur if two processes execute the procedure at the same time with the same customer ID as input. If the input customer ID doesn't exist in the target table, the MERGE predicate in both cases will return a false, activating in both cases the WHEN NOT MATCHED action: INSERT. In such a case, one of the inserts will result in a primary key violation. To prove this, run the code in Listing 2 from two sessions.
The code uses an infinite loop that invokes the aforementioned MERGE statement. It uses the expression CHECKSUM(SYSDATETIME()) to generate the customer ID. There's a high likelihood that the resulting value doesn't already exist as a customer ID in the target table and that after some iterations both sessions will produce the same value. When I ran this code from two sessions, I got a conflict after a few seconds, and one of the sessions generated the error message in Figure 1.
Note that this isn't considered a bug. The atomicity of the MERGE transaction isn't violated -- it's still an all-or-nothing behavior. According to the read committed isolation level, such conflicts aren't supposed to be prevented. If you want to serialize access to the object throughout the transaction (both the check whether the customer exists and the insertion), you need to use the serializable isolation level. You can achieve this by either setting the session's isolation level to serializable (SET TRANSACTION ISOLATION LEVEL SERIALIZABLE), or by specifying the table hint HOLDLOCK or SERIALIZABLE (both have the same meaning) against the target table.
To test using the serializable isolation level alter the MERGE statement in Listing 2, replace the first line in the statement with the following:
MERGE INTO Sales.MyCustomers WITH (HOLDLOCK) AS TGT
Run the code from two sessions, and this time conflicts shouldn't occur.
The MERGE ON Clause Isn't a Filter
The MERGE statement uses an ON clause to define the MERGE predicate. However, this clause has a different meaning than the ON clause in a join -- which some people don't realize. In a join, the ON clause is used for both matching and filtering purposes. In a MERGE statement, the ON clause is used to identify matches and nonmatches and to accordingly determine which WHEN clause to activate. If you don't realize this, and you think of ON as a filter, you can end up writing MERGE statements with bugs. I'll demonstrate this problem with an example, and I'll provide a solution.
In my example, I'll use the Sales.Customers table as the source for the MERGE statement and the Sales.MyCustomers table as the target. First, run the following code to populate the Sales.MyCustomers table with some rows representing customers from Sweden and Italy:
INSERT INTO Sales.MyCustomers(custid, companyname, country, phone)
SELECT custid, companyname, country, phone
WHERE country IN (N'Sweden', N'Italy');