I'm responsible for some old T-SQL code, and I've noticed that a few queries return incorrect result sets. The queries I'm having trouble with all use outer joins and the non-ANSI syntax for expressing them, which was required in SQL Server 6.5—before ANSI-compliant outer-join syntax was added. For example, the following query against the Northwind sample database returns incorrect results:
- SELECT Customers.CustomerId
- FROM customers, orders
- WHERE customers.Customerid *= orders.CustomerId
- AND orders.CustomerId IS NULL
- ORDER BY Orders.CustomerId
I need to see a list of customers who don't have any orders, but this query returns all 91 customers, even though most of them have a record in the Orders table. How can I get the correct result set?
SQL Server lets you write an outer join two different ways. You can use the old syntax, which is proprietary to SQL Server, or you can use the new ANSI-compliant syntax for expressing the join. The old syntax places the JOIN expression in the WHERE clause and uses an asterisk (*) to indicate which table is the preserved table in the outer-join relationship. The new syntax places the JOIN expression in the query's FROM clause and explicitly uses the OUTER keyword, as the following query shows:
SELECT Customers.CustomerId FROM customers LEFT OUTER JOIN orders ON customers.CustomerId = orders.CustomerId WHERE orders.CustomerId IS NULL
You're probably familiar with how syntax varies depending on whether the JOIN expression is in the WHERE clause or the FROM clause. In most cases, each query will produce an identical, and correct, result set. However, you might not be aware of problems that can occur when you combine outer joins with an IS NULL check in the WHERE clause.
Let's examine the first query to understand why SQL Server returns an incorrect result set. Your goal in this query is to join the Customers table to the Orders table, then retrieve all records where the value for CustomerId in the Orders table is NULL. However, instead of getting the correct result of two rows, you get 91. You get the incorrect result because the OUTER JOIN condition is in the WHERE clause instead of the FROM clause, causing SQL Server to perform the IS NULL check against orders.CustomerId before the join. To get the answer you want, you need to test for IS NULL after the join. But if you use the old join syntax, SQL Server evaluates the test before it processes the join. Because no rows in the Orders table have a NULL value for CustomerId, SQL Server eliminates all the Orders table's rows before it processes the outer join. So when SQL Server processes the join, it doesn't see any rows in the Orders table and, thus, incorrectly returns all customers as having no orders.
The ANSI syntax for expressing a join removes the ambiguity about when the IS NULL check should happen. When you use the ANSI syntax, SQL Server evaluates JOIN expressions before WHERE conditions in all cases. I encourage you to always place JOIN conditions in the FROM clause instead of in the WHERE clause, especially when using outer joins, so that you don't have this kind of evaluation-order problem, which can lead to wrong answers.