Think differently about set relationships and member properties
Downloads |
---|
38812.zip |
Last month, in "Set Members and Relationships" (InstantDoc ID 38515), I presented a group of problems that dealt with identifying sets that have certain relationships with other sets. I mentioned that in relational algebra, the problems I discussed involve relational division. Using a common Orders-OrderDetails scenario, I showed how to find existing orders that had the same products as a new order entered into the system and other variations of the problem. I use these same scenarios again in this article's examples.
You'll need to first run the code I provided in last month's Listing 1 to create the Orders and OrderDetails tables and populate them with sample data. (You can download the code at InstantDoc ID 38515.) Orders contains one row for each order, and OrderDetails contains zero or more rows for each order, each with a different product. You also need to run the following code to create and populate the #ProdList table, which represents a new order containing a list of products:
CREATE TABLE #ProdList(productid int NOT NULL PRIMARY KEY) INSERT INTO #ProdList VALUES(2) INSERT INTO #ProdList VALUES(3) INSERT INTO #ProdList VALUES(4)
Last month, I used set terminology to present five variations of a problem in which you had to find orders that had certain relationships with #ProdList. I used the letter O to represent the set of products making up an order and the letter P to represent the set of products in #ProdList. The five problems were:
- P is a subset of O: Find the orders that contain all products that appear in #ProdList and possibly other products as well.
- P is equal to O: Find the orders that contain all of and only the products that appear in #ProdList.
- P is a proper subset of O: Find the orders that contain all products that appear in #ProdList and at least one other product as well.
- O is a subset of P: Find the orders for which all products appear in #ProdList, and return those even if #ProdList contains other products.
- O is a proper subset of P: Find the orders for which all products appear in #ProdList, and return those only if #ProdList contains at least one other product.
Last month's solutions used joins and aggregate functions. This time, let's use only the EXISTS() predicate and correlated subqueries, in which the subqueries refer to columns of an outer query. You might not find any performance improvements in this month's solutions, but they require "outside-the-box" thinking that might prove handy one day.
I also present some problems in which you need to identify sets that meet certain collective-member criteria such as finding orders that contain products 1 and 3 or products 2 and 4. To untangle such problems, you need to examine data that's spread across different table rows.
Without further ado, let's start with the first problem—remember, no aggregations! As a hint, a big part of solving a problem is how you phrase the request.
Task 1: P is a subset of O. Instead of phrasing the problem as "find the orders that contain all products that appear in #ProdList and possibly other products as well," think of it as "find the orders for which no product exists in #ProdList that doesn't exist in the order's details." The tough part here is to train yourself to think this way—to fully understand such a sentence with its subtle possible variations. Once you do, the translation to T-SQL is simple, as Listing 1 shows. Orders A and B meet the criteria because both contain all products from #ProdList.
Task 2: P is equal to O. The second problem, finding the orders containing all of and only the products that appear in #ProdList, can be phrased as "find the orders for which no product exists in #ProdList that doesn't exist in the order's details and for which no product exists in the order's details that doesn't exist in #ProdList." The translation to T-SQL requires you to write two sets of subqueries, each using the NOT EXISTS() predicate twice, as Listing 2 shows. Only order B meets the criteria.
Task 3: P is a proper subset of O. The third task is to find orders that contain all products that appear in #ProdList plus at least one other product. As in the previous solutions, you can rephrase the problem as "find the orders for which no product exists in #ProdList that doesn't exist in the order's details and for which one of the order's details contains a product that doesn't exist in #ProdList." Here you also write two sets of correlated subqueries: the first using NOT EXISTS() twice and the second using EXISTS() and then NOT EXISTS(), as Listing 3 shows. Only order A meets the criteria.
Task 4: O is a subset of P. Reversing the roles of P and O from the first problem, you're now looking for orders for which all products appear in #ProdList, regardless of whether #ProdList contains other products. You can think of this task as "find the orders for which no product exists in the order's details that doesn't exist in #ProdList." As Listing 4 shows, you write a similar query to the one in the first solution, reversing the correlated subqueries.
Note that you get orders B, C, D, and E in the result. Last month's solution to the fourth problem returned orders B, C, and D but not E. Order E is special in the sense that it's an empty order—that is, it has no related order details. Order E shows up in this case because the phrasings of the original problem and the new one aren't exactly the same. To fix this disparity, you need to also ensure that at least one order-detail row exists for the order, as Listing 5 shows. This time, the query doesn't return order E. Taking it a step further, if #ProdList were empty, several of these solutions would yield different results than last month's solutions to the same problems. Therefore, you should specify that you assume that #ProdList isn't empty or add the appropriate filters to cover such a scenario.
Task 5: O is a proper subset of P. The fifth problem involves finding the orders for which all products appear in #ProdList and returning them only if #ProdList contains at least one other product. Being careful, you can phrase the problem as "find the orders that have at least one order detail for which no product exists in the order's details that doesn't exist in #ProdList and for which at least one product exists in #ProdList that doesn't exist in the order's details." This phrase leads to the query that Listing 6 shows. As before, orders C and D meet the criteria.
Now that you've practiced "thinking differently," the next type of problem should be a piece of cake for you. When writing filters in T-SQL, you usually combine several simple logical expressions, examining different attributes from the same row with a combination of the AND, OR, and NOT operators. Less common are problems in which you have to combine logical expressions, examining different attributes from different rows within groups of rows. For example, you might need to write a query that returns orders containing products 2, 3, and 4.
You can solve this query in several ways, one of which is by using relational division. However, try thinking generally of combining logical expressions and examining different attributes from different rows, each having the same group ID (OrderID in this case). The trick here is to use a crosstab query, in which you "pivot," or rotate, rows, making them into columns. In this case, you need to write a query that generates, for each order, a result column for each product of interest containing 1 if such a product exists in the order's details and 0 if it doesn't, as callout A in Listing 7 shows. Figure 1 shows the results of this query.
After you have information about the existence of the different products of interest in a certain row, your task is easy—you write the query that Listing 7 shows. You can write a similar query, like the one Listing 8 shows, to return orders containing products 1 and 3 or 2 and 4. Or you can use the query that Listing 9 shows to retrieve orders containing products 3 and 4 but not 2.
In this month's example problems, the data wasn't convenient for querying, so you had to use tricks to convert the data to an easily queried form. Keep these tricks and techniques in mind; you never know when they might come in handy.