Common table expressions (CTEs) were introduced in SQL Server 2005. I covered both nonrecursive and recursive CTEs almost a decade ago in "Get in the Loop with CTEs."
As a reminder, a recursive CTE has at minimum one anchor member and one recursive member, using the following form:
The anchor member is a query without a recursive reference to the CTE name; it gets invoked only once. The recursive member is a query with a recursive reference to the CTE name; it gets invoked repeatedly until it returns an empty set. The reference to the CTE name in the recursive member represents the previous result set. The outer reference to the CTE name represents the unified results of the single invocation of the anchor member and the multiple invocations of the recursive member. For more details about the fundamentals of recursive queries and examples, see "Get in the Loop with CTEs."
I've written a lot of recursive CTEs since their inception in SQL Server. But recently it occurred to me that all the CTEs I'd written had a single recursive member. Knowing that CTEs support multiple recursive members, I went in search of such examples. I found several interesting use cases that I'll share with you in this article. I'll cover two examples: genealogy and the nested set model for representing graphs. Next month, I'll provide another example.
Genealogy-related queries can be implemented using CTEs with multiple recursive members. For sample data, I'll use part of Bilbo Baggins' family tree. Run the code in Listing 1 to create the table FamilyTree and fill it with sample data. (For a more complete family tree for Bilbo, see en.wikipedia.org/wiki/Bilbo_Baggins#Family_tree.)
Your task is to implement a solution that returns all ancestors of a given family member. For example, given as input @id = 16 (ID of Frodo Baggins), return the ID, name, and level (distance from input) of all ancestors of the input member. Figure 1 shows the desired output for Frodo.
You can implement a solution for this task by using a CTE with two anchor members and two recursive members. Remember that anchor members get invoked only once. One anchor member returns the mother of the input and another anchor member returns the father of the input. Remember that recursive members get invoked repeatedly until they return an empty set. The reference to the CTE name in the recursive member represents the previous result set. So you use one recursive member to return mothers of family members from the previous round and another recursive member to return fathers of family members from the previous round. In pseudo code, the solution CTE looks like this:
anchor member returning mother of input family member
anchor member returning father of input family member
recursive member returning mothers of family members from
recursive member returning fathers of family members from
Note that it's possible to implement a solution to this task using a CTE with a single anchor member and a single recursive member. However, such a solution is less natural and more complex. Listing 3 contains an example of such a solution. This solution uses a trick that prevents the necessity of two anchor members (for mother and father) and two recursive members (for mothers and fathers). Both anchor and recursive queries perform a cross join between the source data and a derived table that's made of two rows -- one representing the father (parent = 'F') and one representing the mother (parent = 'M'). Then, a CROSS APPLY operator uses a CASE expression to determine which column to return, based on the parental role; when parent = 'F' the CASE expression returns the father, and when parent = 'M' it returns the mother.
Again, this solution shows that it's possible to complete the task using a CTE with one anchor member and one recursive member. However, it's simpler and more natural to do so with multiple anchors and multiple recursive members.
The nested set is a model for representing graphs that's based on graph theory. It works well in certain scenarios -- mainly, for the special case of a graph called tree. A tree is a graph that's directed, acyclic, and where only one path leads to each node.
In the nested set model you assign two integer values called left and right to each node, such that the left value of the node is greater than the left value of its parent node and the right value of the node is less than the right value of its parent node. You rely on the left and right values to address common requests against graphs. For example, to return all descendants of a given node you use a query that returns all nodes with a left value that's greater than the left value of the input node and with a right value that's less than the right value of the input node. To return all ancestors of a given node, you use a query that returns all nodes with a left value that's less than the left value of the input node and with a right value that's greater than the right value of the input node. (For more details about the nested set model, see en.wikipedia.org/wiki/Nested_set_model.)
In this article, I demonstrate how to compute the left and right values efficiently by using a CTE with multiple recursive members. As sample data, I use a table called Employees. You can use the code in Listing 4 to create and populate the Employees table.
As you can see, the Employees table represents the tree of employees as an adjacency list. The empid and mgrid columns represent the child and parent IDs of the nodes in the tree, respectively. The task at hand is to query the Employees table and compute the left and right values for each node. Figure 2 is a graphical depiction of the tree of employees, including the left and right values that need to be computed.
In terms of order among siblings, in this example the order is based on empname and empid as a tiebreaker. Therefore, for example, Seraph appears to the right of Jiru, and Steve appears to the right of Seraph.
Figure 3 shows another graphical depiction of the nested set representation of the employees. Each node spreads two arms around the contained nodes. After arranging the nodes in order based on the containment relationship (with siblings ordered by empname), you assign row numbers to the arms in sequential order from left to right. The row numbers you assign will become the left and right values of each node. This idea is the basis for computing the left and right values using a CTE with multiple recursive members.
Using pseudo code, Listing 5 shows how you compute the left and right values. Listing 6 contains the T-SQL solution code that implements this pseudo code. The solution first defines a CTE called EmpsRN. The inner query uses the ROW_NUMBER function to compute numbers (call the column n), starting with 1 and incrementing by 2 (n = 1, 3, 5, . . . ). These numbers will be used to represent the left arms of the nodes below their respective parent. The gaps will be filled by the right arms, which will be computed as n + 1.
The key to computing the left and right values is constructing the binary sortpath values. The CTE C1 generates two rows for each node representing the left and right arms. It starts with anchor members that return the left and right arms of the root node. The anchor members assign the sort path 0x01 to the left arm and the sort path 0x02 to the right arm. Then the recursive members join the left arms of the members from the previous round to the EmpsRN CTE to return children. One recursive member generates left arms for the children, and the other recursive member generates right arms. The recursive member handling the left arms concatenates the sortpath value of the parent's left arm with n (converted to binary). The recursive member handling the right arms concatenates the sortpath value of the parent's left arm with n + 1 (converted to binary). Note that in this solution I used one byte in sortpath for each node, assuming there are no more than 255 direct subordinates to each manager. If there can be more, you should use more bytes accordingly.
Next, the query defining the CTE C2 queries C1 and computes row numbers based on sortpath ordering, naming the result column sortval. The values in the sortval column will eventually become the left and right values of the nodes. The values in the sortval column correspond to the numbers assigned to the arms in Figure 3. Figure 4 shows the result of the query defining C2, with lines added to connect the left and right arms of each node.
Finally, the outer query queries the CTE C2, groups the rows by empid, and returns the minimum sortval and maximum sortval as the left and right values, respectively. The code in Listing 6 generates the desired output shown in Figure 5.
Two Examples Down, One to Go
In this article, I covered two examples of CTEs with multiple recursive members: one for genealogy-related queries and one to compute left and right values for the nested set model for graphs. Next month, I'll continue with this topic and provide another example.
Also, see my article "Cycling with CTEs."