SQL Server 2008’s new HIERARCHYID data type lets you maintain and query hierarchical data. Learn how to define and index the table that will hold the hierarchy data, how to insert new nodes into the table, and how to query the hierarchy.
SQL Server 2008 introduces the HIERARCHYID data type, which is designed to support storing and manipulating data representing a hierarchy. Examples of hierarchies that you might need to represent in a SQL Server database include employee organizational charts, folder lists, forum discussions, and so on. This article is the first in a two-part series about the new HIERARCHYID data type. This month I’ll introduce you to the new data type, and I’ll explain how to define and index the table that will hold the hierarchy data, how to insert new nodes into the table, and how to query the hierarchy. Next month I’ll explain how to handle changes in the hierarchy (e.g., reparenting nodes) and how to convert a traditional parent-child hierarchy (aka an adjacency list) to a representation with the new HIERARCHYID data type.
Creating and Populating the Hierarchy
The HIERARCHYID data type provides the means to store and manipulate data representing a hierarchy. HIERARCHYID is a system-supplied CLR type with a set of methods that allow manipulation of the type. The methods supported by the type include GetLevel, GetRoot, GetDescendant, ToString, Parse, IsDescendantOf, GetAncestor, GetReparentedValue, Read, and Write. I’ll describe some of the methods this month and some next month.
The type internally holds a binary value as large as 892 bytes that represents the path of ancestors leading to the current node. The value positions the current node among other nodes in the hierarchy—both in terms of parent-child positioning, and in terms of positioning among siblings. For example, if you use the HIERARCHYID data type to represent a hierarchy of employees, the value holds the path of all managers leading to the current node; the value represents the correct manager-subordinate relationships, and even the correct position among direct subordinates of the same manager, in case there’s a need for it.
The HIERARCHYID type provides topological sorting—a term taken from graph theory—meaning that the value of a node is greater than the values of all ancestors of the node and less than the values of all descendants of the node. The fact that the type provides topological sorting means that when sorting by HIERARCHYID values, a child would always appear in the output after the parent.
I’ll use a hierarchy of employees in my examples. Run the code in Web Listing 1 to create the Employees table, as well as indexes on the table to support querying the hierarchy. The hid column in the Employees table (see Web Table 1) is of a HIERARCHYID data type, and it represents the position of the current employee in the hierarchy. Notice the computed persisted column called lvl that is defined by a call to the GetLevel method of the hid column. The GetLevel method returns the level of the node in the hierarchy, starting with 0 for the root, 1 for the direct subordinates of the root, and so on.
Besides creating the Employees table, Web Listing 1 also contains statements that create two indexes to support querying the hierarchy; the first is a unique clustered index created on the hid column and the second is a unique nonclustered index created on the lvl and hid columns. These indexes implement two indexing strategies called depth-first and breadth-first, respectively. The depth-first index organizes all members of a subtree (manager and all subordinates in all levels) close to each other; therefore it will support requests to return a whole subtree. This index will also support requests to sort the hierarchy by the hid column (in topological order). The breadth-first index organizes all members of the same level close to each other; therefore it will support requests for siblings from the same level (e.g., a request for all direct subordinates of a manager). Later in the article, I’ll demonstrate such requests and the way they are optimized.
To insert a new row into the Employees table representing a new employee in the hierarchy, you need to generate a HIERARCHYID value for the hid column. The value should represent the correct position of the employee in the hierarchy. You can use methods of the type to help in this task. For the root of the hierarchy—the big boss or the CEO—you can use the HIERARCHYID::GetRoot static method. This method simply produces a HIERARCHYID value that is internally an empty binary string representing the root of the tree. If the new node you are adding is not the root, you can produce the HIERARCHYID value for the new node by invoking the method GetDescendant on the hid value of the manager of the new node. This method accepts two HIERARCHYID inputs; I’ll call them @lft and @rgt. If both values are not NULL, the method produces a new value positioned between the two. If both are NULL, the method produces a value simply below the manager at hand. If @lft is not NULL and @rgt is NULL, the method produces a value greater than @lft. Finally, if @lft is NULL and @rgt is not NULL, the method produces a value smaller than @rgt.
Run the code in Web Listing 2 to create the usp_ AddEmp procedure that is in charge of adding a new node into the hierarchy. If the new employee is the root employee, the procedure uses the HIERARCHYID:: GetRoot static method to produce the HIERARCHYID value for the new employee. If the new employee is not the root, the procedure first obtains the hid value of the new employee’s manager, as well as the maximum hid value among the direct subordinates of that manager. The procedure then uses the GetDescendant method to produce the hid value of the new employee as the last node under the given manager.
To populate the Employees table with sample data, run the code in Web Listing 3. As long as the hierarchy doesn’t contain a large number of levels, the HIERARCHYID values are quite economical.
The HIERARCHYID data type also exposes a method called ToString that produces a canonical string representation of the hierarchical value as a path using a slash as a separator between levels. Unlike the raw binary representation of a HIERARCHYID value, the string representation produced by ToString is more meaningful to the naked eye. For example, the ToString method returns the string ‘/2/1/1/3/’ for the HIERARCHYID value 0x6AD6F0. You can easily see in the string representation of the value that the node is positioned at level 4, and that it’s the child of the parent node with the path ‘/2/1/1/’. To convert a given canonical string representation of a hierarchical value to HIERARCHYID, you can either use the static method HIERARCHYID::Parse, or simply use the CAST or CONVERT functions. For example, the expression CAST('/2/1/1/3/' AS HIERARCHYID) returns the HIERARCHYID value 0x6AD6F0.
To return the current contents of the Employees table, along with the logical string representations of the hid values, run the following query:
SELECT hid, hid.ToString() AS path, lvl, empid, empname, salary FROM dbo.Employees ORDER BY hid;
This query returns the output shown in Web Table 1.
Querying the Hierarchy
Now that I’ve covered how to create and populate the hierarchy, I’ll explain how to query the hierarchy to answer common requests such as sorting the hierarchy, and returning a subtree, a path, direct subordinates, and so on.
Remember that earlier I mentioned that the HIERARCHYID type provides topological sorting. So if you want to sort the employees such that a subordinate will be returned after a manager, you can simply sort by the hid column. You can use the lvl column to produce indentation, replicate some string lvl times, and concatenate the employee name. The following query demonstrates both sorting topologically and producing indentation:
SELECT empid, REPLICATE(‘ | ‘, lvl) + empname AS emp FROM dbo.Employees ORDER BY hid;
Web Table 2 shows the output of this query, and Web Figure 1 shows the execution plan. Notice in the plan that the depth-first index created on the hid column is used here efficiently, preventing the need for a sort operation.
To return a manager and all his or her subordinates in all levels, you can use the IsDescendantOf method. This method is invoked for a given HIERARCHYID value, and accepts a HIERARCHYID value as input. This method returns 1 if the input employee is a descendant of the given employee, and 0 if it isn’t. As an example, the following query returns Ina (employee 3) and all of her subordinates in all levels:
SELECT C.empid, C.empname, C.lvl FROM dbo.Employees AS P JOIN dbo.Employees AS C ON P.empid = 3 AND C.hid.IsDescendantOf(P.hid) = 1;
The query joins two instances of the Employees table, one called P, representing the parent/manager, and the other called C, representing children/subordinates. The ON clause filters only the row for employee 3 from the P instance, and using the IsDescendantOf method, filters all subordinates of employee 3 from the C instance. Web Table 3 shows the output of this query, and Web Figure 2 shows the execution plan.
Notice in the plan that the depth-first index created on the hid column is used here efficiently, scanning the consecutive range of rows at the leaf with Ina (employee 3) and all her subordinates in all levels. The range appears in the plan as >= P.hid and <= Expr1004. If you inspect the Compute Scalar operator that calculates Expr1004, you will find that it represents P.hid.DescendantLimit(), which is the maximum possible value under P.hid.
You can limit the number of levels of subordinates to return below the given manager by filtering the level difference between the subordinate and manager. For example, the following query returns Ina and all of her subordinates up to two levels below her:
SELECT C.empid, C.empname, C.lvl FROM dbo.Employees AS P JOIN dbo.Employees AS C ON P.empid = 3 AND C.hid.IsDescendantOf(P.hid) = 1 WHERE C.lvl - P.lvl <= 2;
To return all managers of a given employee in all levels, you need to make minor revisions to the query that returns a manager and all subordinates in all levels. Simply filter only the given employee from the C instance (subordinates) instead of the P instance (managers), and return the attributes from the P instance. For example, this query returns Didi (employee 14) and all her managers in all levels:
SELECT P.empid, P.empname, P.lvl FROM dbo.Employees AS P JOIN dbo.Employees AS C ON C.empid = 14 AND C.hid.IsDescendantOf(P.hid) = 1;
The output of this query (i.e., Didi and her managers) is shown in Web Table 4.
If you want to limit the number of levels to return, simply add a filter on the level difference between the subordinate and the manager. For example, the following query returns Didi and two levels of managers above Didi:
SELECT P.empid, P.empname, P.lvl FROM dbo.Employees AS P JOIN dbo.Employees AS C ON C.empid = 14 AND C.hid.IsDescendantOf(P.hid) = 1 WHERE C.lvl - P.lvl <= 2;
To get direct subordinates of a given manager, you will find the GetAncestor method useful. This method operates on a HIERARCHYID value (call it v), and accepts an integer (call it l) as input. This method returns the HIERARCHYID value of the ancestor who is one level above v. As an example, the following query shows how to return all direct subordinates of Eitan (employee 2):
SELECT C.empid, C.empname FROM dbo.Employees AS P JOIN dbo.Employees AS C ON P.empid = 2 AND C.hid.GetAncestor(1) = P.hid;
This query filters only the row for employee 2 from the instance P, and returns all employees from the instance C for whom the manager one level above is employee 2. The output for this query is shown in Web Table 5, and the execution plan is shown in Web Figure 3. Notice in the plan that the breadth-first index created on lvl and hid is used here.
Similarly, if you want to return all subordinates of Eitan, two levels below, simply specify 2 as the input to the GetAncestor function:
SELECT C.empid, C.empname FROM dbo.Employees AS P JOIN dbo.Employees AS C ON P.empid = 2 AND C.hid.GetAncestor(2) = P.hid;
More To Come
You can use SQL Server 2008’s new HIERARCHYID data type to maintain and query hierarchical data. In this article I discussed performance aspects of the data type and explained how different indexing strategies can support different types of requests. Next month I’ll explain how you can reparent nodes, move complete subtrees, and convert a traditional parent-child representation of a hierarchy to one that uses the new HIERARCHYID data type.