SideBar    The Logical Puzzle
DOWNLOAD THE CODE:
Download the Code 99036.zip

Executive Summary: SQL Server 2008’s new HIERARCHYID data type lets you maintain and query hierarchical data. Learn how to define and index the table that will hold the hierarchy data, how to insert new nodes into the table, and how to query the hierarchy.

SQL Server 2008 introduces the HIERARCHYID data type, which is designed to support storing and manipulating data representing a hierarchy. Examples of hierarchies that you might need to represent in a SQL Server database include employee organizational charts, folder lists, forum discussions, and so on. This article is the first in a two-part series about the new HIERARCHYID data type. This month I’ll introduce you to the new data type, and I’ll explain how to define and index the table that will hold the hierarchy data, how to insert new nodes into the table, and how to query the hierarchy. Next month I’ll explain how to handle changes in the hierarchy (e.g., reparenting nodes) and how to convert a traditional parent-child hierarchy (aka an adjacency list) to a representation with the new HIERARCHYID data type.

Creating and Populating the Hierarchy
The HIERARCHYID data type provides the means to store and manipulate data representing a hierarchy. HIERARCHYID is a system-supplied CLR type with a set of methods that allow manipulation of the type. The methods supported by the type include GetLevel, GetRoot, GetDescendant, ToString, Parse, IsDescendantOf, GetAncestor, GetReparentedValue, Read, and Write. I’ll describe some of the methods this month and some next month.

The type internally holds a binary value as large as 892 bytes that represents the path of ancestors leading to the current node. The value positions the current node among other nodes in the hierarchy—both in terms of parent-child positioning, and in terms of positioning among siblings. For example, if you use the HIERARCHYID data type to represent a hierarchy of employees, the value holds the path of all managers leading to the current node; the value represents the correct manager-subordinate relationships, and even the correct position among direct subordinates of the same manager, in case there’s a need for it.

The HIERARCHYID type provides topological sorting—a term taken from graph theory—meaning that the value of a node is greater than the values of all ancestors of the node and less than the values of all descendants of the node. The fact that the type provides topological sorting means that when sorting by HIERARCHYID values, a child would always appear in the output after the parent.

I’ll use a hierarchy of employees in my examples. Run the code in Web Listing 1 (www.sqlmag.com, InstantDoc ID 99036) to create the Employees table, as well as indexes on the table to support querying the hierarchy. The hid column in the Employees table (see Web Table 1) is of a HIERARCHYID data type, and it represents the position of the current employee in the hierarchy. Notice the computed persisted column called lvl that is defined by a call to the GetLevel method of the hid column. The GetLevel method returns the level of the node in the hierarchy, starting with 0 for the root, 1 for the direct subordinates of the root, and so on.

Besides creating the Employees table, Web Listing 1 also contains statements that create two indexes to support querying the hierarchy; the first is a unique clustered index created on the hid column and the second is a unique nonclustered index created on the lvl and hid columns. These indexes implement two indexing strategies called depth-first and breadth-first, respectively. The depth-first index organizes all members of a subtree (manager and all subordinates in all levels) close to each other; therefore it will support requests to return a whole subtree. This index will also support requests to sort the hierarchy by the hid column (in topological order). The breadth-first index organizes all members of the same level close to each other; therefore it will support requests for siblings from the same level (e.g., a request for all direct subordinates of a manager). Later in the article, I’ll demonstrate such requests and the way they are optimized.

To insert a new row into the Employees table representing a new employee in the hierarchy, you need to generate a HIERARCHYID value for the hid column. The value should represent the correct position of the employee in the hierarchy. You can use methods of the type to help in this task. For the root of the hierarchy—the big boss or the CEO—you can use the HIERARCHYID::GetRoot static method. This method simply produces a HIERARCHYID value that is internally an empty binary string representing the root of the tree. If the new node you are adding is not the root, you can produce the HIERARCHYID value for the new node by invoking the method GetDescendant on the hid value of the manager of the new node. This method accepts two HIERARCHYID inputs; I’ll call them @lft and @rgt. If both values are not NULL, the method produces a new value positioned between the two. If both are NULL, the method produces a value simply below the manager at hand. If @lft is not NULL and @rgt is NULL, the method produces a value greater than @lft. Finally, if @lft is NULL and @rgt is not NULL, the method produces a value smaller than @rgt.

Run the code in Web Listing 2 to create the usp_ AddEmp procedure that is in charge of adding a new node into the hierarchy. If the new employee is the root employee, the procedure uses the HIERARCHYID:: GetRoot static method to produce the HIERARCHYID value for the new employee. If the new employee is not the root, the procedure first obtains the hid value of the new employee’s manager, as well as the maximum hid value among the direct subordinates of that manager. The procedure then uses the GetDescendant method to produce the hid value of the new employee as the last node under the given manager.

To populate the Employees table with sample data, run the code in Web Listing 3. As long as the hierarchy doesn’t contain a large number of levels, the HIERARCHYID values are quite economical.

The HIERARCHYID data type also exposes a method called ToString that produces a canonical string representation of the hierarchical value as a path using a slash as a separator between levels. Unlike the raw binary representation of a HIERARCHYID value, the string representation produced by ToString is more meaningful to the naked eye. For example, the ToString method returns the string ‘/2/1/1/3/’ for the HIERARCHYID value 0x6AD6F0. You can easily see in the string representation of the value that the node is positioned at level 4, and that it’s the child of the parent node with the path ‘/2/1/1/’. To convert a given canonical string representation of a hierarchical value to HIERARCHYID, you can either use the static method HIERARCHYID::Parse, or simply use the CAST or CONVERT functions. For example, the expression CAST('/2/1/1/3/' AS HIERARCHYID) returns the HIERARCHYID value 0x6AD6F0.

To return the current contents of the Employees table, along with the logical string representations of the hid values, run the following query:

SELECT hid, hid.ToString() AS path, lvl, empid, empname,
  salary
FROM dbo.Employees
ORDER BY hid;
This query returns the output shown in Web Table 1.

Querying the Hierarchy
Now that I’ve covered how to create and populate the hierarchy, I’ll explain how to query the hierarchy to answer common requests such as sorting the hierarchy, and returning a subtree, a path, direct subordinates, and so on.

Remember that earlier I mentioned that the HIERARCHYID type provides topological sorting. So if you want to sort the employees such that a subordinate will be returned after a manager, you can simply sort by the hid column. You can use the lvl column to produce indentation, replicate some string lvl times, and concatenate the employee name. The following query demonstrates both sorting topologically and producing indentation:

SELECT empid, REPLICATE(‘ | ‘, lvl) + empname AS emp
FROM dbo.Employees
ORDER BY hid;

Continue on Page 2

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

There is no explanation of what the notation /2/1/1/2/ stands for. I can only guess that the numbers represent the ordinal position among siblings at that level. Is that so? The article doesn't say. Otherwise it's a good article. Respond to: charles@kangai.demon.co.uk.

ckangai

Article Rating 4 out of 5

Hi Charles!

The following is Itzik Ben-Gan's response your question:

“Hi Charles,

The documentation doesn’t elaborate on how exactly the logical string representation of the hierarchyid value should be interpreted. The important thing is that the value itself positions the node in a certain place in the tree, includes info on all ancestors, and positions the node in a certain place among siblings. We can try to figure out from observation and experimentation what the logical representation means.

The obvious part is that a forward slash is used as a separator between levels. As for the values within a level, it’s a bit more tricky than simply ordinal positions. If you provide the GetDescendants method two NULLs as input, you get 1 within the level:

select cast('/' as hierarchyid).GetDescendant(null, null).ToString(); -- /1/

If you provide an integer x as the left input and NULL as the right input, you get the left integer plus 1 (x + 1):

select cast('/' as hierarchyid).GetDescendant('/1/', null).ToString(); -- /2/ select cast('/' as hierarchyid).GetDescendant('/9/', null).ToString(); -- /10/

However, if you provide two consecutive integers within the level, x and x+1, as the left and right inputs, you get the left with a dot and an integer 1 after the dot (x.1):

select cast('/' as hierarchyid).GetDescendant('/1/', '/2/').ToString(); -- /1.1/

If you provide two consecutive values in the form x.y, x.y+1, you get x.y.1:

select cast('/' as hierarchyid).GetDescendant('/1.1/', '/1.2/').ToString(); -- /1.1.1/

Cheers, Itzik”

Please let me know if you have any more questions. Thanks!

Megan Bearly Associate Editor, SQL Server Magazine mbearly@sqlmag.com

meganbearly

Article Rating 5 out of 5

Is Hierarchyid a field or a data type. It appears in your SQL Scripts that hierarchyid is a field you are using. If I am correct doesn't a data type determine what type the database field is?

Forgive me if this is a stupid question

[Itzik: HIERARCHYID is a datatype. You can define columns, parameters and variables of this datatype. I like to think of a datatype as a class, and of a value of the type as an object (instantiation of the class). That is, data and behavior supporting it.]

smgdba

Article Rating 3 out of 5

It would be great if the advantages of storing data using HierarchyID vs. older e.g., storing EmpID, MgrID in the same table for Emp-Mgr hierarchy methods be discussed in the next article.

sqlmagbm21

Article Rating 5 out of 5