In my July 2001 column, "Are You in Tune?" which began a series about query tuning, I mentioned that one of the best ways to ensure that your queries are running as fast as possible is to check that your tables are properly indexed. However, I didn't discuss index structures in depth. A clear understanding of how SQL Server organizes indexes can be a big advantage in determining what kind of indexes can be useful for which queries. This month, I look at how the information in indexes is organized. And I discuss some of the tools you can use to examine the structures of your indexes and gain an even deeper understanding of the valuable information they contain.
Turn Over a New Leaf
SQL Server supports two kinds of indexes, clustered and nonclustered. SQL Server organizes indexes as trees, with one page at the root level, multiple pages at the leaf level, and zero or more levels in between. As Figure 1 shows, indexes are usually represented as an upside-down tree, with the single-page root at the top and the leaf level at the bottom. When I talk about one page being above another, I mean that it is closer to the root. The index pages above the leaf level are called node pages. Each index row in every node page contains two things: the index key value, which is the first key value on a page at the next level of the index, and the page address of that page, which starts with the specified key. In Figure 1, the intermediate level has three pages: 25, 57, and 102. The level above (which is the root) contains each of the first key values from these three pages plus the numbers of the pages that contain those values. SQL Server manages both clustered and nonclustered indexes as balanced trees, or B-trees, which means that all the branches always have the same number of levels as the other branches between the one-page root and the leaf level.
Clustered and nonclustered indexes have certain similarities in the content of their leaf levels. In both types, every key value appears in the leaf level, and all the keys are in order by their data types. For example, if the index is on the firstname column, the leaf level will have a row for every first-name value, in alphabetical order according to the collation of the column or database. If the index is on birthdate, every birth date will appear chronologically in the leaf level. Clustered and nonclustered indexes differ in what other values the leaf level contains. For a clustered index, the leaf level is the data, so the leaf level contains all the columns in the table. In a nonclustered index, each index row in the leaf level contains a pointer to the data.
The size of an index depends primarily on the size of the index key or keys. An index can have as many as 16 columns for its keys, and the key size is the sum of all the column sizes, measured in bytes. (See the "Data Types" section in SQL Server Books Online—BOL—for a list of how many bytes each type of data requires.) In addition to index keys, each node-level index row contains a pointer to a page at the next level of the index, and page pointers are 6 bytes each. Each index row contains some additional overhead bytes in the node levels. The amount of overhead varies depending on the type of index—clustered or nonclustered, unique or nonunique. If the table has a clustered index, each nonclustered leaf row contains the corresponding clustered key value as the pointer, so the size of your clustered key directly affects the size of every nonclustered index.
How Big Is Big?
SQL Server builds an index from the bottom up: It builds the leaf level first, then sorts it before building the next level. SQL Server doesn't have to build a clustered index's leaf level because the data already exists, but it does have to sort the entire data set. For a nonclustered index, SQL Server takes all the key values (including duplicates) from the data and stores them each in an index row along with the pointer to the location of the data row containing that key. A nonclustered index's leaf level can be quite large compared to the other levels of the index because the leaf level contains an entry for every key value in the underlying table. However, the nonclustered index's leaf level can be quite a bit smaller than the table itself.
For example, consider a table of 10,000 pages and 500,000 rows that has a clustered key of a 5-byte fixed-length character type. If this table also has a nonclustered index on a 5-byte fixed-length character column, the leaf-level rows (level 0) of the nonclustered index will be 11 bytes long because they'll each contain the nonclustered index key (5 bytes), the corresponding clustered index key (5 bytes), and 1 overhead byte. Because each page has 8096 bytes available and 8096 bytes divided by 11 index bytes per row is 736 index rows, an index page for this nonclustered index can hold 736 index rows at this leaf level. The index will need 500,000 index rows, one for each row of data, so the index will have 680 leaf pages.
The upper levels of the indexes contain pointers to pages in the next lower level. The level right above the leaf in a clustered index has a pointer to each of the table's 10,000 pages; for this index, the index rows at the upper levels will be 12 bytes each. Each index row contains the key value (5 bytes), a pointer to a page (6 bytes), and 1 byte of overhead. A total of 674 rows (8096 bytes available on a page divided by 12 bytes per row) will fit on a node-level index page. The index needs enough rows to point to all 10,000 pages, so it will have 15 index pages (10,000 divided by 674) at level 1. Now, index rows at level 2 point to all these 15 index pages, and since one page can contain 15 index rows, level 2 is the root. For the nonclustered index, the index rows at the higher levels will be 12 bytes each, so 674 index rows will fit per page, and you need two pages at level 1, with level 2 as the root page.
So, how big are these indexes compared to the table? The clustered index for a 10,000-page table had 16 index pages, which is less than 1 percent of the table size. I frequently use 1 percent as a ballpark estimate for the space requirement of a clustered index, although in this case, it's an overly large estimate. On the other hand, the nonclustered index for the 10,000-page table needed 683 pages, which is about 6 percent more space than the table data needs. Estimating the space necessary for nonclustered indexes is difficult. My example used a very small key. Nonclustered index keys are frequently much larger—or even composite—so keys of more than 100 bytes aren't unusual. In that case, you'd need many more leaf-level pages, and the total nonclustered index size could be 30 or 40 percent of the table size. (Remember that SQL Server lets you have as many as 249 nonclustered indexes!) Disk space is cheap, but is it that cheap? You need to plan your indexes carefully.
Several tools and commands let you examine index structures so that you can see exactly how much space an index needs and how many pages are at each level. The stored procedure sp_spaceused reports the total space used by all indexes on a table. Because the number is in kilobytes and each page holds 8KB, you need to divide by 8 to get the number of pages your indexes require. The value for the total index size includes the special allocation pages called index allocation maps (IAMs). Each table and each index has at least one IAM page. Note that no documented command can tell you the size of just one index. However, Enterprise Manager can show you the size of one index, so you can guess that SQL Server must have a procedure to return that information. Enterprise Manager actually calls the undocumented command sp_MSindexspace, which takes a table name and an index name as parameters. For example, the Northwind sample database has a 2055-row table called Order Details that contains one row for every item purchased in every order. The table has five indexes, including a clustered index called PK_Order_Details on the OrderID and ProductID columns. When I want to see the total space used just for the PK_Order_Details index, I can run the following command:
The value returned is simply the size of the index in kilobytes. You can run this command to examine the comparative sizes of different indexes on the table or see how an index will shrink or grow if you remove or add columns. Next month, I'll continue examining the structure of indexes, and I'll tell you about additional index information you can get from the sysindexes system table.