Get the lowdown on tables and indexes
The sysindexes system table contains information (such as disk space used) that isn't available through any supplied system procedure, function, or view. Many people might never need this information, but if your needs go beyond the tools Microsoft supplies, it can be helpful to understand the information in this table. Let's look at the columns that contain information about table and index size, then I'll give you some information about index-key columns.
The documented procedure sp_spaceused uses the columns containing size information to report the total size of all indexes on a table. The undocumented procedure sp_MSindexspace uses these same columns to provide information about index sizes and can report the size of each individual index. (For more information about sp_MSindexspace, see my November 2004 column, "Get Into Index Structures," InstantDoc ID 43939.) One benefit of understanding these columns is that you can use that understanding to adapt sp_MSindexspace if you want different kinds of information than the set it produces. For example, executing this procedure returns a row for each set of column statistics, which don't take up any space. You probably don't want that information if you're investigating space usage. However, sysindexes contains a row for each set of statistics and a row for each index, and sp_MSindexspace doesn't differentiate between the two types of data in the sysindexes table. In SQL Server 7.0, sp_helpindex also retrieves statistics, but Microsoft changed it so that it no longer retrieves statistics in SQL Server 2000. Microsoft didn't update sp_MSindexspace, however, so sp_MSindexspace may return more information than you want or need.
Another change you might want to make to the output of sp_MSindexspace is to include the full size of the clustered index. As the procedure is written, it returns only the size in kilobytes of the index tree above the data. As I've mentioned before, the clustered index's leaf level is the data itself, so you need to consider whether to include the data pages in the computation of the index size. The data pages will take up storage space in your database anyway, so some DBAs just want to know how much extra space the clustered index needs. The full size of the clustered index tree, including the leaf-level data is easier to compute than the value that sp_MSindexspace returns.
To see how you might make that change to the procedure, you need to know about three columns: reserved, used, and dpages. In "Get Into Index Structures," I discussed the difference between reserving space and using it, including the allocation of extents. If SQL Server allocates a uniform extent of eight pages to a table or index, that extent might not be completely used, but sysindexes counts the full eight pages as reserved space. Only the pages that contain data or index rows will count as used. In addition, any index allocation map (IAM) pages for the table or index count as used pages. For the sysindexes row that contains information about the clustered index, the dpages value represents the number of data pages and the used value is the total number of pages for all indexes, including the data in the clustered index's leaf level.
To compute the size of the clustered index tree above the leaf, sp_MSindexspace includes the query that Listing 1 shows. This code takes the value stored in the used column, subtracts the number of actual data pages (dpages), and subtracts the total of all the used pages for the nonclustered indexes. Nonclustered indexes have an indid value of more than 1 but less than 255. If you want to include the clustered index's leaf level, you can remove the dpages column from the second line of Listing 1.
SQL Server Books Online (BOL) is incomplete in its description of the dpages column in sysindexes. For a clustered index, the description is clear, but for nonclustered indexes, BOL gives identical meanings for the dpages and used columns. In sysindexes for any table's nonclustered indexes, these two values are different. The dpages value is the number of pages in the index's leaf level. So for the clustered index, the dpages value is the number of data pages. For nonclustered indexes, the leaf level is separate from the data. In addition to the leaf level index pages, SQL Server always uses at least one additional index page for each nonclustered index because there's always a root page that's separate from the leaf.
A Key Feature
One of the most mysterious columns in sysindexes is the keys column, which contains the index keys. BOL describes this column as a "list of the column IDs of the columns that make up the index key." However, the data stored in the keys column is indecipherable at first glance. SQL Server knows which columns are key columns for any index, and the sp_helpindex procedure lists them, so it must somehow decipher the data in the keys column. However, the code for sp_helpindex provides no clues about how to interpret the data in the keys column. The sp_helpindex procedure uses the index_col() function to extract the names of the index columns. Index_col() is a system function and thus isn't defined in T-SQL, so there's no way to see how this function works. The sp_helpindex procedure needs to know how many columns make up each index, so it calls index_col() for each column in the key. SQL Server needs to know how many columns are in the key, so it uses the sysindexes column called keycnt.
Keycnt seems like it should be a straightforward value: the number of keys in an index. However, the keycnt values in sysindexes might surprise you. Two features of SQL Server's index organization can produce unexpected results. First, clustered indexes have to be unique. If you don't declare an index to be unique, SQL Server makes it unique by adding an extra, hidden key field. Thus, any nonunique clustered index will have a keycnt value that's one greater than the number of columns you supplied in the index definition.
The second feature of index organization is that nonclustered indexes contain the clustered index key as part of the leaf level, and these columns are always part of the keycnt value. So, a nonclustered index will always have as a keycnt value the sum of the number of columns you defined for the index, plus the keycnt value for the clustered index, minus any columns that appear in both the nonclustered and clustered index definitions (unless the table has no clustered index).
Find Your Keys
Now let's look again at the keys column. As I mentioned, the data looks cryptic, but the keys column value does contain meaningful information. Let's look at the keys value for the pubs database's authors table, which has a clustered index on au_id (unique) and composite nonclustered index on au_lname, au_fname:
USE pubs SELECT keys, keycnt, datalength(keys) FROM sysindexes WHERE object_name(id)= 'authors' AND dpages > 0
The length of the keys column seems to be 32 times the number of keys, so you can deduce that information for each key takes up 32 bytes. If you examine the bytes independently and see where they differ, you can gather more information about how the key columns are represented, so you could write your own code that works with the key information, for example.
The following code lists the first two groups of 32 bytes from the keys column entries for the authors table:
SELECT substring(keys, 1,32), substring(keys, 33,32) FROM sysindexes WHERE object_name(id)= 'authors' AND dpages > 0
The strings for the first two columns of the nonclustered index are practically identical:
0xA703A7002800000008D00034000000000200 FFFF00000000FEFFFFFF00000000 0xA703A7001400000008D00034000000000300 FEFF00000000FDFFFEFF00000000
One difference is in the seventeenth byte, where the first index column has a 2 and the second index column has a 3. (The seventeenth byte of the second set of 32 bytes is actually byte 49.) I can extract just these seventeenth bytes by running the following query:
SELECT substring(keys, 17,1) col1, substring(keys, 49,1) col2 FROM sysindexes WHERE object_name(id)= 'authors' AND dpages > 0
The preceding code gives me this output:
col1 col2 ---- ---- 0x01 0x 0x02 0x03
Since the au_lname column has a column ID value of 2 and the au_fname column has a column ID value of 3, I can assume that the seventeenth byte of any 32-byte substring in keys indicates which column is part of the index.
Forward or Backward
There's obviously more information in the keys column, but the only other piece of information I decided to investigate was the data that shows whether the index key is stored in descending or ascending order. (Defining an index key as descending is a new feature in SQL Server 2000.) To find out where SQL Server stores the information, I created three new indexes on the authors table, using different ordering sequences for au_fname and au_lname:
CREATE INDEX desc_nameindex1 ON authors(au_lname DESC, au_fname) CREATE INDEX desc_nameindex2 ON authors(au_lname, au_fname DESC) CREATE INDEX desc_nameindex3 ON authors(au_lname DESC, au_fname DESC)
By using the previous query to display the 32 bytes for each key and comparing the difference in values when the key was stored ascending (the default) to when it was stored descending, I determined that a value of 4 in the 31st byte meant that the key was defined with the keyword DESC. A 0 in the 31st byte means that the key was defined as an ascending key.
I'll leave it as an exercise for you to tweak the sp_MSindexspace procedure or create a new system procedure that can extract whatever index information you're interested in. Knowing what the columns in the sysindexes table mean will help you create useful procedures for tracking and managing your own data and metadata.