Indexing is such a large subject that getting a handle on what to do and what not to do when you're developing your indexing strategies can be difficult. You create indexes to improve query response time. But indexing is a balancing act.

Related: Improving the Index_Evaluation_USP Sotred Procedure

Each index is a system-managed table, so every addition or modification to the data in a user table potentially involves updating the indexes, which can cause slower performance of data updates. Everyone—from product vendors to database-performance specialists to DBAs who work on the production front lines—has ideas about how and when to index. Here's my personal list of dos and don'ts.

Do Index the Primary and Unique Key Columns

Make certain that the primary key for each table and any unique key columns that exist are indexed. You don't have to index a table's primary key yourself because when you create a table and declare the primary key, SQL Server automatically creates a unique index on that column or set of columns. However, if another column also contains unique data and you want to maintain the uniqueness of each column entry, you should create a unique index or a unique constraint on that column as well. A secondary benefit from indexing the unique key column is the query performance boost this operation gives to the joins or searches executed on the unique key column.

Do Index the Foreign Key Column

If a table contains one or more foreign key columns that reference other tables, you need to index those columns. Foreign key columns are usually nonunique because they're on the "many" side of the one-to-many (1:M) relationship. Therefore, you expect to see duplicate values in the foreign key column every time one row in the master table (on the "one" side) is related to two or more rows in the detail table (on the "many" side). You need to create a unique index on a foreign key column only if you're enforcing a mandatory one-to-one (1:1) relationship between two tables. In a 1:1 relationship, the values in the foreign key column are a unique set that corresponds to the primary key values in the master table. When the foreign key column in a table is indexed, SQL Server can scan the indexes for the rows it needs instead of scanning both tables.

Do Consider Using a Covering Index for Queries That Return Few Columns

A covering index contains all columns in all parts of a query, including the FROM and WHERE clauses. If SQL Server can scan the index to find the data it needs, it doesn't have to read the associated data pages. Consequently, query performance improves. For example, in the Authors table in the Pubs database, the aunmind covering index consists of author last name (au_lname) and author first name (au_fname). Usually, unless you specify differently in your query, SQL Server will choose whether to use an index. When you're working with tiny tables—tables that fit on one or two pages (8K blocks of space on the hard disk)—SQL Server will choose to perform a table scan instead of using an index. For little tables, a table scan (which requires only one hard-disk access) is less costly than reading an index, finding the location of the data rows, then reading the data rows (which requires two hard-disk accesses). But for a query such as

SELECT au_lname, au_fname FROM AUTHORS

SQL Server can use the covering index to build the result set, even for the tiny 24-row Authors table, as Figure 1 shows. For more information about designing indexes to improve query performance, see Kalen Delaney, Inside SQL Server, "9 Burning Questions," October 2002.

Do Consider Using a Clustered Index for Large Tables and Range-of-Values Queries

A clustered index determines the storage order of the records on disk. The Authors table is clustered on au_id, so au_id is the clustered key. When you query the Authors table without specifying ordering criteria, as follows:

SELECT * FROM Authors

The results might display in clustered key order (e.g., the lowest value of au_id might appear at the top of the list, as Figure 2 shows). In SQL Server 7.0 and later releases, ordering of the result set can vary, depending on the retrieval strategy that the query optimizer chooses. Therefore, don't count on record ordering. Instead, use the ORDER BY option in your queries. However, if you have a large table, particularly a table that you often query by using a range-of-values condition on the clustered key (e.g., WHERE au_id between '400-00-0000' and '499-99-9999'), the query performance will improve when you use a clustered index. If you join this table to other tables and the clustered key is also a joined column, the JOIN query performance might be better than if the query executed the join without a clustered index.

Do Index Sorting, Grouping, and Aggregating Columns

You also need to consider indexing columns that you use to order by and those that you use in a grouping expression. You might benefit from indexing the columns that the MIN(), MAX(), COUNT(), SUM(), and AVG() functions use to aggregate the data. When you use the MIN() and MAX() functions, SQL Server does a simple lookup for the minimum and maximum values in the column, respectively. If an index's data values are arranged in ascending order, SQL Server can read the index to quickly determine the correct values of MIN() or MAX(). The range-of-values query incorporates a filter or constraint (expressed in the SELECT query's WHERE clause or HAVING clause) to limit the rows that the query returns. Similarly, when you have an index, you can optimize data sorting (by using the ORDER BY clause) and data grouping (by using the GROUP BY clause), especially if the table or tables you're querying contain many rows.

Do Consider a DSS

Indexing in a transactional environment is a balancing act. You try to enhance query response by creating indexes—but doing so decreases performance of inserts, updates, and deletes. If your querying compromises the database's transactional requirements, and if you decide that you can't resolve the two conflicting requirements, consider creating a separate decision support system (DSS). A DSS is essentially a database and a set of queries that produce reports. The DSS can be a simple copy of the transactional database enhanced by tables of summary data and views that make running reports simpler and quicker than using the transactional database. The DSS doesn't need to be a full-fledged data mart or data warehouse running on its own server. The DSS can share the same hardware platform that the transactional database is running on as long as the hardware is robust enough to support all the user connections.

And the DSS doesn't necessarily have to be read-only: You can create views on the DSS that are designed to facilitate report-writing and which, because of their complexity, make data seen through a view act as though it's read-only. When you implement a DSS, you can take the query load off your transactional database by analyzing the most commonly executed queries, creating covering indexes for the tables, and creating effective indexed views for the data.

Don't Over-Index Transactional Tables with Heavy I/O Activity

A transactional table is a table in an online transaction processing (OLTP) database. You can use transactional tables for reports or as input to programs that export data (e.g., in a SELECT ... FOR XML query). However, the OLTP database's primary purpose is capturing data, and if you over-index a transactional table, you might inhibit performance of INSERT, UPDATE, and DELETE operations. Each index is a database structure—a table that contains a column of values arranged in ascending order, with bookmarks that point to associated rows in the data pages. Therefore, when you add a record to a transactional table, an INSERT operation occurs not only on the data page, but also in all associated indexes. And when you delete a row from the data page, SQL Server removes the corresponding row from all associated indexes. Certainly you need to create some indexes on heavily used OLTP tables, but don't overdo it—an UPDATE operation involves not only the data page but also the number of index pages that matches the number of indexed columns that you need to modify.

Don't Index Wide ColumnsIf a table's columns are wide (i.e., the columns have a data type of char(25) or larger), indexes created on those columns might result in a large number of pages, thereby negating the benefit of using an index in the first place. Nonclustered indexes in SQL Server 2000 and 7.0 are structured differently from those in earlier releases. If you're not careful, you can create indexes that take more space on disk than the original data pages.

A Double-Edged Sword

Proper indexing can make a sluggish database perform better. Conversely, improper indexing can make a high-performing database run poorly. The difference depends on how you structure the indexes—and on your follow-up. You need to test and observe the effect of a newly created index on your production environment, and you need to remove indexes that compromise your database's performance. Your indexing strategy is an important part of your responsibilities as a physical data modeler and DBA.

Related: Time to Defrag You Indexes?