To get the best performance out of Microsoft SQL Server, you need to avoid these 10 common pitfalls related to AWE memory usage, normalization, clustered indexes, search arguments (SARGs), wasted network bandwidth, and more.
One of the primary things that I do as a consultant is to help companies evaluate the gap between where their current SQL Server deployments are and where they should be in relation to industry best practices. By offering a number of different auditing services and options, I help clients validate code, optimize performance, and evaluate their needs for future growth. And because many of my audits are holistic in nature, I’ve gained great insights over the years into some common developer pitfalls and mistakes that can adversely affect performance. This article focuses on 10 common performance problems (presented in no particular order) that developers can avoid with a modicum of effort.
1. Failure to Properly Configure AWE Memory Usage
Although configuring server memory is obviously an IT or DBA responsibility, I’ve seen too many environments where “poorly performing code” was actually the result of a system with 8GB (or more) of RAM restricting SQL Server to only 2GB. In fact, from what I’ve seen, I’d wager that this problem is so widespread that more than half of production SQL Server deployments suffer from it. Of course, by helping clients remedy this situation, I end up being a hero; adding more RAM is typically one of the easiest ways to boost performance, as long as SQL Server can use the memory.
Happily, you, too, can be a hero by enabling SQL Server to use more than 2GB of RAM. It’s a relatively simple operation—although I wish that the SQL Server installer offered to configure Address Windowing Extensions (AWE) during installation. To start, verify that SQL Server is using AWE, which you can do by executing the code shown in the first part of Listing 1. (Note that some lines in the listings are wrapped to fit on the printed page.) If you have more than 2GB of physical RAM and AWE isn’t enabled, you can use the remaining code in Listing 1 to quickly configure SQL Server to use the additional memory.
The trick to enabling AWE is to ensure that the service account used by SQL Server has been granted the Lock pages in memory user right, as Figure 1 shows. Without this critical configuration change, SQL Server can’t use more than 2GB of RAM, no matter how much physical memory is available. Also note that if you need to change this configuration, you need to restart the SQL Server service before you can realize the benefits. (For more information, see the Microsoft article “How to configure SQL Server to use more than 2 GB of physical memory” which applies to SQL Server 2005 and SQL Server 2000.)
2. Failure to Normalize
Many performance-tuning articles suggest that you intentionally denormalize databases for the sake of performance (to what I jokingly call 2 ¾ normal form). Although internal denormalization is a valid approach that’s covered elsewhere, it’s also a sad fact that databases are occasionally deployed by developers who either don’t know anything about normalization or who fail to grasp its importance.
In these cases, the extra business logic required to extract useful information from jagged Frankencolumns of tangled data adds significant processing overhead. However, this additional overhead usually doesn’t become a performance problem until the database becomes heavily used. Of course, by that time there’s typically no easy or cost-effective way to remedy the situation. Therefore, although it sounds trite, one of the best things that developers can do to ensure good performance is start with a solid, normalized foundation. The cost and headache of doing otherwise can quickly become a true performance nightmare.
3. Failure to Use Clustered Indexes
When you’re looking to squeeze seconds or milliseconds of additional performance out of your code, the prospect of shutting down your production database for a few hours to correct a performance problem probably doesn’t make much sense. But doing so is the only way to remedy common performance problems stemming from the lack of a clustered index.
As with normalization problems, failure to use clustered indexes is a performance issue that typically won’t surface until a table starts to get fairly large (say, a few million rows). But by that time, the table has typically become heavily fragmented (both physically and logically, through the use of forwarding records), and queries involving bookmark operations can become slow enough to draw the attention of end users and management. Sadly, at this point the only available options are to either re-create the table or try to slap a clustered index on it. Both operations can take hours (though recreating the table is typically faster), and end users can’t use the table while the data is being restructured— which in most cases renders your database useless for lengthy periods.
Therefore, it’s a good rule of thumb to make sure that all tables have a clustered index. Likewise, given the fact that a well-placed clustered index can drastically improve ranged queries, it’s also a good practice to ensure that any table with more than 20,000 rows has a well-considered clustered index.
4. Failure to Optimize
If you’re interested in improving the performance of your code, your efforts will only go so far if your database is missing key indexes, has heavy table or index fragmentation, or doesn’t have accurate statistics. Furthermore, since some DBAs are either too swamped to optimize index placement (or consider it a developer responsibility), it behooves developers to ensure that foreign keys and other joined columns are indexed on both sides of the join (i.e., in both tables).
Likewise, for developers whose code is being fingered for performance problems, it’s always good policy to verify that indexes are being regularly rebuilt and defragmented and that statistics are being regularly updated. Doing so can help you avoid wasted effort. Because I find that “failure to DBA” is a common problem in many engagements where I’m called in to help tune “poorly running code,” I’ve provided two scripts (Web Listings 1 and Web Listings 2) that you can use to help evaluate statistics and index fragmentation.
5. Failure to Create SARGable Queries
Highly optimized queries against large tables in SQL Server take advantage of index-seek operations instead of more expensive scanning operations involving tables and indexes. (The difference between these operations is analogous to seeking against the alphabetized listing of last names in your phone book to find every “Campbell” versus scanning the phone book for every “Michael”—the difference can be expressed in orders of magnitude.) Accordingly, queries that can take advantage of index-seek operations are commonly referred to as SARGable queries (where SARG is a word play on Search ARGument).
Creating SARGable queries might sound a bit spooky to the uninitiated, but once you learn a few simple syntax rules and ensure that a viable index is in place, creating SARGable queries actually becomes second nature. Happily, you can find a wealth of resources (both online and in print) that cover this topic in depth, but for a quick overview, see Listing 2.
6. Wasted Network Bandwidth
SQL Server’s default behavior is to constantly update clients about current execution status while a query, batch, or operation is being processed. Although beneficial in some cases, this chattiness adds additional network overhead. Therefore a good rule of thumb is to ensure that stored procedures called by applications take advantage of the SET NOCOUNT ON directive to turn off status messages. Doing so can cut back significantly on network traffic and, in the words of SQL Server Books Online (BOL), “provide a significant performance boost.” Web Listing 2 shows an example of using SET NOCOUNT on.
In a similar vein, developers should also avoid SELECT * queries when returning result sets, as these result sets typically return much more data than the calling application needs. Therefore, a great way to optimize code is to make sure that you’re returning only columns that you need; failing to do so results in additional network overhead with no benefits. Likewise, another good rule of thumb is to try to always horizontally restrict returned data to only what’s needed (through efficient use of the WHERE clause).
7. Improper Use of Cursors
Sadly, cursors are prone to abuse. But as long as cursor-based operations aren’t causing massive performance problems (or can’t be easily rewritten using set-based logic), I typically encourage clients to use optimized declaration syntax and ensure that cursors are closed and de-allocated as soon as possible. A key consideration that makes this recommendation possible is the fact that most developers intend to use lightweight, single-direction cursors for simple looping operations. However, although this might be their intention in most cases, developers will end up with a much more expensive cursor unless they use very explicit syntax.
The fix for this problem is usually simple: Use more verbose syntax (as Listing 3 shows). By explicitly declaring lightweight cursors designed for use by the current connection only, and by ensuring that cursors are closed and de-allocated (especially when nested) as soon as possible, you can overcome many of the problems typically associated with cursors without spending a lot of development effort.
8. Failure to Use Full-Text Indexing
In cases where SARGable queries aren’t possible due to a requirement to search for %conditions% in the middle text columns, full-text indexing can be a lifesaver. Without full-text indexing, the best you can hope for is an index scan (which requires less I/O than a full table scan). But against millions of rows, even an index scan can gobble up excessive amounts of CPU resource, which can adversely impact systemwide performance.
By tokenizing text data with a full-text index, not only can you drastically decrease the size of the indexes used for queries, but SQL Server can provide results much closer to those you’d anticipate with an index seek. The result is that, in certain scenarios, full-text indexing can yield queries that require substantially less processing and return results at a fraction of the cost imposed by their traditional counterparts. For example, the data in Figure 2 was taken from an environment where more than 8 million rows were being pummeled up to 12 times a second by a fairly complex query.
Best of all, these improvements can also free up existing I/O and CPU resources—which translates to better overall system performance. Therefore, although full-text indexing can’t solve every problem, you should evaluate it as a possible way to boost performance if you’re doing intensive text searches. There’s also a slight performance hit involved with turning full-text indexing on, so make sure you’re getting tangible benefits by going this route.
9. Failure to Use Multiple Data Files
A key performance enhancement that many organizations fail to take advantage of is SQL Server’s support for multiple data files. In scenarios in which a database is larger than the amount of physical memory available, consider using multiple data files, especially if the server in question has multiple processors and one or more RAID controllers. Furthermore, the benefits of using multiple data files are even more pronounced in cases where most of the database’s data is housed in a handful of heavily used tables. Finding the right number of data files to use can require some testing, but as long as you don’t go crazy adding secondary data (.ndf) files, it’s hard to go wrong. In fact, Microsoft recommends a ratio of roughly 0.25 to 1 data file per file group per processor. Why? Because adding more files lets SQL Server take advantage of parallel processing by delegating expensive read operations to dedicated I/O threads. The more threads—up to a point—the better. Therefore, moving a heavily used table to a dedicated file group, and its nonclustered indexes to another dedicated file group, can provide serious performance benefits—especially if you can place the files for these file groups on different physical disk arrays or spindles.
10. Failure to Properly Size Data Types
Failure to properly size data types can result in costly performance problems that can be difficult to diagnose and correct. This problem partly stems from the fact that many developers don’t realize that as far as SQL Server is concerned, a VARCHAR(20) is a different data type than a VARCHAR(30). Throwing NVARCHAR versus VARCHAR and NULL versus NOT NULL into the mix only complicates things further. Masking the problem, however, is the fact that SQL Server typically handles subtle differences by transparently converting one data type to another in most cases.
Where this can cause problems, however, is when coercion causes SQL Server to choose an index or table scan instead of a much better-performing index seek. On smaller tables, these effects are typically negligible; however, as tables begin to grow, the effect can become much more pronounced. In some cases, this means that coercion can be a “silent thief” that slows moderately sized queries to a degree that usually goes unnoticed. However, in extreme cases, I’ve seen examples where a stored procedure declared an NVARCHAR(50) parameter and used it in a WHERE clause against a VARCHAR(50) column, which resulted in an index scan that was making the query take four seconds to execute. By simply changing the parameter to match the table’s data type, SQL Server was able to revert to an index seek that completed in milliseconds. Therefore, although the frequency of performance issues caused by this problem is typically very low, it’s always a good idea to pay close attention to data-type declarations and sizing when coding to avoid having your code become a sad statistic in the war on optimizing database performance.
Optimizing SQL Server performance is a vast topic that you can approach from many different angles. This article focuses on a smattering of common problems that are typically fairly easy to remedy. My hunch is that most SQL Server deployments suffer from a handful of issues outlined in this article—but it’s my hope that this article has also gotten you to think a bit more about ways to tackle performance issues while you code, rather than after the fact.