My special area of interest and focus within the huge product that is SQL Server is the product's internals. I want to know how SQL Server does what it does, and I want to be able to use that knowledge to get SQL Server to do things better and faster. Furthermore,I want to know all this without having to read the source code. Knowing about SQL Server internals has given me this knowledge, which for the past seven years I've shared with readers to help you improve your own SQL Server installations.
How I Got Hooked
I first became interested in internals during my life as a technical support engineer for Sybase, about 18 years ago.The Sybase database engine, as you might know, was the original source for the Microsoft SQL Server product. I'd been providing technical support for Sybase customers all over the world, on all aspects of the product, after having had the bare minimum of training. After a year of fumbling my way through complex, but often fascinating database-performance issues, I attended a "Performance Tuning and Optimization" course through Sybase's education department. I learned an enormous amount of information that week, but my emotions as I sat through the course ran the gamut from elation to frustration. I was delighted to learn so many details about what the database engine was actually doing, but frustrated and at times even angry that I hadn't been given this information when I first started supporting customers. As I thought about it further, though, I realized that the concepts in the course wouldn't have made sense to me if I hadn't already had experience with some of the problems that I could use the course information to solve. In other words, I wouldn't have understood the point of learning details about the product's internal workings unless I could relate what I learned to specific situations I had encountered. At the end of the week, my overwhelming feeling was of excitement. I realized that it was possible to become an expert in this product, and I made it my goal to become one.
A few years later, I left Sybase and started working with the Microsoft database product. I started teaching the same course, and I continued my quest to learn everything I could about how the database engine really worked. By the time SQL Server 6.5 was in its full flower, I felt like an expert. But that feeling was short-lived; Microsoft introduced SQL Server 7.0, and I was back to square one. It seemed that everything about the internals changed in 7.0, including the structure of indexes, the possible query plans that the optimizer could devise, and the way that locks were acquired and held. In addition, the product became so much bigger with the introduction of DTS, merge replication, the beginnings of data warehousing and analysis services, and new programming interfaces such as ADO and OLE DB. I did find a renewed excitement in learning new concepts and behaviors, but I decided that I couldn't continue to be an expert in the entire product. I decided to specialize, and my focus became the core engine, in particular locking issues, indexes, and query optimization. I also specialize in SQL Server's metadata, which provides information about locking and indexes, as well as the metadata that describes the physical storage of the data and indexes.
Around that time, I was invited to update Ron Soukup's Inside SQL Server to cover version 7.0, and I also started writing this monthly column. Although this more public exposure has let me meet many people who have similar interests, I've also met many people who maintain or develop large SQL Server systems who profess to have no interest in, or need to know about, SQL Server internals. So I ask them, why not?
Why Not Internals?
I propose a quiz, such as one you might find in Reader's Digest or the Sunday newspaper supplement, to help you assess whether you're a prime candidate to learn about internals:
- Has an application or query performed well sometimes, but not always, and you couldn't figure out why?
- Has SQL Server seemed to run low on memory, although your system actually had plenty of memory?
- Have you ever encountered a SQL Server error message that you didn't understand, which contained the phrase Contact your system administrator—and you are the sys admin?
- Has an application performed perfectly in your development and testing environments, but when you tried to run it in production with dozens or hundreds of users, essential queries and processes seemed unable to run to completion?
- Do you schedule maintenance jobs for your SQL Server system by using the Maintenance Wizard and click Yes to questions about optimization and consistency checks, without really knowing what steps SQL Server would actually perform when running the job?
- Do you manually defragment or rebuild your indexes regularly without verifying that doing so is really necessary?
- Have you added indexes to try to speed up queries, then found that your data-modification operations are actually much slower?
- Have you ever followed someone's advice to change a database's recovery model without knowing what the full impact of the change would be?
Can you answer yes to any of these questions? If so, learning about SQL Server internals would be well worth your while.
Even if some aspects of SQL Server internals might not seem immediately valuable, that information might help you solve a performance problem sometime. For example, you might read one of my articles about database-recovery modes and how SQL Server uses the transaction log differently in each mode.You might run scripts to perform data-loading operations and index maintenance that work perfectly fine now, so you don't find the relationship between recovery modes and the kinds of work you do in your database to be particularly relevant. But if someday you find that your log has grown too big for the disk on which it resides, and you find that efforts to shrink it don't seem to work, a good knowledge of exactly what's logged in each mode might be useful for troubleshooting.
Top 4 Reasons to Study Internals
If you still aren't convinced that you need to know anything about SQL Server internals, here are my top four reasons why you should learn about them:
4. You can determine when a behavior is normal and when it's a bug. Here's an example in SQL Server 2000's Northwind database. Look at the plan for this query:
WHERE ProductID in ( 9, -9)
The query plan shows a nonclustered index on ProductID being used to find the relevant rows.
However, this query, which returns the same rows, doesn't use the index on ProductID. Instead it uses a clustered index scan, and if you have a basic understanding of index internals, you'll know that a clustered index scan is a table scan:
WHERE abs(ProductID) = 9
This query also uses a clustered index scan:
WHERE ProductID in ( 5, -5)
There are two different reasons for the nonuse of the nonclustered index on ProductID, and they're both valid. That is, both are normal and expected behavior. Once you understand how indexes are organized and how SQL Server decides to use them, you'll be able to expect it, too, and make the best choices in writing queries and designing indexes.
3. You can determine whether a problem is really a bug. I've talked about covered queries in several of my columns. A common place to see them is in simple aggregate queries, such as this one in the Northwind database:
Suppose you've built a nonclustered index on UnitsInStock (because the Northwind database doesn't come with this index). SQL Server wouldn't have to touch the data pages at all.All the data needed to compute the average is in the index. So what would you do if the plan for the previous query did a table scan and didn't use the nonclustered index? I have a table in one of my databases, and a query exactly like the previous one, which returns the average on a column with a nonclustered index and doesn't use the index unless I add this WHERE clause:
This condition is actually meaningless because every row in my table has a value for Quantity greater than or equal to 0, but I still need to add the WHERE clause in order for SQL Server to use my nonclustered index on Quantity. This is a bug.The more you know about what SQL Server should do, the faster you can tell whether it's really doing the wrong thing and avoid wasting time trying to fix it yourself.
2. You can explain your decisions with confidence. If you've been working with SQL Server for a while, you've probably learned from experience what works and what doesn't in many cases. But what do you do if your boss asks you to explain your reasoning or brings in an outside consultant who recommends a different solution. Can you explain why your ideas are valid?
A student from one of my recent classes sent me an email message after the class was over. He said that before he took the class he'd actually suggested to the database designers that they change their clustered index to columns other than the primary key. They ignored him. After taking the class, he was so sure he was correct, he told the engineers again that he thought they should change some of the clustered indexes.The designers didn't know he had just taken the class, but this time they took his advice. He felt he was more confident because he knew that he could explain his decision if asked about it.When you know what you're talking about,people listen to you.
1. You can get your own answers to questions. How can you find out everything you've ever wanted to know about SQL Server? Not by reading this one article, or even by reading one book. After you start developing your own expertise, you'll find that your knowledge will grow exponentially. The more you know, the more quickly you can find the best sources of information. You can describe your problems more precisely when you post them in a public Help forum. And you can quickly develop tests or write queries by using SQL Server metadata to provide even more information about what your SQL Server system is really doing.