Real-world experience shows the benefit of internals knowledge
I get a lot of nice feedback about my columns and classes, but some people ask me why they should bother learning how SQL Server works. I study internals often just for a deeper understanding of my favorite software and for the experience of learning something new. But many of my readers and students have specific reasons for learning about SQL Server internals. One of the most common reasons is that once you understand the way SQL Server works, you're better prepared to track down problems. If you understand the way indexes work, for example, you know that a query that takes 5 seconds to return 1 row, even from a 10 millionrow table, can probably be improved.
Recently, I saw a real-world example of how newly acquired knowledge of SQL Server internals helped a DBA solve a puzzling problem. In a class I taught about SQL Server internals and performance tuning, one student was extremely motivated. Instead of constantly picking my brain and asking me questions, as many students do, Danny tried to duplicate every demonstration to make sure he could produce the same behavior and get the same results I did. He asked questions only when he wanted to delve even deeper into a topic.
Danny installed software on the class computer that let him access his machines at work so that he could test the concepts he was learning on his own databases. Toward the end of the second day of class, he mentioned that he was having a problem with one of his SQL Servers at work and was trying to track down the cause. The main symptom was that when he clicked the plus sign (+) in Enterprise Manager to expand the database list, this machine took more than 15 minutes to enumerate the list of about 40 databases. The expansion didn't take nearly that long on the other SQL Server machines, so he felt confident that the underlying problem was solvable.
I first suggested that he determine which commands Enterprise Manager was sending to SQL Server when expanding the database list. We had talked about SQL Server Profiler briefly, so he could set up a simple trace. (For more information about using Profiler traces to improve performance, see Steven Berringer's article "9 Steps to an Automated Trace," InstantDoc ID 43014.) I recommended that he capture only the SQL:Batch Completed events in the TSQL event category. I further recommended that he filter the captured data on application name by specifying "MS SQLEM" as the value in the LIKE clause for the ApplicationName filter to capture events only from Enterprise Manager.
When he started the trace, he saw only the T-SQL batches sent to SQL Server through Enterprise Manager, just as he expected. He then opened Enterprise Manager and clicked the plus sign to expand his databases. The Profiler display showed the code in Listing 1. This code looked fine for Enterprise Manager to use to obtain information about every database, but Profiler was displaying this query as many times as there were databases. This repetition might not seem like a problem at first glance—after all, you need the properties for each database—but the SELECT statement has no WHERE clause. One execution of this statement retrieves the information for all databases, so Enterprise Manager doesn't need to send the same statement multiple times. This redundancy appeared to be a bug in Enterprise Manager, so we couldn't easily fix it.
However, Danny reminded me that he experienced much longer delays on one machine than on any others. I assumed that he meant one client machine, so I started researching how the network connectivity could be different between different clients and suggested that he investigate the protocol configuration for each client.
Meanwhile, Danny copied the code from the Profiler display into a Query Analyzer window, still connected to his at-work system. When he executed the statement, it took more than 30 seconds to run. Because he thought that a simple system-table access shouldn't be that slow, he started paring down the function calls from the SELECT list. He discovered that the last function in the list, has_dbaccess(name), was the culprit. When he removed that function from the SELECT list, the query completed almost immediately.
The has_dbaccess() function takes one parameter—the database name—and returns 1 if the user executing the function has access to the database, 0 if the user doesn't have access, and NULL if the database name is invalid. This check keeps Enterprise Manager from displaying names of databases that you don't have access to. Danny wanted to determine whether this function was just resource-intensive or whether some databases were slower to access than others. By running the following SELECT statement for each database, he could check the time required to access each database individually:
He ran this code for each database, plugging in one database name at a time, and noticed that several databases took many seconds to return results but others returned results immediately. Meanwhile, a classmate looked through Microsoft's Knowledge Base and found the article "FIX: Opening the Database Folder in SQL Server Enterprise Manager 2000 Takes a Long Time" (http://support.microsoft.com/default.aspx?scid=kb;en-us;282416&Product=sql). The title certainly seemed to reflect Danny's problem, but the article said that SQL Server 2000 Service Pack 1 (SP1) fixed this problem. Danny checked his version number and found that his problem server was running SP3. Because SP1 fixed the client's SQL-DMO library, he checked his client machines and found that they were still running the RTM version of SQL-DMO.
I then traced the actions SQL Server took when expanding the list of databases on one of the class machines, which had SP3 installed for all components, and didn't see SQL Server running the SELECT query once for each database. SP1 apparently fixed that problem, but the SELECT query in Listing 1 still took several seconds to run even once on the problem machine at Danny's office.
When Danny finished isolating the databases that were slower to return results for has_dbaccess(), we examined the properties of those databases by running sp_helpdb and looking at the status column in the results. We immediately noticed a similarity: All the slow databases had the AUTOCLOSE option set to ON. We hadn't yet discussed AUTOCLOSE in class, so Danny didn't know why this setting would cause a performance problem. According to SQL Server Books Online (BOL), if a database has this option on, the database shuts down cleanly and frees up its resources after the last user exits. At first, I rejected this setting as the cause of the problem because Danny had said the slowdown happened only when running Enterprise Manager from one client machine. But he explained that he'd meant that it happened on all clients that were accessing one server. Other servers, which had different database lists, didn't display this behavior.
The AUTOCLOSE option setting caused a problem because when SQL Server needs to check the value of has_dbaccess() for a user, it must access the database to examine the sysusers table in that database. If the database is closed (i.e., not in use), SQL Server must take time to open it. Of Danny's 40 databases, 15 had this property set to ON. When he used a simple SELECT statement that called getdate() before and after the SELECT has_dbaccess() statement, he discovered that accessing a closed database on his server took about 2 seconds. Running the complete command to check the has_dbaccess() status in all databases took more than 30 seconds because 15 databases had this option set to ON. And because of the bug, his version of Enterprise Manager ran the 30-second statement for each of his 40 databases! Setting the AUTOCLOSE option to OFF in all his databases reduced the time SQL Server took to expand his list of databases to less than 1 minute.
In this case, a combination of problems was causing slow performance. One problem was a bug with Enterprise Manager (or SQL-DMO, which runs under Enterprise Manager). Another was that the databases were set to AUTOCLOSE—an option not intended for use on a production SQL Server with many users. Note that miscommunication and confusion about the exact problem can hinder you in investigating problems such as this one, so make sure you understand your system's configuration fully.
Finally, to reward Danny for his hard work, I showed him an undocumented command that would let him write one statement to test the time SQL Server took to access each database through the has_dbaccess() function. The undocumented procedure sp_MSforeachdb automatically sets up a loop that lets you list each database name or use each database name in a query, using a question mark as a placeholder for the database name. For example, the following statement prints the name of each database on your SQL Server:
EXEC sp_MSforeachdb 'PRINT ''?'''
You can get the same results just by selecting all rows from the sysdatabases table, but I wanted to show you a simple example of how to use the procedure. This procedure is especially helpful when you want to use each database name as an argument to a function or procedure.
The following code returns the result of the has_dbaccess() function for each database, then selects the current date and time:
If you compare the milliseconds values in the date/time data for each database, you can tell which databases took longer than others to return the value of has_dbaccess(). Although using the undocumented procedure isn't necessary, it lets you learn something new about SQL Server and stretch your brain in figuring out other places where the procedure might be useful.