As a DBA, you probably hear from users whenever they experience sluggish application performance. Users expect you to twist some knobs and make the application fly. But many application performance problems aren't caused by the configuration of the database server, OS, hardware, or network, or even by gremlins in the wiring. Most often, the application causes poor performance. In particular, the way in which applications talk to the database through ADO can cause slow performance.
ADO, Microsoft's newest API for working with various data sources, is a feature-rich object model that you can use with any OLE DB provider to communicate with the data source that the provider supports from Visual Basic (VB), Visual C++, Active Server Pages (ASP), and other development tools. Figure 1 shows ADO's position in a client/server application. I've used ADO since its first release and watched it mature into the flexible, powerful, and relatively easy-to-use tool that it is now. But developers and DBAs need to consider the ways in which using ADO affects application performance—and make allowances if they can. Although DBAs and database developers can benefit from the following six tips, I'll take the view from the server room. Also, I'll specify any differences in behavior or terminology between SQL Server 6.5 and 7.0.
1. Start with a Good Design
So, how do you make an application fast and scalable? First, design the database for the task at hand. The normalization and indexing design for a transaction-processing system might be different from the design for a decision-support system. Next, keep conversations between the application and the database as short as possible, access the network and the database server as little as possible, and make transactions short. To accomplish these objectives, you can use WHERE clauses to limit result sets that you request or rows that you modify and you can create indexes to support these operations. (For more information on indexing strategies and tuning query performance, see "Tuning SQL Server 6.5 for High-Performance Queries," and "Which Is Faster: Index Access or Table Scan?" January 2000.)
2. Understand How ADO Gets the Data
The ADO Recordset object returns result sets in a usable structure in your application, and the properties of the Recordset object can dramatically affect how the application performs. (For more information, see Ken Spencer, "The ADO Programming Handbook," page 31.) The default cursor type for an ADO Recordset is forward-only and read-only, sometimes called a firehose cursor. This isn't a true cursor but a stream of data from the server to the client. This cursor type is not scrollable or updateable. (For more information, see Morris Lewis, "How ADO Uses Cursors," September 1999). You need to capture the data in some type of control or wrap it in HTML to provide it to users. The firehose cursor's method of streaming data from the server to the client is efficient, and although using the ADO Recordset object might require more coding, you can provide good functionality to users with this type of recordset.
Let's examine the effect of using various recordset properties. Screen 1 shows a SQL Trace (for information on SQL Trace and SQL Profiler, see "The Investigative Tools," page 27) of a VB and ADO application that sent the statement
to the server. You can see the statement highlighted in the screen shot in the sp_cursoropen call. If you run the statement again with a default ADO Recordset object, the SQL Trace, which Screen 2 shows, is much different from the one you saw in Screen 1. Listing 1 shows the VB code that produced the trace in Screen 2.
Using different Recordset properties produces different traces from the same SQL statement. The VB code that produced the trace in Screen 1 used a server keyset cursor, which caused the OLE DB provider to issue calls to sp_cursoropen, sp_cursorfetch, and sp_cursorclose, which are extended stored procedures in SQL Server that manage server-side cursors. Listing 2, page 26, shows the VB code.
The calls in the trace in Screen 1 give you information about the overhead of using server-side cursors. Note that the sp_cursorfetch calls have several parameters that identify the cursor and provide other internal data. The last number is the rowset size. The default rowset size for an ADO server cursor is 1, as you can see in the rows starting with the third sp_cursorfetch call. So for each row in the result set, the application calls sp_cursorfetch to retrieve the data from that row. Often referred to as a badminton cursor, the ADO server cursor goes back and forth between the client and server until it has fetched all the rows. The default network packet size for SQL Server is 4096 bytes, so this type of cursor can generate excessive network traffic. Such row-by-row operations are also CPU-intensive. The CacheSize property of the Recordset object controls the number of rows each sp_cursorfetch call fetches. When you use server-side cursors, you need to set the CacheSize property reasonably high, depending on the expected size of result sets, to reduce the number of row-by-row operations. (A value between 100 and 500 is a good starting point.) Screen 3 shows the trace of the SELECT statement, which uses a higher cache-size setting. Listing 3 shows the code that produced the trace you see in Screen 3.
The single-row fetching activity can significantly affect how an application performs. The duration values for each sp_cursoropen and sp_cursorfetch call in a SQL trace will usually be low. What SQL Trace doesn't show is the latency in these operations between the client and server. The client application waits while a badminton cursor does its back-and-forth act. A high number of users accessing the application can consume CPU resources that would be better spent managing I/O and other operations for the SQL Server application. To demonstrate this point, I conducted a test on a table with 172 rows. I ran Windows NT Task Manager to compare the server-side cursor with no value set for cache size with the default forward-only and read-only cursor. Using the server-side cursor caused a spike in my single-processor CPU usage, as the spike on the left in Screen 4 shows. Using the default cursor caused a much smaller spike in CPU usage, as the spike on the right in Screen 4 shows. The Task Manager shows the CPU-intensive nature of row-by-row operations. You would expect this level of CPU usage from an application that performs many inserts to a SQL Server application one row at a time rather than batching insert statements or using bulk copy program (bcp), or from an application that processes one row at a time with an ANSI ( ) cursor rather than performing set-based SQL operations.
Of course, you have limited control of an application's performance if it lets users run queries that return huge result sets. Many development teams' design requirements specify adding the capability to let users browse all the rows in a table or specify developing a report generator that lets users forgo using selection criteria. As a result, reports could return thousands or tens of thousands of rows. This scenario places excessive demand on the client or server, depending on which resource handles the result sets. And resource consumption is higher for Web applications than other applications.
Returning large numbers of rows is not a scalable solution. Handling large numbers of returned rows is particularly tricky when you use ADO. The ADO Recordset object has properties that let users page through a recordset a certain number of records at a time (e.g., 10 pages of 10 records each for a 100-row recordset). This technique is common for Web applications, but it compromises scalability in ADO. If you want to use the PageSize and AbsolutePage properties to page through an ADO recordset, then SQL Server needs to return all the rows that qualify in the SQL statement before the paging can begin. In addition, SQL Server needs to maintain the entire recordset in the client memory or in the SQL Server tempdb database, depending on the setting of the Recordset object CursorLocation property.
If your application needs to return large result sets to the client, it's worth the effort to develop a method of returning only the number of rows that can fit in the screen on the user's computer monitor at one time. For more information, see the Microsoft article "Just-In-Time Record Set Creation" on the Micro-soft Developer Network (MSDN) Web site at http://msdn.microsoft.com/ library/periodic/period98/html/sql98j5.htm.
3. Watch ADO Recordset Objects and Concurrency
Another performance factor of using ADO Recordset objects is concurrency. You can specify that SQL Server resources accommodate different concurrency and table design choices. Using a timestamp column for row versions can be important when you use optimistic concurrency because SQL Server uses the tempdb database for worktables that it generates to manage key sets and concurrency. If you use optimistic concurrency on a table with a timestamp, you need to check only the key values and timestamp value to determine whether someone else updated the row while a user had it open. If you don't use a timestamp, SQL Server needs to keep a row version worktable in the tempdb database and check every value in every column to determine whether a conflict exists. (For more information, see the Microsoft article "Understanding Worktables Used by Server Side Cursors" at http://support.microsoft.com/support/kb/articles/Q168/6/78.asp.) Applications can use client disconnected or persistent ADO recordsets to help offload work from the SQL Server, but you'll need to account for the load on the client machine, too.
4. Examine Executing Statements
You use the ADO Recordset object for retrieving data, but what about UPDATE, INSERT, and DELETE activity or Data Definition Language (DDL) statements? You can use the ADO Command object for such operations, which usually don't return a result set. You can use a Command object as the source for a Recordset object, but you can also set the SQL statement, active Connection object, and other recordset options in the Recordset object properties, thus avoiding the additional overhead of creating the Command object.
For operations that don't return recordsets, I'm a big fan of Command objects because they offer distinct advantages. You can set up Command objects to return values and output parameters from stored-procedure execution and to accept input parameters. If you set the CommandType property to adCmdStoredProc, you'll get remote procedure calls (RPCs) from the client to SQL Server that bypass parameter translation through the data-access layers. These RPCs can be 20 percent to 30 percent faster than SQL Batch calls to SQL Server that are generated by the adCmdText value for the CommandType property. Screen 5 shows a SQL Profiler trace of a call to a stored procedure that inserts a row into the Authors table of the Pubs sample database. Note the message RPC Completed. Listing 4 shows the code for the ADO Command object used to execute the stored procedure.
5. Beware of Prepared Statements and Temporary Stored Procedures
You can use prepared statements to execute statements that SQL Server might reuse several times during a user session but not enough times to warrant your creating a stored procedure. You use the ADO Command object and the Prepared property to set up prepared statements, as Listing 5 shows. But beware of the possible pitfalls of using prepared statements. Applications might needlessly generate prepared statements. In SQL Server 6.5, the SQL Server ODBC driver, which creates, executes, and drops temporary stored procedures, supports prepared statements. The creation of temporary stored procedures can cause error 1105, which means you've run out of space on the system segment of the tempdb database. To avoid this problem, make sure that the Create temporary stored procedures for prepared SQL Statements check box in the ODBC data source is unchecked. Note that this selection is unavailable in SQL Server 7.0. The overhead of creating, executing, and dropping the temporary stored procedures is high, and often this procedure doesn't provide a benefit. For example, while investigating an application deadlock problem, I found that the drop statements on the temporary stored procedures were also deadlocking in the tempdb database. If you see excessive statements referring to objects called #ODBC# in a SQL Server trace, then the application might be needlessly using prepared statements. I once traced an application that had an average of 3000 temporary stored procedures in the tempdb database at a given time, all of which were used once and dropped.
SQL Server 7.0 implements prepared statements internally. A SQL Profiler trace shows prepared statements as extended stored procedures sp_prepare and sp_execute. By setting the Prepared property of an ADO Command object to true (you need to set the CommandType property to adCmdText), you tell the data-access layers and the SQL Server that you plan to reuse the query with varying parameter values during the application session. In my experience, this method works well if the prepared statement is actually reused, but it can present pitfalls, too. At a customer site, developers used an old version of ODBC and the prepare and execute functions in the ODBC API to write a C++ application. When the company upgraded to SQL Server 7.0, the application slowed down. A SQL Profiler trace showed sp_prepare and sp_execute calls where simple SELECT statements had existed before, which hampered communications between the client and SQL Server. SQL Server 7.0 ships with a utility in the MSSQL7\Binn folder called ODBCCMPT.EXE. It implements the SQL Server 6.5 prepare and execute behavior for individual applications. After I applied this fix and made sure that the ODBC data source wasn't using temporary stored procedures, the application's performance improved. To set an ODBC Data Source Name (DSN) to not create temporary stored procedures when it points to a SQL Server 7.0 server, you need to add the UseProcForPrepare value, set to 0, to the Registry entry for that DSN. (SQL Server Books Online—BOL—documents the ODBCCMPT.EXE utility.) Similar to the situation with temporary stored procedures in SQL Server 6.5, excessive calls to sp_prepare and sp_execute in a SQL Trace warrant further investigation if you think the prepared statements aren't being reused. Listing 5 shows the code for the ADO Command object to execute the stored procedure.
6. Observe, Adjust, Observe
Identifying and fixing performance problems is an iterative process of observing, adjusting, and observing. Don't just guess what's going on. Gather some empirical data, analyze it, and use it to identify the likely causes of the problem. Of course, it's best to examine performance issues early in the application development cycle, not after you've deployed the application. Don't overlook the powerful SQL Trace and SQL Profiler tools. Trace data from the SQL Server has led me to the root of performance problems within a few hours, and, in one case, without my looking at the systems but only at a SQL Trace file that was e-mailed to me. Look for slow operations in terms of duration and operations that repeat in staggering numbers. And use this article to uncover ADO programming practices that are bogging down your application. Adding hardware and adjusting the knobs in SQL Server, especially in SQL Server 7.0, usually has little effect on a poorly designed client application.