“A good plan violently executed now is better than a perfect plan executed next week.”
- George S. Patton
Baselines and subsequent comparison metrics make it easy to determine if index modifications or code changes are truly improving performance on heavily used servers. Yet, many DBAs (especially reluctant DBAs) regularly fail to set up baselines before they undertake performance tuning efforts. In my experience, that's commonly because there's no 'perfect' solution out there that can EASILY address all of the complex variables and differences in workload, utilization, and execution that can be encountered from one 'run' or execution to the next when collecting performance metrics.
Aggregate Values Are Good Enough—And Trivial to Obtain
From a performance standpoint though, aggregate values are good enough. Or, in other words, if you can trace the number of operations executed in a given amount of time and then determine the average reads, writes, CPU-utilization, and duration—then when you make some changes it's fair to say you should expect to see decreases in whatever metric or performance aspect you're targeting.
And, happily, those metrics are trivial to obtain. To do so, here's what I do:
- Open up SQL Server Profiler. (Start > Programs > Microsoft SQL Server > Performance Tools > SQL Server Profiler.
- Create a new Trace. (File > New Trace – then connect to the server you'd like to 'baseline.')
- From the Trace Properties template, give your trace a name (like "baseline" or 'post index changes,' etc.), make sure the Standard (default) trace template is selected, and instruct the trace to be saved to file (anywhere will work).
WARNING: Profiler is an insanely powerful tool. It IS gradually being replaced by extended events (which is good news), but still remains very simple and easy to use. With all of the power it provides, however, it IS possible to do ‘dumb’ things with this tool. Specifically, every time you enable a trace on a server, you're imposing some additional code/processing and overhead (commonly referred to as the 'watcher effect.') Generally, you can assume that using Profiler will add something on the order of 1-3% processor overhead and should be pretty ‘light’ on your disks when saving to file as profiler efficiently buffers writes to disk. However, IF you 'go nuts' with the events you're watching/profiling (or grab an event that occurs multiple/tens/hundreds of times per query it IS possible to set up a situation where Profiler can quickly generate HUGE amounts of data. Likewise, I recommend that you NEVER use the 'Save to table' feature—just because it's too easy to accidentally save trace data to a table on the server you're profiling—meaning that you can create a very NASTY 'feedback loop' that can negatively impact performance.
On the Events Selection tab, uncheck everything but the RPC:Completed and SQL:BatchCompleted events (meaning that you only want to watch for T-SQL batches and RPC (remotely executed stored procedure) calls.)
- Then click Run.
- On busy servers, a flurry of queries/sprocs will fly by in the trace window. On less busy servers you’ll be able to see every batch/rpc call that is executed on the server (well, technically: on the instance you’re profiling).
- Let the trace run for a while—where a 'while' is whatever you think makes sense. On heavily used servers a 'while' can be as little as 5-10 minutes. Just remember though, that you'll want to run this same trace for ROUGHLY the same amount of time the next time you need to gather results. (Though, as we’ll see in a second, the actual amount of time really doesn't matter—too much.)
- Once the trace completes (i.e., after it’s run long enough for you to decide to stop it), you can then either copy/paste the .trc file(s) to another SQL Server (other than production) where you can run some queries directly against the files (less ideal in my mind) OR you can save the results of your trace to another SQL Server into something like a 'test' or 'dba' or 'meddling' database somewhere. (And, for the record, you don’t HAVE to save these results to a server other than your production server, but I HIGHLY recommend that you avoid saving ANYTHING to/on/against your production server that isn’t production related. So, worst case scenario: If you HAVE to save results on your production server, at least create a new/temporary database OTHER than your production databases to save this transient data—rather than letting it gum-up anything in true production databases.) And, to SAVE data from Profiler, just use the File > Save As > Trace Table option (and specify the server and database and table-name you'd like to save to).
Crunching the Numbers
Once you've collected data it’s pretty easy to run some queries over it, generate some averages and other metadata to be able to either establish a baseline or compare latest execution details against said baseline. To pull this off I use the following query—against my trace data stored in a table:
<source_table, sysname, Totals>
TotalMinutes [Total Mins],
TotalOperations [Total Ops.],
CAST((CAST(TotalOperations as decimal(18,4)) / CAST(TotalMinutes as decimal(18,4))) as decimal(18,4)) [Ops / Min.],
CAST((CAST(ISNULL(TotalCPU,0) as decimal(18,4)) / CAST(TotalOperations as decimal(18,4))) as decimal(18,4)) [CPU / Op],
CAST((CAST(ISNULL(TotalReads,0) as decimal(18,4)) / CAST(TotalOperations as decimal(18,4))) as decimal(18,4)) [Reads / Op],
CAST((CAST(ISNULL(TotalWrites,0) as decimal(18,4)) / CAST(TotalOperations as decimal(18,4))) as decimal(18,4)) [Writes / Op],
CAST((CAST(ISNULL(TotalDuration,0) as decimal(18,4)) / CAST(TotalOperations as decimal(18,4))) as decimal(18,4)) [Duration / Op]
And note that the query above uses T-SQL Template syntax to specify the name of the table you want to query (i.e., you'll have to specify the table name instead of the place-holder).
Likewise, you don't HAVE to save Profiler trace output to SQL Server in order to query it—as you can fn_trace_gettable(‘FilePath_and_Name_here.trc’) instead—but I find that typically ends up being more work than just saving results to a table within SQL. (This function is also deprecated—but so too is Profiler.)
Otherwise, once you’ve crunched your numbers, you’ll get output similar to the following:
Making Sense of the Output
Overall, the key to making this baseline work is the idea that averages a decent enough indicator of overall performance that if averages go up or down from one reading to the next, you’re able to make sense of whether any recent changes were good or bad from a performance standpoint (i.e., the expectation is that if you add indexes or made code ‘tweaks’ you should see decreased averages—but if new features or functionality for your app are released and averages for reads or writes or duration ‘shoot up’ then you know that something ugly was introduced). To that end, the primary thing you’ll want to compare from one trace to the next is the Operations/Minute value—because that’s going to be a decent indicator of whether you're comparing apples to apples or apples and oranges from one execution to the next (based on different workload characteristics, increases/decreases in activity, and so on). Or, in other words, if Operations / Minute are roughly within the same ballpark, then it should be safe to assume that the other metrics should line up well enough that any serious spikes or jumps in reads, writes, CPU, or duration might be a problem whereas any decent decreases in these values should be an indicator that your performance tuning efforts have paid off (because, all things being more or less equal, if you're seeing roughly the same number of executions per minute, but seeing less Reads and Duration (for example), then those operations are much more efficient).
In the end though, the primary benefit of the approach I've outlined here is that data collection is pretty much trivial. As such, once you've established a baseline, that baseline is only useful IF you're regularly comparing it against future/additional checks.