Here’s a query that every DBA should have in their toolbox.
The short (or seriously over-simplified) version of how this query works is pretty simple. Each time SQL Server goes to create a new execution plan, it takes note of which indexes are available (and suitable) to help it most efficiently execute the query in question. As part of this process of evaluating which indexes exist and are suitable for use in the generation of an execution plan, SQL Server also keeps tabs of columns (or combinations of columns) where no index exists but where, if one did, SQL Server could create a better plan. Or, in other words, as SQL Server is busy creating execution plans, it’s almost like it says: "Hmm. Sure would be NICE if there were an index across this column—or across these 3 columns ordered like so and with this other column included."
But, while SQL Server IS able to take note of these recommendations, it’s not allowed to CREATE indexes by itself (and that’s a VERY good thing). Instead, details about these index recommendations end up getting dropped into the sys.dm_db_missing_index* DMVs. These recommendations are also incorporated into the execution plans themselves. As you might suspect, that means that there are potentially gobs of scripts or options out there for querying and analyzing this information.
In my case, all I’ve done is correlate information about missing indexes against execution costs and the number of times an execution plan has been run (i.e., execution counts). Grabbing costs can be a tiny bit of a pain to extrapolate because it requires the use of XPATH (meaning that the query below is going to be slow and clunky on production systems—but it’s designed not to block or cause problems). In many ways, this is really just an extension of the technique I outlined previously on how to find your most expensive queries. The difference, though, of course is that this query limits output to potential problem areas that SQL Server has already identified.
-- based off of a combination of: -- Jason Strate's Excellent Missing Indexes queries to get the Execution Plans: -- http://www.jasonstrate.com/2010/12/can-you-dig-it-missing-indexes/ -- And my own technique of grabbing the costs and multiplying them -- by the Execution Count to get Aggregate costs: -- http://sqlmag.com/blog/performance-tip-find-your-most-expensive-queries -- TODO: need to figure out a way to -- a) get a list of indexes by impact, then -- b) cross-apply their plans and get costs. SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; GO WITH XMLNAMESPACES(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'), PlanMissingIndexes AS ( SELECT query_plan, usecounts FROM sys.dm_exec_cached_plans cp CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp WHERE qp.query_plan.exist('//MissingIndexes') = 1 ), MissingIndexes AS ( SELECT stmt_xml.value('(QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex/@Database)' , 'sysname') AS DatabaseName, stmt_xml.value('(QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex/@Schema)' , 'sysname') AS SchemaName, stmt_xml.value('(QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex/@Table)' , 'sysname') AS TableName, stmt_xml.value('(QueryPlan/MissingIndexes/MissingIndexGroup/@Impact)' , 'float') AS Impact, ISNULL(CAST(stmt_xml.value('(@StatementSubTreeCost)' , 'VARCHAR(128)') as float),0) AS Cost, pmi.usecounts UseCounts, STUFF((SELECT DISTINCT ', ' + c.value('(@Name)', 'sysname') FROM stmt_xml.nodes('//ColumnGroup') AS t(cg) CROSS APPLY cg.nodes('Column') AS r(c) WHERE cg.value('(@Usage)', 'sysname') = 'EQUALITY' FOR XML PATH('')), 1, 2, '') AS equality_columns ,STUFF((SELECT DISTINCT ', ' + c.value('(@Name)', 'sysname') FROM stmt_xml.nodes('//ColumnGroup') AS t(cg) CROSS APPLY cg.nodes('Column') AS r(c) WHERE cg.value('(@Usage)', 'sysname') = 'INEQUALITY' FOR XML PATH('')), 1, 2, '') AS inequality_columns ,STUFF((SELECT DISTINCT ', ' + c.value('(@Name)', 'sysname') FROM stmt_xml.nodes('//ColumnGroup') AS t(cg) CROSS APPLY cg.nodes('Column') AS r(c) WHERE cg.value('(@Usage)', 'sysname') = 'INCLUDE' FOR XML PATH('')), 1, 2, '') AS include_columns ,query_plan ,stmt_xml.value('(@StatementText)', 'varchar(4000)') AS sql_text FROM PlanMissingIndexes pmi CROSS APPLY query_plan.nodes('//StmtSimple') AS stmt(stmt_xml) WHERE stmt_xml.exist('QueryPlan/MissingIndexes') = 1 ) SELECT TOP 200 DatabaseName, SchemaName, TableName, equality_columns, inequality_columns, include_columns, usecounts, Cost, Cost * UseCounts [AggregateCost], Impact, query_plan FROM MissingIndexes ORDER BY Cost * usecounts DESC;
Of course, with everything in SQL Server, there are a few caveats.
The first and most important caveat is that you should never EVER, EVER, EVER just assume that you can take the output of this query and build a routine that’ll dynamically process the results and start giving SQL Server all of the indexes it thinks it needs. Such a thought is ingenious—but, as my favorite bumper sticker says: "Don’t believe everything you think." Or in other words:
Finally, an obvious limitation of the query above—or the approach it uses—is that it only works against queries that are in the cache. If you’ve got queries that are somehow called at high rates and are missing indexes, but are also being regularly evicted from the cache (i.e., recompiles, stats changes, etc.), then you can MISS those missing indexes with the query above. So, use the query above as a tool (frankly, a powerful one), but don’t assume that it’s the ONLY tool you can or should use.