Editor's Note: Send your T-SQL questions to SQL Server MVP Itzik Ben-Gan at firstname.lastname@example.org.
I want to write a GROUP BY query against the Sales table in the Pubs database that returns the monthly sales total and the grand sales total in the same result. I used an expression both in the SELECT list and in the GROUP BY clause that formats the order date stored in the ord_date column as yyyy_mm (e.g., 2001_06 for June 2001). I also used the ROLLUP option to retrieve another row that holds the grand total. When I ran the query that Listing 1 shows, the statement executed successfully. However, the ROLLUP option, which generated the additional row containing the grand total, placed a NULL value in the YearMonth result column. To specify the word Total (instead of NULL) as the YearMonth value for the row that the ROLLUP option generated, I revised the query by using a CASE expression and the GROUPING() function in the SELECT list, as the query in Listing 2 shows. Typically, the GROUPING() function returns 1 when the ROLLUP option generates the aggregation and 0 when the row is a base aggregation that the GROUP BY option generates, so I expected that the revision would be fairly straightforward. However, Listing 2's query failed. How can I revise Listing 2's query to get the results I want?
First, let's look at two simple queries to explain why Listing 2's statement returns an error, then build the query you want. Consider the query against the Pubs database's Sales table that Listing 3 shows. The query, which performs a GROUP BY on the base ord_num column and concatenates the letter A to the ord_num column in the SELECT list, runs successfully. Note that the query specifies the ord_num column alone in the GROUP BY clause and is part of an expression in the SELECT list. However, Listing 4 shows a query that fails, even though the query is similar to Listing 3's query. Listing 4's statement generates the following error message:
<i>Column 'Sales.ord_num' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.</i>
The difference between the two queries is apparent. The query in Listing 3 specifies the ord_num base column alone in the GROUP BY clause, but the base column is part of a calculation in Listing 4's GROUP BY clause. T-SQL doesn't support embedding a non-base-column expression from the GROUP BY clause in a more complex expression in the SELECT list. You can use a derived table to circumvent this restriction, as Listing 5 shows. Now, the original expression in Listing 4's GROUP BY clauseâ€”LEFT(ord_num, 1)â€”is the base column lord_num in Listing 5's derived table, a subtle difference that lets Listing 5's query run successfully. To repair Listing 2's query, you can use the same derived-table trick, which Listing 6, page 14, illustrates. The T derived table generates the YearMonth result column, which the outer query considers a base column. Listing 5 and Listing 6 show that T-SQL lets you specify a base column in the GROUP BY clause and embed it in a more complex expression in the SELECT list. Figure 1, page 14, shows Listing 6's output.