BI projects benefit from aggregate and rank window functions
SQL Server 2005 introduces the notion of a window, which is a user-specified selection of rows that divides a result set into partitions.You can use these partitions to perform certain calculations in relation to the current row. The current row might be a summary row or contain row-ranking values. For this reason, there are two groups of functions that you can apply to windows: aggregate window functions and ranking window functions.You can use these new window functions in business intelligence (BI) projects.To demonstrate how to use them, I created a table called Employee, which Table 1 shows.You can download the T-SQL code that creates this table from the SQL Server Magazine Web site.
Aggregate Window Functions
You use the OVER clause to specify all window functions.This clause groups the result set of a query in partitions so that each row from a partition displays separately. For example, the OVER clause in Listing 1 specifies that the corresponding aggregate function (SUM) is a window function. In this case, the result set is partitioned into four groups by the department ID (dept_id). After that, the sum of salaries is calculated for each partition. Table 2 shows the results. Note that the main difference between using the OVER clause and GROUP BY clause is that the OVER clause displays each row of a partition separately, whereas the GROUP BY clause displays only one row for each group.
You can use aggregate window functions for more complex calculations. For example, you can use two columns in a table to build separate partitions, as the query in Listing 2 shows.This query has one partition for the values in the job column and another one for the values in the dept_id column. The former calculates the sum of salaries in relation to the different types of jobs.The latter calculates the average salary by department. Table 3 shows the results of the query in Listing 2.
Aggregate window functions can be especially useful when you use them in a subquery and subsequently apply the subquery in a comparison. For example, suppose you want to find the employee with the highest salary in each department. Listing 3 shows this query, and Table 4 shows the results. Using the MAX aggregate window function, the subquery in the FROM clause determines the maximum salary value (max_sal_dept) for each department. The query then compares each department member's salary against this value. As this example shows, the OVER clause lets you use the summary row at the same time as the detailed rows in each partition.
The query in Listing 4 finds the employee with the highest salary in each job type as well as the highest salary in each department.To do so, the query uses two columns (job and dept_id) from the employee table to build two partitions.The query uses the same aggregate window function but applies it separately to the two partitions.Two applications are necessary because the highest salary in each job has nothing in common with the highest salary in each department. Table 5 shows the results of the query in Listing 4.
Ranking Window Functions
SQL Server 2005's new ranking window functions return a ranking value for each row in a partition. T-SQL supports three such functions: RANK, DENSE_RANK, and ROW_NUMBER.
You can apply the RANK function to partitions to display the rank of the row in the result set. For example, the query in Listing 5 uses the RANK function to specify the rank of the three highest-paid employees in each department. As you can see by comparing the SELECT statements in Listing 4 and Listing 5, the only difference between the aggregate window functions and the RANK function is that the RANK function determines the ranking of rows within a window partition.
The RANK function uses logical aggregation—that is, when two or more rows in a result set have the same value in the ordering column, they'll have the same rank.The subsequent row will have a rank that is one plus the number of ranks that precede the row. For example, in Table 6, there are two employees (Austin and Peters) in department 20 with the same salary. Therefore, these two employees have the same rank (1). The rank of the other employee (Kimes) from that department is 3. Because of logical aggregation, the RANK function displays numbering gaps when two or more rows have the same rank. If you don't want to display row rankings with gaps, you have to use the DENSE_RANK function. The query in Listing 6 shows how to use this function, and Table 7 shows the results.The difference between the DENSE_RANK and RANK functions can be seen in the row that displays the data for the employee named Kimes. The dept_dense column in Table 7 ranks Kimes as 2 (as opposed to 3 in the dept_rank column in Table 6), so the gap has been eliminated.
The query in Listing 6 demonstrates how to use not only the DENSE_RANK function but also the ROW_NUMBER function.The ROW_NUMBER function returns the sequential number of a row within a result set, starting at 1 for the first row. As callout A shows, the ROW_NUMBER function can use an OVER clause that doesn't include the PARTITION option. In that case, the entire table is treated as one partition. Note that the value of 9 in the dept_row# column is missing. That row contains information for the employee named May (employee_id 1119 in Table 1), who works in department 30. This employee's information isn't included because it doesn't meet the condition in the WHERE clause of the SELECT statement (i.e., the employee isn't one of the three highest paid employees in that department).
A Standardized Solution
The introduction of the aggregate and ranking window functions in SQL Server 2005 isn't an afterthought. Microsoft implemented them to adhere to the SQL:1999 standard. IBM DB2 and Oracle databases also implement these functions, which means you can use them with any of these three database systems.