125370.zip

My friend Roy Harvey occasionally sends me interesting T-SQL challenges because he knows I enjoy them. Roy recently challenged me to write T-SQL code to return the Academy Award’s Best Picture winner, using the Academy’s new voting rules for 2010. In this article I describe the logic behind picking the winner, I provide sample data and desired results, and I include my own solution. As always, I urge you to come up with your own solution before looking at mine.

### The Rules

This year the Academy changed the rules from previous years for determining the Best Picture award—and the new rules are far from straightforward. The new rules are summarized in the Los Angeles Times article “Academy will use a new method to tabulate best-picture ballots”  and the BBC News article “Oscars 2010: Best picture voting changes explained.” My understanding of the rules based on these articles follows. Note that the articles fail to address a couple of scenarios, but for the sake of Roy’s challenge I just picked certain rules. Of course, it’s important as we implement these rules in T-SQL that we don’t leave any room for ambiguity in the solution and that we address even improbable cases.

In previous years there were five nominees for the Best Picture award, and the winner was simply the one that had the highest percent of the votes. Now there are 10 nominees, and the rules are far more complicated.

According to the new procedure, voters complete ballots on which they rank the nominated movies from 1 to 10—although they don’t have to fill all 10 slots, and many voters probably don’t.

The process the Academy uses to determine the winner is quite involved. To start, the Academy arranges the voters’ first choice in piles by movie. Note that the LA Times and BBC articles don’t indicate what happens if some movies are left out in the first round due to zero votes as voters’ first choice—perhaps because this scenario is very unlikely with 6,000 Academy members voting, or because the implication is obvious. But in order not to leave any room for doubt, let’s assume that in such a case those movies are removed from contention.

Back to the existing piles, if there’s a movie with more than 50 percent of the votes (which is unlikely in the first round), it’s declared the winner. Otherwise, the movie (or movies in case of ties) with the smallest pile is removed from contention. The votes associated with the removed movies are distributed between the remaining movies based on those voters’ next applicable vote (i.e., next vote on the ballot for a movie that remains in contention). If there’s no next applicable vote on the ballot, it isn’t assigned to a remaining movie.

This process is then repeated until one movie is left with more than 50 percent of the votes. That movie is declared the winner of the Best Picture award.

Note that the LA Times and BBC articles don’t discuss a scenario in which all the remaining movies in contention are tied, probably because this situation is very unlikely. But let’s assume that if such a situation arises, our solution is to stop the process, notify that all the remaining movies are tied, and announce those movies as winners.

### Sample Data and Desired Results

Use the code in Listing 1 to create the Movies table and the Votes table, and fill the Movies table with 10 movies (represented by the letters A through J).

Next, I provide three sets of sample data and describe the desired results your code should produce for each. Use the following code to populate the Votes table with Sample Data 1:

`TRUNCATE TABLE dbo.Votes;INSERT INTO dbo.Votes(voter, rnk, movie) VALUES  (1, 1, 'A'),(2, 1, 'A'),(3, 1, 'A'),(4, 1, 'B'),(5, 1, 'B'),  (6, 1, 'C'),(6, 2, 'D'),(6, 3, 'B'),  (7, 1, 'D'),(7, 2, 'E'),(7, 3, 'B'),  (8, 1, 'A'),(9, 1, 'B'),(10, 1, 'C'),(11, 1, 'E');`

The desired result from your code for Sample Data 1 is Winner is B with 60.00 percent after 3 iterations.

For Sample Data 2, just clear the Votes table to test how your code behaves when the input is an empty set:

`TRUNCATE TABLE dbo.Votes;`

The desired output is:

`Empty input or unknown error.`

For Sample Data 3, use the following code:

`TRUNCATE TABLE dbo.Votes;INSERT INTO dbo.Votes(voter, rnk, movie) VALUES  (1, 1, 'A'),(2, 1, 'A'),(3, 1, 'A'),(4, 1, 'B'),(5, 1, 'B'),  (6, 1, 'C'),(6, 2, 'B');`

In this case, the desired output is:

`Ties between the following movies after 2 iterations:movie-----AB`

### My Solution

Listing 2 contains my complete solution. The code in callout A declares a few scalar local variables, as well as table variables. In each round the scalar variables @top_pct, @top_movie, and @bottom_pct hold the percent associated with the top movie, the top movie, and the percent associated with the bottom movie, respectively. These variables allow you to determine whether a movie passed the 50 percent threshold, as well as whether all the remaining movies are tied. The variable @iteration keeps track of the number of iterations, to report how many iterations it took to determine the result.

As for the table variables, @CurrentVotes keeps track of the current applicable votes in each round. @BottomVotes holds the votes that are removed from contention in each round (associated with the smallest pile or piles). @MovieTotals holds a row for each movie with the calculated percent of the votes in each round.

The code in callout B processes the votes in the first round. The first statement in callout B inserts the voters’ first choice into @CurrentVotes. The second statement queries @CurrentVotes, calculates the percent of the votes for each movie, and inserts the results into @MovieTotals. Note the interesting use of the expression 1. * COUNT(*) / SUM(COUNT(*)) OVER() to calculate the percent of the votes per movie. Because the query groups the rows by movie, the COUNT(*) expressions are calculated per movie. But the SUM function is calculated in the context of a window defined by the OVER clause. An OVER clause with empty parentheses represents the entire result set, and therefore SUM(COUNT(*)) OVER() means the total of the counts for all groups. The use of 1. (read “one dot”) is to force implicit conversion of the following integer count to a numeric type; otherwise you get integer division. So the expression calculates the percent of the movie count of votes out of the total counts for all movies.

The third statement in callout B is an assignment SELECT query against @MovieTotals designed to assign the scalar local variables @top_pct, @top_movie, and @bottom_pct with the percent of the votes associated with the top movie, the top movie, and the percent associated with the bottom movie, respectively. The assignment of the minimum and maximum percents to @top_pct and @bottom_pct are straightforward. The assignment of the top movie to @top_movie is less trivial. The expression used for this purpose is RIGHT( MAX(CAST(pct AS CHAR(10)) + movie) ,  1). The idea is to convert the percent to a fixed-sized character string and concatenate the movie, then apply the MAX aggregate to get the concatenated percent and movie associated with the movie with the highest percent, then use the RIGHT function to extract the rightmost character representing the movie. In short, the expression returns the movie with the highest percent of the votes. All aggregates are calculated using one pass over the table (never mind that the table is tiny in this case). Finally, the last statement in callout B initializes the @iteration variable with 1.

The code in callout C enters a loop that iterates when a resolution isn’t determined, and in each iteration, removes the votes of the bottom movie(s) and redistributes those voters’ next applicable choice. The loop condition is to keep iterating when a clear winner isn’t found (i.e., the top percent is less than or equal to 50 percent), as well as when all the remaining movies aren’t tied (i.e., top percent is different than bottom percent).

The first statement in the loop’s body deletes all votes from @CurrentVotes that are associated with the movie(s) with the bottom percent. The DELETE statement uses the OUTPUT clause to insert the deleted rows into @BottomVotes.

The second statement in the loop’s body distributes the next applicable choice of the removed voters among the movies that remain in contention. The query joins Votes and @BottomVotes to identify the removed voters’ choices ranked after the removed votes, and then in the WHERE clause filters only the applicable ones (i.e., those associated with movies that remain in contention). Out of the applicable votes the query then uses a combination of TOP (1) WITH TIES and a ROW_NUMBER expression, partitioned by voter and ordered by rnk, in the query’s ORDER BY clause, to return for each voter only the first of the remaining applicable choices. This technique isn’t trivial—you might want to spend some time to figure out its logic.

The code in callout D processes the votes in the current round. It starts by clearing the @MovieTotals table, then issues identical INSERT and SELECT statements to the ones used in callout B to calculate the percent per movie and obtain the top percent, top movie, and bottom percent. The code in callout D then increments the iteration counter and closes the loop body.

The code in callout E announces the results. If the top percent is greater than 50 percent, the code announces the winner. If the top percent is equal to the bottom percent, the code indicates that all remaining movies are tied and presents those movies. Otherwise, the code announces that either the input was empty or some unknown error occurred.