Depleting Quantities

I’ll start with a challenge involving the computation of running totals that need to be depleted when a certain condition is met. Microsoft Data Platform MVP Geri Reshef presented the original challenge; later, Sharon Rimer of Naya Technologies presented a variation of the challenge.

The sample data for this challenge involves a table called Transactions, which you create and populate using the following code:

`                              SET NOCOUNT ON;                              USE tempdb;                                                            DROP TABLE IF EXISTS dbo.Transactions;                              GO                              CREATE TABLE dbo.Transactions                              (                                txid INT NOT NULL CONSTRAINT PK_Transactions PRIMARY KEY,                                qty  INT NOT NULL                              );                              GO                                                            TRUNCATE TABLE dbo.Transactions;                                                            INSERT INTO dbo.Transactions(txid, qty)                                VALUES(1,2),(2,5),(3,4),(4,1),(5,10),(6,3),                                      (7,1),(8,2),(9,1),(10,2),(11,1),(12,9);`

The transactions keep adding quantities of some item to a container based on txid ordering. As soon as the cumulative quantity exceeds the container’s capacity (given as input), the container needs to be depleted. Your solution needs to show the state of the container (total quantity) after every transaction, with 0 as the total quantity after depletion, as well as the depletion quantity when relevant. Here’s the desired result for the given sample data and an input container capacity of 5:

`                              txid        qty         totalqty    depletionqty                              ----------- ----------- ----------- ------------                              1           2           2           0                              2           5           0           7                              3           4           4           0                              4           1           5           0                              5           10          0           15                              6           3           3           0                              7           1           4           0                              8           2           0           6                              9           1           1           0                              10          2           3           0                              11          1           4           0                              12          9           0           13`

Solution Based on Cursor

It’s common for people to try and solve such tasks using recursive queries. Such solutions can be elegant, but aren’t very efficient. People also solve such tasks using a method known as “quirky update,” which is very efficient, but not guaranteed since it relies on physical processing order. So far, I haven’t found efficient, guaranteed, set-based solutions for this task, and instead have been reluctantly using iterative solutions (T-SQL- or CLR-based).

Here’s an example for a simple iterative T-SQL solution using a cursor:

`                              SET NOCOUNT ON;                                                            DECLARE @maxallowedqty AS INT = 5;                                                            DECLARE @C AS CURSOR, @txid AS INT, @qty AS INT, @totalqty AS INT = 0, @depletionqty AS INT = 0;                                                            DECLARE @Result AS TABLE                              (                                txid         INT NOT NULL PRIMARY KEY,                                qty          INT NOT NULL,                                totalqty     INT NOT NULL,                                depletionqty INT NOT NULL                              );                                                            SET @C = CURSOR FORWARD_ONLY STATIC READ_ONLY FOR                                SELECT txid, qty                                FROM dbo.Transactions                                ORDER BY txid;                                                            OPEN @C;                                                            FETCH NEXT FROM @C INTO @txid, @qty;                                                            WHILE @@FETCH_STATUS = 0                              BEGIN                                SELECT @totalqty += @qty, @depletionqty = 0;                                                              IF @totalqty > @maxallowedqty                                BEGIN                                  SET @depletionqty = @totalqty;                                  SET @totalqty = 0;                                END;                                                              INSERT INTO @Result(txid, qty, totalqty, depletionqty)                                  VALUES(@txid, @qty, @totalqty, @depletionqty);                                                              FETCH NEXT FROM @C INTO @txid, @qty;                              END;                                                            SELECT txid, qty, totalqty, depletionqty                              FROM @Result                              ORDER BY txid;`

The code defines a cursor variable and uses it to fetch the transactions one at a time in chronological order. It accumulates the quantities in a variable called @totalqty. After every row fetched, the code checks if the cumulative quantity exceeds the container capacity, in which case it sets a variable called @depletionqty to the current @totalqty value, and then zeroes the current @totalqty value. The code then writes the current transaction info (txid and qty) along with the current @totalqty and @depletionqty values to a table variable. Once done iterating through all transactions, the code queries the table variable to produce the desired result.

Solution Using RESET WHEN (Not Supported as of SQL Server 2017)

The downsides of using iterative solutions are well known. The question is always whether a descent set-based alternative exists. So far, I haven’t found one myself for the task at hand with the existing tools that we have in T-SQL, and would love to see one created some day. As mentioned, Teradata supports a windowing clause called RESET WHEN that resets the window partition when a certain condition is met. The beauty in this clause is that the condition can use a window function, so you can ask about what accumulated until the previous row. With our task you would reset the window partition when the sum of the quantities from the beginning of the partition and until the previous row exceeds the input container limit, like so:

(Remember, this code currently isn’t supported in SQL Server.)

`                              DECLARE @maxallowedqty AS INT = 5;                                                            SELECT txid, qty,                                SUM(qty) OVER(ORDER BY txid                                              RESET WHEN                                                -- reset window partition when                                                 -- running sum until previous row > @maxallowedqty                                                SUM(qty) OVER(ORDER BY txid                                                              ROWS BETWEEN UNBOUNDED PRECEDING                                                                       AND 1 PRECEDING)                                                  > @maxallowedqty                                              ROWS UNBOUNDED PRECEDING) AS runsum                              FROM dbo.Transactions;`

Had this code been supported, it would have produced the following output:

`                              txid        qty         runsum                                 ----------- ----------- -----------                              1           2           2                                        2           5           7                                        -----------------------------------                              3           4           4                                        4           1           5                                        5           10          15                                        -----------------------------------                              6           3           3                                        7           1           4                                        8           2           6                                        -----------------------------------                              9           1           1                                        10          2           3                                        11          1           4                                        12          9           13          `

As you can see, the window partition is reset after transaction IDs 2, 5 and 8.

To produce the final desired result, you set the total container quantity (call it totalqty) to zero when the running sum exceeds the container limit, and to the running sum otherwise. You then compute the depletion quantity (call it depletionqty) as the running sum minus the total quantity. Here’s the complete solution’s code:

`                              DECLARE @maxallowedqty AS INT = 5;                                                            WITH C AS                              (                                SELECT *,                                  SUM(qty) OVER(ORDER BY txid                                                RESET WHEN                                                  SUM(qty) OVER(ORDER BY txid                                                                ROWS BETWEEN UNBOUNDED PRECEDING                                                                         AND 1 PRECEDING)                                                    > @maxallowedqty                                                ROWS UNBOUNDED PRECEDING) AS runsum                                FROM dbo.Transactions                              )                              SELECT txid, qty, totalqty, runsum - totalqty AS depletionqty                              FROM C                                CROSS APPLY                                  ( VALUES( CASE WHEN runsum > @maxallowedqty THEN 0 ELSE runsum END ) )                                    AS A(totalqty);`

As you can see, the solution is simple, concise and elegant!

Recently, Sharon Rimer of Naya Technologies introduced a variation of this challenge based on one of the company's customer cases where the desired result was to compute how many times the container exceeded the input limit. To handle this need, you would keep the same definition for the CTE C, and use the following outer query to compute the count:

`                              SELECT COUNT( CASE WHEN runsum > @maxallowedqty THEN 1 END ) AS timesexceeded                              FROM C;`

The complete solution would look like this:

`                              WITH C AS                              (                                SELECT *,                                  SUM(qty) OVER(ORDER BY txid                                                RESET WHEN                                                  SUM(qty) OVER(ORDER BY txid                                                                ROWS BETWEEN UNBOUNDED PRECEDING                                                                         AND 1 PRECEDING)                                                    > @maxallowedqty                                                ROWS UNBOUNDED PRECEDING) AS runsum                                FROM dbo.Transactions                              )                              SELECT COUNT( CASE WHEN runsum > @maxallowedqty THEN 1 END ) AS timesexceeded                              FROM C;`

Again, short and elegant.

Complex Islands

There are many other challenges that perhaps do currently have a reasonable T-SQL set-based solution, but that can be solved more easily and elegantly with the RESET WHEN clause. A good example is handling complex islands tasks where you need to define a new island whenever a condition that compares something from the current row and something from the previous row is met. To demonstrate such a challenge, I’ll use a table called Stocks, which you create and populate by running the following code:

`                              SET NOCOUNT ON;                              USE tempdb;                                                            DROP TABLE IF EXISTS dbo.StockRates;                              GO                              CREATE TABLE dbo.StockRates                              (                                stockid  INT  NOT NULL,                                dt       DATE NOT NULL,                                val      INT  NOT NULL,                                CONSTRAINT PK_StockRates PRIMARY KEY(stockid, dt)                              );                              GO                                                            INSERT INTO dbo.StockRates VALUES                                (1, '2017-08-01', 13),                                (1, '2017-08-02', 14),                                (1, '2017-08-03', 17),                                (1, '2017-08-04', 40),                                (1, '2017-08-05', 45),                                (1, '2017-08-06', 52),                                (1, '2017-08-07', 56),                                (1, '2017-08-08', 60),                                (1, '2017-08-09', 70),                                (1, '2017-08-10', 30),                                (1, '2017-08-11', 29),                                (1, '2017-08-12', 35),                                (1, '2017-08-13', 40),                                (1, '2017-08-14', 45),                                (1, '2017-08-15', 60),                                (1, '2017-08-16', 60),                                (1, '2017-08-17', 55),                                (1, '2017-08-18', 60),                                (1, '2017-08-19', 20),                                (1, '2017-08-20', 15),                                (1, '2017-08-21', 20),                                (1, '2017-08-22', 30),                                (1, '2017-08-23', 40),                                (1, '2017-08-24', 20),                                (1, '2017-08-25', 60),                                (1, '2017-08-26', 80),                                (1, '2017-08-27', 70),                                (1, '2017-08-28', 70),                                (1, '2017-08-29', 40),                                (1, '2017-08-30', 30),                                (1, '2017-08-31', 10),                                (2, '2017-08-01', 3),                                (2, '2017-08-02', 4),                                (2, '2017-08-03', 7),                                (2, '2017-08-04', 30),                                (2, '2017-08-05', 35),                                (2, '2017-08-06', 42),                                (2, '2017-08-07', 46),                                (2, '2017-08-08', 50),                                (2, '2017-08-09', 60),                                (2, '2017-08-10', 20),                                (2, '2017-08-11', 19),                                (2, '2017-08-12', 25),                                (2, '2017-08-13', 30),                                (2, '2017-08-14', 35),                                (2, '2017-08-15', 50),                                (2, '2017-08-16', 50),                                (2, '2017-08-17', 45),                                (2, '2017-08-18', 50),                                (2, '2017-08-19', 10),                                (2, '2017-08-20', 5),                                (2, '2017-08-21', 10),                                (2, '2017-08-22', 20),                                (2, '2017-08-23', 30),                                (2, '2017-08-24', 10),                                (2, '2017-08-25', 50),                                (2, '2017-08-26', 70),                                (2, '2017-08-27', 60),                                (2, '2017-08-28', 60),                                (2, '2017-08-29', 30),                                (2, '2017-08-30', 20),                                (2, '2017-08-31', 1);`

In this table, you keep track of daily closing stock rates. Say you need to identify periods (islands) where the stock value is greater than or equal to 50, and for each period you need to show when it started and ended, as well as what the maximum stock rate was during that period. The catch is that you need to tolerate (ignore) gaps of up to 6 days. Here’s the desired result for the given sample data:

`                              stockid     startdate  enddate    maxvalue                              ----------- ---------- ---------- -----------                              1           2017-08-06 2017-08-18 70                              1           2017-08-25 2017-08-28 80                              2           2017-08-08 2017-08-18 60                              2           2017-08-25 2017-08-28 70`

Observe, for example, that for the first island for stock ID 1 we ignored the gap between Aug. 9, 2017, and Aug. 15, 2017, since it wasn’t longer than 6 days, but we didn’t ignore the gap between Aug. 18, 2017, and Aug. 25, 2017, since it was 7-days long.

Currently Supported T-SQL Solution

As mentioned, there are currently supported set-based solutions for islands challenges like this one, but they are longer and more complex than how you would handle such tasks with the RESET WHEN clause. With one of the currently supported solutions, as the first step you compute a flag (call it isstart) that you set to 0 when it’s not the beginning of an island by comparing something from the current row and with something from the previous row (obtained with the LAG function). Otherwise, you set the flag to 1. In our case, after filtering only rows where the stock value is greater than or equal to 50, you set the flag to 0 when the difference between the previous date and the current date is less than 7 days; otherwise, you set the flag to 1. Here’s the code that implements this step:

`                              SELECT stockid, dt, val,                                CASE                                  WHEN DATEDIFF(day, LAG(dt) OVER(PARTITION BY stockid ORDER BY dt), dt)                                         < 7                                    THEN 0                                  ELSE 1                                END AS isstart                              FROM dbo.StockRates                              WHERE val >= 50;`

This code generates the following output:

`                              stockid     dt         val         isstart                              ----------- ---------- ----------- -----------                              1           2017-08-06 52          1                              1           2017-08-07 56          0                              1           2017-08-08 60          0                              1           2017-08-09 70          0                              1           2017-08-15 60          0                              1           2017-08-16 60          0                              1           2017-08-17 55          0                              1           2017-08-18 60          0                              1           2017-08-25 60          1                              1           2017-08-26 80          0                              1           2017-08-27 70          0                              1           2017-08-28 70          0                              2           2017-08-08 50          1                              2           2017-08-09 60          0                              2           2017-08-15 50          0                              2           2017-08-16 50          0                              2           2017-08-18 50          0                              2           2017-08-25 50          1                              2           2017-08-26 70          0                              2           2017-08-27 60          0                              2           2017-08-28 60          0`

As a second step, you produce an island identifier by computing the running sum of the isstart flag. Finally, you group the data by stockid and isstart, and return the start and end dates of the island, plus the maximum stock rate during the period. Here’s the complete solution’s code:

`                              WITH C1 AS                              (                                SELECT *,                                  CASE                                    WHEN DATEDIFF(day, LAG(dt) OVER(PARTITION BY stockid ORDER BY dt), dt)                                           < 7                                      THEN 0                                    ELSE 1                                  END AS isstart                                FROM dbo.StockRates                                WHERE val >= 50                              ),                              C2 AS                              (                                SELECT *,                                  SUM(isstart) OVER(PARTITION BY stockid ORDER BY dt                                                    ROWS UNBOUNDED PRECEDING) AS grp                                FROM C1                              )                              SELECT stockid,                                MIN(dt) AS startdate,                                MAX(dt) AS enddate,                                MAX(val) as maxvalue                              FROM C2                              GROUP BY stockid, grp                              ORDER BY stockid, startdate;`

Solution Using RESET WHEN (Not Supported as of SQL Server 2017)

Using the RESET WHEN clause, it would have been easier to solve the task since you can simply reset the window partition when the condition for starting a new island is met. You could then use the minimum date in the partition is the island identifier.

Here’s an example of how you would compute the island identifier (call it grp):

`                              SELECT stockid, dt, val,                                MIN(dt) OVER(PARTITION BY stockid                                             ORDER BY dt                                             RESET WHEN DATEDIFF(day,                                                MIN(dt) OVER(                                                 PARTITION BY stockid ORDER BY dt                                                 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING),                                               dt) >= 7                                             ROWS UNBOUNDED PRECEDING) AS grp                              FROM dbo.Stocks                              WHERE val >= 50;`

Here the code uses the MIN window function to obtain the date from the previous row by using the window frame extent ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING. Alternatively, you could get the date from the previous row with the LAG function, like so (showing only the alternative RESET WHEN clause):

`                              RESET WHEN DATEDIFF(day,                                 LAG(dt) OVER(PARTITION BY stockid ORDER BY dt),                                dt) >= 7`

Here’s what the output of this step would be, showing the computed group identifier for each island:

`                              stockid     dt         val         grp                              ----------- ---------- ----------- -----------                              1           2017-08-06 52          2017-08-06                              1           2017-08-07 56          2017-08-06                              1           2017-08-08 60          2017-08-06                              1           2017-08-09 70          2017-08-06                              1           2017-08-15 60          2017-08-06                              1           2017-08-16 60          2017-08-06                              1           2017-08-17 55          2017-08-06                              1           2017-08-18 60          2017-08-06                              1           2017-08-25 60          2017-08-25                              1           2017-08-26 80          2017-08-25                              1           2017-08-27 70          2017-08-25                              1           2017-08-28 70          2017-08-25                              2           2017-08-08 50          2017-08-08                              2           2017-08-09 60          2017-08-08                              2           2017-08-15 50          2017-08-08                              2           2017-08-16 50          2017-08-08                              2           2017-08-18 50          2017-08-08                              2           2017-08-25 50          2017-08-25                              2           2017-08-26 70          2017-08-25                              2           2017-08-27 60          2017-08-25                              2           2017-08-28 60          2017-08-25`

Then you would need just one more step to group this data by stockid and grp, and return the beginning and end of each period, plus the maximum stock rate within the period, like so:

`                              WITH C AS                              (                                SELECT *,                                  MIN(dt) OVER(PARTITION BY stockid                                               ORDER BY dt                                               RESET WHEN DATEDIFF(day,                                                  LAG(dt) OVER(PARTITION BY stockid ORDER BY dt),                                                 dt) >= 7                                               ROWS UNBOUNDED PRECEDING) AS grp                                FROM dbo.Stocks                                WHERE val >= 50                              )                              SELECT stockid,                                grp AS startdate,                                MAX(dt) AS enddate,                                MAX(val) as maxvalue                              FROM C                              GROUP BY stockid, grp                              ORDER BY stockid, grp;`

As you can see, this solution is shorter and simpler than the currently supported one.

Conclusion

Window functions are profound; there’s no question about it. Wouldn’t you like to see them becoming even more powerful? The RESET WHEN clause, if implemented in T-SQL, will doubtless increase its utility and enable us to replace existing iterative solutions with elegant and more efficient set-based ones. If you find that the addition of this feature would be beneficial to you, be sure to add your vote to the connect item requesting it.