The Pseudo-Parallel Approach to Purging Non-Partitioned Data

When you have related tables that need to be purged, particularly when foreign key constraints are involved, you typically need to run the purges for each table serially so that the data is deleted from the first table, then the second table, and so on, in the correct order. For small volumes of data, running purges serially is usually acceptable. However, for large volumes, you might need to break the work into pieces and purge the pieces concurrently (i.e., in parallel) where possible.

In Oracle, parallelizing operations is easy. SQL Server doesn't offer this capability, but you can run pseudo, or "fake," parallel operations. The pseudo-parallel approach works when you have only application-enforced foreign key relationships or very limited numbers of database-enforced foreign key relationships between the tables. In some systems, you might have a mix of both application-enforced and database enforced constraints, which would also allow you to use this approach.

The code in Listing 1 demonstrates the pseudo-parallel approach. (You can download this code by clicking the Download the Code Here button near the top of the page.) To run this code, you need to schedule multiple SQL Server jobs to run at the same time, with each job running a particular piece number.

With this design, you can greatly scale the purge process out rather than scaling it up—that is, you can spread the work across multiple threads of execution as opposed to throwing more and bigger hardware at the process. However, the pseudo-parallel approach has limited applicability due to the requirement that you must be able to split the DELETE operations into separate, unrelated, and preferably restartable processes. This approach doesn't work when there are significant numbers of foreign key relationships involved and the tables being purged don't stand alone without the presence of hard foreign key constraints, even though a given group of tables might all be part of a single database subsystem.

When you can deploy this method, though, you can achieve extremely high volume purges in a very short period of time. I've achieved delete throughput in the tens of millions of rows per hour with this design.



Web Listing 1: The Pseudo-Parallel Approach

CREATE PROCEDURE MyProc (@ParallelMode Bit = 0, @PieceNumber Tinyint = 0)                              AS                              SET NOCOUNT ON                              IF (@ParallelMode = 0)                                BEGIN                                  DELETE FROM Table1 WHERE SomeField = SomeValue                                  DELETE FROM ChildTable WHERE SomeField = SomeValue                                  DELETE FROM ParentTable WHERE SomeField = SomeValue                                END                              IF (@ParallelMode = 1)                                BEGIN                                  IF (@PieceNumber = 1)                                    BEGIN                                      DELETE FROM Table1 WHERE SomeField = SomeValue                                    END                                    IF (@PieceNumber = 2)                                        BEGIN                                          DELETE FROM ChildTable WHERE SomeField = SomeValue                                          DELETE FROM ParentTable WHERE SomeField = SomeValue                                        END                                END