The Pseudo-Parallel Approach to Purging Non-Partitioned Data

When you have related tables that need to be purged, particularly when foreign key constraints are involved, you typically need to run the purges for each table serially so that the data is deleted from the first table, then the second table, and so on, in the correct order. For small volumes of data, running purges serially is usually acceptable. However, for large volumes, you might need to break the work into pieces and purge the pieces concurrently (i.e., in parallel) where possible.

In Oracle, parallelizing operations is easy. SQL Server doesn't offer this capability, but you can run pseudo, or "fake," parallel operations. The pseudo-parallel approach works when you have only application-enforced foreign key relationships or very limited numbers of database-enforced foreign key relationships between the tables. In some systems, you might have a mix of both application-enforced and database enforced constraints, which would also allow you to use this approach.

The code in Listing 1 demonstrates the pseudo-parallel approach. (You can download this code by clicking the Download the Code Here button near the top of the page.) To run this code, you need to schedule multiple SQL Server jobs to run at the same time, with each job running a particular piece number.

With this design, you can greatly scale the purge process out rather than scaling it up—that is, you can spread the work across multiple threads of execution as opposed to throwing more and bigger hardware at the process. However, the pseudo-parallel approach has limited applicability due to the requirement that you must be able to split the DELETE operations into separate, unrelated, and preferably restartable processes. This approach doesn't work when there are significant numbers of foreign key relationships involved and the tables being purged don't stand alone without the presence of hard foreign key constraints, even though a given group of tables might all be part of a single database subsystem.

When you can deploy this method, though, you can achieve extremely high volume purges in a very short period of time. I've achieved delete throughput in the tens of millions of rows per hour with this design.

 

 

Web Listing 1: The Pseudo-Parallel Approach

CREATE PROCEDURE MyProc (@ParallelMode Bit = 0, @PieceNumber Tinyint = 0)

AS

SET NOCOUNT ON


IF (@ParallelMode = 0)

  BEGIN

    DELETE FROM Table1 WHERE SomeField = SomeValue

    DELETE FROM ChildTable WHERE SomeField = SomeValue

    DELETE FROM ParentTable WHERE SomeField = SomeValue

  END


IF (@ParallelMode = 1)

  BEGIN

    IF (@PieceNumber = 1)

      BEGIN

        DELETE FROM Table1 WHERE SomeField = SomeValue

      END


      IF (@PieceNumber = 2)

          BEGIN

            DELETE FROM ChildTable WHERE SomeField = SomeValue

            DELETE FROM ParentTable WHERE SomeField = SomeValue

          END

  END