Using Table Partitions to Archive Old Data in OLTP Environments

Save on costs and increase query performance

What is in this article?:

  • Using Table Partitions to Archive Old Data in OLTP Environments

SQL Server table partitioning is most often used to implement a sliding window. For that purpose, the table is most frequently partitioned on a date column. This allows for efficient access to the data in a given date range. It also lets you drop aged data from the table by just moving the partition as a whole to another table using the SWITCH option. A lot has been written about this scenario, so we won’t go into additional details here. Instead, we’ll explain how to use the partitioning feature when a date column isn’t the right choice.

If you have a large OLTP database, most of the queries probably access only a very small percentage of the data. To save on costs and increase query performance, it often makes sense to move the less frequently accessed data out of the way and onto cheaper storage with less frequent backups.

One way to achieve this is by creating two versions of each table, one for the active data and one for the less used (archived) data. However, this means that all queries that need to access the archived data must now select a union of the active and archive tables. In existing applications, this could mean a lot of code changes, which isn’t always an option.

There are several ways to address this problem. Partitioning the tables is one option. In this article, we’ll show you how to set up partitioning to separate data into active and inactive data sets, and we’ll explain some of the pitfalls of this method.

First, we must determine which column to partition on. Suppose we want to partition our Customer table into active and inactive customers. If there’s a customer in the database that hasn’t ordered anything in a long time, that customer might be “eligible” for archiving. The last order date is a natural choice on which to base this decision. However, depending on the database design, that field might not exist in the Customer table. In addition, there might be other factors to prevent this customer from being archived. For example, the customer might have an outstanding balance due.

Therefore, in this case, a date value isn’t really an appropriate choice for archiving. It would be nice if a customer record and all its child records could be marked as ready to be archived. The easiest way to achieve this is to add an ArchiveIndicator column to all tables, which should be included in the archiving process. This indicator would be set according to a set of business rules in a weekly or monthly running job. After that indicator exists, it can be used as the partition column.

 

Using an ArchiveIndicator as a Partitioning Column

Let's look at an example. First, we’ll create a simple partition function and partition scheme. Typically, you’d want to define the partition scheme to put the two partitions on different file groups, but for testing and proof of concept, we can create them both on the same file group. Listing 1 contains the code to create both partitions on the DEFAULT file group (whatever that happens to be in your database). We’ll assume you have a database called testdb; if you don’t, you can easily create one by running the command

CREATE DATABASE testdb;

After running the code in Listing 1, we need to create a Customer table to which we can add the ArchiveIndicator column. We’ll run the code in Listing 2 to try to create the table on the psArchive partition scheme, which hopefully you’ve already created. However, our table creation attempt results in the error message that Web Figure 1 shows.

Figure 1: Error message from trying to create Customer table

Figure 1: Error message from trying to create Customer table

If we attempt to resolve the error by adding the ArchiveIndicator to the primary key, we create another problem: Now the CustomerId is unique per partition—which means that after a customer is archived, a new customer with the same CustomerId could be created in the active partition. This probably isn’t the intended behavior.

The error message makes the assumption that the index will be partitioned using the same partitioning scheme as the table, and it says that the partitioning column needs to be part of all unique indexes. However, that’s true only for a clustered unique index, which must be partitioned the same way as the table. For a nonclustered index, you can tell SQL Server to not partition the index by placing it directly on a specific file group, as the code in Listing 3 does.

 »

Discuss this Article 1

ScottSchwarze
on Mar 1, 2011
Be aware, partitioned tables and indexes are available only on the Enterprise, Developer, and Evaluation editions of SQL Server. ( http://msdn.microsoft.com/en-us/library/ms188706.aspx)

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottOur Experts will show you:
• Common SQL Server
Problems
• Best Practices for T-SQL
• SQL Server Integration
Services
• Database Development

Come See Michael Otey & Tim Ford in Person!

Early Registration Now Open

From the Blogs
May 21, 2013
blog

A Common Misconception about MAXDOP

Out of the box, SQL Server is (and has been) able to take advantage of multiple processors/cores without any effort on behalf of administrators....More
May 9, 2013
blog

My ISO 8601-Compliant Signature 2

My family recently just "officially" announced that we're in the process of adopting a child from South Africa. We're quite excited, of course, but there's a ton of paperwork to do—along with the need for gobs of signatures....More
May 8, 2013
blog

Use SSIS for ETL from Hadoop

In this blog post, Mark Kromer walks you through using SSIS as a way to use ETL techniques using Microsoft's Hadoop on Windows (HDInsight) as a source using Hive connectors...More
SQL Server Pro Forums

Get answers to questions, share tips, and engage with the SQL Server community in our Forums.