Save on costs and increase query performance
SQL Server table partitioning is most often used to implement a sliding window. For that purpose, the table is most frequently partitioned on a date column. This allows for efficient access to the data in a given date range. It also lets you drop aged data from the table by just moving the partition as a whole to another table using the SWITCH option. A lot has been written about this scenario, so we won’t go into additional details here. Instead, we’ll explain how to use the partitioning feature when a date column isn’t the right choice.
If you have a large OLTP database, most of the queries probably access only a very small percentage of the data. To save on costs and increase query performance, it often makes sense to move the less frequently accessed data out of the way and onto cheaper storage with less frequent backups.
One way to achieve this is by creating two versions of each table, one for the active data and one for the less used (archived) data. However, this means that all queries that need to access the archived data must now select a union of the active and archive tables. In existing applications, this could mean a lot of code changes, which isn’t always an option.
There are several ways to address this problem. Partitioning the tables is one option. In this article, we’ll show you how to set up partitioning to separate data into active and inactive data sets, and we’ll explain some of the pitfalls of this method.
First, we must determine which column to partition on. Suppose we want to partition our Customer table into active and inactive customers. If there’s a customer in the database that hasn’t ordered anything in a long time, that customer might be “eligible” for archiving. The last order date is a natural choice on which to base this decision. However, depending on the database design, that field might not exist in the Customer table. In addition, there might be other factors to prevent this customer from being archived. For example, the customer might have an outstanding balance due.
Therefore, in this case, a date value isn’t really an appropriate choice for archiving. It would be nice if a customer record and all its child records could be marked as ready to be archived. The easiest way to achieve this is to add an ArchiveIndicator column to all tables, which should be included in the archiving process. This indicator would be set according to a set of business rules in a weekly or monthly running job. After that indicator exists, it can be used as the partition column.
Using an ArchiveIndicator as a Partitioning Column
Let's look at an example. First, we’ll create a simple partition function and partition scheme. Typically, you’d want to define the partition scheme to put the two partitions on different file groups, but for testing and proof of concept, we can create them both on the same file group. Listing 1 contains the code to create both partitions on the DEFAULT file group (whatever that happens to be in your database). We’ll assume you have a database called testdb; if you don’t, you can easily create one by running the command
CREATE DATABASE testdb;
After running the code in Listing 1, we need to create a Customer table to which we can add the ArchiveIndicator column. We’ll run the code in Listing 2 to try to create the table on the psArchive partition scheme, which hopefully you’ve already created. However, our table creation attempt results in the error message that Web Figure 1 shows.
Figure 1: Error message from trying to create Customer table
If we attempt to resolve the error by adding the ArchiveIndicator to the primary key, we create another problem: Now the CustomerId is unique per partition—which means that after a customer is archived, a new customer with the same CustomerId could be created in the active partition. This probably isn’t the intended behavior.
The error message makes the assumption that the index will be partitioned using the same partitioning scheme as the table, and it says that the partitioning column needs to be part of all unique indexes. However, that’s true only for a clustered unique index, which must be partitioned the same way as the table. For a nonclustered index, you can tell SQL Server to not partition the index by placing it directly on a specific file group, as the code in Listing 3 does.