Creating Horizontally Partitioned Views

Improve query performance and maintain high data availability

Downloads
8234.zip

Every database management system (DBMS) developer eventually encounters a situation in which a table stores a huge amount of historical data but users retrieve only small, logically distinct portions at a time. For example, an international trading firm's database might have tables containing millions of records' worth of trades spanning a year or more, but users typically need to retrieve data for only one month at a time. To improve query performance, a DBA or a developer often splits a large table into separate tables with the same structure, each containing data restricted to a given logical separation. For instance, the trading firm's database could take the form of 12 tables, one for each month. If the yearly total was 120 million records, you might expect an average of 10 million records per table. Most users will issue queries that span a month or less, so they can choose the monthly table they want to query and sort through 10 million records instead of 120 million.

Querying smaller tables is more efficient, but it complicates the creation of front-end applications and reporting tools that need the hard-coded table names to relate to a particular query. Worse yet, if any queries span more than a month, creating programs in a robust fashion becomes almost impossible without arbitrarily applying UNION statements throughout the code.

Fortunately, SQL Server 7.0 has a feature known as horizontally partitioned views, which provides a highly effective alternative to that complexity. A horizontally partitioned view lets you divide a large table into smaller sub-tables yet provide only one view as the means of interacting with all the tables. A user or developer needs to query only one view to extract records from any of the sub-tables. In addition to the convenient interface, this type of partitioned view provides a considerable performance boost.

However, besides a brief description in SQL Server Books Online (BOL) and a small mention in the SQL Server resource kit, this useful and elegantly implemented feature is surprisingly sparsely documented. To remedy that lack, let's look at what horizontally partitioned views are, how to implement them, and how to distribute data into the correct tables automatically.

Create Horizontally Partitioned Views

To improve query performance, you could split a large table, such as the one with 120 million records, into 12 more-manageably sized tables, each containing one month's records. Because the table maintains the same structure but is partitioned along row-level boundaries—in this case, by month—it is horizontally instead of vertically partitioned. To treat all these tables as one object, you need to rejoin them with a partitioned view. The view definition in Listing 1 shows the fundamental structure of a partitioned view. Issuing queries against one object instead of 12 is one big advantage this method offers when you're creating applications and reports.

For the query plan to work most efficiently, the SQL Server engine needs a guarantee that each table contains only the data it was intended to contain. To obtain this guarantee, use check constraints. You need to create each table with the constraints in the manner that Listing 2 illustrates. The listing includes table and index creation data-definition language (DDL), but note that the check constraint is designed to ensure that the table includes only data for a particular month—January, in this example. This check constraint on the trade date guarantees that each table contains only trades for a given month. Now, when someone issues a query such as

SELECT * FROM V_TRADES
WHERE
TRADEDATE  BETWEEN '1/10/1999 AND '1/15/1999'
AND   TRADETYPE = 'B'

the SQL Server engine will quickly examine all the tables that compose the view and determine through the check constraints that only the TRADE_011999 table might contain records that match the date portion of the WHERE clause. The engine will then query the TRADE_011999 table without attempting to select any records from the other tables that form part of the union.

Populate the Tables

After you've established partitioning as a method for seamlessly improving query performance, you still need a way to place the trades in the correct table. Conceivably, trade data might come in from the end of the previous month, from the current month, or from deferred trades that occurred 2 months ago. For the table population to work as seamlessly as the queries, you must create a stored procedure that can tell which table the record belongs in based on information in the record—in this case, the trade date. Because the table name the insert statements will use is variable, the most efficient way to create a stored procedure for this purpose is by using the sp_executesql system stored procedure, as Listing 3 shows. The example in Screen 1 shows a query on data from February 1999. The query plan shows that SQL Server will scan only the February table and check only the month constraint for the other tables.

A partitioned structure for large tables provides some benefits that go beyond query performance. By partitioning tables, you can easily archive old data in removable media as the months progress. For example, if you divide a 7GB table into 12 tables of 600MB each, you can store each month on a CD-ROM. Without this partitioning, an organization faces the daunting prospect of copying data from the main table to an archive table and then deleting the copied rows. In a system that runs 24 x 7, the locks such an operation generates might render the database inaccessible for an unacceptably long time. Horizontal partitioning helps you maintain high availability without costly overhead.

Discuss this Article 5

marcelo (not verified)
on Sep 21, 2002
I created a partitioned view on 12 tables, one for each month. The primary key of each table is (Date, id) and is clustered. I created a check constraint on date for each table. I did some tests to check the execution time of the same query over the table with data of the whole year and the partitioned view created. For my surprise the second was much bigger! Checking the execution plan of the query over the view it makes a index seek for all the tables, including the ones with date out of the between range. Is ir right? How to make sure The view I created is considered partitioned?
Zeev (not verified)
on May 2, 2001
Where can i Fined Listing 1 and Listing 2 ??
david freibrun (not verified)
on Aug 28, 2000
Do horizontally partitioned views work with a large number of tables? Say, over 100? I created the view with just 10 tables and defined contraints on the tables and the results were considerably slower than when I was querying an individual table. I need to find out if I am missing something.
asomar (not verified)
on Jul 28, 2004
I creared partitioned view, i see performance gain for reading data but performance degrades for my inserts.
LUCY KHOVYEVA (not verified)
on Aug 20, 2001
Hi, My name is Lucy. I've read your article about creating horizontally partitioned views (article dated April 2000 in SQL Server magazine). I working with an application which has a similar problem- large table, but user retrieves only small amount of data. Of course performance is horrible. Partition views would be definitely a solution for our intent to improve performance. Your article was really helpful, but it leaves a few questions unanswered. I've tried to search for an answer on web, but could not find anything. You would be a real help if you just take a few minutes and answer on two of my questions. 1. What is the maximum number of tables can be used in single horizontally partitioned view? 2. Does horizontally partitioned view technique works with queries where inner join is used for specifying the criteria? Ex: select * from vwJobAnalysis inner join tblPeriod on vwJobAnalysis.per_end_dt =tblPeriod.per_end_dt It seems that SQL Server scans all tables even when only one table has a data that matches the criteria in table tblPeriod. (I've created a few tables, all tables are have the same structure and no indexes. Each table has a constraint on field "per_end_dt" equals to specific period). SQL Server scans only one table if a change my syntax to : SELECT * FROM vwJobAnalysis WHERE per_end_dt IN ("08/15/2001") Please let me know if I did something wrong. -Thanks in advance. Lucy Khovyeva.

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottOur Experts will show you:
• Common SQL Server
Problems
• Best Practices for T-SQL
• SQL Server Integration
Services
• Database Development

Come See Mike Otey & Tim Ford in Person!

Early Registration Now Open

From the Blogs
May 9, 2013
blog

My ISO 8601-Compliant Signature 2

My family recently just "officially" announced that we're in the process of adopting a child from South Africa. We're quite excited, of course, but there's a ton of paperwork to do—along with the need for gobs of signatures....More
May 8, 2013
blog

Use SSIS for ETL from Hadoop

In this blog post, Mark Kromer walks you through using SSIS as a way to use ETL techniques using Microsoft's Hadoop on Windows (HDInsight) as a source using Hive connectors...More
Vision road sign
May 6, 2013
blog

Cheaters Never Win, Even in TPC Benchmarks

In this portion of the series on database benchmarking, I want to tell you about one of my favorite aspects of the TPC benchmarks – CHEATING....More
SQL Server Pro Forums

Get answers to questions, share tips, and engage with the SQL Server community in our Forums.