Downloads
37889.zip

SQL Server's partitioned views let organizations tune their systems to handle a lot of data. This feature lets you partition data across several tables; logically uniting the tables via a view makes the partitioning as transparent as possible to users and developers. Several previous articles (including two by Kalen Delaney and me, "Distributed Partitioned Views," August 2000, InstantDoc ID 9086, and "Querying Distributed Partitioned Views," September 2000, InstantDoc ID 9097) covered partitioned views. SQL Server Books Online (BOL) also covers the foundations. In this article, I share a few tips that you might not be aware of, including the negative effect of using Enterprise Manager to make schema changes to partitions, how to circumvent the limitation on how you can insert data into the tables through the partitioned view, and problems caused by using constants in the partitioning criteria.

Altering Partitions from Enterprise Manager


I'm an avid advocate of using T-SQL even for tasks that you can perform easily through Enterprise Manager. Besides improving my T-SQL skills, this approach lets me better understand what I'm doing and have more control over it. In addition, Enterprise Manager doesn't always perform a task the most efficient way. Making schema changes to partitions from Enterprise Manager, for example, can degrade partitioning performance and functionality. To walk through this problem, first run the script in Listing 1 to create the partitions Orders2000, Orders2001, and Orders2002 and the updateable partitioned view Orders.

First, you need to make sure that Orders provides full partitioning functionality in terms of updateability and partition elimination during retrieval. Partition elimination means that SQL Server accesses only the relevant partitions for your query's filter criteria. Populate Orders with the following three orders, each of which goes to a different partition because each partition hosts rows from a different year:

INSERT INTO Orders(orderid, customerid, orderdate)
  VALUES(1, 'aaaa', '20000109')
INSERT INTO Orders(orderid, customerid, orderdate)
  VALUES(2, 'aaaa', '20010118')
INSERT INTO Orders(orderid, customerid, orderdate)
  VALUES(3, 'aaaa', '20020212')

The Orders partitioned view meets all the requirements for updateability as described in BOL, so the INSERT statements are successful. Now, to verify that partition elimination is occurring for data retrieval, turn on STATISTICS IO and, from the Query menu in Query Analyzer, turn on the Show Execution Plan option. Note that by turning on this option, you request the actual plan that the optimizer generates and uses when you run a query (as opposed to the Display Estimated Execution Plan option, which shows an estimated plan without running the query).

To test partition elimination, issue the following SELECT statement against Orders to retrieve data only for January 2000:

SET STATISTICS IO ON

SELECT * FROM Orders
WHERE orderdate >= '20000101'
  AND orderdate < '20000201'

In the STATISTICS IO result, note that SQL Server physically accessed only Orders2000; the amount of I/O against all other partitions is 0:

Table 'Orders2002'. Scan count 0, logical reads 0, ...
Table 'Orders2001'. Scan count 0, logical reads 0, ...
Table 'Orders2000'. Scan count 1, logical reads 2, ...

In the graphical execution plan, hover your mouse pointer over the three clustered index seek operators. Notice in the yellow details boxes that only the branch in the plan for the Orders2000 partition actually executed (Number of Executes: 1); the others didn't (Number of Executes: 0).

Now, suppose you have to enlarge the customerid column from varchar(5) to varchar(10). Using T-SQL, you can write a short, efficient script that runs in a split second because SQL Server doesn't need to physically access every row to enlarge a varchar column; it just expresses the change in the metadata. Running the following code performs the change:

ALTER TABLE Orders2000
  ALTER COLUMN customerid VARCHAR(10) NOT NULL
ALTER TABLE Orders2001
  ALTER COLUMN customerid VARCHAR(10) NOT NULL
ALTER TABLE Orders2002
  ALTER COLUMN customerid VARCHAR(10) NOT NULL

Try retrieving data after running this script, then rerun the query against Orders and examine the plan and the results of STATISTICS IO as you did before:

SELECT * FROM Orders
WHERE orderdate >= '20000101'
  AND orderdate < '20000201'

Notice in the results that partition elimination is taking place. Issuing an INSERT query is also successful:

INSERT INTO Orders(orderid, customerid, orderdate)
   VALUES(4, 'aaaa', '20020828')

To see how Enterprise Manager handles the change, click the Tables folder under the tempdb database where the partitions were created, right-click the Orders2000 table, and select Design Table. In the Design Table dialog box, change the customerid column size to varchar(15). Repeat this process for the Orders2001 and Orders2002 partitions. Click Save Change Script before saving your changes in one of the partitions, and examine the script SQL Server generates. You should get the (abbreviated) script that Listing 2 shows for Orders2002.

Enterprise Manager changes the column size inefficiently. First, it creates a new table, Tmp_Orders2002, with the new size for customerid. Next, Enterprise Manager copies data from Orders2002 to Tmp_Orders2002, drops Orders2002, and renames Tmp_Orders2002 to Orders2002. Finally, it adds a primary key constraint, a unique constraint, and a CHECK constraint.

The CHECK constraint requires special attention when you're dealing with partitioning. The code that adds the CHECK constraint uses the WITH NOCHECK option. This option instructs SQL Server to skip checking whether existing data meets the constraint and check only future modifications. SQL Server has no guarantee that only data from 2002 exists in the partition. SQL Server tracks whether a constraint was added WITH NOCHECK by turning on or off the twelfth bit in sysobjects.status for the constraint's row. When the bit is on, SQL Server knows that it can't trust that all the rows in the table meet the constraint's criteria.

After changing all column sizes from Enterprise Manager, try retrieving data for only 2000:

SELECT * FROM Orders
WHERE orderdate >= '20000101'
  AND orderdate < '20000201'

SQL Server accessed all partitions because it couldn't be sure what data was in each table:

Table 'Orders2002'. Scan count 1, logical reads 2, ...
Table 'Orders2001'. Scan count 1, logical reads 2, ...
Table 'Orders2000'. Scan count 1, logical reads 2, ...

Try inserting a row into the view:

INSERT INTO Orders(orderid, customerid, orderdate)
  VALUES(5, 'aaaa', '20021224')

This time, the insert fails, resulting in the following error message:

<i>Server: Msg 4436, Level 16, State 12, Line 1</i>
<i>UNION ALL view 'Orders' is not updatable because
a partitioning column was not found.</i>

To fix the problem, drop and recreate all CHECK constraints by running the code that Listing 3 shows. Now, when you retry the SELECT and INSERT queries, the partitioned view works as expected. If you use T-SQL to perform future schema changes, you won't have to fix anything.

A Trick for INSERTs


One documented limitation of an updateable partitioned view is that regardless of whether a column allows NULLs in all partitions, SQL Server requires you to specify all values in your INSERT statements. This limitation can cause problems in client applications' backward compatibility. For example, suppose you developed three applications called APP1, APP2, and APP3 that retrieve and insert data into the Orders partitioned view. All the applications have INSERT and SELECT statements like this one:

INSERT INTO Orders(orderid, customerid, orderdate)
  VALUES(...)
SELECT orderid, customerid, orderdate
FROM Orders
WHERE ...

Now, suppose that you changed one of the applications, for example APP1, to require an additional column in Orders. You add a nullable column to all the partitions and refresh the view's schema by using the following code:

ALTER TABLE Orders2000 ADD shipperid INT NULL
ALTER TABLE Orders2001 ADD shipperid INT NULL
ALTER TABLE Orders2002 ADD shipperid INT NULL
EXEC sp_refreshview Orders

You might think that this change would let APP1 manipulate shipper ID data in Orders and that because the column is nullable, the change won't affect APP2 and APP3. But as soon as APP2 or APP3 attempts to insert data as usual:

INSERT INTO Orders(orderid, customerid, orderdate)
  VALUES(6, 'aaaa', '20020418')

you get the following error:

<i>Server: Msg 4448, Level 16,
 State 17, Line 1</i>

<i>Cannot INSERT into partitioned view
'Orders' because values were not
supplied for all columns.</i>

A look in BOL confirms that this behavior is documented. The developers of SQL Server took the conservative approach in this case, requiring you to specify data for all columns. The rationale is that each partition might have different nullability properties for the same column. However, you can force SQL Server to accept INSERT statements that don't provide values for nullable columns. Just create an INSTEAD OF INSERT trigger that reinserts the data from the inserted table back into the view. Sounds strange, but it works perfectly. Running the following code creates the trigger:

CREATE TRIGGER TRG_Orders_IOI_I_Insist ON Orders INSTEAD OF INSERT
AS
INSERT INTO Orders
  SELECT * FROM inserted
GO

Now issue the previous insert that failed; this time, it's successful:

INSERT INTO Orders(orderid, customerid, orderdate)
VALUES(6, 'aaaa', '20020418')

Constant Trouble


The last partitioning tip concerns constants in the partitioning criteria that you write in your CHECK constraints. To demonstrate the problem, run the script that Listing 4 shows to create the partitions T1, T2, and T3 and the partitioned view V1. Note that the partitioning column col1 is a bigint data type (8 bytes) and that some of the constants I specified in the CHECK constraint exceed the maximum possible regular integer value (4 bytes) of 2,147,483,647.

At first glance, the partitioned view seems to meet all the requirements for partition elimination and updateability, but there's still a problem. First, try inserting a row into the view:

INSERT INTO V1 VALUES(11)

You get the following error:

<i>Server: Msg 4436, Level 16,
 State 12, Line 1</i>
<i>UNION ALL view 'V1' is not updatable
because a partitioning column was not found.</i>

But you know that a perfectly valid partitioning column exists—or does it?

To check whether retrieval works efficiently, you can insert values directly into the partitions:

INSERT INTO T1 VALUES(10)
INSERT INTO T2 VALUES(3000000000)
INSERT INTO T3 VALUES(5000000000)

then try retrieving data that's supposed to be in only one partition:

SET STATISTICS IO ON
SELECT * FROM V1 WHERE col1 = 10

The results of the graphical execution plan and of STATISTICS IO show that SQL Server accessed all partitions:

Table 'T3'. Scan count 1, logical reads 2, ...
Table 'T2'. Scan count 1, logical reads 2, ...
Table 'T1'. Scan count 1, logical reads 2, ...

SQL Server has a certain way it interprets the data type of constants that you specify. For example, SQL Server considers 10 an integer. However, it considers a whole number outside the range of regular integers (e.g., 4,000,000,000) as numeric with a scale of zero, not bigint. Now, let's consider the following expression from one of the CHECK constraints:

col1 > 4000000000

Col1 is a bigint column, but 4000000000 is numeric, so SQL Server implicitly converts the value that has the lower precedence data type to the one with the higher precedence. Checking BOL under "Precedence, Data Type Precedence" reveals that numeric (decimal) has a higher precedence than bigint. So your partitioning expression effectively becomes

CONVERT(numeric, col1) > 4000000000

One rule about the partitioning column is that you can't specify it as an argument for a function in the CHECK constraint's expression. In this case, the column implicitly became the CONVERT() function's argument, so you can't use it for partitioning. To fix the problem, explicitly convert the constants to bigint. Drop the old CHECK constraints and recreate the correct ones by running the code that Listing 5 shows. Make sure that STATISTICS IO is turned on, then try to retrieve and modify the view—it works as expected:

INSERT INTO V1 VALUES(11)
SELECT * FROM V1 WHERE col1 = 10

Imparted Wisdom


I hope I've convinced you to use T-SQL when performing schema changes or at least to be cautious when using Enterprise Manager. Be aware of what happens behind the scenes, and make sure you use T-SQL when you find that Enterprise Manager doesn't perform efficiently.

In addition, when faced with a limitation that might compromise an important part of your application, such as backward compatibility, don't give up. Always look for a way to get what you want. If you look hard enough, you'll probably find a solution.