Editor's Note: This is the final article in a three-part series about SQL Server 2000's distributed partitioned views.
If you want to be able to update distributed partitioned views, the underlying tables and the views you built on them must meet certain requirements. Our first article in this series, "Distributed Partitioned Views in SQL Server 2000," August 2000, described those requirements. If you need to create partitioned views that don't meet the requirements for updates—for example, if the partitioning column isn't part of the underlying table's primary key or the view doesn't include all the base tables' columns—you can create INSTEAD OF triggers to modify the view. However, the query optimizer's plans for views with INSTEAD OF triggers might be less efficient than its plans for updateable views because some of the new optimizing techniques depend on the rules that make the views updateable.
Creating the Triggers
Suppose you don't want to include the customerid column in the primary key of the Orders view's underlying tables because you prefer a smaller primary key that consists of only the orderid column. Listing 1 shows the code that lets you drop and recreate the primary key in all the partitioned tables that participate in the Orders view. To improve query performance when you search by the customerid column, you probably still want an index on customerid, as Listing 1 shows. Now try to run the insert statement that Listing 2 shows. You'll receive the error message
<i>UNION ALL view 'Orders' is not updatable because a partitioning
column was not found.</i>
The view is no longer updateable because it doesn't meet a main requirement of updateable partitioned views—that the partitioning column be part of the primary key.
To allow updates against the Orders view from all the nodes, you need to create a set of INSTEAD OF triggers on the Orders view on all the nodes. Listing 3 shows an INSTEAD OF INSERT trigger that fires instead of the insert statement.
The trigger examines the inserted virtual table, which holds all the rows that you want to insert into the view. The INSTEAD OF INSERT trigger first determines whether all the new rows are within the boundaries of the check constraints we placed on the customerid column in all of the partitioned tables. If rows exist outside those boundaries, the trigger rolls back the insert statement. Next, the trigger performs three insert statements, each of which inserts the relevant slice of rows into the relevant table by filtering the rows in the inserted table. The trigger's filter uses the same check constraint criteria as the table into which you're inserting rows. Before each insert, the trigger checks whether the qualifying rows exist. In our example, we perform the existence check locally to avoid performing an unnecessary remote query. Notice that all the modifications occur in a distributed transaction. Also notice that the trigger uses only the table name to reference the local table but uses the full, four-part table names to reference remote tables.
Listing 4 shows an INSTEAD OF DELETE trigger. Because we aren't adding data that might be illegal, the delete trigger doesn't need to determine whether the delete statement deletes rows within the boundaries of all the check constraints on the customerid column in the underlying tables. The INSTEAD OF DELETE trigger includes three delete statements, one for each table in the distributed view. Each delete statement performs a join between one of the underlying tables and the deleted virtual table to ensure that only the orders deleted from the view are deleted from the underlying tables. For readability, we added a filter (which matches the underlying table's check constraint) to each delete statement; but the filter is redundant and unnecessary.
The INSTEAD OF UPDATE trigger is a combination of the INSTEAD OF INSERT and the INSTEAD OF DELETE triggers. As Listing 5 shows, the INSTEAD OF UPDATE trigger first deletes all the old rows by joining each underlying table to the deleted table in the same way the INSTEAD OF DELETE trigger used a join to delete rows in the view. The INSTEAD OF UPDATE trigger then inserts the new version of the rows in the same way that the INSTEAD OF INSERT trigger inserted rows into the view.
The INSTEAD OF UPDATE trigger, however, has a limitation that you should be aware of. An update statement against the Orders view will fail if any tables with foreign-key constraints reference any column that is in an underlying table. Just the presence of the foreign-key constraint won't cause the update to fail, but if you try to delete any rows that have related rows in the referencing table, you'll get a foreign-key constraint violation, and the update will fail.
A complex set of relationships causes this limitation. In our example, we split the update operation into separate deletes and inserts, and we perform the deletes first, which might result in orphaned child rows in the referencing table. However, we can't perform the inserts before the deletes because this action would violate the uniqueness of the primary keys in the partitioned tables. You could write an INSTEAD OF UPDATE trigger that would let you use a foreign key, but such a trigger would be complex, especially if you allowed updates to the primary key. The trigger also probably wouldn't perform well, so you might prefer to simply drop or disable the foreign key.
If you aren't hampered by foreign-key references, our sample INSTEAD OF UPDATE trigger supports updates to the partitioning column. For example, the delete and insert operations simply transfer to the correct table any rows that have changed their customerid values and, as a result, belong to another underlying table.
If you want to update the Orders view from any node, you need to create similar triggers on all the nodes. The only difference between the triggers you create on the different nodes is which table is the local table and which tables are remote tables. Each node's trigger references the table on that node by the table's local name and references all other tables (remote tables) by their full, four-part names.
To create the set of triggers on the other nodes, you can run the scripts that Web Listing 1 shows. (You can download this listing by entering InstantDoc ID 9734 at http://www.sqlmag.com/ and opening the 9734.zip file.) Now try to issue the modifications that the script in Listing 6 shows. This script first clears all rows from the Orders view, then repopulates the view with all orders from the Northwind database. The INSTEAD OF DELETE and INSTEAD OF INSERT triggers take care of these modifications. Next, the script uses the INSTEAD OF UPDATE trigger to update all the order IDs by setting them to 21325 - orderid. The script then selects all rows from the Orders view; you should see the same orders that appear in Northwind's orders table, except that the order of all the order IDs will be reversed (e.g., the first order ID in Northwind will be the last order ID in the Orders view).
Almost all the techniques we cover here work for local partitioned views as well as for distributed partitioned views. The only exceptions are that for local partitioned views
- you need to create only one set of triggers on one server.
- you use regular transactions instead of distributed transactions.
- you reference all the tables locally instead of using their full, four-part names.
From the query tests we performed in "Querying Distributed Partitioned Views," September 2000, you can see that the types of systems that can benefit most from distributed partitioned views are in online transaction processing (OLTP) environments and environments whose Web site databases serve many individual queries that retrieve relatively small amounts of data. The most efficient way to handle such queries is to route each query to the server that contains the most data that will satisfy that query.
Distributed partitioned views can also be your lifeline if you're experiencing an ever-increasing number of transactions on an ever-increasing amount of data. You can scale your system only so much by adding more processing power, and adding more resources to the server might not always be possible or affordable. With distributed partitioned views, you can achieve good performance with commodity hardware and simply add more servers to scale out almost linearly.
However, distributed partitioned views aren't typically suited to data warehousing environments, which usually have to process large amounts of data and access most nodes involved in the federation to satisfy a query. The cost of running distributed queries against most of the nodes might be too high to justify distributed partitioned views. In contrast, local partitioned views are well suited to data warehousing environments, and SQL Server 2000 provides a lot of features that enhance local partitioned views in data warehousing implementations.
Local partitioned views are more suitable for data warehouses because you don't have the extra overhead of trying to marshal large amounts of data among the nodes. Furthermore, by using distributed partitioned views, you limit the choice of query plans that SQL Server considers. For example, if you have a join of three tables, in which two tables are on the same remote server, the optimizer might choose to join the two remote tables first, whereas the local partitioned view might prefer a different join order.
If your partitioned system must be highly reliable, you can use Microsoft Cluster Server (MSCS) failover clustering. Each cluster consists of two to four servers that appear to the user or the application as one virtual server. You define one node as the primary node to service user requests.
Failover clustering doesn't provide load-balancing capabilities, but it does give you high availability. If the primary node fails, one of the other nodes in the cluster takes over and starts servicing user requests. The failover process is automatic and transparent to users. The recommended configuration for highly available distributed partitioned views is to have each node of the federation participate in a failover cluster consisting of at least two servers.
In multitiered environments, such as those built on Microsoft Windows Distributed interNet Applications (DNA), you should incorporate data-routing rules into the business-services tier. In the query tests we ran, the system processed queries most efficiently when we ran the query against the server that held most, or all, of the data that the query required. You can achieve such data routing by storing each server's keys in a routing table, which a COM+ business component checks to decide the destination server for a query. Applications can then call the data-routing component when they need to issue a query.
Distributed partitioned views are a powerful feature for improving SQL Server performance and scalability, but they still have some limitations. For example, distributed partitioned views don't support parallel execution plans or bulk inserts through the view. However, Microsoft is considering adding support for auto-partitioning, in which the system decides on the range of keys for each partition and migrates the data to rebalance the partitions.