Building and maintaining logical relationships between tables are fundamental parts of working with relational databases. Most databases must maintain certain relationships or the data will be logically corrupt. When such relationships exist, we say that the data has referential integrity. One table is the referenced table and the other is the referencing table; values in the referencing table must match values in the referenced table. (Some people call these tables parent and child tables, but that terminology implies a hierarchy, which the relational model avoids.) SQL Server can automatically enforce referential integrity through foreign key constraints that you define. This feature is called Declarative Referential Integrity (DRI) because it's part of the table definition. You can also use other features, such as triggers, to enforce relationships; this use is procedural referential integrity. In this article, I look at how you can enforce referential integrity in SQL Server, paying particular attention to the relevant new features in SQL Server 2000.
SQL Server 7.0 and earlier releases have only one way to deal with attempted violations of foreign key constraints. If anyone tries to modify data on a table in a way that would violate the referential integrity (as defined through foreign keys), SQL Server disallows that modification and returns an error message. SQL Server 2000 has a new cascade function that can deal with referential integrity violations in another way, as I'll explain.
First, let's look at a quick example to clarify what referential integrity is all about. The Northwind database has a table called Orders and one called Order Details. In the Orders table, the OrderId column is the primary key that uniquely identifies each order. The Order Details table also has an OrderId column, but in this table, the column is a foreign key that must match an existing OrderId in the Orders table. In this example, the Orders table is the referenced table and the Order Details table is the referencing table. If you've set up a foreign key constraint to enforce the relationship between Orders and Order Details, SQL Server verifies that modifying either of those two tables doesn't violate the relationship. For example, if you try to delete a row from the Orders table when the OrderId in that row exists in the Order Details table, the deletion will violate the referential integrity constraint. Trying to update an OrderId column in the Orders table when the original value exists in Order Details but the new value doesn't is also a violation. And, SQL Server must verify every insertion into Order Details to ensure that the new OrderId exists in the Orders table and must verify every update of the OrderId column in Order Details.
The ANSI SQL-92 standard contains the concept of a referential action. Sometimes, instead of preventing a data-modification operation that would violate a foreign key reference, you might want the database system to perform another, compensating action that allows the modification and still honors the constraint. For example, if you delete an Orders table row that Order Details references, you could instruct SQL Server to automatically delete all related Order Details rows (i.e., cascade the delete to Order Details). That way, you can modify the Orders table without violating the constraint.
The ANSI standard defines four possible referential actions that apply to deletes from or updates to the referenced table: NO ACTION, CASCADE, SET DEFAULT, and SET NULL. The NO ACTION option, which is the ANSI-standard default, prevents the modification. CASCADE allows a delete or update of all matching rows in the referencing table. SET DEFAULT lets the delete or update occur but sets all foreign key values in the referencing table to a default value. And SET NULL allows the delete or update and sets all foreign key values in the referencing table to NULL.
Note that these actions apply only to modifications to the referenced table. When modifying the referencing table, you have only one possible referential integrity action: If you insert or update a value in the foreign key column, the new value must match a value in the referenced table, or SQL Server will reject the modification.
Enforcing Referential Integrity
SQL Server releases before SQL Server 6.0 don't let you declare primary and foreign key relationships in your table definitions. All referential integrity validation must take place through triggers, which you have to code in T-SQL. SQL Server 6.0 introduced constraints, including primary and foreign key constraints, but limits referential actions to NO ACTION. SQL Server 7.0 and 6.5 also offer only the NO ACTION option (without the NO ACTION syntax).
SQL Server 2000 has supported both CASCADE and NO ACTION referential actions since beta 1. Listing 1 shows the syntax for creating two simple tables: Table1 and Table2. Table2 has a foreign key that references Table1; the referential action is CASCADE for updates and NO ACTION for delete operations. NO ACTION is the default, so specifying it explicitly in the constraint definition is optional. The listing then attempts insertions into the two tables, including an insertion into Table2 that violates the foreign key relationship. SQL Server rejects the violation and returns the following error message:
Server: Msg 547, Level 16, State 1, Line 1INSERT statement conflicted with COLUMN FOREIGN KEY
constraint 'FK_Table2_Table1_a'. The conflict occurred in
database 'pubs', table 'Table1', column 'a'.The statement has been terminated.
The code then tries to delete a referenced row from Table1, fails, and returns the following error message:
Server: Msg 547, Level 16, State 1, Line 1
DELETE statement conflicted with COLUMN REFERENCE constraint
'FK_Table2_Table1_a'. The conflict occurred in database 'pubs',
table 'Table2', column 'a'.
The statement has been terminated.
The code then successfully deletes from Table1 a row that Table2 doesn't reference. Finally, the code updates all primary key values in Table1, cascading the changes to Table 2.
You can also define foreign keys through the SQL Server Enterprise Manager. In the left pane, open the database, then select Tables. When the right pane lists all the tables, right-click the referencing table, and select Design Table. Select the Table and Index Properties button on the toolbar, and select the Relationships tab. Figure 1 shows the definition of a foreign key from Table2 to Table1. The dialog box lets you name the constraint and choose the primary key (referenced) table and the foreign key (referencing) table. Drop-down lists let you choose which column from each table will make up the key. At the bottom of the dialog box, you can choose to maintain the relationship between the tables by applying a CASCADE operation for updates to the primary key, deletes to the primary key, or both. When a box is clear, the default behavior of NO ACTION (disallowing the change) is in effect.
As I mentioned, in releases earlier than SQL Server 6.0, triggers are the only way to perform referential actions. In SQL Server 7.0 and 6.5, you need to use triggers to perform any action other than the default NO ACTION. In SQL Server 2000, triggers still have some valuable functions for enforcing referential integrity.
Even with support for the CASCADE action, SQL Server 2000 can perform only two of the four ANSI-defined referential actions. To delete rows from a referenced table and have the referencing table replace the foreign key with a default value or NULL, you still must use triggers. Listing 2 shows a simple trigger that sets all referencing values in Table2 to NULL in the case of a delete from Table1.
You can also use triggers to make error messages friendlier. The preceding section showed the error messages that SQL Server generates automatically when you violate a foreign key constraint; they aren't pretty. If you write triggers to check referential integrity, you can produce any error message you want.
If you use triggers to enforce referential integrity, keep a couple of things in mind. First, you can't combine foreign key constraints and triggers for the same primary key/foreign key relationship. Referential integrity applies to modifications to both tables. Second, you can't use a foreign key constraint to ensure that new rows inserted into the referencing table are valid and also expect a trigger to cascade any updates or deletes to the referenced table. SQL Server lets you create both the foreign key constraint and the trigger. But if you try to delete a referenced row in the referenced table, SQL Server will check the constraint first, determine that deleting the row would violate the referential integrity, and disallow the delete. Thus, any delete trigger on the referenced table will never fire. If you decide to enforce referential integrity with triggers, you need to write two triggers: one on the referenced table to handle deletes and updates, and one on the referencing table to handle inserts and updates. You still might want to declare the foreign key relationship to clarify the relationship between the tables. Just use ALTER TABLE's NOCHECK option to ensure that SQL Server won't enforce the constraint; then the trigger will fire.
A drawback of using triggers to enforce referential integrity is that you can have problems handling updates to multiple rows in the referenced table. For example, suppose that you want all primary keys in Table1 to be multiples of 10 instead of single-digit integers. Issuing an update to Table1 to multiply all primary key values by 10 would be simple, but writing a trigger to propagate that change to all the foreign keys would be a difficult, or even impossible, task. SQL Server uses a pseudo-table called inserted to keep track of all new rows that an update statement generates. All old versions of the changed rows are in the pseudo-table deleted. These pseudo-tables are accessible only inside a trigger.
To write a trigger to propagate the changes, you need to know which rows in inserted correspond to which rows in deleted. An update trigger to propagate the updated values in Table1 would need to update Table2. And for each row in Table2 that has a value matching a value in deleted, you'd have to change the value in Table2 to the corresponding value in inserted. Deleted might have the values 1, 2, and 3, and inserted might have the values 10, 20, and 30.
Programmatically, you can't associate the rows in the two pseudo-tables with each other. Some people try to use cursors to step through the rows in both tables, assuming that the first row in inserted corresponds to the first row in deleted. However, relational database management systems (RDBMSs) such as SQL Server have no concept of a first row; to them, a table is an unordered set of rows. You can't use triggers to propagate multiple updates of a primary key from a referenced table to a referencing table, unless another column in the referenced table is also unique. You could then use this unique column to join the inserted and deleted tables to determine which old values correspond to which new values.
Don't let this limitation deter you from considering triggers if SQL Server's foreign key capabilities don't meet your needs. Only in rare situations will you need to update multiple primary keys in one update statement, and sometimes you might want to disallow all updates to primary keys. If you need to update multiple primary keys, you could write a special script for that purpose. You could use the ALTER TABLE command to temporarily disable your triggers, then write the code that updates both the referenced table and the referencing table to the new values.
SQL Server 2000 introduces a new trigger capability that also relates to foreign keys. Triggers in SQL Server 7.0 and earlier releases are after triggers—that is, they execute after the data-modification statement takes effect. If you have a delete trigger on Table1, for example, the deleted rows aren't part of Table1 while the trigger is active; they've already been deleted. Other RDBMSs have before triggers, which specify actions for the database system to perform before modifying the data. SQL Server 2000 has a similar feature, called instead-of triggers, which differ from before triggers in that they replace the data-modification operation. If you have an instead-of trigger for a delete on Table1 and you try to execute a delete statement on Table1, the instead-of trigger will fire. The delete won't happen unless the instead-of trigger reissues the delete statement.
I won't go into all the details of instead-of triggers, but they're intended primarily for modifying views that you can't otherwise change, such as views spanning multiple tables. Usually, you can't delete from a view that is based on a join. However, an instead-of trigger on the view lets you execute separate deletes on each table the view is based on.
Be aware that you can't combine instead-of triggers and foreign keys with the CASCADE referential action. If you define any foreign keys with CASCADE on a table, you get the following error message if you try to create an instead-of trigger for the same action.
Server: Msg 2113, Level 16, State 1, Procedure Table2_IOT_update, Line 5
Cannot CREATE INSTEAD OF DELETE or UPDATE TRIGGER 'Table2_IOT_update'
on table 'Table2' because the table has a foreign key with cascaded DELETE
This restriction means that if the CASCADE action is for updates, you can't also have an instead-of trigger for updates, but you can have instead-of triggers for inserts or deletes. Similarly, if you have an instead-of trigger on Table2, you can't alter the table to add a foreign key constraint with the CASCADE action for the same data-modification operation.
Working with Foreign Key Constraints
Here are a few additional details about foreign key constraints. In general, a foreign key in one table references a primary key in another table. You can use a foreign key constraint to reference a column with a Unique constraint (instead of one with a primary key constraint). However, referencing a primary key is more typical and is generally better practice.
Using identical column names in tables involved in a foreign key reference isn't necessary, but it's often good practice. Consider a Customers table that an Orders table references. The cust_id and location_num column names are defined in the Customers table. The Orders table could use the column names cust_num and cust_loc, but the different column names might make the tables more cumbersome to work with.
Although the names of related columns can differ, the data types of related columns must be identical except for nullability and variable-length attributes. For example, a column of type char(10) NOT NULL can reference one of type varchar(10) NULL, but it can't reference a column of type char(12) NOT NULL because the length definition is different in the two columns. Likewise, a column of type smallint can't reference a column of type int.
Another foreign key consideration is table ownership. I recommend having one owner, preferably the database owner (DBO), own all tables in a database, but in some cases this restriction might not be preferable. The owner of a table can't declare a foreign key reference to another table unless the owner of the other table has granted REFERENCES permission to the first table owner. The owner of the first table must have REFERENCES permission even if that owner already has permission to select from the table to be referenced. This restriction prevents another user from accessing your tables without your knowledge or consent and changing the way operations are performed. (If someone is able to create a foreign key that references one of your tables, you might be disallowed from modifying your own table because the change would violate the relationship that someone else set up.) You can grant any user REFERENCES permission even if you don't also grant SELECT permission, and vice versa. The only exception is that the DBO, or any user who is a member of the db_owner role, has full default permissions on all objects in the database.
When using foreign key constraints, also consider performance and indexing. In deciding how to use foreign key relationships, you need to balance the protection they provide against the corresponding performance overhead. Be careful not to add constraints that form logically redundant relationships. Excessive use of foreign key constraints can severely degrade the performance of seemingly simple operations.
The columns you specify in foreign key constraints are often strong candidates for index creation. If you build the index with the same key order you used in the primary key or the Unique constraint of the table that the foreign key references, SQL Server can perform joins efficiently. Also, a foreign key might be a subset of the table's primary key. In the Northwind database's Order Details table, OrderId is part of the primary key and is also a foreign key. Because OrderId is part of a primary key, it's already part of an index. In the Order Details table, OrderId is the index's lead column, so building a separate index on that column alone probably isn't warranted. However, if OrderId weren't the lead column of the index, building an index on it might make sense.
Enforcing Foreign Key Constraints
SQL Server offers three options for controlling enforcement of foreign key constraints. You can add constraints after adding data, temporarily disable checking of foreign key references, or use the bulk copy program (bcp) or BULK INSERT command to initially load data and avoid checking foreign key constraints.
First, if you want to add constraints after adding data, don't define your constraints in the CREATE TABLE statement. After loading the initial data, you can add constraints by using the ALTER TABLE command. When you use ALTER TABLE to add a new foreign key constraint for a table that already has data, SQL Server by default checks the existing data to verify the constraints. If constraint violations exist, the database won't add the constraint. With large tables, such a check can take a long time.
But you do have an alternative—you can add a foreign key constraint and omit the check by specifying the WITH NOCHECK option in an ALTER TABLE statement. SQL Server will check all subsequent operations but won't check existing data. This process is sometimes called deferring a constraint. Here's an example that adds a foreign key constraint to an existing table (notice the clause WITH NOCHECK).
ALTER TABLE table2
ADD CONSTRAINT FK_Table2_Table1_a FOREIGN KEY(a)
With the second option, the table owner can temporarily disable checking of foreign key references by using the
ALTER TABLE table_name NOCHECK CONSTRAINT
statement. You can reestablish the FOREIGN KEY constraint by using
ALTER TABLE table_name CHECK CONSTRAINT constraint_name
Note that when you use this method to reenable an existing constraint, SQL Server doesn't automatically check to see that all rows still satisfy the constraint. To check the existing rows, you can simply issue a dummy update by setting a column to itself for all rows (such as UPDATE ORDERS SET cust_id = cust_id), and if any constraint violations occur, you can fix them.
The two options above are similar in their syntax and can be confusing. The main difference between the two is whether you use the word WITH. To defer a constraint, you ALTER the table WITH NOCHECK. To disable a constraint, you ALTER the TABLE and specify NOCHECK followed by the name of the constraint.
Finally, you can use bcp or the BULK INSERT command to initially load data. The BULK INSERT command and the bcp program by default don't check any foreign key constraints. You can use the CHECK_CONSTRAINTS option to override this behavior. BULK INSERT and bcp are faster than regular INSERT commands because they generally bypass the usual integrity checks and most logging. If you choose to execute bcp or BULK INSERT without checking constraints, you could later perform a dummy update to flag any violations in the existing data, as I described in the case of reenabling a constraint. Otherwise, your users might see constraint error messages when they perform update operations on preexisting data, even if they haven't changed any values.
The ability to specify a CASCADE referential action was one of the most common feature requests that the SQL Server 2000 development team received. Other database systems, including Microsoft Access, support this capability. However, carefully scrutinize your application design before adding CASCADE actions to your tables, or even before defining foreign key constraints. You might find that using application logic to enforce referential actions other than NO ACTION is more often appropriate. Although referential actions might be intuitive, ask yourself how many applications could apply this feature? How many real-world examples involve an application that is so simplistic that you would unconditionally delete (or set to the default or to NULL) all matching rows in a related table? Most applications perform some additional processing, such as asking users whether they really want to delete a customer who has open orders. The declarative nature of referential actions doesn't provide a way to hook in application logic to handle cases like these. Probably the most practical course of action is to use SQL Server's foreign key constraints to guarantee that nothing violates relationships and have each application deal with CASCADEs and other special referential actions individually.