Easing database maintenance through abstractions
Most applications start with a clean, clear database model. But over time, the application specifications change. Perhaps a second application starts using the database and you need to add new fields, or one field value becomes a selection of several fields. As your applications' data needs grow and change, you eventually find that you need to change the database schema. But you have a problem: You need to maintain compatibility with existing applications or earlier application versions while supporting new requirements.
Most DBAs have dealt with this kind of database-evolution problem. In modifying a database, you create version 1 and version 2 tables. You orphan existing fields and replace them with new fields, and no one knows exactly where the data is. You create extension tables. Soon, your nice, clean database model is a confused mess of extraneous fields, tables, and indexes that no one quite knows how to deal with. And the chief reason you can't clean up the data model is that your existing applications are tied directly to the original database table schema.
You can solve many of these problems by taking advantage of a technique that many programming languages use: They create interfaces to mask the underlying code from parts of the program. You can use similar mechanisms—in this case, stored procedures and views—to create an interface to your data structures that abstracts (i.e., hides) the underlying schema from the programs that access that data. Using this method to abstract your database schema simplifies data processing and provides DBAs and developers with a cleaner-looking schema. In addition, abstraction improves application performance and lets you enhance security.
A Can of Worms
Let's look at an example that illustrates the kind of tangle you might encounter when making database changes. Say you maintain a database that has a table called Employees. Ten years ago, when you originally designed the database, the specification called for the table to hold one phone number in the Phone field, as Figure 1 shows. At the time, this schema made sense because most people had only one phone number. But employees now have home phones, cell phones, pagers, and fax machines. Some of your new applications require all those numbers, but you don't want to break the original application, which expects to find the old Phone field populated.
You can approach this problem in a few ways. One approach is to leave the Phone field in the Employees table and use that field to store employees' primary phone number. You can then create a new PhoneNumbers table to store secondary numbers such as fax, cell, and other alternatives. However, you now have phone numbers in two tables and must remember to gather data from both when creating a list of employee phone numbers. You also have to put the phone numbers in the correct tables. The main problem with this approach is synchronizing the data in the two tables.
You can solve the synchronization problem by copying the default number to the PhoneNumbers table. You can use triggers to maintain and synchronize both tables. The data will be denormalized (i.e., you'll have the same data in both the PhoneNumbers and Employees tables). But because an update to the default number in either table also updates the other table, you'll increase locking and slow data modification within the application. In the Employees table, this problem isn't likely to be severe, but in tables that have high transaction rates, the behavior could become more problematic.
A better approach might be to remove the Phone field from the Employees table, then create a separate PhoneNumbers table and store all phone numbers in that table, as Figure 2 shows. With this approach, applications need to access only one table to get phone numbers. You don't have the extra overhead of transaction locking, and your database model remains straightforward. The only problem with this approach is that the existing application will no longer find the Phone field in the Employees table. To ease this transition, you can hide the change from the existing application by creating an interface between the application and the base tables. That's where stored procedures and views can come to the rescue.
Using Stored Procedures with Applications
Your applications can execute T-SQL stored procedures to select, insert, update, and delete data. OLE DB, ADO, and ADO.NET support the use of stored procedures to store and retrieve data. Because the stored procedures can perform joins, they can mask changes to the underlying database schema. Stored procedures are precompiled in SQL Server, so they perform better than ad hoc SQL statements, which SQL Server has to parse and compile on the fly. Also, you can use stored procedures as a security mechanism. You can grant users permission to execute a particular stored procedure without granting them rights to select, insert, update, or delete data directly from the underlying table. This method lets you add business logic or check security entitlements before modifying data.
The key to using stored procedures without breaking existing applications is to maintain the same interface. That means the procedure must provide the same input parameters and output values or resultsets as the original database schema. The code in Listing 1 creates a GetEmployeeInfo stored procedure for the original application to retrieve information from the Employees table. You can modify this stored procedure to work with a new database schema without affecting the existing application; you just need to be sure the input parameters and the result set remain the same. As Listing 2 shows, the revised stored procedure takes the same input parameters as the stored procedure in Listing 1. Listing 2's procedure returns the same record set. The field names and positions are the same as in the original stored procedure. But the CASE statement you use for the phone number lets the procedure return a null value if no default phone number is set up for the specified employee. This change is transparent to the client.
Insert, update, and delete procedures would follow the same interface rules. For example, Listing 3 shows the code to create the original UpdateEmployeeInfo stored procedure; you can replace this code with the code that Listing 4 shows. The revised stored procedure uses a CASE statement to look for area codes because although the original application didn't store the area code as a separate field, the new PhoneNumbers table does. The CASE statement looks for an area code and extracts it if it's present. The stored procedure lets you modify the way data is stored without changing the way people currently use the application. The code in Listing 4 abstracts the complex maintenance of the normalized tables that store multiple phone numbers. As long as the stored procedures you use provide the same input and output as the original procedures, what they do internally doesn't affect the application directly. This method lets you change the schema of your database to meet changing requirements without recompiling your applications. You're investing some extra work in your revised stored procedures, but you don't need to modify and recompile the application that uses your database.
Substituting Views for Tables
The chief difficulty in using stored procedures in place of direct table access is that the application must be coded to take advantage of stored procedures. You might be able to recompile the application's code once you've written your stored procedures. But if you purchased shrink-wrapped software or don't have access to the application's source code, you can't easily shift from direct table access to stored procedures. The answer in this situation might be to use views.
Applications treat views as if they were tables. If you create a view that has the name of an existing table and the same columns as that table, an application will be none the wiser. Because the view lets you perform joins and calculate fields, it can abstract the data from the database schema. For example, you could replace the original Employees table with the Employees view that Listing 5 shows. To make this view work, you'd create a new table called Employee_Base and move the Employees table data to the new table. Employee_Base can have a different schema than Employees. The application accesses the Employee database object and is unaware of whether the object is a view or a table. The Employees view would then join all the rows in Employee_Base to the appropriate column in the PhoneNumbers table. Note that the view performs the join without making the application aware of the change. The view uses a left join rather than an inner join. If no default phone number is defined for the employee, the file will return a null.
The Employees view provides the same columns as the Employees table, but it has two drawbacks. First, the view won't have the same indexes as the table, which can impair performance. Second, the view doesn't let you use simple INSERT, UPDATE, or DELETE commands to automatically maintain the Employee_Base, EmployeePhoneNumbers, and PhoneNumbers tables.
You can solve the first problem by creating an indexed view. Indexed views are materialized in the database much as a table is: All the data in the view is written to the disk drive as an index. That means an indexed view is also a form of denormalization that eliminates the need to perform table joins when retrieving the data. Be aware that maintaining an indexed view will add overhead to any inserts, updates, or deletes you perform on the base tables that the view depends on. Column indexing is also limited in an indexed view, and you can't use some operators—such as TOP, UNION, or DISTINCT—in the SELECT statement that forms the view. The biggest advantage of the indexed view is speed. Because the view is materialized and indexed, it will respond as fast as a table. If your application is more heavily skewed to selecting data than modifying data, indexed views might be a viable option. To learn more about indexed views, see the "Creating an Indexed View" section of SQL Server 2000 Books Online (BOL).
To solve the second problem—modifying data—be aware that you can modify data in a view as long as that data is from only one table. The original application could insert a new row into the Employees table by executing the following T-SQL text:
INSERT INTO EMPLOYEES( Salutation, FirstName, LastName, Suffix, Address1, Address2, City, State, ZipCode, Phone, SSN) VALUE('Mr.', 'Mark', 'Scott', NULL, '123 Main Street', NULL 'Anytown', 'US', '000-555-1212', '123-45-6789')
So that your view will properly execute this INSERT statement, you must create INSTEAD OF triggers to handle the modification. The INSTEAD OF trigger lets you execute custom code when you execute an insert, update, or delete. Listing 6 shows an example of an INSTEAD OF INSERT trigger that you could apply to the Employees view.
The process of replacing a table with a view can be complicated. You must first determine which objects—including views, functions, triggers, stored procedures, and foreign key constraints—are dependent on the base table. Although you can get a list of dependencies from Query Analyzer, the list isn't always accurate or comprehensive. To locate dependencies, I typically script the entire database and perform text searches on the name of the table to locate references to it. After you know a table's dependencies, you can create a new table, then recode and recompile the dependent objects to reference the new table. Once the dependent objects point to the new base table, you can rename the old table and build the view with the old table's name.
This approach is not for the faint of heart. An error here will render your application dysfunctional. Make certain that you complete each step carefully. And be sure you're using the same best practices you use whenever you make significant changes to your database schema: Test each change, one at a time, and be sure you can reverse each change to go back to the point at which everything worked.
Keeping Databases in Shape
Maintaining databases is easier when you're working with simple, clean schemas. By abstracting data structures from the applications that reference them, you can modify your data tables without recompiling your applications. Microsoft uses a form of this methodology in the information schema views, which provide views of the data in SQL Server system tables. Maintaining the most effective schema for storing your data will reduce database maintenance and help you optimize performance. Abstracting the data table structures from your application code helps you keep optimal structures while limiting the effect on your applications.