How to bridge the gap

While developing SQL Server 7.0, Microsoft was so concerned about backward compatibility that one of its design goals included the capability for all existing application source code to run unmodified. However, as with most product upgrades, in order for the company to implement several new features, it had to sacrifice some backward compatibility. In this article, we explore some of SQL Server's internal implementation changes that resulted in potential compatibility problems and examine several situations that developers might face when upgrading applications.

Why All the Changes?

With any new software release, you expect change. SQL Server 7.0 provides change and more. Internally, SQL Server 7.0 is almost a completely new product. Why the drastic change? Put simply, the architecture and code base for SQL Server had reached their limits with the 6.5 release. The only way to achieve the long list of features and design goals for this release and still position SQL Server with enough growth room was to reengineer the internals completely. Doing so created an opportunity for Microsoft to enhance the efficiency of the query optimizer by adding new algorithms and to increase ANSI compatibility by tightening many of the Transact SQL (T-SQL) constructs. These two changes most directly contribute to the compatibility conflicts in moving to SQL Server 7.0.

Let's examine some changes in SQL Server 7.0 and how they affect compatibility. We'll categorize the issues into three groups: system-level, syntactical, and behavioral.

System-Level Compatibility

Knowing that backward compatibility is a large concern for customers, Microsoft incorporated the concept of compatibility levels into SQL Server 7.0. SQL Server 7.0 has three compatibility levels: 60 for SQL Server 6.0 databases, 65 for SQL Server 6.5 databases, and 70 for SQL Server 7.0 databases. These levels let legacy databases in the 7.0 environment operate as if they were running under an earlier release of the product.

The system stored procedure sp_dbcmptlevel lets developers report on and change the compatibility level of their databases. For example, to set the Northwind database to level 65 compatibility, execute the following command:

sp_dbcmptlevel \[Northwind\], 65

In SQL Server 7.0, the Master database must always be at the 70 compatibility level. If you add any user-defined objects to Master, you must test and ensure that they work correctly at the 70 level. However, you can set other user databases within the server to the 60 or 65 compatibility levels. This functionality lets developers migrate their existing databases to SQL Server 7.0 in phases. For example, a development environment might support a new project that uses a version 70 database while simultaneously supporting two older projects that use one database at the 60 compatibility level and another at the 65 compatibility level.

For databases at a 60 or 65 compatibility level, many of SQL Server 7.0's new features are not available. Also, although SQL Server 7.0 currently supports these legacy databases, future releases might not. Therefore, we strongly recommend that you upgrade all applications and databases to support 7.0 as soon as possible.

System Tables and Information Schema Views. Microsoft has always frowned on developers directly accessing SQL Server's internal system tables. With each release of SQL Server, the company provides a disclaimer that tells users to access system tables only via one of the provided APIs: system stored procedures, SQL Distributed Management Objects (SQL-DMO), or T-SQL. The reason for this warning is that the format of the system tables depends on the internal architecture of SQL Server and might change from release to release. However, until SQL Server 7.0, Microsoft made few changes to these tables, and consequently, many developers incorporated system tables into their applications.

Developers who did not heed Microsoft's warning are in for a rude awakening. Many of the system tables' structures have drastically changed. In addition, SQL Server 7.0 storage format is natively Unicode; consequently, system meta data now uses the new Unicode data format. For example, SQL Server 7.0 identifiers are all nvarchar(128). During beta testing, system-table changes gave sites the most problems in upgrading their applications. To give some compatibility, SQL Server 7.0 provides system-level views that mimic the 6.x system tables. Although these views exist in all compatibility modes, they are for backward compatibility only and do not fully support many of the new features.

In SQL Server 7.0, meta data management is still available through system stored procedures and SQL-DMO. In addition to these two access methods, SQL Server now supports the SQL-92 definition for the Information Schema. The Information Schema is a series of read-only views that provide an internal, system table-independent view of the SQL Server meta data. Developers who need to access system information but are concerned about database portability can use these views instead of the system tables.

Syntactical Compatibility

Many syntax changes that occurred in the 7.0 release (e.g., the addition of the TOP keyword, increased data-type sizes, and increased object-name size limits) involve the addition of new features. Fortunately, those changes have little impact on your migration strategy. However, a few other changes are worth mentioning.

UPDATE and Aliases. One significant difference between SQL Server 6.5 and 7.0 is the use of aliases in an UPDATE statement. When the FROM clause contains an alias and the SET clause references that alias, SQL Server 7.0 does not let the UPDATE clause reference the table name. The UPDATE clause must reference the alias instead. For example, Listing 1 shows a SQL statement that works in SQL Server 6.5, but when you execute it in SQL Server 7.0, the compiler reports an error. Listing 2 shows the syntax that works in both the 6.5 and 7.0 environments.

Foreign Keys. The data types of foreign-key columns must be the same between tables. Before SQL Server 7.0, this requirement applied to numeric data, but SQL Server allowed char and varchar key combinations. Listing 3 is an example of a foreign-key declaration that is valid in SQL Server 6.5, but fails in SQL Server 7.0 (unless it's running in 65 compatibility mode). SQL Server 6.5 provided the implicit type conversion between the two types. In SQL Server 7.0, the types must exactly match. Updated versions of the creation statements appear in Listing 4.

Deferred Name Resolution. Deferred name resolution allows procedure compilation without all table references being present. Deferred name resolution works in much the same way as the object-oriented concept of late binding. At compile time, the compiler attempts to resolve all table names that the procedure references. But if a table does not yet exist, the compiler defers this name resolution until execution time.

For developers who have used temporary tables within their stored procedures or triggers, this subtle new feature is long overdue. Although this feature is useful, it has a side effect that many developers might initially miss--the compiler no longer reliably catches table-name typos. Yes, you now must test your code. This statement might sound funny at first, but if you are not aware of deferred name resolution, you might find yourself wondering why the compiler missed this error.

Outer Joins. In earlier versions of SQL Server, you use the *= and =* operators to specify the left and right outer join conditions in the WHERE clause. In some cases, this syntax results in an ambiguous query that might not return the expected results. For example, consider the two SELECT statements in Listing 5 and their output in Figure 1 on page 32. Logically, both statements are identical, but they result in different data sets. The ambiguity exists under the old syntax because the optimizer does not always distinguish the JOIN condition from the selection criteria. SQL Server 6.5 supports SQL-92-compliant outer joins using the LEFT OUTER JOIN and RIGHT OUTER JOIN extensions in the FROM clause, thus removing this ambiguity. For backward compatibility, SQL Server 7.0 continues to support the old-style syntax. However, Microsoft has stated that future releases of SQL Server will support only the SQL-92 syntax. Therefore, we strongly recommend that you convert any remaining queries that use the old-style syntax to a SQL-92-compliant form.

Behavioral Compatibility

The behavioral category contains some of the hardest-to-identify compatibility changes. Generally, these subtle changes cause problems only at execution time. Further complicating matters, some of the conflicts produce errors or warning messages, but others might go unnoticed and lead to data corruption. The following items highlight some of the more frequent compatibility challenges developers encounter.

The Query Optimizer. A major change that affects backward compatibility is the addition of several new JOIN algorithms to the query optimizer. Before SQL Server 7.0, only one algorithm, nested iteration, was available. Each time a query executed, the result sets always returned in the same order. In SQL Server 7.0, the optimizer has several options: a hash join (in-memory, grace, or recursive), a merge join, or the nested iteration. Each of these alternatives can produce the correct result; however, each has the potential to return that result in a different order. For more information on how the query optimizer can improve query processing, see Dusan Petkovic and Christian Unterreitmeier, "New Features for Query Processing," July 1999.

If you rely on the implicit ordering of your data, you need to be especially conscious of this change. The change is most visible when you use the GROUP BY clause. Including a GROUP BY clause in your SELECT statement does not imply any ordering of the result set. Before SQL Server 7.0, developers often took for granted that the data would return in the same order as the columns in the GROUP BY clause. Therefore, they omitted the ORDER BY clause. SQL Server 7.0, like the ANSI standard, does not guarantee this ordering. If you need ordered data, you must supply the ORDER BY clause.

Also, because of the new join algorithms and internal changes (e.g., row-level locking, increased page size, and I/O and optimizer improvements), consider removing optimizer hints from your queries unless SQL Server 7.0 still requires the hints. Because of query-engine inefficiencies, earlier releases of SQL Server sometimes require that you use hints to get the best execution plan for large queries. The many enhancements to SQL Server 7.0 change this requirement. For maximum query efficiency, leave query-plan determination to the optimizer and use hints only as a last resort.

Null Comparisons. Dealing with three-state logic, which nulls introduce, has always been difficult for developers. Not only do queries have to consider the equality of an attribute, but when the attribute allows nulls, queries must also deal with the issue of the attribute's existence. When an attribute is null, its value is unknown. Therefore, logical comparison to another value is not possible. To get around this problem, earlier versions of SQL Server implement a T-SQL extension that allows for the comparison of nulls. For example, value=null was always true when value was null. This capability changes with SQL Server 7.0. In adhering to ANSI standards, SQL Server 7.0 uses the SET ANSI_NULLS option to control null comparisons. When set to ON, which is the default value, the statement value = null equates to unknown, not true or false. To test for the presence of nulls, you must now use the IS NULL or IS NOT NULL clause. Consider the SELECT statements in Listing 6: Under SQL Server 6.x, the first SELECT statement returns row "BB"; under SQL Server 7.0, it returns no rows. The second SELECT statement returns row "BB" under both SQL Server 6.5 and SQL Server 7.0.

Aggregation and Nulls. Another area involving nulls (and requiring good defensive programming) is aggregation. When SQL Server 6.5 encounters a null value in an aggregate function (such as AVG, MIN, and MAX), it skips the row and continues without generating any warning or error messages. Under SQL Server 7.0, the same statement completes but generates the message Warning: Null value eliminated from aggregate. For example, if you use the table definition and the first UPDATE statement Listing 7 shows, SQL Server 7.0 generates the warning. The second UPDATE statement correctly accounts for the nullibility of col2 and is preferred in all SQL Server versions.

String Truncation. SQL Server 7.0 no longer allows implicit truncation of strings. For example, consider the code in Listing 8, page 34. When you execute the code, SQL Server 6.5 does not generate any errors. However, the result set from the SELECT statement returns two rows; each contains the value 0123456789. If you execute that same SQL script under SQL Server 7.0, the second INSERT statement generates an error message, String or binary data would be truncated, and the statement ignores the INSERT. The result set returns only one row. The good news is that implicit data loss is eliminated. However, when moving data, developers and designers now must be more conscious of field sizes. Note that even when running under 65 compatibility mode, SQL Server 7.0 does not support 6.x behavior.

Arithmetic Overflow. In SQL Server 6.x, UPDATE and INSERT statements that encounter an arithmetic overflow condition set the value to null, or skip the action if the value belongs to a non-null column. In SQL Server 7.0, however, the action terminates and no modifications occur. For example, executing the SQL code in Listing 9, page 34, in SQL Server 6.x generates the warning Arithmetic overflow occurred, but the row still inserts with the col12 value set to NULL. Under SQL Server 7.0, these same SQL statements still produce an error message, Arithmetic overflow error for data type tinyint, Value = 1600000000. However, no row is inserted. As with string truncation, designers and developers must be conscious of this change and either adjust their data types or modify their calculations accordingly.

Empty Strings. Another new feature in SQL Server 7.0 is its ability to handle empty strings. Pre-7.0 releases interpret an empty string as either a null or a single blank character. In SQL Server 7.0, empty strings are treated as empty strings. This change affects all string-related functions. Some examples appear below; for a complete list of affected functions, consult SQL Server 7.0's Books Online (BOL).

Before 7.0 With 7.0
RTRIM(' ') = null RTRIM(' ') ="

Timestamps. Originally, the purpose of the timestamp data type was to support SQL Server recovery algorithms. Every time someone modified a page, SQL Server stamped it with the current @@DBTS value and incremented @@DBTS by one. This approach was sufficient to determine the relative sequence of page modifications for recovery, but the timestamp values have no relation to time. In SQL Server 7.0, the only remaining use of the timestamp data type is for concurrency, because the @@DBTS value is incremented only for use in timestamp columns. If a table contains a timestamp column, every time an INSERT, UPDATE, or DELETE statement modifies a row, the timestamp value in the row is set to the current @@DBTS value, and @@DBTS is incremented by one.

Also, you need to be aware of the effect of timestamps on replication. Merge replication does not support timestamp columns. Therefore, organizations that want to exploit SQL Server's merge replication features need to remove the timestamp column from any published table. In a merge replication scenario, changes to this column have no common meaning between servers, because local servers generate their own timestamp values, and the uniqueness of those values is guaranteed only within their specific databases.

Putting It to the Test

Each deployment and database is different. We have touched on just a few of the potential pitfalls you might encounter. Microsoft invested in the development of many resources to ensure a smooth upgrade path, including BOL, so be sure to check them out. BOL provides an excellent reference on compatibility issues and discusses many additional topics such as ODBC and operational (DBA) changes.

As with any project, before deploying your databases to production, be sure to thoroughly test your applications in the SQL Server 7.0 environment. If your developers have good SQL writing habits, you should not encounter many problems. However, many of the changes are behavioral in nature, so they won't show until execution time. In the end, most of the compatibility conflicts are a direct result of stronger ANSI compliance. Therefore, resolving those conflicts will result in a more robust, portable, and stable database. Make the time and do the migration. SQL Server 7.0 is worth upgrading to.