Downloads
100761.zip

Organizations typically use the same column name across different tables to denote the same item of data. For example, if you have a column containing domain IDs in several tables in a database, the columns' names would be the same (e.g., DomainId) in all those tables. Those columns' data types should also be the same because mismatched columns can cause problems. For example, mismatched columns can

  • Decrease SQL Server's performance. Columns with the same name in different tables (and views) are typically used to link the tables together (e.g., as keys and foreign keys). If the data types aren't the same, SQL Server has to convert one of the data types to make the columns comparable. This additional work is unnecessary overhead. In some cases, it can also mean an otherwise useful index isn't used, resulting in a further decrease in performance.
  • Interfere with data integrity. Client applications (e.g., stored procedures, ADO.NET clients) often need to define data types. An incorrectly defined column can potentially invalidate both the underlying table and any client applications. Mismatched columns can also lead to client application errors. For example, if a column is defined as tinyint but should have been defined as int, a large value will cause an overflow error.

At my company, I discovered that our databases often contained columns with the same name but were defined with different data types. So, I created a utility to quickly identify which columns have mismatched data types across tables in a database. This utility uses the INFORMATIONAL_SCHEMA.COLUMNS view. This view provides all the information needed to determine whether a column has mismatched data types—namely, the column's name and data type (COLUMN_NAME and DATA_TYPE), the name of the table in which the column resides (TABLE_NAME), and the table's schema (TABLE_SCHEMA).

Here's how the utility works. The utility begins by first using INFORMATIONAL_SCHEMA.COLUMNS to calculate column name density—that is, how often (in terms of percent) each column name appears across all the tables and views in a given database. The column names (COLUMN_NAME) and column name densities (\[%\]) are stored in a temporary table named Prevalence.

Next, the utility then performs two INNER JOIN operations. It first joins the INFORMATIONAL_SCHEMA.COLUMNS view to itself, matching on the column name. The result set of the first join is then joined to the Prevalence table, matching again on the column name. Next, the utility searches the second join's results for data types that aren't the same to identify mismatched columns. The final result set is then sorted, with column name density being the primary sort criterion.

Figure 1 shows sample results from running the utility.

As you can see, the DomainId column has been defined as a smallint, tinyint, varchar, and int in various tables. Once identified, you can easily correct the data type for a given column. It might take a bit more effort to correct any client applications that use the column.

The utility should prove valuable to DBAs because they can use it to identify and correct mismatched columns and thereby improve data integrity and SQL Server performance. It should also prove valuable to developers and quality assurance (QA) specialists. They can use it to ensure that columns in new and amended tables are defined consistently across different tables before those tables are migrated to a production environment.

You can download the utility by clicking the 100761.zip hotlink at the top of the page. The utility works on SQL Server 2005 and later. It's currently set up to detect mismatched columns across tables in a single database and not across multiple databases. You might want to extend this utility to identify any mismatched columns across all the databases on a given server, perhaps leading to a basic data dictionary. You might also want to make other modifications, such as sorting the results by column name density and some other measure (e.g., the number of rows in the table, column usage) to give a column-importance metric that better meets your needs.

—Ian Stirk, freelance consultant, London