Choosing the right data type is an important part of table design
A future T-SQL for Starters column will address using the T-SQL syntax to build tables. But before I start building tables, I need to talk about data types. Every column in a table must have a defined data type, such as integer or character, that controls the values you can enter into that column. In this article, I look at the system-defined data types and explain how you can add your own data types. You need to add your own data types before you start building the tables that might reference them. User-defined data types can add a level of consistency and control to projects with several developers working on the same database.
Columns Need Data Types
When you create a table, you must list a data type for every column. Any data entered into a column must match the data type for that column. In some cases, you must also specify the length of the column. The SQL statement in Listing 1 uses only system-supplied data types to create the Authors table in the Pubs database.
Most of the columns in this table are text fields. The phone number, state, and ZIP are always the same length, so they use the char data type. The author's name, address, and city can vary in length and can use the varchar, or variable-length character, data type. The contract column is a yes/no column that tells whether the author has a contract; here a bit data type, with a value of 0 or 1, will suffice. Let's take a closer look at the system-defined data types.
System-Defined Data Types
SQL Server 6.x comes with about 19 predefined data types; SQL Server 7.0 added four more to handle Unicode data and to support data-warehousing applications. Each data type has a range of values it can contain, and knowing these values can help you choose among the various options.
Character data. Character, or char, is one of the most common data types. This type holds character data such as names and addresses. SQL Server 6.5 limits the size of any character column to 255 characters. If your data is longer—for example, if you want to add free-form notes about a contact—you must use the text data type. SQL Server 7.0 eliminates this limitation, and a character field can now occupy up to 8000 bytes. No single record can span pages, so the page size determines the upper limit. When a column has the char data type, you must surround the data with single quotes, as you see in this example:
UPDATE authors SET au_lname = 'Hemingway' WHERE au_id = '213-46-8915'
As I mentioned in "Working with Character Data" (May 1999), you can choose a fixed-length character data type, char, or a variable-length data type, varchar. The fixed length works better when the data is of the same or similar length, such as the author ID (which looks like the author's Social Security number, a bad choice for an identifier). You won't save much processing overhead in most cases by using a variable-length data type. But authors' last names might be long, so using a variable-length character data type might be worthwhile. For most names, a fixed-length field means a lot of wasted space, so a variable-length data type is better. You have a trade-off between using the extra space and incurring the processing overhead to handle variable-length fields.
SQL Server 7.0 supports the Unicode character set, so you need a data type that takes advantage of the extended capability Unicode offers. If you want to use a Unicode character data type, you denote it as Nchar, or if it's of variable length, Nvarchar. When you input Unicode data, you again put the data in single quotes, with N preceding the opening quotation mark. If your author's name is defined as a Unicode data type, you write the previous UPDATE statement as
UPDATE authors SET au_lname= N'Hemingway' WHERE au_id = '213-46-8915'
The Unicode data type limit is 4000 characters. Each Unicode character takes two bytes of storage, so you can fit only half as many Unicode characters on an 8K page as you can regular character data.
Integers. The basic integer data type can handle data values ranging from -2,147,483,638 through 2,147,483,647. Smallint covers -32,768 to 32,767 and is often a better choice when you know that your range of data will be limited. If I were to number my CD collection, I might use a smallint column for the primary key, assuming that my collection never exceeds 32,767 CDs. I could opt for tinyint, at one byte, but it handles only values of 0 through 255—which might work if I were tracking my certificates of deposit rather than my compact discs.
Exact numeric data. If you need to get back exactly the value you input, choose one of the exact numeric data types. Two exact numeric data types exist: decimal and numeric, which are virtually the same data type. You must specify the precision and scale for these data types, in the format decimal (p,s). The precision is the total number of digits stored on both sides of the decimal point, and the scale is the number to the right of the decimal point. You can store the number 123.4567 in a column using a numeric (7,4) data type. If the number to be stored has more decimal places than will fit, it is rounded to fit. Numeric data types can use anything from 2 to 17 bytes to store the data.
Approximate numeric data. Some numbers cannot be expressed exactly, such as one-third or pi; to express such numbers, use real or float data types. Real stores values to a precision between 1 and 7. Float, sometimes called double precision, can store data with a precision of 8 to 15. You might use real or float in scientific applications where you have to store numbers that do not lend themselves to exact binary representations. The last digit or two may not be exactly preserved when these numbers are converted to binary, so do not use them for exact comparisons in WHERE conditions.
Binary. Suppose you need to store binary data in your database. You have the choice of binary or varbinary data types, which are fixed and variable length, respectively.
Money. The two data types for monetary data are money and smallmoney. The money data type handles values up to plus or minus 922 trillion. Most of us can use the smallmoney data type, which covers the value range -214,748.3648 to +214,748.3647. By convention, money data types are stored to four decimal places and displayed to two. Note that many financial transactions don't use the money data types. For example, stock trades can involve 1/32nd fractions, which need five decimal places.
Dates. SQL Server stores both date and time in one column, so if you ask for the current date with the GETDATE() function, you also get the time. The two date formats are datetime and smalldatetime. The smalldatetime data type covers a range of January 1, 1900, through June 6, 2079, and includes the time to the minute. That range should be good enough for scheduling most projects. The datetime data type is good up to December 31, 9999 (watch for that Y10K problem), and starts at January 1, 1753. Why 1753? It's linked to the change from the Julian to Gregorian calendar. Although the Gregorian calendar was proposed some years before, during a period of approximately 30 years, some countries had adopted the new calendar and others had not. So you need to know the location for the date to be unambiguous. Also, the year started on March 1, not January 1. So a date such as January 15, 1492, can be interpreted as falling in either the first month of 1492 or the 11th month of 1493. SQL Server's creators chose to avoid this confusion by refusing to accept dates before 1753. Also, note that the datetime data type shows times down to the millisecond, but it is accurate only to 1/300th of a second.
Text and image. Use text for character data if you need to store more than 255 characters in SQL Server 6.5, or more than 8000 in SQL Server 7.0. Use image for binary large objects (BLOBs) such as digital images. With text and image data types, the data is not stored in the row, so the limit of the page size does not apply. All that is stored in the row is a pointer to the database pages that contain the data. You must follow special procedures to update this type of data, and they are beyond the scope of this article. (For more information on BLOBs, see Michael Otey, "Who's Afraid of the Big, Bad BLOB?" April 1999.) For now, be aware that changes to text or image data types might not be logged, and pointers in the row might not be updated; only the data in the text or image field changes.
Special Data Types
Some data types do not fit neatly into any specific category. One is the bit data type, an integer with a value of 0 or 1 (you can store eight of them in one byte). You often use the bit data type for true/false flags, such as to note whether an author has a contract or has written a bestseller, or for other similar values. One minor difference is that SQL Server 7.0 allows nulls in columns using the bit data type, whereas SQL Server 6.5 does not—the value has to be 0 or 1.
SQL Server uses the timestamp data type internally; this type provides a unique value that is updated every time the data in a table row is updated. Timestamps are internal values that SQL Server maintains and they do not translate to dates and times.
SQL Server 7.0 introduces two other new data types, and I will leave the details about them for future columns. The uniqueidentifier data type lets you assign a globally unique value to a column. Globally unique means unique not just within this database or this computer, but everywhere. This type is relevant if you plan to merge data from various sources into a data warehouse. The cursor data type is for cursor variables. The use of cursors is an advanced topic, which I will not attempt to cover here.
User-Defined Data Types
SQL Server lets you define your own data types, but they must be variations on the system-supplied data types. Usually they are based on a system-supplied data type with a predetermined size (in the case of character or numeric data) and associated defaults and rules. For example, you can define a zipcode data type as char(10) and a phone number as char(14). The zipcode data type will ensure that all zip code formats are consistent in all zip code columns, for any client, vendor, employee, or personal contact anywhere in the world.
SQL Server is not very helpful when you want to define data types from scratch. For example, you cannot define a new data type with associated properties, such as latitude and longitude data types for a GIS application, along with the spherical trigonometry methods necessary to compute the distance between two points. Perhaps one day SQL Server will offer more flexibility.
For now, you can add data types from the SQL Server Enterprise Manager or from the Query Analyzer (in SQL Server 6.5, the ISQL/w window). Each data type you add is good in only the current database. If you need to define a data type for all databases, put it in the Model database. The new data type will appear in all new databases from then on. Another approach is to write, or use SQL Server's Generate SQL Script option to produce, a script that you can run in each database. In Enterprise Manager, right-click on the database, select All Tasks, Generate SQL Scripts, then pick the option to create a script for the data types.
You can use the graphical interface you see in Screen 1 to add a data type. In Enterprise Manager, expand Databases, expand the database, and select User-Defined Data Types. Right-click and select New User-Defined Datatype Properties. You then define the data type.
As usual in SQL Server, you can also add a data type from the Query Analyzer. You can see the syntax in Screen 2. Notice that the quotes go around the system data type name, but not around the name of the new data type you are defining. Also, a good practice is to state whether the data type allows NULL values. You can override this setting when you use the data type for a specific column, but specifying what is allowed helps set standards, which is why you define data types.
If you define data types, the table creation statement might look like the code in Listing 2. You do not have to give the length of the column where you specified a user-defined data type, because the length is already implied. And you can let the null/not null default to the data type's specification, although it doesn't hurt to include it again.
Choose the Right Type
Picking the right data type is part of table design. In SQL Server 6.5, changing a column after you define it is difficult; in SQL Server 7.0, you can change a column's data type, but do so with caution.
User-defined data types are a good way to ensure consistency across applications and development teams, as long as everyone is using SQL Server. However, if you need to move the SQL code to a different database management system (DBMS), avoid user-defined data types.