More SQL Server datetime secrets revealed

In "Solving the Datetime Mystery," September 2000, I began a discussion of datetime values. In this issue, I conclude that discussion by telling you how SQL Server determines which century to assume when you enter a two-digit year, and why SQL Server won't accept a date before 1753. Also, I tell you about one more internal storage detail and show you code for searching for various datetime values in your tables.

When SQL Server stores a date, the program leaves no ambiguity about the year. The date is stored as the number of days from a base date of January 1, 1900, so a date in the 1900s would be a smaller value than a date in the 2000s, and a date in the 1800s would be a negative number. But what happens when a user or an application enters a character string with a two-digit year? How does SQL Server figure out which year is intended? SQL Server's default behavior is to interpret a two-digit year as 19yy if the value is greater than or equal to 50 and as 20yy if the value is less than 50. To verify this behavior, I used Query Analyzer to execute these two statements:

SELECT convert(datetime, '1/1/49')
SELECT convert(datetime, '1/1/50')

I got the following results:

2049-01-01 00:00:00.000
1950-01-01 00:00:00.000

SQL Server interpreted the two-digit 49 as 2049 and the two-digit 50 as 1950. That interpretation works now, but by the year 2051, you won't want SQL Server to interpret a two-digit year of 51 as 1951. With SQL Server 7.0, you can change the cutoff year that determines how SQL Server interprets a two-digit year. A two-digit year that is less than or equal to the last two digits of the cutoff year will have the same first two digits as the cutoff year. And for a two-digit year that is greater than the last two digits of the cutoff year, SQL Server interprets the first two digits as a number that is one less than the cutoff year's first two digits. To change the cutoff year, select the Server Properties tab in the Properties dialog box in the SQL Server Enterprise Manager. Or you can use the sp_configure stored procedure:

EXEC sp_configure 'two digit year cutoff', '2000'
RECONFIGURE

In this example, because the two-digit year cutoff is 2000, SQL Server interprets all two-digit years except 00 as occurring in the 1900s. When the default two-digit year cutoff value is 2049, SQL Server interprets the two-digit year 49 as 2049 and the two-digit year 50 as 1950.

Be aware that although SQL Server uses 2049 as the cutoff year for interpreting dates, OLE Automation objects use 2030. You can use the two-digit year cutoff option to provide consistency in date values between SQL Server and client applications. However, I strongly recommend that you always use four-digit years in your data to avoid ambiguity in your applications.

Limits of Date Values


SQL Server Books Online (BOL) defines the ranges of the two datetime data types this way: "Use datetime to store dates in the range from January 1, 1753, through December 31, 9999. Use smalldatetime to store dates in the range from January 1, 1900, through June 6, 2079." Two of these dates seem arbitrary. Why is the earliest datetime value in 1753? And why is the last smalldatetime value not at the end of a century, or even at the end of a month? I mentioned in my last column that datetime values use 4 bytes to store the number of days before or after the base date. Smalldatetime values use only 2 bytes. Generally, 2 bytes can hold numbers between -215 and +215 - 1, using 15 of the 16 bits for the data and one bit for the sign. If you ignore the sign and use that 16th bit for data, the range of possible values more than doubles. Instead of a maximum 2-byte value of 32,767, the value is 65,535. And 65,535 days after the base date of January 1, 1900, is June 6, 2079.

The reason for the early cutoff of possible datetime values isn't mathematical but historical. In September 1752, Great Britain adopted the Gregorian calendar, which differed from the one previously used in Great Britain and its colonies by 12 days. The change happened on September 2, 1752, so the next day was September 14, 1752.

So, with 12 days lost, how can you compute dates? For example, how can you compute the number of days between October 12, 1492, and July 4, 1776? Do you include those missing 12 days? To avoid having to solve this problem, the original Sybase SQL Server developers decided not to allow dates before 1753. You can store earlier dates by using character fields, but you can't use any datetime functions with the earlier dates that you store in character fields.

Datetime and smalldatetime values also differ in the precision of the time information they can store. Last month, I mentioned that datetime records time to the nearest tick, which is 3.3 milliseconds (ms). Smalldatetime stores time only to the nearest minute. But what happens when you try to convert a datetime value to smalldatetime? You might have read or assumed that SQL Server rounds smalldatetime values to the nearest minute, and the following queries seem to bear this idea out. For example, the query

SELECT CAST('2000-07-08 14:55:29' AS smalldatetime)
returns the time as 14:55, whereas
SELECT CAST('2000-07-08 14:55:30' AS smalldatetime)

returns the time as 14:56. However, times aren't always rounded to the nearest minute because SQL Server rounds all datetime values to the nearest .000, .003, or .007 seconds; therefore, 29.999 rounds up to 30.000 before being converted to smalldatetime, and 29.998 rounds down to 29.997. Thus, the query

SELECT CAST('2000-07-08 14:55:29.998' AS smalldatetime)

returns time as 14:55, whereas

SELECT CAST('2000-07-08 14:55:29.999' AS smalldatetime)

returns the time as 14:56.

Finding a Date


Suppose you want to find all the orders placed in August 1996 and stored in the Northwind database orders table. SQL Server automatically converts from character string constants to datetime values implicitly when it can figure out what the datetime value needs to be. But if you use wildcards in your character strings, SQL Server might not be able to convert properly. For example, I could use the following query to find the relevant orders:

USE Northwind
SELECT  * FROM orders WHERE OrderDate BETWEEN '8/1/96' and '8/31/96'

Remember that all datetime values hold both a date and a time component, so if any orders hold a datetime value falling after midnight on August 31 (where midnight is the first instant of a day, with a time of 00:00:00.000), the query I used won't find them. I'm assuming that all dates are entered into the table with a time of midnight, and checking the dates in the orders table confirms this assumption. SQL Server interprets the query above and returns 25 rows. After SQL Server converts the two string constants to datetime values, it can perform a proper chronological comparison. However, if your string contains a wildcard, SQL Server can't convert the string to a datetime; instead, SQL Server converts the datetime to a string. To perform this conversion, SQL Server uses its default date format, which is mon dd yyyy hh:miAM (or PM). You need to match this format in any character string that contains wildcards. Thus, the first query below will return the same 25 rows as the previous query, but the second query won't find any matches:

SELECT  * FROM orders
WHERE OrderDate LIKE 'Aug % 1996%'
SELECT  * FROM orders
WHERE OrderDate LIKE '8/%/96 %'

Although SQL Server is usually flexible about how you can enter dates, when you compare a datetime to a string that has wildcards, you must base the string on the default datetime format. Note that the default format uses two spaces before a single-digit day. So if you want to find all rows in the orders table with an OrderDate of July 8, 1996, you need to use the following query, making sure to put two spaces between Jul and 8:

SELECT  * FROM orders
WHERE OrderDate LIKE 'Jul  8 1996%'

New Tricks


Although I've worked with SQL Server for 13 years, I'm still learning new things, and not just about changes the new releases introduce. I always believed that the CONVERT() function's third argument, which controls the conversion style, is relevant only when you're converting datetime or money values to character strings. I recently found out that, when you convert a character string to a datetime, you can also use the style argument as a replacement for setting the DATEFORMAT value. You need to be careful to enter the date in the format that the style requires; see the BOL entry "Cast and Convert" for details about the possible styles. For example, these two statements will return different results:

SELECT convert(datetime, '10.12.99',1)
SELECT convert(datetime, '10.12.99',4)

The first statement tells SQL Server to assume the date is style 1 (mm.dd.yy), so SQL Server can convert the string to the corresponding datetime value and return the value to the client. With the second statement, SQL Server assumes the date is represented as dd.mm.yy, so it returns a different date. You'll get a conversion error if you try to use styles 102 or 104, which require a four-digit year, with the query above.

The second new trick I learned recently involves the way the DATEDIFF() function works. This function finds the difference between two dates in the unit you specify as the first argument. The date functions don't round any values; the DATEDIFF() function just subtracts from each date the components that correspond to the datepart you specified. For example, a query to find the number of years between New Year's Day and New Year's Eve of the same year would return a value of zero. SQL Server subtracts the year part of the two dates; because the year is the same, the difference is zero:

SELECT datediff(yy, 'Jan 1, 1998', 'Dec 31, 1998')

However, if you want to find the difference in years between New Year's Eve and the following New Year's Day (the next day), this query returns a value of 1 because the difference between the year parts is 1:

SELECT datediff(yy, 'Dec 31, 1998', 'Jan 1, 1999')

I always thought that I couldn't use DATEDIFF because two dates in different years could have the same "day of year" value. For example, if I use DATEPART() to extract the dy component in each of these queries, I get the same value:

SELECT datepart(dy, 'Oct 12, 1977')
SELECT datepart(dy, 'Oct 12, 1999')

Both queries return 285, so I assumed that to find the difference between these dates, SQL Server subtracted the dy components for each individual date to get zero. But you can use DATEDIFF() to discover whether two dates are the same, regardless of the time. SQL Server knows that if you want the difference in days between two dates that are in different years, the difference must be a nonzero value. Thus, the following query returns a value greater than zero:

SELECT datediff(dy, 'Oct 12, 1977', 'Oct 12, 1999')

A value of zero would be possible only if the years were identical. So how can you use this fact when working with your data? In the Pubs database, the titles table has a field called pubdate in which not all the values have a time of midnight. The pubdate column has a default value of getdate(), and two rows in the table use that default when the Pubs database is first created. The Pubs database on my SQL Server 7.0 machine was created on November 13, 1998, but not at midnight. I can use the following query with my SQL Server 7.0 system to see which books were published on November 13, 1998:

SELECT title_id, pubdate FROM titles
WHERE datediff(dy, pubdate, 'Nov 13 1998') = 0

You can find other clever ways to work with datetime data if you understand how SQL Server keeps track of datetime data and if you know how the various date-manipulation functions work. A good place to see some nice solutions to problems with datetime data is on the public newsgroups, which you can access with any newsreader through the msnews.microsoft.com server, or on the Web-based forums that SQL Server Magazine supports at http://www.sqlmag.com/.