SQL Server 2000 has a scalar data-type system that defines data types such as int or float, which DBAs commonly think of as "primitives" because they contain values that you can't break down further into components. However, SQL Server 2000 doesn't provide a practical way to extend this type system. When you extend a type, you add new types to it that let you work with more complex values. You might wonder why you'd want to extend a data type such as int, and in SQL Server 2000, you wouldn't. Instead of extending data types, you use SQL Server 2000's built-in constraints. However, a custom data type based on multiple fields—similar to SQL Server's built-in datetime multifield data type—might be useful. SQL Server 2005 (formerly code-named Yukon) lets you use its new user-defined data type (UDT) capability to create your own multifield scalar data types and treat them the same way you do built-in scalar data types.
For example, imagine you have a table column that holds a scalar data type. Given a sample set of data that contains the triples 12 4 9, 13 5 2, and 9 14 11, you might recognize each triple as an ordered collection of items known as a vector and decide that each triple needs three columns. You'd know from the experience of micro-parsing a field and from the rules of first normal form that the space you'd save squeezing each triple into one column would be a false economy.
However, not all triples are vectors. Consider this set of data: 12 1 1998, 3 5 1992, 1 4 1996. Writing the numbers in the format
12/1/1998, 3/5/1992, 1/4/1996
makes it clear that they're dates. Again, from experience, you know that storing a date in three columns would make it difficult to work with. In this case, the triple should use one column because storing a date in three columns makes no sense. You store a date in one column because even though it has three fields, it's a scalar value, and the SQL Server scalar data-type system provides the datetime data type to store it.
Other multifield values can be scalars, too. Angles such as latitude and longitude often contain four fields: degrees, minutes, seconds, and direction. These values look like 34°6'12"N or 61°35'19"W. Unfortunately, the SQL Server scalar type system doesn't have latitude and longitude data types.
SQL Server 2005 lets you create new scalar types by using any programming language that the .NET Common Language Runtime (CLR) supports. You can add Latitude and Longitude scalar data types to SQL Server and store instances of each in its own column, just as you store a date in its own column with the datetime data type in SQL Server 2000. Let's look at how you can create a UDT for latitude in SQL Server 2005 by using C#. Note that the sample code that accompanies this article uses the beta release of SQL Server 2005 that Microsoft distributed at the Professional Developers Conference (PDC) in October 2003. You can download the Latitude.cs file, which contains complete sample code for this article's example, at http://www.sqlmag.com, InstantDoc ID 42169.
Goals for the Latitude Data Type
Before you design a UDT, you need to decide what characteristics you want it to have. First, in this example, you want the UDT to store latitude to the nearest second of arc without any loss of data. Second, you want to be able to use in SQL expressions a string representation such as 34°6'12"N that has the fields degrees, minutes, seconds, and direction. Each field has a restricted range of values. Degrees are in the range 0 to 90, and minutes and seconds are in the range 0 to 59. The direction must be N or S. Third, you must be able to use Latitude values in magnitude comparisons and ORDER BY clauses.
Listing 1 shows a simple SQL batch that creates an environment for using the Latitude data type. Listing 1's code creates the Places table, in which each row has a place name and a latitude. This batch also creates the Tropics view, which lists places that are in the tropics (i.e., places that lie between 12° north and 12° south of the equator). The Places table contains a column named Lat that uses the Latitude type. You insert latitudes into the Places table by using the CONVERT() function. The Tropics view uses magnitude comparisons of latitudes to determine which rows contain values that are in the tropics range.
Each UDT needs a string representation, an internal representation, and a null representation. The first representation is obvious: You need to be able to describe a latitude as a string, such as 15°2'39"N. The internal representation is how SQL Server will store on disk the bits representing latitude. Last, because SQL Server scalar data types can be null, you need a way to handle that, too.
You implement UDTs in SQL Server 2005 by designing a public class that has several well-known methods, properties, and attributes. Listing 2 shows a skeleton implementation of the Latitude class that illustrates the basic features of a typical UDT.
The SqlUserDefinedType attribute in Listing 2 is a marker showing that the Latitude class is implementing a UDT. The angle_ field shows the internal representation that SQL Server will use to store the value. The ToString() and Parse() methods are for implementing the string representation. Finally, the IsNull and Null properties are for implementing the null representation. The minimum requirements for implementing a UDT are that the data type
- is a public class that has SqlUserDefinedType and Serializable attributes
- overrides the ToString() method to produce a string representation
- has a public Parse() method to interpret the string representation
- implements the IsNull property of the INullable interface
- implements the public static Null property to produce a null instance
- is added to the database by using CREATE ASSEMBLY and CREATE TYPE
The construction parameter of the SqlUserDefinedType attribute specifies the technique SQL Server will use to save and restore the internal representation on disk. You can choose one of several available formats, but Format.Native is more straightforward than the other available formats. This article doesn't cover using the other formats.
The SqlUserDefinedType attribute's IsByteOrdered parameter lets you know whether you can order the UDT. If IsByteOrdered is true, SQL Server lets you use the UDT in ORDER BY clauses and comparison expressions.
Now, you can use the C# code in Listing 2 to implement the Latitude UDT. As I mentioned, the Latitude.cs file online contains the source code for implementing Latitude. Latitude.dll is the assembly that the compiler produces when you compile Latitude.cs.
The Int32 angle_ field in the Latitude class is SQL Server's internal representation of the Latitude UDT. You can see in Listing 2 that the SqlUserDefinedType attribute uses Format.Native. The SqlUserDefinedType attribute also specifies IsByteOrdered=true. This specification means that SQL Server will let you use the Latitude type in magnitude comparisons and ORDER BY clauses.
The Latitude UDT has a helper method named Angle(), which Listing 3 shows. You use the Angle() method to encode degrees, minutes, seconds, and direction into an Int32 data type so that SQL Server can store the complete value in the angle_ field.
The encoding is straightforward except for one part: The code converts degrees and minutes to seconds and adds them to the total seconds so that the values will be properly ordered and fit into an int data type in .NET. If the value you're specifying is supposed to represent a point south of the equator, the direction is negative (S) and the code converts the result to a negative number. Before the code converts the value, it adds 1 to it to ensure that 0°0'0"N has a different internal representation than 0°0'0"S. This distinction is important because, although these values are the same, their lexical representations are different.
Latitude has another helper method, GetParts(), which is the complement of the Angle() method and decodes the angle_ field into degrees, minutes, seconds, and direction. The following code shows the signature for GetParts():
ref byte minutes,
ref byte seconds,
ref bool positive)
The source code in the Latitude.cs file contains the code you use to implement the GetParts() method.
The Latitude UDT's last helper method is called EncodeNullAngle(). This method sets the angle_ field to a value that represents a null. Latitude uses Int32.MinValue to represent a null; it can use this value because the Angle() method will never produce Int32.MinValue. You implement the EncodeNullAngle() method like this:
angle_ = Int32.MinValue;
The EncodeNullAngle() method makes implementing the IsNull and Null properties easy. Listing 4 shows the code you use to implement these properties. The IsNull property checks to see whether the angle_ field contains the value that represents a null, in this case, Int32.MinValue. The static Null property creates an instance of a Latitude UDT, then uses the EncodeNullAngle() method to set the instance's value to null. SQL Server internally uses the IsNull property to check for null Latitude values.
Listing 5 shows the C# code you use to implement the ToString() method, which checks to see whether the Latitude value is null and returns "null" if it is. If the Latitude value isn't null, the GetParts() method decodes the value_ field, then uses the results to build a Latitude string. Note that the string "\xB0" is the way you represent the degree character (°) in C#.
The complement of the ToString() method is the Parse() method, which Listing 6 shows. The Parse() method parses a string and returns a Latitude value that represents it—in this example, all four fields. This method must also validate the string passed in to it to be sure that the string is properly formatted. The Parse() method's input parameter is a SqlString, which is similar to the string type in C# except that SQLString represents a null string through its IsNull property.
The first thing the Parse() method does is determine whether SQL Server passed in a null string or a string whose text is "null" when it called Parse(). If so, the method returns a null for the Latitude value. Otherwise, Parse() uses a regular expression to make sure that the string is properly formatted. If the formatting isn't correct, SQL Server throws an error.
If SQL Server passed in a valid string, the code uses the same regular expression to parse the string and extract the values of the fields that make up the Latitude value, then returns the Latitude value. The code checks each extracted value to make sure that it's in the proper range; if any value is outside its specified range, the code throws an error.
The use of a regular expression greatly simplifies the implementation of the Parse() method. The regular expression is a private static member of the Latitude class, which means the regular expression is precompiled when you load the code, and SQL Server uses it in the compiled form repeatedly. Also note that the static member is marked as readonly. Without this designation, the Latitude.dll assembly could compromise SQL Server's security and reliability.
Compiling and Cataloging
You use the C# compiler that comes with SQL Server 2005's PDC bits to compile the Latitude.cs file you downloaded and produce an assembly. You have to add that assembly and the Latitude data type to a database before you can use them. This step is called cataloging. Listing 7 shows a T-SQL batch that catalogs the Latitude type. In this listing, the CREATE ASSEMBLY command loads the latitude.dll file into the database. PERMISSION_SET=SAFE means that this assembly contains only managed code and generally performs only computational kinds of operations. SQL Server will analyze the assembly when it loads it, and if the assembly doesn't meet the criteria for PERMISSION_SET=SAFE, SQL Server will raise an error and won't load the assembly. Assemblies that SQL Server loads with PERMISSION_SET=SAFE are as secure and reliable as any T-SQL batch.
Once SQL Server has cataloged the assembly, it catalogs the Latitude type. In Listing 7's code, the string following the CREATE TYPE command is the name of the type. The EXTERNAL NAME clause specifies an assembly and a class in that assembly. The string Latitude: means that the type is located in the assembly named Latitude that Listing 7's code added to SQL Server. The string \[Cartography.Latitude\] refers to the class name Latitude in that assembly's Cartography namespace. The names of the namespace and class are case-sensitive and must match the case in the implementation, even if you implemented the code in a language such as Visual Basic .NET, which ignores case.
This article shows how to implement a simple UDT in SQL Server 2005. Remember, for your UDT to work, you need a few basic elements: a class that has an internal representation of the type, a way to represent an instance of the type as a string, and a way to handle nulls in the data. SQL Server 2005 offers other innovations for UDTs such as ways to create a user-defined sort order, expose fields in the type directly to T-SQL, include arithmetic functions, and implement user-defined serialization. In addition, the next release of Visual Studio .NET, code-named Whidbey, directly supports creating UDTs. I hope this introduction will inspire you to explore these innovations.