3 ways to maximize the benefits of SQL Server 2005’s new features
SQL Server 2005 is bringing many enhancements our way, but from an application developer's perspective, .NET integration is the most anticipated new feature. SQL Server 2005 can execute user-defined functions (UDFs), stored procedures, user-defined aggregates (UDAs), user-defined types (UDTs), and triggers written in common language runtime (CLR) languages such as Visual Basic .NET and C#. The scope of this remarkable shift in database programming could fill a book, and much of the functionality that the CLR offers isn't possible in T-SQL today. Let's look at how the CLR works in SQL Server 2005 and how to start writing and deploying CLR code in SQL Server 2005. We'll work through three examples that show you how to use the CLR to replace extended stored procedures, work with CLR triggers, and use CLR UDAs. These simple examples will help you see how you can start taking advantage of SQL Server 2005's enhancements.
CLR Integration: A Closer Look
The greatest advantage of CLR integration is that CLR languages such as Visual Basic and C# provide ways to interact with APIs such as the .NET Framework Base Class Libraries—BCLs—for string manipulation, use the cryptography API, create and work with regular expressions, and call external resources such as the Windows file system, Web services, and the Windows Event Viewer.
Another benefit is that the SQL Server database environment hosts the CLR, so loading and unloading assemblies happens in the Appdomain that SQL Server hosts. Appdomains are basically security boundaries that facilitate isolating, loading, and unloading managed code (i.e., code that runs inside a runtime environment). Using Appdomains is cleaner in implementation than using sp_OA× extended stored procedures. For example, because sp_OA× procedures can instantiate COM objects, a component could access SQL Server's internal memory structures and cause the SQL Server service to shut down. Introduction of Appdomains has eliminated such risks because managed code is isolated inside the Appdomain boundary, which prevents code from accessing SQL Server memory structures. With the CLR's compiled code, you can access system resources outside the SQL Server boundary and perform complex mathematical calculations that aren't possible with T-SQL. And with the option of using both T-SQL and CLR languages in SQL Server 2005, you can now meet any business requirement.
The first time you execute a call to managed code in SQL Server 2005, the call prompts SQL Server to load the CLR. After that, any managed code invocation runs in the CLR. The managed environment performs tasks such as memory allocation, exception handling, and enforcing SQL Server and CLR security policies. As I mentioned, through the CLR, managed code can access the BCLs that provide complete access to string manipulation, regular expressions, Web services infrastructure, file I/O, and the cryptography API. The BCLs bring the power, scalability, and security of the .NET Framework to database applications.
Figure 1 shows the SQL Server 2005 CLR architecture. The SQL OS (the SQL Server abstraction layer over the underlying OS resources) is primarily responsible for handling key processes such as transactions; enforcement of atomic, consistent, isolated, durable (ACID) rules; lock management; memory management; and hosting of the SQL Server Engine. The CLR resides inside the SQL Server Engine environment. A hosting layer facilitates interaction between the SQL OS and the CLR and is primarily responsible for assembly loading and unloading, memory management, enforcing the security model, determining the number of allocated threads and fibers, lock detection, and detecting execution contexts. The introduction of CLR languages in SQL Server wouldn't be possible without the changes Microsoft made to the SQL Server Engine to accommodate the CLR runtime.
Using Visual Studio 2005
Although you don't have to use Visual Studio 2005 to develop CLR-based objects, Visual Studio makes developing and deploying CLR objects easier, so I use it for the examples in this article. To work with the CLR in Visual Studio 2005, open the New Project window and select Database from the Project Types list. Then, select SQL Server Project from the Visual Studio installed templates list, as Figure 2 shows. Right-click the project you want, and select a new item. You'll see a list of templates for CLR objects you can create in SQL Server 2005. The Visual Studio code templates greatly simplify writing code for CLR-based database routines. For example, to deploy any function or sub as a SQL Server 2005 CLR object, you must declare it as Public Shared within the .NET class that encapsulates the implementation. The templates automatically add the class attributes that are required to deploy CLR objects.
Once you've built a project, Visual Studio compiles it into an assembly that encapsulates all the referenced assemblies. Deploying the assembly uploads its binary code into the SQL Server database that's associated with the project. SQL Server 2005 also contains new DDL statements for deployment; I discuss deployment later in this article.
Replacing Extended Stored Procedures
Creating extended stored procedures—traditionally the domain of C++ gurus—can be difficult, and using them can be risky. As I mentioned earlier, extended stored procedures run under the same security context as SQL Server and run inside the SQL Server process space, so a badly written, unmanaged extended stored procedure could bring down the SQL Server service. (For information about the complexity of extended stored procedures, see the Microsoft article "Creating Extended Stored Procedures" in Related Reading.) However, .NET code is type safe (i.e., it accesses types only in well-defined ways to allow consistent data-type definitions and increase code interoperability). It also runs within the Appdomain boundary, so it can't access random SQL Server memory locations and other .NET code. .NET code is also safer to deploy and use than extended stored procedures. To better understand this difference and see the power of BCLs, consider the following extended stored procedure:
This undocumented extended stored procedure returns the partitioned hard disk drives (HDDs) and the amount of free space available in each drive. If you were to rewrite this stored procedure in a CLR language such as Visual Basic .NET or C#, you would need to use the System.IO namespace, which contains the DriveInfo class that houses the information you're looking for about the each drive's free space. The DriveInfo class also exposes other properties, such as AvailableFreeSpace, DriveFormat, TotalSize, and VolumeLabel. So, when you replace the xp_fixeddrives extended stored procedure, you also get extended functionality: the ability to access additional drive information. This capability isn't possible with the extended stored procedures in SQL Server 2000 and earlier releases—unless you write your own extended stored procedures, which could be difficult.
Let's look more closely at how to create the CLR stored procedure that Listing 1 shows. You must prefix all stored procedure classes in Visual Studio 2005 with SQLProcedure as the method attribute. The Visual Studio templates automatically create this attribute. Next, you need to get an instance of the SQLPipe class, which will let the managed stored procedure return result sets to the caller that instantiated the stored procedure. An instance of the SQLPipe class is available to managed stored procedures through the SqlContext class. The SqlContext class gives an instance of the current running context of the code to maintain the current connection properties and transaction context under which the CLR objects were invoked. Both the SQLPipe class and the SqlContext class are defined in the System.Data.SqlServer namespace, which the Visual Studio templates automatically reference. Next, you form the metadata collection that defines the column data types that you'll send to the SqlDataRecord method and later query using SQLPipe. You use the SendResultsStart, SendResultsRow, and SendResultsEnd methods to start, send, and terminate the data transmission, respectively.
Using CLR Triggers
Another useful feature of the CLR is the ability to create CLR-based triggers. Let's say you need to validate a piece of data at the point of insertion—for example, an email address that a user enters. Currently, applications rely heavily on front-end validation for such requirements and give you multiple ways to validate data from the front end or the application layer. But a SQL Server 2005 T-SQL enhancement lets you invoke stored procedures as a Web service, so SQL Server 2005 application developers must build validation into the back-end layers. The CLR can help you build robust back-end systems. In this example, I show you how to create the CLR ValidateEmail trigger, which Listing 2 shows.
To accomplish this task, you need to use the regular expressions that the code at callout A in Listing 2 shows, which are available in the .NET Framework. A regular expression (regex or regexp for short) is a special text string for describing a search pattern; you find regular expressions in the System.Text.RegularExpressions namespace. In addition, as with stored procedures, you must prefix the trigger classes with the <SQLTrigger(Event, Name, Target)> method attribute, which you can see in the code at callout B. In this attribute, Event describes the operation that will invoke the trigger. Name is the name of the trigger during deployment. And Target is the name of the table you'll attach the trigger to for the specified event. The Visual Studio templates add sample attributes to the class and we need to change the parameter appropriately; the attributes are important during the deployment phase when Visual Studio 2005 is used.
To validate the email address, you'll search for a pattern that looks like firstname.lastname@example.org. For stored procedures, you used SQLPipe to send data back to SQL Server. But to get a trigger context, you use the GetTriggerContext method that the code at callout C shows. Once you get this context, you have access to the INSERTED and DELETED tables in SQL Server's trigger-execution space. These pseudo tables are automatically populated as soon as a DML operation such as INSERT, UPDATE, or DELETE occurs on the base table. To access the pseudo tables, you need to create a SQLCommand object, as the code at callout D shows, and get the email address from the Email ID column in the INSERTED pseudo table.
Once you get the address from the EmailID column, you can use the Regex.IsMatch method that the code at callout E shows to validate all email addresses that have the string pattern you specified. No validation happens against any online mail server service; you don't need to use any connected system for validations, which is one of the advantages of using regular expressions.
Depending on the return value (a Boolean value of True or False), the trigger either raises an error, as the code at callout F shows, or lets the process insert the email address into the table. If the address is invalid, you'll get the error message that Figure 3 shows. (Web Listing 1, which you can download at InstantDoc 45673, shows the script to create this error message.) Notice that the error message includes the trigger name so that you can use the error message for corrective action. To add a bit more functionality to Listing 2's trigger code, if the email is valid, you can send an email message to notify the user that the address is accepted. The code in Web Listing 2 creates the acceptance message.
This email message code will run synchronously while the trigger code executes. With this extension to the trigger's functionality, you not only validate the email address but also add more value by using the CLR BCLs. More important, this back-end email address validation would have been next to impossible in previous SQL Server versions.
Using CLR UDAs
Another important SQL Server 2005 CLR innovation is the introduction of UDA functions. In addition to the standard SQL Server aggregate functions such as SUM, COUNT, MIN, and MAX, UDAs let you create your own aggregate functions for purposes such as consolidating values in a column, as well as other complex mathematical operations. For example, say you have a Department table that contains a product_key column and you need to find the product of all the values for a given department. This aggregate function isn't available in SQL Server 2000; you can write your own code to get the product, but most homegrown solutions are roundabout and inelegant, and can be achieved only by using temp-table explicit looping techniques or by using complex formulae with logarithmic expressions.
Listing 3 shows the Visual Basic .NET code to create a UDA that finds the product of the values in a column. Open Visual Studio 2005, and add User Defined Aggregate to the project. The template will list the framework methods for creating a UDA. The standard methods you need for this UDA are Init, Accumulate, Merge, and Terminate. These methods, which you see highlighted in Listing 3, are the invocation, the aggregate process, the merger (in case of parallelism), and the termination positions for the aggregate function. As with the other CLR objects we've created, the class needs to be prefixed with the SqlUserDefinedAggregate attribute, and in Visual Studio, the template automatically generates this attribute. You also need to mark the class as Serializable. A typical call to the new UDA would look like:
SELECT Dept_ID, dbo.clr_product(product_key) FROM #temp_dept GROUP BY Dept_ID
Figure 4 shows the output this query generates.
Let's look at another way you can unleash the power of UDAs. Say the Department table you're working with includes employee data, and you want a comma-separated list of all the employees for each department. If you had to write the T-SQL code for this request in SQL Server 2000, you'd have to create a temporary table and use all sorts of looping techniques to form the comma-separated values, as the script in Web Listing 3 shows. But with the CLR, you can use the concatenation UDA that Listing 4 shows and call it with the following simple statement:
SELECT Dept_ID, dbo.clr_concat(name) FROM #temp_dept GROUP BY Dept_ID
Figure 5 shows the output from this query. When you compare the code in Web Listing 3 and Listing 4, you might think that you had to write more code for the UDA. However, the UDA is a reusable code block that you can use across the database for multiple queries. Moreover, the UDA code is more elegant and has fewer string manipulations, which incur CPU overhead in SQL Server and don't perform as well as the .NET stringbuilder class. Using the stringbuilder class lets you utilize the existing .NET Framework's capability to optimize string manipulations.
Deploying CLR Objects
You can deploy CLR objects manually, or you can deploy objects automatically in Visual Studio 2005 by simply right-clicking the project and selecting Deploy from the resulting popup menu. The deployment operation in Visual Studio 2005 uses the custom attributes SqlProcedure, SqlFunction, SqlTrigger, and SqlUserDefinedAggregate to automatically create the routines in the database. To see a list of assemblies for a given database, go to Visual Studio's Object Explorer and select the database name, then select Programmability, Assemblies. For manual deployment, you can use DDL statements, as the code in Web Listing 4 shows. (For a brief explanation of the .NET security settings you use for assemblies, see the Web sidebar "Security Settings for CLR Assemblies in SQL Server" at InstantDoc ID 45948.) You can also optionally upload the assemblies' source code into SQL Server to use for debugging CLR objects.
You can also get information about CLR deployed assemblies from SQL Server's system tables. The sys.assemblies table provides information about the assembly; the sys.assembly_references table provides dependency references to other cataloged assemblies; and the sys.assembly_files table provides assembly CLR code (if you've uploaded the code).
A typical way to use system tables is to get information about the assembly attributes. For example, the statement that Web Listing 5 shows returns such assembly-related information as version information and the permission set under which this assembly is deployed. (For information about the security settings you can use for .NET assemblies, see the Web sidebar "Security for CLR Assemblies in SQL Server.") If you right-click the assembly and select View Dependencies in SQL Server Management Studio, you can view the list of CLR objects that you deployed with the code in Web Listing 4.
T-SQL Is Not Dead
Although CLR integration is generating a lot of excitement among SQL Server developers, the introduction of .NET languages into SQL Server doesn't mean you're going to have to replace all your T-SQL code. The CLR can boost productivity in some ways that are impossible in earlier SQL Server editions, so exploiting the advantages of the CLR is worth the challenge. But SQL Server 2005 also includes rich T-SQL enhancements that make T-SQL more robust, such as common table expressions (CTEs) that provide support for recursive queries; PIVOT and UnPivot, which provide the ability to transpose rows to columns and vice versa; ranking capabilities; and the addition of the XML data type and XQuery. These features take the implementation of set-based solutions to the next level.
To make a judicious choice about when to use T-SQL and when to use the CLR, remember that CLR extensions perform better when you need a high level of computation or text manipulation. For data-intensive tasks, T-SQL's set-based solutions perform better than CLR-based solutions because T-SQL works more closely with the data and doesn't require multiple transitions between the CLR and the SQL OS. In processes such as cryptography, text manipulation, I/O operations, and invoking Web services, the CLR offers rich APIs that open capabilities that aren't possible in SQL Server 2000 and earlier versions. Finally, test every piece of code for security, reliability, and performance. Any given task can have multiple implementations when you're using the .NET BCLs.
This article has only touched the surface of what is possible when you use the .NET 2.0 CLR objects in SQL Server 2005. Use the examples I've given you to imagine what you can do in your application's database tier, for example, now that you can access Web services from within the database logic. And explore the database objects that you can create to utilize the feature-rich .NET 2.0 Framework and programming languages.
"Creating Extended Stored Procedures," http://msdn.microsoft.com/library/
"Using CLR Integration in SQL Server 2005," November 2004, http://msdn.microsoft.com/library/
Preparing for SQL Server 2005, "Seeing the CLR's Potential," May 2005, InstantDoc ID 45753
Developer .NET Perspectives, "The CLR's Inclusion in SQL Server 2005," February 17, 2005, InstantDoc ID 45445 Developer .NET Perspectives, "Leveraging the CLR's Power," March 4, 2005, InstantDoc ID 45575