No matter what your job, you need to understand the CLR
One of the most innovative features in the upcoming SQL Server 2005 release, formerly code-named Yukon, is Common Language Runtime (CLR)-based code execution. If you're a developer or your job combines the duties of developer and DBA, you probably know about the CLR. But if you're a DBA who doesn't work with the development side of SQL Server, you might have tuned out the CLR, assuming it won't affect you. If you haven't been paying attention to the CLR, I have some advice for you: The roles of all people who work with SQL Server are changing, and you need to learn this new technology whether you call yourself a DBA, a developer, or a programmer.
In SQL Server 2000 and earlier releases of the database platform, you coded SQL Server procedures, triggers, functions, and other objects in. In SQL Server 2005, you can also create these objects in C# and Visual Basic .NET, and Microsoft plans to continue adding .NET Framework CLR languages to SQL Server. Microsoft even wrote significant parts of SQL Server 2005 in .NET languages such as managed C++, C#, and Visual Basic .NET. In general, SQL Server 2005 exposes .NET Framework functionality by letting you
- write SQL Server executable code such as stored procedures and triggers in CLR-based C# or Visual Basic .NET and invoke those stored procedures the same way you do T-SQL stored procedures
- write CLR-based functions in C# or Visual Basic .NET and invoke them the same way you do T-SQL functions
- write CLR-based user-defined data types (UDTs) and aggregates and use them and their associated code to define your tables and business rules
In SQL Server 2000 and earlier releases, you can call external code by using extended stored procedures, which call a precompiled DLL coded to the Open Data Services (ODS) API specification. You use these extended stored procedures to perform complex mathematical calculations, expose functionality in the OS, perform CPU-intensive operations, or just provide functionality that T-SQL can't perform without assistance. One problem with using extended stored procedures is that SQL Server must fully trust the code. Because you use unmanaged code to create extended stored procedures and they run in the same address space as SQL Server, a problem in an extended stored procedure could harm SQL Server—in fact, it could blue-screen the system. Not good. CLR-based code execution means SQL Server developers and DBAs can further and safely extend the power of their procedures.
To see the power of CLR-based code execution, let's look at how I created and deployed a semi-complex CLR assembly project designed to perform a common task: capture and encrypt credit card information. I used the bits of Visual Studio 2005 (formerly code-named Whidbey) and SQL Server 2005 that Microsoft released at its Professional Developers Conference (PDC) in October 2003, although you can create all my examples by using the current release of Visual Studio .NET. The pre-beta versions of Visual Studio 2005 don't integrate with SQL Server 2005 yet, but Microsoft plans to reintegrate them this summer. You'll need to construct the code and deploy the assembly DLLs to SQL Server 2005 by using T-SQL scripts, as SQL Server Books Online (BOL) explains.
A word of caution: Remember that the bits I use are alpha code. Thus, big hunks of it work, but Microsoft development teams are still ironing out the details and interfaces of the final product. It's also not fair to evaluate performance based on this build.
Understanding the CLR
The .NET Framework, which Figure 1 illustrates, is built on a language-neutral CLR architecture that lets developers use virtually any computer language to construct .NET applications. Think of the language as the human-to-computer interface—there are as many languages as there are types of programs and programmers. At this point, Microsoft expects SQL Server 2005 to support three languages: T-SQL, C# .NET, and Visual Basic .NET. The stored procedures you're using in SQL Server 2000 and earlier releases should continue to run, and you'll be able to create new stored procedures, functions, and UDTs in any of the three supported languages. Unlike T-SQL, the .NET languages can draw on any of the more than 5000 classes that the .NET Framework supports.
Figure 2 illustrates how the CLR works. Visual Studio launches the CLR compiler, which takes a block of code called an assembly and generates a DLL that contains an intermediate language (IL) form of the code. The CLR converts the code to native machine code and executes it.
I recently met with a group of Microsoft Regional Directors and magazine editors to discuss upcoming Microsoft technology. The group agreed that the CLR is exciting but that users need to understand it better. Many of us expressed the fear that developers might assume that CLR-based stored procedures would be faster, better, cooler, and otherwise superior to traditional T-SQL stored procedures. Sometimes, folks tend to oversimplify the viability and suitability of CLR-based executables. For example, I read an article that said, "A .NET stored procedure is no different in usage than a regular SQL stored procedure." Such broad statements extolling the virtues of CLR-based executables are dangerous. My concern is that developers might try to use CLR-based code to do something simple such as convert Fahrenheit to Celsius, and when they discover that the new code runs slower than the T-SQL it's replacing, they'll abandon the CLR. Let's look at the technology more closely and try to ground ourselves in reality.
Grounded in Reality
Consider that calling a DLL from T-SQL is similar to calling an external (out-of-process) COM object. Established best practices still apply: Call external routines fewer times while using more parameters, and spend as much quality time as possible in the invoked code. By "quality time," I mean that the time your application spends in the external code should be a productive and efficient use of CPU cycles. The first time SQL Server calls an external CLR DLL, SQL Server must load the code into memory, compile it into runtime code, and execute it. After SQL Server takes these steps, calling the external object is less costly than calling it without this preparation—but still more expensive than calling in-process code. The result? Calling a CLR object can be considerably slower than calling a T-SQL stored procedure.
The Microsoft Regional Directors and I came up with several ways that SQL Server-based CLR assemblies can be a viable alternative to T-SQL-based procedures. SQL Server developers have already implemented solutions for virtually all these examples by using extended stored procedures and COM DLLs, but each of the following solutions would likely benefit from a rewrite to CLR-based code:
- Replacing or augmenting the functionality of large, complex T-SQL stored procedures—especially those that perform mathematical calculations such as complex business or scientific formulas. T-SQL's math skills aren't much better than mine, so offloading that complex math to a CLR-based DLL might be beneficial.
- Accessing a Microsoft Project file to update complex scheduling tasks. The logic in the CLR procedure lets an application interact with the Project data structures that T-SQL can't access directly. This approach can apply to any external data structure that doesn't have a traditional data-access interface such as process-control instrumentation or non-mainstream files.
- Performing complex geographic-mapping algorithms. By using Geographic Positioning Satellite (GPS) coordinates, one of the Regional Directors was able to write a query for the request, "Show me all the airports within 5 miles of this city." The CLR procedure that executed the geometry functions simplified the rather complex process of writing intelligent queries against a mapping database.
- Performing complex engineering calculations. For example, my daughter George brought me a 6-inch thick book of chemical-engineering formulas. With the CLR, you could code these formulas and easily access them from T-SQL procedures as functions. Again, be careful that the amount of CPU time needed to compute the formula justifies the trip to the CLR code and back.
In each of these examples, CLR code extends the power and reach of the T-SQL query language. That's what CLR-based procedures are for.
Testing the Concept
To test the effectiveness of a basic CLR-based stored procedure, I wrote a simple .NET assembly in Visual Basic .NET to convert Fahrenheit to Celsius and back. I tried two approaches. First, I called the assembly from a T-SQL stored procedure and discovered that it took roughly eight times longer to execute than the equivalent conversion code in T-SQL.
Next, I modified the .NET assembly and coded it so that the test application called it as a CLR-based T-SQL function. The performance of this version was better than the first approach but still about five times slower than the equivalent T-SQL code. However, I wrote the T-SQL code in one simple expression, so I'm not surprised at the result. The T-SQL code didn't require any higher math to compute—just simple arithmetic.
Of course, simple conversion isn't what CLR-based stored procedures are for—they're intended to call routines that are difficult, expensive, or impossible to execute from T-SQL. CLR-based stored procedures are designed as a replacement for extended stored procedures—not simple expressions.
After consulting with Peter Blackburn, my UK-based technical guru, I decided to try something a bit more ambitious—and more realistic. This time, I chose to implement RSA encryption by using the RSACryptoServiceProvider to take a given string and return an encrypted byte array. Almost 3 weeks later, I was able to get the new application working. Figure 3 shows a diagram of how my application works. Along the way, I discovered several problems (which I describe in a moment) that forced me to create much of the project in T-SQL batches. I also discovered a new key to debugging, which I describe in the sidebar "T-SQL and SQL CLR Debugging," page 14. The benefit of this experience is that I'm more familiar with SQL Server 2005's new SQL Server Management Studio (formerly called Workbench), which replaces Enterprise Manager, Query Analyzer, and a few other utilities. Microsoft is currently integrating the Management Studio tools into Visual Studio 2005 to make managing SQL Server databases even easier.
The process I used to create this application was backward when compared to the way that many people design applications:
- I started with a paper design (a radical concept for some). This step helped me keep track of my work when my development process was interrupted.
- Next, I wrote the CLR-based encryption code in a Windows application and tested it using Visual Studio .NET 2003 and the .NET Framework 1.1. I reverted to this earlier release of the .NET Framework because of problems in the 2.0 release that prevented me from converting strings to byte arrays and back.
- Then, I loaded the Visual Studio 2005 PDC bits and imported the application that used the .NET Framework 1.1. I altered the code to pass a varbinary byte array from and back to the conversion routines. This method was more efficient than converting the byte arrays to Unicode strings before transport.
- I wrote scripts to create the T-SQL stored procedures that called the CLR-based encryption functions and deployed the CLR DLL assembly. This example let me deploy the assemblies directly from Visual Studio 2005, but when I called the encryption classes, I got protection violations. I had to hard-code the assembly code as UNSAFE to get it to work. Also, when I used the Visual Studio 2005 deploy feature, Visual Studio dropped any T-SQL stored procedures that depended on calls to the CLR assemblies. I expect Microsoft will solve this problem in later builds.
- Next, I wrote a script to create the test table CCards and set the table's access permissions. (This script and all the SQL scripts for this example are available for download at http://www.sqlmag.com, InstantDoc ID 42208.)
- I created scripts for T-SQL stored procedures to add rows to the test table. The scripts included code to set execute permissions on the procedures.
- Finally, I wrote a VB front end to capture parameters from the user and call the T-SQL procedures. The T-SQL procedures called the CLR-based code to encrypt the value and store that value as a varbinary data type in the test table. The application also included code to fetch a row from the test table and attempt to decrypt the value by using a selected private key.
As Figure 4 shows, the example program captures a user's credit card number and an expiration date. When the user clicks Submit, the code calls a stored procedure that calls a CLR assembly, which in turn generates a private RSA encryption key and uses the key to encrypt the credit card number. The stored procedure saves to the database the encrypted data (now in the form of a byte array) and the expiration date. The code saves the private encryption key and displays it in the client application to assist in debugging the application. However, the private encryption key isn't stored on the server; this way, only the user can decrypt the server-side value—and then only if the client-side application persists the value locally. I can't compare the example encryption and decryption routines with equivalent T-SQL code, which doesn't support this functionality, but this example shows the mechanics of creating CLR-based code, and my informal tests showed that it performed well. However, I suggest you experiment with your own CLR code in an environment that matches your production system—don't assume that your code will perform the same way mine did.
When the user wants to retrieve the credit card number so that some server-side function can use it, the client-side application calls a retrieve stored procedure that takes the encryption key, looks up the customer record by ID and expiration date, and decrypts the value by using another CLR-based procedure. To help debug the example application, the code returns the decrypted credit card number to the client application, but I expect that in a real-world solution, the server-side application would simply use the unlocked value to perform some other operation.
Note that this application calls the CLR stored procedures from T-SQL procedures. I found this design more convenient than calling the CLR procedures directly, and I expect that's what most developers will end up doing. Another approach would be to code the CLR procedures as functions and embed them in T-SQL code.
This article introduced you to some concepts and realities of CLR-based stored procedures. You've seen how the procedures work and where they fit in the bigger scheme of a high-performance database management system (DBMS). In an upcoming article, I'll show you how to use Visual Studio 2005 and SQL Server 2005 to code and test the CLR-based stored procedures that this article's test application invokes.