When you're deciding the best mechanisms for storing your user data, you often need to find a way to store large data objects. These objects could be character strings representing textual content (for example, book chapters), or they could be binary data storing bitmap images, sound, or video files. SQL Server releases before 7.0 limit the size of a data object in a SQL Server table to 255 bytes. With SQL Server 7.0, the limit increases to 8000 bytes. All releases can store larger data objects by using the special data types text and image, which let you store as much as 2GB of data outside the table on a set of special pages. Let's look at how SQL Server stores these special data types and discuss some factors that can affect your decision about whether to implement these data types. Finally, we'll look at a special mechanism in SQL Server 2000 that lets you store text data in the table when it will fit.
For this article, I refer to large data objects as LOBs. As of SQL Server 7.0, the LOB types are text, ntext, and image. The text data type contains ASCII characters, ntext contains Unicode characters, and image contains binary data. Images are sometimes called BLOBs (binary large objects).
When you include a column of LOB data in a table, the sp_help output for the table shows that the LOB data takes up only 16 bytes. Those 16 bytes hold a pointer to a location inside SQL Server—but outside the table—where the LOB data is stored. If you have more than one LOB column in the table, each row will contain multiple 16-byte pointers. In SQL Server 6.5, the LOB pointer is the address of a 2KB page that's the start of a linked list of pages (if more than one page is needed). Even if you need only 1 bit more space than the 255-bit maximum size, SQL Server 6.5 allocates a whole 2KB page and accesses that page in a separate read operation any time it needs the LOB data. If you have thousands or millions of bytes of data, the wasted storage isn't the most important problem, but the efficiency of finding data is limited. Although SQL Server 6.5 allows a substring search on LOB data, it has to follow each page chain to the end to determine whether the row contains what it's looking for. Searching for and replacing data in the middle of a LOB value is also problematic because there's no easy way to get to the middle of the data.
SQL Server 7.0 overcame some of these limitations by completely changing the way it organizes LOB data. One enhancement increased the maximum length of character or binary fields to 8000 bytes. (Note that for the ntext data type, you get 4000 characters because each Unicode character requires 2 bytes for storage.) When you need more than the maximum in-row data, you can still use the LOB types. Another improvement is that instead of storing the LOB data for each row in a separate linked list of pages, each table has only one collection of special pages to hold all the LOB data for that table. A page that stores the LOB data isn't limited to holding data for one LOB column. LOB data pages can hold data from multiple rows and can even have a mix of text, ntext, and image data on one page.
Although developers who work with LOB data often think of the data as being stored in one string of bytes, the data storage is completely different. SQL Server 2000 and 7.0 organize the LOB pages in a B-tree structure so that operations starting in the middle of the string are more efficient than in SQL Server 6.5. In addition, the potential for wasting large amounts of space isn't so severe because many short data values can use space on the same page.
When deciding whether to use SQL Server LOB data types, developers often consider avoiding them completely and instead storing a character string specifying an OS file outside the database that contains the desired data. For example, if each column of your LOB data is a photograph in bitmap form, you might store the location of an external OS file in every row, rather than store the pointers to additional pages within the database. When Microsoft first introduced the LOB data types to SQL Server, accessing LOB data was inefficient. If you frequently accessed a large percentage of the LOB information, you'd usually have to use the external files to get good performance. However, SQL Server has greatly improved its access methods in the last several releases, so now the choice isn't as easy. Which method is faster depends on how big your LOB data is, how often you need to access it, and what you want to do with the data after you access it.
SQL Server manages the special LOB data pages internally just as it does all other data pages. SQL Server brings the pages into its cache when they're first accessed; if your applications frequently access the same LOB data, those pages will stay in the cache, giving you much faster subsequent access. If the LOB data pages aren't reaccessed, SQL Server writes them to the list of free pages to be replaced in the cache eventually by other pages. So if you access a small subset of your LOB data frequently, your best choice might be storing the data in the database and using text, ntext, or image columns. In some situations—such as when there are only a few such pointers, when you don't care about backing up the LOB data, or when you know you'll never be moving the database—storing your own pointers to files outside the database still might be better. However, in most cases, using LOB data types on their own special pages inside SQL Server gives you the best performance. To know which method is faster for your applications and data, you need to run your own comparison tests.
However, performance might not be your main consideration in choosing a storage method. You also need to consider ease of management. Storing the LOB data outside SQL Server gives you a large number of separate files to manage. If you need to move or copy your SQL Server installation, you must also move thousands (or more) of external files. In addition, your SQL Server backups won't contain the LOB data. Admittedly, the LOB data might not be as volatile as the other data in your database, so you might not need to back it up. However, if you don't include the LOB data in the database, a database restore after a complete system failure won't include the LOB files.
SQL Server 2000 added functionality that lets you decide whether and when you want SQL Server to store small chunks of LOB data on the data pages. You can set the threshold for small at the time you enable this option. You must enable the option for each table separately by using the stored procedure sp_tableoption. The following example enables the text in row option for a table called texttable and lets you store as much as 300 bytes of LOB data in the data row:
Any LOB data in texttable that exceeds 300 bytes is stored outside the data row.
No data movement occurs as an immediate result of enabling the text in row option. However, after you turn on this option, whenever you insert or update a row containing less than 300 bytes of LOB data, SQL Server moves the LOB data from the special pages into the table row. Consequently, a very large table could see a lot of activity. When you disable the text in row option by running the sp_tableoption procedure and supplying a size limit of 0, the movement of LOB data back to external pages happens immediately.
The code in Listing 1 uses a simple table to demonstrate when the data movement occurs. The script builds a table with five columns, including one text column, and populates the table with one row. Running sp_spaceused shows that the table has four pages reserved, with three used for data. After the code enables the text in row option, the space-usage information doesn't change, but when the code updates the row, the report changes to show that only three pages are reserved for the table. Because the text data has moved into the row, the table no longer needs the additional text page. I won't discuss the exact meaning of the values that sp_spaceused shows; what's important is at what point the number of pages changes. Finally, you can see that when the code disables the text in row option, the number of pages in the table immediately increases to four because the data has moved off the row and back onto its own page.
Storing small LOB columns in the data rows can dramatically increase how efficiently SQL Server can manage your data. The more LOB data you can put in the data rows, the greater increase you'll see. However, keep in mind that if you're storing large chunks of LOB data in a data row, the row has less room for other data, so you'll need to limit the number of columns or the size of the data types you use in the other columns.
Remember, when determining how to store your LOB data—whether in separate files, in special pages in the database, or in the data rows themselves—there's no one right answer for everyone in every circumstance. Understanding the way SQL Server manages LOB data can help you make your decision, but there's no substitute for testing the various possibilities to see which method is best for you.