SQL Server 2000 and 7.0 can store large data objects (LOBs) by using the special data types text, ntext, and image, as I mentioned last month in "Off the Record." These data types let you store as much as 2GB of data outside the table—but inside the database—on a set of special pages. The special mechanism in SQL Server 2000 that lets you store this LOB data in the table if it will fit is called text in row. Let's look at a few other aspects of dealing with LOB data, including more differences between working with LOB data that's stored in its own pages and LOB data that's stored in the data row. I examine only text LOB data here because it's easiest to show in print.
You can read and modify text data by using the T-SQL commands SELECT, INSERT, UPDATE, and DELETE, but when text data is very large, using these commands is often awkward. So if you're setting a text field to a new value, your UPDATE statement might take up 2GB or more, and your application might not be able to pass a T-SQL query that big to the server. You could try to embed the UPDATE statement in a stored procedure and have the application call the stored procedure because you can use LOB data types as parameters for stored procedures and user-defined functions (UDFs).
If your text data grows very large, you might choose to deal with smaller chunks of data instead of the entire LOB column. T-SQL has three special statements for working with text data: WRITETEXT, READTEXT, and UPDATETEXT. Both READTEXT and UPDATETEXT let you work with chunks of a text column. WRITETEXT doesn't let you deal with chunks, but it can work with very large columns that might be awkward to modify by using the usual INSERT or UPDATE commands.
When working with WRITETEXT, READTEXT, and UPDATETEXT statements, you need to use a text pointer to access the LOB data. A text pointer is a unique varbinary(16) value that indicates for each text, ntext, or image column in each row which LOB value you're working with. You obtain a text pointer's value by using the function TEXTPTR() in a SELECT statement containing FROM and WHERE clauses that tell which row you're interested in. Make sure that the SELECT statement that retrieves the text pointer returns only one row, preferably by using an exact match on the primary key value in the WHERE clause; that approach ensures that, at most, one row meets the criteria.
Before using the READTEXT statement, you need to ensure that the text pointer you have is valid. If you find a row with the criteria you specified and the text pointer for that row has been initialized (given a value), the text pointer is valid. You can check a pointer's validity by using the TEXTVALID() function in a separate statement. Or you can use the code that Listing 1 shows to check that you don't have a NULL value in the variable you assigned to the text pointer.
The argument you supply to the TEXTPTR() function is the name of a column that contains LOB data. When using READTEXT, you can specify the starting position within the LOB data (the offset) and the number of bytes for SQL Server to read. You can use READTEXT with the PATINDEX() function to find the offset (in number of characters) at which some string or pattern exists. You can also use it with the DATALENGTH() function to determine the size of the text column in bytes. If you're working with ntext data, you have to divide the DATALENGTH() value by 2 to get the number of characters. But you can't directly use these functions as parameters for the READTEXT statement. Instead, you must obtain values for them beforehand and store those values in local variables, which you can then use with READTEXT. The code in Listing 1 finds the pattern "Chicago" in the pr_info column for pub_id 1622 in the Pubs database's pub_info table and returns the contents of that column from that point on.
If you run Listing 1's code in Query Analyzer, you might not see the entire result. By default, Query Analyzer can return 256 characters for a text column. Use the Tools, Options command and go to the Results tab to change this limit. Alternatively, you can use the system function @@TEXTSIZE, whose value is unrelated to the Query Analyzer setting for the maximum number of characters that can be returned. The @@TEXTSIZE function controls the serverwide limit on the number of bytes a LOB column can return, whereas the character limit in Query Analyzer controls the client tool's display. You can change the value of @@TEXTSIZE for your connection by executing SET TEXTSIZE n. The default value for @@TEXTSIZE is 64KB.
I won't cover all the details of using the READTEXT, WRITETEXT, and UPDATETEXT commands; SQL Server Books Online (BOL) describes them well. However, let me mention one often-overlooked fact about working with text pointers. Because you have to use multiple statements when you're working with text data (one to get the text pointer and another to retrieve the text data), SQL Server doesn't guarantee read repeatability. Even if you enclose both the SELECT and the READTEXT statements in a transaction, there's no guarantee that the value read by the SELECT will still be valid when READTEXT executes. Suppose you retrieved a text pointer for the row containing pub_id='1622', but before you could use it with READTEXT, another user deleted and reinserted the row for publisher 1622. The text pointer would no longer be valid. In Listing 1, the window for this occurrence is small because SQL Server executes READTEXT immediately after obtaining the text pointer. But a window of vulnerability still exists. In an application, the window might be wider. If the text pointer is invalid when the READTEXT operation occurs, you get an error message like this:
Invalid text, ntext, or image pointer value
To keep your text pointer from becoming invalid, put the two statements into a transaction and add the REPEATABLEREAD hint on the table in the SELECT statement you use for the text pointer, as Listing 2 shows.
Alternatively, you could change the isolation level of the connection to Repeatable Read, which isn't a bad solution. However, doing so changes the isolation behavior for all operations on that connection. To affect just the lock on the row that has the text pointer, the REPEATABLEREAD hint is a better choice. (Two other locking hints, HOLDLOCK and SERIALIZABLE, would also prevent the text pointer from being invalidated, but they're stronger than is necessary. Not only do these two hints, which are equivalent, prevent the retrieved data from changing, they can also prevent insertion of new rows into the table.)
When SQL Server acquires a text pointer for text data in a table that has text in row enabled, the data row is locked with a shared lock if the transaction's isolation level is the default READ COMMITTED or higher and the database isn't in read-only or single-user mode. The lock ensures that nobody else can modify or delete the row while you have a text pointer on a text value from that row. SQL Server releases this lock when the text pointer becomes invalid. At the end of the transaction, all text pointers are invalid, so the locks are all released. The shared locks on rows with in-row text are held until the end of a transaction, so you can't get a text pointer for in-row text except in a user-defined transaction.
Listing 3 creates a new table that has a text column and inserts one row into it. It then attempts to obtain a text pointer and use READTEXT to return some of the data in the text column. The code in Listing 3 doesn't work, and the error message you receive is misleading:
You cannot use a text pointer for a table with
option 'text in row' set to ON.
Server: Msg 7133, Level 16, State 1, Line 4
NULL textptr (text, ntext, or image pointer)
passed to READTEXT function.</i>
However, attempting to perform the same text pointer assignment and READTEXT operation inside a user-defined transaction, as Listing 4 shows, succeeds.
When the text data is outside the data row, you don't have to use a user-defined transaction except to guarantee that the text pointer remains valid. For text-in-row data, you can't use a text pointer unless you've defined a transaction.