How to prevent table scans when searching LOB data?

Question: Our schema involves a varchar (max) column which contains a set of data fields that come from another system. We can’t change that part of the system but we’re having problems with blocking when running update queries that involve searching that column. Is there anything we can do?

Answer: This is a question that came up today from one of the students in our current class.

The scenario is a fairly common one where a LOB column is being used to store a structured set of data (sometimes called a ‘property bag’) that is infeasible (for whatever reason) to break up into separate columns to store in a relational table. If a SELECT has the LOB column in the WHERE clause then the query plan will involve a table scan – as you cannot create a nonclustered index over a LOB column.

If a transaction involves UPDATEs and a SELECT that searches the LOB column, then there is the possibility of blocking occurring. Update operations require exclusive locks on the resource being updated (e.g. table rows) and these locks must be held until the transaction commits. If an UPDATE occurs, taking an exclusive lock, in the first part of a transaction and then a SELECT occurs that causes a table scan, that table scan might take a long time to execute. This makes the transaction take a long time overall and so the exclusive lock from the UPDATE operation is held for a long time. This may cause another transaction to block, waiting for the UPDATE’s exclusive lock to be released.

So how can you get rid of the table scan without completely shredding the LOB value into a table itself? Simple – you need to create a nonclustered index that allows the SELECT to be an index seek rather than a table scan.

But what to pick as the index key? You need to create a computed column that pulls a substring out of the LOB value, where the substring is one of the fields in your structured data inside the LOB value. Then you can create a nonclustered index over that computed column. This is a similar concept to creating a secondary XML index on an XML column – essentially pulling out one field as the index key.

This isn’t an ideal situation but it does solve the huge performance problem and I’ve seen this done several times successfully when there are no alternatives.

Discuss this Blog Entry 2

on Apr 27, 2012
Could a fulltext index work here?
on Apr 27, 2012
That's a good technique, provided (a) you're using a new enough SQL Server version so that you can have a computed column based on a LOB column; (b) you can figure out what value might be useful for searching within the LOB. Of course, there is going to have to be a table scan done (but only once) to read all values from the LOB column in order to build the index. I'd love to see you discuss a pattern that I've tried to use a few times -- not, however, specifically involving LOB data. You can use a combination of "top" and a where clause based on the clustered index column(s) to do the work (a glorified table scan) in "chunks", each as its own transaction, so that the locks are held for less time. If you definitely need the a single transaction to cover all the changes (so you don't partially update the table), and you're hopefully not updating a high percentage of rows, you can have each "chunk" be read-only against the "target" table, and write the needed data (primary key and new values for column[s] to change) to a temporary table. Then you update the target table from the temporary table in one shot. (Of course if the data could be changing out from under you, some additional work might be required to avoid problems.) Is this a standardized pattern? Is there something wrong with it?

Please or Register to post comments.

What's SQL Server Questions Answered?

Practical tips and answers to many of your questions about SQL Server including database management and performance issues.

Contributors

Paul S. Randal

Paul Randal worked on Microsoft's SQL Server team for nine years in development and management roles, writing many of the DBCC commands. Randal was ultimately responsible for SQL Server 2008'...

Kimberly L. Tripp

Kimberly L. Tripp has been working with SQL Server since 1990, and she’s worked as a consultant, trainer, speaker, and writer specializing in core SQL Server performance tuning and availability...
Blog Archive

Sponsored Introduction Continue on to (or wait seconds) ×