Improving Retrieval Performance

Downloads
27363.zip

Storing duplicate copies of the same word in the full-text table allows the possibility of duplicate results. Using the table FT_EmailSubject that Listing 2 in the main article creates, assume that FT_EmailSubject contains the row values that Table A shows. If you perform a search for the word Rose, the query returns MessageID 1 twice unless you collapse the MessageIDs by using a DISTINCT clause or a GROUP BY MessageID clause. But both of these methods cause an additional sorting step at query time.

To avoid this sorting cost, you can rewrite the queries to use an EXISTS predicate instead of a DISTINCT clause, as the query in Listing A shows. An alternative to this approach that reduces storage space and improves performance for some queries is to use a table like the one that Listing B creates, FT_EmailSubject2. MessageID and Word comprise a composite primary key on this table. In filling this table, the population process must filter out duplicate words. This filtering pushes the sorting cost onto the population process instead of making SQL Server perform a sort during each search.

Another improvement you can make is to include a NumAppearances field in the FT_EmailSubject2 table and store the number of times each word appears in the original string. You can use this field to indicate how relevant the results of a query are. Note that in both of these variations, the ability to perform nearness and phrase searches is lost because information about where the word appears in the original string is no longer available. If the position of individual words in a string isn’t stored, the searches available are effectively limited to OR and AND searches.

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottOur Experts will show you:
• Common SQL Server
Problems
• Best Practices for T-SQL
• SQL Server Integration
Services
• Database Development

Come See Michael Otey & Tim Ford in Person!

Early Registration Now Open

From the Blogs
May 21, 2013
blog

A Common Misconception about MAXDOP

Out of the box, SQL Server is (and has been) able to take advantage of multiple processors/cores without any effort on behalf of administrators....More
May 9, 2013
blog

My ISO 8601-Compliant Signature 2

My family recently just "officially" announced that we're in the process of adopting a child from South Africa. We're quite excited, of course, but there's a ton of paperwork to do—along with the need for gobs of signatures....More
May 8, 2013
blog

Use SSIS for ETL from Hadoop

In this blog post, Mark Kromer walks you through using SSIS as a way to use ETL techniques using Microsoft's Hadoop on Windows (HDInsight) as a source using Hive connectors...More
SQL Server Pro Forums

Get answers to questions, share tips, and engage with the SQL Server community in our Forums.