T-SQL UDF Quickly Determines How Often a Substring Appears

Downloads
96786.zip

Executive Summary:

Microsoft SQL Server 2005 or Microsoft SQL Server 2000 database administrators sometimes come across the problem of hitting the 900-bytes limit for unique constraints. To work around the 900-byte limit, Microsoft SQL Server 2005 or Microsoft SQL Server 2000 database administrators can create an additional computed column that gives the checksum value of the string column. Because Checksum is an int data type, a unique index for the computed column can be created


I created a function, dbo.occurrences, that calculates how many times a substring appears in a source string. Rather than looping through the source string and checking whether the substring still exists in the part of the string that hasn’t been scanned yet, dbo.occurrences performs a simple arithmetic calculation. The number of occurrences of a substring within a string is expressed by the formula

Occur (s,sub ) =
\{ len(s) – len (replace(s,sub,’’) \}
/ len(sub) \{len(sub) 0\}

This formula calculates the number of occurrences of a string within a string by first replacing each substring inside the source string with an empty string. Next, the formula finds the difference between the length of the original source string and the length of the resulting string after you perform the replacement operation just described. Finally, the difference is divided by the length of the substring, providing the substring isn’t empty. Listing 1 shows how dbo.occurrences implements this formula.

To use dbo.occurrences, you pass in a source string and a substring as varchar(max) parameters. For example, if you want to see how many times the letter i is in the string Eli Leiba is the king, you use the call

select dbo.occurrences
(‘Eli Leiba is the king’,’i’)

which produces the result of 4. If you want to see how many times the substring ll is in the string Hello tell me hello, you use the call

select dbo.occurrences
(‘Hello tell me hello’, ‘ll’)

which produces the result of 3. If you try other pairs of source strings and substrings, you’ll realize that this simple calculation does the job!

The Occurrences.sql file contains the dbo.occurrences function. You can download Occurrences.sql from SQL Server Magazine’s Web site. (Go to www .sqlmag.com, enter 96786 in the InstantDoc ID text box, then click the 96786.zip hotlink.) I tested this function on SQL Server 2005, SP1. To make it work on SQL Server 2000, you need to use varchar(8000) instead of varchar(max) for the two parameters’ type.

—Eli Leiba, Senior Application DBA, Israel Electric Company

Discuss this Article 3

Janusz (not verified)
on Nov 6, 2007
Maybe it will be faster to instead of using your logic use: len(replace(s,sub,sub + '*') - len(s) It should work faster I supposed because you have no division and you do not have to check if you string has any characters inside.
ANTHONY (not verified)
on Feb 21, 2008
Another dead link. http://www.sqlmag.com/files/09/96786.Figure_01.jpg
AnneG_editor
on Feb 22, 2008
I've reported this issue to our Web team and we'll get it resolved ASAP. Thanks for your patience.

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottOur Experts will show you:
• Common SQL Server
Problems
• Best Practices for T-SQL
• SQL Server Integration
Services
• Database Development

Come See Michael Otey & Tim Ford in Person!

Early Registration Now Open

From the Blogs
May 21, 2013
blog

A Common Misconception about MAXDOP

Out of the box, SQL Server is (and has been) able to take advantage of multiple processors/cores without any effort on behalf of administrators....More
May 9, 2013
blog

My ISO 8601-Compliant Signature 2

My family recently just "officially" announced that we're in the process of adopting a child from South Africa. We're quite excited, of course, but there's a ton of paperwork to do—along with the need for gobs of signatures....More
May 8, 2013
blog

Use SSIS for ETL from Hadoop

In this blog post, Mark Kromer walks you through using SSIS as a way to use ETL techniques using Microsoft's Hadoop on Windows (HDInsight) as a source using Hive connectors...More
SQL Server Pro Forums

Get answers to questions, share tips, and engage with the SQL Server community in our Forums.