Congratulations to Marcin Motyl, a computer programmer and analyst for Aster City Cable in Poland, and Juan C. Calderon, an IT security senior auditor for GE Ddemesis Servicios S.A. de C.V. in Aguascalientes, Mexico. Marcin won first prize of $100 for the best solution to the May Reader Challenge, "Query Strings." Juan won second prize of $50. Here’s a recap of the problem and the solution to the May Reader Challenge.
Maria is the database developer for a company that analyzes Web data and provides reports based on the traffic. One such report provides a list of the most popular URLs, based on number of hits. The company’s databases run on SQL Server 2000. Maria has a QueryStrings table that contains query strings from various URLs and a Urls table that maps the query strings to specific URLs. The schema for the tables and some sample data follows:
CREATE TABLE QueryStrings ( QSID int NOT NULL IDENTITY PRIMARY KEY, QSVal varchar(8000) NOT NULL ) INSERT INTO QueryStrings VALUES( 'a=1&b=2&c=3' ) INSERT INTO QueryStrings VALUES( 'a=1&b=4' ) INSERT INTO QueryStrings VALUES( 'b=2&a=1&c=3' ) INSERT INTO QueryStrings VALUES( 'a=1&b=24' ) INSERT INTO QueryStrings VALUES( 'a=11&b=24' ) CREATE TABLE Urls ( UrlID int NOT NULL IDENTITY PRIMARY KEY, QSID int NOT NULL REFERENCES QueryStrings( QSID ) ) INSERT INTO Urls VALUES( 1 ) INSERT INTO Urls VALUES( 1 ) INSERT INTO Urls VALUES( 2 ) INSERT INTO Urls VALUES( 2 ) INSERT INTO Urls VALUES( 3 ) INSERT INTO Urls VALUES( 4 ) INSERT INTO Urls VALUES( 5 ) INSERT INTO Urls VALUES( 5 )
Maria uses the QueryStrings table in several queries and JOIN conditions to filter the URLs based on the search string specified by the user requesting the report. The following sample query joins the QueryStrings table with the Urls table:
DECLARE @SearchString varchar(8000) SET @SearchString = 'a=1&b=2' SELECT u.UrlID, u.QSID, q.QSVal FROM Urls AS u JOIN QueryStrings AS q ON q.QSID = u.QSID WHERE '&' + q.QSVal + '&' LIKE '%&' + @SearchString + '&%'
In the sample query, Maria searches for URLs that contain only certain query-string parameters and values such as a=1&b=2. Given the sample data, the query will return two query strings with QSID=1 and QSID=3. Now she wants to modify the query so that it returns all query strings that contain the parameters being searched for irrespective of their order. Help Maria find a set-based solution for her new requirement.
Maria first uses a table-valued user-defined function (UDF) to split the query string, based on the delimiter, an ampersand - &. By doing this, she obtains a virtual table or result set that contains each query-string parameter and value as a row. The UDF that returns the query-string parameters and values as rows follows:
CREATE FUNCTION GetQSTokens ( @QueryString varchar(8000) ) RETURNS @Tokens TABLE( pos tinyint IDENTITY, VALUE varchar(8000) ) AS BEGIN /* Returns tokenized list of query-string parameters. The query string is split based on '&' as the delimiter. We can now use the UDF result to search each value efficiently. */ DECLARE @Pos int IF RIGHT( @QueryString, 1 ) <> '&' SET @QueryString = @QueryString + '&' WHILE( 1 = 1 ) BEGIN SET @Pos = charindex( '&', @QueryString ) IF @Pos = 0 BREAK INSERT INTO @Tokens VALUES( LEFT( @QueryString, @Pos - 1 ) ) SET @QueryString = substring( @QueryString, @Pos + 1, 8000 ) END RETURN END
The GetQSTokens UDF splits the search string based on the delimiter character and returns a list of parameters. For example, if you run the following query with the search string a=1&b=2, it returns two rows, one with a=1 and another with b=2:
DECLARE @SearchString varchar(8000) SET @SearchString = 'a=1&b=2' SELECT * FROM dbo.GetQSTokens( @SearchString )
Now Maria can use this user-defined function in the WHERE clause of the original query:
DECLARE @SearchString varchar(8000) SET @SearchString = 'a=1&b=2' SELECT u.UrlID, u.QSID, q.QSVal FROM Urls AS u JOIN QueryStrings AS q ON q.QSID = u.QSID WHERE NOT EXISTS( SELECT * FROM dbo.GetQSTokens( @SearchString ) AS v WHERE '&' + q.QSVal + '&' NOT LIKE '%&' + v.Value + '&%' )
The WHERE clause of the subquery uses the table-valued function to create a virtual table based on the search string, then uses the virtual table to determine whether at least one parameter exists that isn’t present in the query string. The search predicate
'&' + q.QSVal + '&' NOT LIKE '%&' + v.Value + '&%'
returns true if there is any value that isn’t present in the search string. Using NOT EXISTS with this search condition in the subquery, Maria filters out any URLs that don’t contain one of the search values. She could also formulate the search condition as a COUNT(*) subquery, as follows:
WHERE ( SELECT count(*) FROM dbo.GetQSTokens( @SearchString ) AS v WHERE '&' + q.QSVal + '&' NOT LIKE '%&' + v.Value + '&%' ) = 0
With the COUNT(*) approach, Maria can test whether all the search values are present in a particular query string. This method can be more expensive than the NOT EXISTS approach if you have a lot of parameters. The NOT EXISTS approach has the added advantage of returning true if the condition matches for one of the search values, a more efficient approach than using a scalar UDF because the UDF is executed once for the entire query instead of once for every row being searched. The NOT EXISTS method is also closer to a complete set-based solution that uses SQL Server 2000’s features. The search can also be achieved by having the client code dump the normalized search string into a temporary table or build the temporary table in a stored procedure, then joining with that too.
You can also rewrite the previous query as:
DECLARE @SearchString varchar(8000) SET @SearchString = 'a=1&b=2' SELECT u.UrlID, u.QSID, q.QSVal FROM Urls AS u JOIN QueryStrings AS q ON q.QSID = u.QSID JOIN dbo.GetQSTokens( @SearchString ) AS v ON '&' + q.QSVal + '&' LIKE '%&' + v.Value + '&%' GROUP BY u.UrlID, u.QSID, q.QSVal HAVING COUNT(*) = ( SELECT COUNT(*) FROM dbo.GetQSTokens( @SearchString ) )
This query is more expensive than the NOT EXISTS variant because it uses the GROUP BY clause on the columns to satisfy the search condition.
There are also several pure set-based solutions that don’t use any SQL Server 2000 feature. All of these involve using a Numbers table to normalize the query strings, then performing the search.
The final unconventional solution involves generating the WHERE clause for the query dynamically based on the search string. You can generate the WHERE clause in a stored procedure and use dynamic SQL to execute the query, or you can generate the clause on the client side and execute it by querying a view. The dynamic SQL approach has its own limitations and security implications, but it’s the fastest solution or as fast as the table-valued UDF approach because it requires no additional joins and the optimizer can evaluate the search predicates efficiently because it uses built-in functions only. With this approach, the dynamically executed query looks like the following:
SELECT u.UrlID, u.QSID, q.QSVal FROM Urls AS u JOIN QueryStrings AS q ON q.QSID = u.QSID WHERE '&' + q.QSVal + '&' LIKE '%&a=1&%' AND '&' + q.QSVal + '&' LIKE '%&b=2&%'
JUNE READER CHALLENGE:
Now, test your SQL Server savvy in the June Reader Challenge, "Gathering Statistics" (below). Submit your solution in an email message to firstname.lastname@example.org by May 15. SQL Server MVP Umachandar Jayachandran, a SQL Server Magazine technical editor, will evaluate the responses. We’ll announce the winner in an upcoming SQL Server Magazine UPDATE. The first-place winner will receive $100, and the second-place winner will receive $50.
Dan manages databases hosted on SQL Server 2000 and needs to write a script that gathers statistics on files and filegroups in every database. Dan wants to obtain filegroup attributes and statistics on the disk space that each filegroup uses. He’s also interested in the files that contain each filegroup, their attributes and space used, growth parameters for each file, reads performed on a file, and so on. He wants to use this information for capacity planning and monitoring purposes, such as keeping track of file growth. Help Dan write SQL code that will let him easily obtain this information.