Here's how you can build a simple but powerful Web search tool for your data warehouse or database. Unlike many Web front ends, this project, which WebSQL subscribers can download using the Download the Code" link, puts most of the search logic in the database. For this project, you use basic HTML input objects to collect the search criteria. After your Web server receives the parameters, they are included in an EXEC statement that calls a stored procedure. After executing, this stored procedure returns a recordset, which the server formats into HTML and sends to the browser.
The power in this approach resides in your use of the stored procedure. A server-based procedure provides precompiled access to all the data warehouse tables, thereby enhancing performance. And because one of the procedure parameters is the search type, one Web page calling one procedure can return a nearly limitless number of search types, plus column headings or the number of columns displayed.
In this article, I also show you how to format the text so that users can easily copy their results into any text editor. And I show you how the stored procedure can embed a hyperlink into the recordset sent back to the browser.
This project uses a three-tiered browser, Web server, database architecture. You need SQL Server 6.x or 7.0, Windows NT 4.0 with Internet Information Server (IIS) 3.0 or 4.0, and Active Server Pages (ASP) with ADO version 1.0, 1.5, or 2.x. And you need to be familiar with basic HTML commands and know how to build ASP pages and stored procedures in Transact SQL (T-SQL).
Getting Data via a Stored Procedure
First, you need to build the database stored procedure. Using the Pubs database, create a procedure that accepts three input parameters and outputs a recordset. The three inputs consist of the search type, the main search argument (SARG), and a second SARG to filter the recordset by year-to-date (YTD) sales. Then, build an IIS/ASP page to capture user inputs and display the recordset.
Choosing the Search Types. The first procedure parameter is the type of search the user requested. Using a search type parameter provides flexibility in the types of searches a Web page allows. This simple procedure offers only three search types: the author's last name (Author), the title of a book (Title), and the author's ID (Author_ID). If no value is provided for the search type, use a default value of Author. And because browsers manipulate only strings, you need to use a varchar() data type to pass in all the variables.
As you can see in Listing 1, I named the search type variable @strSearchType. Using @strSearchType and T-SQL flow-of-control commands, the procedure chooses which Begin...End block to execute. Each block contains a different search (SELECT statement) to return a recordset to IIS/ASP. The T-SQL flow-of-control commands are limited, so use the IF-ELSE command. You won't win any coding contests, but after this search compiles, the server processes it quickly. Notice that I did not use a CASE statement. T-SQL has a CASE expression, but you can use it only within a statement (e.g., SELECT, UPDATE, DELETE).
Establishing the Search Arguments. The first parameter identifies which block to process; the next parameters provide search arguments that specify what to search for. For this project, I used two SARG types. The first accepts a string the user provides. Users enter a name or title. The second search argument lets users further limit their searches based on YTD sales for any book title. This project uses only two SARGs, but you can use any number or none at all. If you want to build an elaborate search procedure for your data warehouse, you can use up to 255 parameters in SQL Server 6.x (1024 in SQL Server 7.0).
The first SARG in the project accepts a string, such as an author's name or a book title. To enable users to search for a book without knowing the author's last name or the full book title, use a LIKE clause with a wildcard at the end of the string to expand the search. Use this command to add the wildcard:
SELECT @NameSrch = @strUSERINPUT + '%'
If you use the LIKE clause and wildcard combination, you won't lose query performance. The query optimizer still uses your indexes (a column used this way might be a good candidate for a clustered index). But if you add the % wildcard at the beginning of a LIKE argument, you force the optimizer to use a table scan, which can decrease performance. (For more information, see Dusan Petkovic and Christian Unterreitmeier, "New Features for Query Processing," July 1999.)
The next SARG in the project runs against the ytd_sales column in the Titles table. YTD sales are stored as a money data type, so you need to convert the incoming search parameter into the money type. Also, you need the procedure to validate that all incoming string characters are numbers. If they aren't, set the parameter default to zero. (Alternatively, you can use VBScript in IIS/ASP to validate that string characters are numbers.) To check for numbers, use a CASE expression such as this one:
SELECT @moneyYTDsales = CASE WHEN @strYTDsales LIKE '%\[^0-9\]%' THEN 0 ELSE CONVERT(money,@strYTDsales) END
Using a LIKE clause when you want to know whether the string is not like a number might seem counterintuitive, but that's where the ^ symbol comes into play. In this command, LIKE '%\[^0-9\]%' resolves to TRUE if any character is not between 0 and 9.
The last objective for this procedure is to provide an HTML link to the author's name. To do so, include the HTML references as part of the author's name column returned with each record. This procedure creates a string with a value similar to: "<A HREF='PUBSlookup?author_id=363-45-1234'>author name</A>." The URL references the same page, so you begin HREF with just the ASP page name. Within a browser, you want to display the author's first and last names. The bottom status bar in Screen 1 displays the full hyperlink, which shows the author's ID.
To include a hyperlink to an author ID, you can use this T-SQL code:
SELECT Author = SUBSTRING("<A HREF='pubslookup.asp?Author_id=" + COALESCE(A.au_id,") + "'>" + SUBSTRING(COALESCE((au_fname + ' ' + au_lname),'- '),1,25) + "</A>" , 1,75) ,
The final SUBSTRING length of 75 represents a number equal to or greater than the combined length of all characters returned, including the embedded HTML tags. Link tags are useful because they don't show up when you copy the data from a browser.
Setting Up the Web Interface Basics
Let's review the objectives for this hypothetical Web interface. First, it's a single Web form solution for your data warehouse searches. When users initially hit your ASP page, they see a welcome greeting and the input selection form to run a search. After a search, the browser displays the results of the search and the input form, in case the user wants to run a new search.
Data warehouse users want data fast, and they often want to copy data into an email message, report, or spreadsheet for further analysis. They're less interested in fancy graphics, fonts, or other gimmicks. But you can't easily copy the data from an HTML table into an editor. So the second objective for the interface is to give users copy-out capability.
After the data is online, users often ask you to modify the page's format or add new data. So, you need a Web page that's fast and flexible, one that puts the database developer in charge while minimizing the amount of ASP programming you need to modify the page.
Browser Query or Form Request?
Browsers either request a query or use forms to make requests. This project uses both. You can expect that most user searches initiate from a posted form. However, the Author_ID hyperlink sends a query string to your page, so your ASP page must handle queries as well. Every time a user hits your page, your ASP code determines the type of search: posted form, author ID query, or empty query (from an initial hit or an unrecognized query). When the user first hits your Web page, the Querystring and the Form input object values are empty, so testing for an initial hit is easy. And when the user posts a form or makes a query, the ASP code evaluates the incoming Querystring and Form objects to determine the user's requested search type. (This project's form uses the Post method, which sends hidden values. If you use the Get method, the form's search values display in the query string.)
To test for a valid query, first declare the variables you need to capture the incoming Form and Querystring object values. After populating the variables, test each for a valid string. This project gives preference to the form, so first test for valid form values. If none exist, test for a valid query string value. If a Querystring object with the name Author_ID exists, you have a valid query. If both these tests fail, you have either an empty or an invalid query string. In this case, you simply present the generic welcome text and the empty search input boxes.
Gathering Search Criteria
You built your T-SQL procedure to accept three parameters: the search type; a user-provided string containing an author name, book title, or ID; and a YTD sales SARG. To capture the parameters in your Web page, use a combination of INPUT and SELECT boxes.
To capture the search type parameter, use an HTML <SELECT> tag (which programmers call a combo box). This combo box gives users a list of searches matching those in the procedure. To capture the user's search string, use an INPUT box with TYPE=text. Data warehouses often have less-than-accurate data, so don't include any validation on this control. Someone might be searching for bad entries, and you don't want to get in their way. And finally, use radio buttons to capture the YTD sales.
To enhance usability, the project's ASP code includes the search form at the bottom of every user request. You rarely need to force users back and forth between the search and results pages. Simply include the search form at the bottom of every ASP page, after displaying the requested recordset in the browser.
Another feature sometimes omitted from search forms is memory of the input search criteria. Every time the search form is returned to the user, your ASP code can return the user's search criteria. Although this functionality is not necessary, it adds a nice touch that users appreciate. To add this information to INPUT text boxes, use the following code to insert the value the browser sent last.
<INPUT TYPE = text NAME = 'USER_INPUT' VALUE = '<% =strUSER_INPUT %> ' >
To display the user selection for the SELECT box and the INPUT radio button, you need to compare the VBScript variable values against each possible HTML tag value. For the SELECT tag, you can use the following code:
Response.write("<SELECT NAME = 'SelectType'>") Response.write("<OPTION VALUE = 'Author'") If strSelectType = "Author" Then Response.write("SELECTED") End IF
Making the Connection
Now that it has the parameters, your ASP code needs to connect to the database and submit the procedure. You can use ADO to do so. (For details on using ASP with ADO, see Michael Otey, "The ABCs of ADO and ASP," page 30.) Because production environments are often slow to upgrade, I've kept the commands compatible with all versions of ADO. (In the downloadable project code, see the ADO commands for using the ADO 2.x native provider.) First create, then open, an ADO Connection object. Next, create a Recordset object, build the SQL string, set the connection as active, and open the connection. This example uses an ODBC Data Source Name (DSN), which you also need to set up. The code in Listing 2 connects the ASP code to the database and submits the procedure. Your stored procedure is sent to the server and executed when the recordset is set to open. The returned recordset goes into the Recordset object, which by default is an efficient forward-only, read-only recordset. Because the stored procedure ends with a SELECT statement, the procedure outputs a recordset, so you don't need to use an OUTPUT parameter as part of the procedure.
The SQL string (source) contains three components: the EXEC command, the name of the stored procedure, and one or more parameters. The EXEC command instructs the database engine that the following T-SQL is a procedure. Because the procedure is the first SQL statement to be executed, the EXEC keyword is not necessary, but using it is good form.
The next component of the SQL string is the name of the stored procedure. For this example, both the EXEC keyword and the name of the stored procedure are hardcoded into the ASP code. As you work with this template, you can easily make the name of the procedure an ASP variable, based on user input. The last aspect of the source string is the three procedure parameters. The ASP code populates these parameters based on the values the browser provides.
Presenting the Return Set
Opinions differ on how to present user-requested data. Client-side developers often insist on beautifully crafted Web pages, whereas server-side developers and financial analysts just want the data. In my experience, data warehouse users just want the facts. And usually, they want the ability to copy their results into a spreadsheet for further analysis. (Executive information systems—EISs—usually need graphs, charts, and other visually enhanced Web interfaces, though, so know your audience.)
Here's how to present the data in a basic manner. The only added options are embedded carriage returns, which provide the desired copy-out feature. Without extraneous HTML tags, the recordset races through the network. For one data warehouse I built using this approach, the server returns, on average, about 200 records per second to the requesting browser over a WAN.
You might want to adjust the display format if the recordset is empty or has only one record. First, test for an empty recordset using the following script:
IF (objRst.BOF = False OR objRst.EOF = False) THEN 'process the records
The Single-Record Return Set. Displaying single records in a different format is helpful. For this project, when the recordset contains only one record, the ASP code formats the record vertically rather than horizontally. To test for a single record, move to the second record and test the end-of-file status. If the test result is true, you have a single record. You could use the ADO RecordCount property, but it makes the server count each record, wasting valuable time. The following code provides one possible approach:
objRst.MoveFirst objRst.MoveNext IF objRst.EOF = True THEN flagSingleRecord = "OneRecord"
Screen 2 shows the result of HTML formatting for the single record in the sample project.
Multirecord Return Sets. An important aspect of this project is that the ASP code programmatically determines the number of columns returned, their order, their names, and the number of records in the recordset. This feature provides flexibility in making changes to the server-side stored procedure. You can add or drop columns, change column order, or adjust column size without ASP code changes.
The first step in setting the format for multirecord return sets is defining the HTML format. To enable users to copy information out, use an HTML preformatted tag <PRE>. <PRE> lets you send the recordset to the browser with virtually no HTML format tags. Although using table format tags makes the data look nice, it can increase file size and network overhead, thus slowing performance. I usually set up stored procedures to return a maximum of 1500 records, and with table format tags, performance is unacceptably slow.
After you establish the HTML layout, your next task is to identify the number of columns and their names. To get the name for each column, loop through the columns. Text with the <PRE> tag doesn't ignore spaces, unlike standard HTML formatting. Essentially, you are sending to the browser a standard ASCII fixed-width format recordset. So, you need to pad the column name with spaces equal to the full length of the column. Use the ADO DefinedSize property to get this value, then pad the column name with spaces to make up the difference. This VBScript command tells you the number of spaces you need to add to the column name:
intLength = CInt(objRst(i).DefinedSize + 1) - Len(objRst(i).Name)
Notice that I've added one (+1) to the defined size. This notation puts one space between each pair of columns to aid in readability.
After generating the column names, populate the rows in a similar manner. Using a standard loop within a loop, cycle through one row at a time, and for each row, through each column. For the column names, you used the VBScript LEN command. But for the records, you can rely on the ADO property ActualSize. The difference between the DefinedSize and the ActualSize gives you the exact amount of space you need to pad each column. (Note that if you are using ADO 2.x, the GetString method of the Recordset object might work more efficiently than looping.) At the end of each record, use this VBScript command to embed a carriage return and line feed so users can easily copy the records into any text editor:
strPrint = strPrint + chr(10) + chr(13)
You can embed a tab or another character after each column, making the data more easily portable into a spreadsheet. And finally, after each row has been written out, add an HTML <BR> tag to specify a line break. The resulting page display is in Screen 3.
After you have all the pieces working, you'll have a simple but fast and efficient data warehouse Web interface. The network staff will appreciate the minimized use of bandwidth, and the data analysts will love the fast response time. And because the power resides on the server in T-SQL stored procedures, you can expand and modify your data warehouse searches quickly and efficiently.