Downloads
41679.zip

DTS DLL Debugging

Debugging Visual Basic (VB) COM DLLs that you use in Data Transformation Services (DTS) is easier than it sounds. First, open the DTS Package in the DTS Designer. Then, start the VB Interactive Debugging Editor (IDE) and open the WebDataFetcher project. Within the VB IDE, you can run the project with a full compile by pressing Ctrl+F5. Running the project with a full compile is a good way to check for blatant syntax errors. The program will highlight in blue any lines that contain syntax errors.

Related: 34 Development Tips

You can also step through the code and inspect how it works or debug errors in the programming logic by pressing F8, which puts the VB IDE into debug mode. The VB IDE will temporarily register the WebDataFetcher project. In debug mode, the VB IDE will stop on the first line of code in the project before it's executed. When you run the DTS Package, the VB IDE isn't in the screen view. After you run the Package, click the VB IDE icon to bring the IDE into view. The program will highlight in yellow the first line of the GetFile function. VB IDE will execute this line next. Each subsequent time you press F8, the VB IDE will execute the present line, move to the next line, and highlight it. You can move the flashing cursor over any variable, such as aURL, then pause, and a window will pop up that contains the value of that variable. If you don't want to step through code, just press F5 and VB IDE will run the rest of the code without stopping. When you want to stop the VB IDE from running in debug mode, click the End item in the Run menu list.

The Security Connection

Here's a summary of steps you can take to optimize SQL Server security and connectivity.

  1. Use Windows-only authentication with SQL Server.
  2. Use trusted connections instead of strings that pass SQL Server usernames and passwords.
  3. Put the connection objects in DLLs and put them in Microsoft Transaction Server (MTS).
  4. Set your code to use OLE DB instead of ODBC if you're using ADO. With ADO, ODBC calls OLE DB, so by using OLE DB directly, you improve performance by eliminating a processing layer.
  5. Use TCP/IP between your IIS and SQL Server machines, not the default Named Pipes, if IIS and SQL Server are on separate servers. As Microsoft article "PRB: 80004005 ConnectionOpen (CreateFile()) Error Accessing SQL" at http://support.microsoft.com/support/kb/articles/q175/6/71.asp states, "When Named Pipes are used to access the SQL Server, IIS tries to impersonate the authenticated user, but it does not have the ability to prove its identity."
  6. Put your connections and stored procedure calls into Visual Basic (VB) code DLLs, install them in MTS (which will automatically pool connections for you), and create server objects in VBScript to use the connections.
  7. Ask for the correct Microsoft department if you need help using ADO-based code to talk to SQL Server. Microsoft Technical Support not only has IIS and SQL Server experts; it also has ADO-to-SQL Server experts.

Fast Bulk Load in SQL Server 7.0

SQL Server 7.0 offers several high-speed mechanisms for loading data. Bulk copy program (bcp) is a high-speed file-import utility that SQL Server has supported since the early days of the database management system (DBMS). Bcp lets you quickly load large files and is often a good choice, but it's not user friendly.

In SQL Server 7.0, Microsoft extended SQL Server's data-import capabilities with DTS and the new T-SQL command BULK INSERT. DTS offers a tremendous amount of data-handling flexibility, but BULK INSERT can be twice as fast as either bcp or DTS when used in comparable circumstances.

Why is BULK INSERT so much faster? BULK INSERT is a T-SQL command, so it runs in-process with the SQL Server engine. Thus, SQL Server doesn't need to pass the data along the normal client API network-abstraction layer called a Network Library (NetLib). Bypassing the NetLib layer saves a huge amount of time.

In addition, SQL Server 7.0 supports a custom task add-on that lets you write a BULK INSERT task directly from a DTS package. (Microsoft also integrated this feature into SQL Server 2000.) If you're looking for the best combination of speed and programmatic workflow control, BULK INSERT from DTS might be the answer.

7 Cursor Cues

Proper API cursor use can mean all the difference in the world when it comes to application scalability. The Microsoft TechNet article "Top 10 SQL Server 7.0 Support Issues and Resolutions" gives several cursor tips, including

  • Use default settings for singleton SELECT statements (which return one row) or other small result sets. Caching a small result set on the client and scrolling through the cache is more efficient than asking the server to implement a cursor.
     
  • Use the default settings when fetching an entire result set to the client, such as when you're producing a report. Default result sets are the fastest way to transmit data to the client.
     
  • You have to use default result sets for any T-SQL statement or batch of T-SQL statements that generate multiple result sets.
     
  • In joins, static or keyset-driven cursors can be faster than dynamic cursors. You must use static or keyset-driven cursors when you want to do absolute fetches.
     
  • Static and keyset-driven cursors increase the usage of tempdb. Static server cursors build the entire cursor in tempdb; keyset-driven cursors build the keyset in tempdb.
     
  • In applications that use server cursors, each call to an API fetch function or method causes a round-trip to the server. Applications can minimize these round-trips by using block cursors with a reasonably large number of rows returned on each fetch. You can set the block size by using ADO's CacheSize property. I like to use batch sizes in the range of 100 to 500 rows.
     
  • Consider SQL Server 7.0's cursor performance optimization feature, called Fast_Forward Cursors. SQL Server Books Online (BOL) has a lot of information about Fast_Forward Cursors.

Lost in Translation?

Q. I'm a Visual Basic (VB) developer and new to T-SQL programming. Does T-SQL have a function equivalent to VB's InStr() function?

A. Life would be much easier if functions had the same name across different programming languages.

T-SQL's charindex function, for example, lets you do the same thing as VB's InStr(), which specifies the first occurrence of one string in another string. Charindex's syntax is

CHARINDEX ( expression1 , expression2
\[ , start_location \] )

Expression1 is a short character data type expression that contains the sequence of characters you want to find. Expression2 is the character string that you want to search. And start_location is the character position where charindex starts searching for expression1 in expression2. If you don't specify a start_location or if start_location is a negative number or zero, the charindex search starts at the beginning of expression2.

No Datetime Primary Keys, Please

Q. I work on a payroll-management database and suggested to the development team that I use a datetime column as a table's primary key. Team members told me not to do this, but when I pressed for a reason, no one could provide a good answer. Why shouldn't I use a datetime column as a primary key?

A. I agree with your development team. In general, you should avoid using datetime as a primary key. First, datetime is an 8-byte data type, and narrow keys tend to be more efficient and faster than wider keys. If your table is going to be very large, a smaller integer-based data type, such as the 4-byte int or the 2-byte smallint, might be a better fit. Second, and much more important, datetime is accurate only to one three-hundredth of a second, or 3.33 milliseconds (ms). By definition, primary key columns must be unique, and you can't ensure that you'll have unique values in a datetime column. Your business rules might say that entering multiple records within 3.33ms of one another is impossible, but I think that making that assumption is dangerous. Business rules and technical assumptions can always change.

Variant Behavior

SQL Server 2000 supports variants through the new sql_variant data type. Similar in many ways to the VB Variant data type, sql_variant can store information returned from any SQL Server–supported data types except text, ntext, image, timestamp, and sql_variant. Actually, the sql_variant type isn't a base data type but a way to tell SQL Server that the data type will be set when the data is stored in that column. So the data type for a particular column can be any of the supported sql_variant data types. Although the implementation isn't the same as that of the VB Variant data type, the sql_variant data type is flexible enough to solve the changing-rule problem. Because sql_variant morphs itself into almost any SQL Server data type, SQL Server sets the column's "real" data type based on the data it writes to the column. For example, a sql_variant data type "becomes" an integer when SQL Server writes an integer value to the column. If you use sql_variant in an extended property table, the customer data properties now look slightly different, as Table A shows.

In this case, you can define the table with three columns: one to identify the object, another to name the property being managed, and a third to hold the data for that property. This table is an example of a sparse table, which lets you define as many properties as the current business rules dictate. The sql_variant Info column can hold any non-BLOB data type, so you can easily store almost any type of information, even if during the design phase, you didn't anticipate needing to handle that type of data.

Templates for Development

I use templates in two ways: to develop objects and to reduce the amount of typing that I have to do when I write a section of code. At the company I work for, I customize the templates that I use for development according to the company's development standard. The templates provide handy development snippets of code that I use when I compile a series of objects on a server. Figure A shows my code template for generic stored procedures. This template consists of four sections: drop, create, and permissions sections and a success-or-failure notification section.

Every stored procedure that I write includes these four sections because they speed development for everyone in the shop. For the DBAs, the stored-procedure template produces a script that runs and reports a success or failure; for the developers, the template produces consistent documentation. In addition, the "drop" section of the template eliminates errors that result when you try to create an object that already exists.

Xp_sendmail Rules

You can use xp_sendmail extended stored procedure in triggers, stored procedures, and batch commands to incorporate SQL Mail functionality into code. The xp_sendmail extended stored procedure is in the master database, but with the proper syntax, you can call it from any database. The proper syntax for calling xp_sendmail from another database is master.dbo.xp_sendmail followed by any of the input parameters in Table B (a list of these parameters is also available in BOL).

You should be aware of the following rules when using the xp_sendmail extended stored procedure:

  • You must configure SQL Mail sessions before you use xp_sendmail.
  • Only one user can execute xp_sendmail at a time. If you execute long-running queries, be aware that other connections must wait until the first connection has completed processing.
  • Xp_sendmail queries are subject to locks and blocks just like any other query.
  • Xp_sendmail's Execute permissions default to members of sysadmin, but you can grant these permissions to other users.

Using xp_sendmail from SQL Server with the proper planning about when and who should receive these email messages can simplify administration and troubleshooting.

Use Codebehind to Separate ADO.NET from HTML

Active Server Pages (ASP) developers need to be aware of one change that will break some existing ASP pages you migrate: ASP.NET doesn't let you put functions inside the <% %> render blocks. You need to place all server-side functions within a <script> block with the runat="server" attribute. Listing A shows both the ASP syntax and the required syntax for ASP.NET.

The elimination of functions in render blocks is helpful because it encourages developers to separate executable code from HTML presentation. But the most important method in ASP.NET for separating code from HTML is a new feature called codebehind. Codebehind lets you move your programmatic logic from the original .aspx files, where it would be subject to spaghettification and mangling by graphic designers who don't understand the importance of the <% %> tags, into separate class files. In ASP.NET, you use the src or the codebehind attribute of the @ Page directive to tell ASP.NET how to locate a codebehind file. Src tells ASP.NET to find the source file and compile it dynamically. Codebehind (the default in VS.NET) requires you to precompile your codebehind class into an assembly stored in your application's bin subfolder (the default location for assemblies in a Web application). The inherits attribute tells ASP.NET that the current page should inherit from the codebehind class file, which lets ASP.NET compile the page and codebehind through the CLR into one assembly. Listing B shows how a page that handles postbacks would look if you used codebehind. Listing C shows the codebehind file.

Judicious use of codebehind can help you create applications that are easier to understand at a glance, easier to maintain, and easier to debug. However, using this feature to its best advantage requires discipline and some up-front design work.

ASP to ASP.NET

Remember that because ASP and ASP.NET can run side by side, the decision about whether to migrate to ASP.NET needn't be all-or-nothing. You might be able to migrate your application page by page, keeping in mind that ASP and ASP.NET don't share the Application and Session collections. So if you use these collections for managing state, you need to write code to pass this state information between the part of your application that's running in ASP and the part that's running in ASP.NET.

What kind of challenges can you expect during migration? Several changes are likely to cause problems, some of them specific to ASP.NET, and some language-specific. In the former category are the following:

  • ASP.NET supports only one language per page, whereas ASP lets you use multiple languages in the same page.
  • As I mentioned, ASP.NET doesn't support functions in <% %> render blocks.
  • In ASP, the Request(), Request.QueryString(), and Request.Form() functions return arrays of strings. In ASP.NET, they each return a string.

The latter category includes these changes:

  • ByVal is the default parameter type for VB.NET. Your code won't work if it doesn't specify ByVal or ByRef for a method parameter but expects the receiving method to treat the parameter as ByRef.
  • Unlike VB 6.0 and earlier, VB.NET doesn't use Set and Let.
  • You must use parentheses for calling Sub procedures in VB.NET.

ASP pages that use COM components might also present problems because these pages don't work directly in ASP.NET. You can make them work, however, by using the .NET COM Interoperability layer. Through a utility called tlbimp.exe, this layer lets you import a COM object's type library to create a .NET assembly, then use that assembly to call the COM component from an ASP.NET page.

Resources for Developers

Unlike administrators, developers need to learn more about using SQL and T-SQL efficiently than they need to learn about system maintenance or network integration. When you're ready to go beyond The Practical SQL Handbook: Using Structured Query Language, the next book you should read is SQL Server 2000 Design and T-SQL Programming. The authors, SQL Server Magazine contributing editors Michael D. Reilly and Michelle A. Poolet, explain database design theory in an easy-to-follow style. As part of the explanation, they walk you through the steps for implementing a database design on SQL Server. The book concludes with advice about how to improve performance in multiuser applications through intelligent use of transactions, locking, indexing, and stored procedures. Most important, you can use this book to learn how improper transaction handling, including the implicit transactions in INSERT, UPDATE, and DELETE statements, can drastically affect not only application performance but also the overall capacity of the server. In general, the most common mistakes that degrade server performance occur because the application developer doesn't understand how locking and transactions affect the system; therefore, these areas should be high on your list of topics to study.

The next step toward mastering SQL Server development is to read Professional SQL Server 2000 Programming. This book addresses the client/server developer who needs to use SQL Server from within VB or ASP, including how to use distributed transaction services in a multitiered application. Because SQL Server is typically the back-end database for an application—not the platform running the application—you need different skills to write programs that use SQL Server efficiently. This book does a good job of helping you avoid common mistakes.

You might think the three books I've recommended overlap quite a bit, but they have different viewpoints. The Practical SQL Handbook: Using Structured Query Language teaches someone without database experience how to write basic and intermediate queries for most relational database systems. SQL Server 2000 Design and T-SQL Programming teaches future project managers and administrators how to build and manage a database and how to write T-SQL code. Professional SQL Server 2000 Programming targets the client/server developer who needs to use SQL Server from within VB or ASP, and includes information about how to use DTS in a multitiered application.

Microsoft offers two MOC courses for SQL Server 2000 developers. Course 2071: Querying Microsoft SQL Server 2000 with Transact-SQL teaches basic T-SQL syntax and programming. Course 2073: Programming a Microsoft SQL Server 2000 Database covers writing stored procedures and triggers; creating databases, tables, and indexes; and other programming topics. Again, if you take an instructor-led course, check the instructor's credentials before signing up. In both of these courses, you'll benefit from having an instructor with on-the-job experience programming SQL Server at the server level.

Implementing Drillthrough for Developers

To implement the drillthrough feature, you must perform both administrative and application-specific tasks. From an administrator's perspective, the first task is to enable the cube for drillthrough. An administrator's ability to control enabling of the cube for drillthrough is crucial because cells often contain the data from thousands or even millions of rows, so letting all users query this data unrestricted can lead to network bottlenecks. To enable the cube for drillthrough, start the Cube Editor and select Tools, Drillthrough Options. In the resulting Cube Drillthrough Options dialog box, simply select the Enable drillthrough check box, as Figure B shows. Analysis Services supports drillthrough for regular, virtual, or linked cubes, but Analysis Services won't let you use drillthrough on a cell that has values based on calculated member cells or custom member formulas.

Note in Figure B that you must enable drillthrough and that the column list contains all the columns from the fact and dimension tables in the schema. In this example, the cube's name is Enrollment and the cube allows reporting against student enrollments in Quilogy's technical education courses. Depending on which columns you select in this dialog box, Analysis Services will join the appropriate tables when it queries the underlying data. The dialog box in Figure B also contains a Filter tab that can contain a WHERE clause to limit the number of rows in the result set. This WHERE clause is in addition to the WHERE clause that drillthrough generates dynamically.

In addition to enabling drillthrough at the cube level, you can also modify the options on each individual partition in cubes that contain more than one partition. You can access a dialog box analogous to the one in Figure B through the Partition Wizard—to access the Partition Wizard, right-click the partition and select Edit from the context menu—by clicking the Advanced Settings button and Drillthrough Options. The resulting Partition Drillthrough Options dialog box lets you change the columns and filter to use when creating drillthrough data from that partition. The ability to set drillthrough options on individual partitions means that if you execute a drillthrough operation on a cell that aggregates data from more than one partition, each partition returns its own result set and the columns in those result sets can differ from one another. Application developers need to be aware of this behavior so that they can display the results accordingly. The administrator's next task, configuring drillthrough security, requires that you use the Cube Role Manager dialog box and click the Drillthrough setting for the role you want to grant access to. Each role has a security setting, called Allow Drillthrough, that you can select to let that role execute the queries that perform a drillthrough operation. After you enable drillthrough and configure security, you can implement drillthrough for applications.

Think Big

Don't make the "little database" mistake. Thinking big when you design and test your SQL Server database applications will help you avoid unpleasant and unnecessary application rollout problems. When you start designing a new database, plan for growth from the very beginning, considering both the performance impact and the recoverability impact of growth. In today's fast-paced, competitive business environment, decision makers recognize information's strategic value and have elevated data to a corporate asset, collecting all the data they can. And as companies grow, so will their data banks. The best time to accommodate this growth is before deploying the database. Likewise, development groups need to perform usability and stress testing by using data sets that truly represent the live environment. Problems that you uncover before the application goes live aren't problems at all.

More Tips for Source Control

I agree wholeheartedly with Sameer Dandage and Wayne Kurtz's article "Stay in Control" (June 2002, InstantDoc ID 24696), which presented source control as a key to successful project rollout and maintenance. The article was a great introduction to the subject, but it could have emphasized the following points more:

  • Visual SourceSafe (VSS) is a multilanguage repository, and you can place all of a project's scripts (e.g., SQL, VB, ASP) in the same repository.
  • You can use VSS project labeling as a quick and easy way to extract all source related to a particular application build or version.
  • You can use keywords to enable VSS to write versioning information into source files.

The keyword capability can be extremely useful if you need to check the version of a file on a particular server. For example, the $Header keyword gives you the source file name, the VSS version, the date and time the file was extracted from VSS, and the user who extracted it. When you have this information in a piece of code such as a stored procedure or trigger, you can then use sp_helptext to ensure that the correct version of the file is in the database.

Arrays of Elements

Sometimes you need to deal with denormalized data from legacy systems. For example, sometimes you get comma-separated arrays of elements that you need to normalize so that you can manipulate the data with set-based queries. Run the script in Listing D to create and populate the Arrays table. Note that each group in the groupid column contains an array of elements in the string column. Suppose you want to normalize the data and write a query that produces the results in Figure C, where each element appears in a different row along with its group ID and position in the array.

First, you need to duplicate each row in Arrays as many times as there are elements in the array. For this purpose, first run the script in Listing E to create the Nums auxiliary table and populate it with a sequence of consecutive integers. The following query provides a solution to the problem:

SELECT
groupid,
n-LEN(REPLACE(LEFT(string,n),',','))+1 AS pos,
CAST(SUBSTRING(string,n,CHARINDEX(',',string+',',n)-n) AS int) AS val
FROM Arrays JOIN Nums
ON n <= LEN(string) AND SUBSTRING(','+string,n,1) = ','
ORDER BY groupid, pos

The JOIN condition first makes sure that you don’t return a larger number than the length of the string and finds a match in Nums for each element’s starting position. So, the n value from Nums represents each element’s starting position. The SELECT list returns the group ID, calculates the position of the elements by counting the number of commas from the beginning of the string until the element’s starting position, then calculates the length of the element itself by extracting the substring starting with n and running to the position of the next comma minus n. For more details about how to separate elements from arrays, see T-SQL Black Belt, "Separating Elements" (August 2002, InstantDoc ID 25678).

Hidden Power of UNION

I used to think that UNION was useful only when you needed to combine the results of two queries from different sources (e.g., tables, views) into one result set. However, using UNION is sometimes the quickest way to select from just one table.

Suppose you need to retrieve all OrderIDs from the Northwind Orders table where the CustomerID is VICTE or the EmployeeID is 5:

SELECT OrderID FROM Orders
WHERE CustomerID = 'VICTE' OR EmployeeID = 5

The Orders table has indexes on the CustomerID and EmployeeID columns, but SQL Server doesn't use them to execute the SELECT statement, as the following execution plan shows:

|--Clustered Index Scan(OBJECT:
(\[Northwind\].\[dbo\].\[Orders\].\[PK_Orders\]
), WHERE:(\[Orders\].\[CustomerID\]='VICTE'
OR \[Orders\].\[EmployeeID\]=5))

Instead, SQL Server scans the clustered index PK_Orders. SQL Server uses this plan because the OR operator in the query's WHERE clause makes the result satisfy both conditions at the same time, so SQL Server must double-scan the table. Scanning a clustered index in this case is almost the same as scanning the entire table; the server goes though the table record by record and checks for the specified predicate.

To improve the query's efficiency, you could create a composite index on CustomerID and EmployeeID:

CREATE INDEX IdxOrders001 ON Orders
(CustomerID, EmployeeID)

The Index Tuning Wizard advises you to create just such an index to improve performance of the SELECT statement. SQL Server can use the new index to find all the records in which EmployeeID = 5, then scan only the resulting range of records to return the required result. The estimated execution plan for the new query shows that SQL Server uses the composite index:

|--Index Scan(OBJECT:(\[Northwind\]
.\[dbo\].\[Orders\].\[IdxOrders001\]),
WHERE:(\[Orders\].\[CustomerID\]='VICTE' OR
\[Orders\].\[EmployeeID\]=5))

But an employee can make thousands of deals with thousands of customers. So even if you create a composite index, SQL Server will need to scan a range of records in the index and won't perform a seek operation, which uses indexes to retrieve records and is the fastest way for SQL Server to find information.

You need to make SQL Server use a seek operation instead of a scan, but SQL Server won't perform a seek when an OR operator is in the WHERE clause. You can solve this dilemma by using UNION to rewrite the SELECT statement:

SELECT OrderID FROM Orders
WHERE CustomerID = 'VICTE'
UNION
SELECT OrderID FROM Orders
WHERE EmployeeID = 5

SQL Server 2000's estimated execution plan for this statement is

|--Merge Join(Union)
|--Index Seek(OBJECT:(\[Northwind\]
.\[dbo\].\[Orders\].\[CustomersOrders\]),
SEEK:(\[Orders\].\[CustomerID\]='VICTE'
) ORDERED FORWARD)
|--Index Seek(OBJECT:(\[Northwind\]
.\[dbo\].\[Orders\].\[EmployeesOrders\]),
SEEK:(\[Orders\].\[EmployeeID\]=5)
ORDERED FORWARD)

This execution plan looks longer than the original one, but both operators are index-seek operators. SQL Server doesn't use the composite index; instead, it uses two single-column indexes. You might think that two seek operations would cost more than one seek, but performance improves when you use this method. You can check performance by using SQL Trace to analyze the three versions of the SELECT statement. For the UNION query, my SQL Server 2000 system performed four reads to return the result. The first SELECT query required 23 reads, and the second SELECT statement, which created the composite index, required 11 reads.

This special use of UNION can help you avoid the OR operator's slow performance, but use it carefully. If you don't have the appropriate indexes (CustomersOrders and EmployeesOrders, in this example), you can double-scan the table.

You Don't Have to Use SqlCommandBuilder

SqlCommandBuilder automatically generates INSERT, UPDATE, and DELETE statements that the DataAdapter uses to post DataSet updates to the source database. But you can boost performance by not using SqlCommandBuilder and instead coupling your own stored procedures to the DataAdapter's InsertCommand, UpdateCommand, and DeleteCommand properties.

.NET Don’ts

Don't treat ADO.NET like ADO. Unlike ADO, ADO.NET is designed to work in a disconnected mode, maintaining an independent copy of the data for each client. ADO.NET applications are designed to work with cached data stores, quickly connecting and disconnecting from the database to retrieve and update data.

Don't overlook the DataView. The DataView lets you bind to both Web and WinForms applications and can contain a custom subset of the information from the DataTable. You can use the DataView Expressions property to extend the data in the DataTable by using calculated columns.

You don't need to use a DataSet. If you're retrieving data from a single table, you can avoid the DataSet's overhead by creating one instance of a DataTable independent of a DataSet. Then, you can access the data in the DataTable and bind to it without needing a DataSet.

Don't think of the DataSet as a small database. Sometimes, beginning ADO.NET programmers try to make the DataSet into a smaller version of the underlying database. You're better off thinking of the DataSet as a local data cache that contains only the data your application needs.

Don't embed connection strings. Embedding hard-coded connection strings in your application can compromise security. If you can, take advantage of integrated security. If you can't, use integrated security to store your application's connection strings externally in Active Directory (AD) or in a secured configuration file.

Don't retrieve unnecessary data. Retrieving too much data is the best way to kill your application's performance. Excessive data taxes the network and consumes valuable system resources that large-scale Web applications need. Review your SELECT statements to make sure they're using the appropriate WHERE clauses and retrieving only the columns they require.

Finding the Lost Identity

Here's a tip for people who are engrossed in finding missing identity values. SQL Server generates identity values serially when you turn on the IDENTITY property for a particular column of a table. You can lose the sequence of these values when rows are deleted from the table. To make recovering missing identity values faster, I developed the query in Listing F, which shows you the mathematical difference between the IDENTITY column values if any part of the sequence is missing. To see how this code works, let's walk through an example. First, insert some additional values into the Region table in the Northwind database, as Table C shows. I've inserted four new rows. Next, use the following query to delete some of the rows from the table so that some identity values are missing:

DELETE FROM region WHERE regionid IN(3,6,7)

Now execute the query that Listing F shows. This query is especially helpful, for example, when you're working with a table that contains more than a million rows and is missing only one or two values from the columns. In such a case, rather than scrolling through all the table's rows, you can use Listing F's query to automatically show the culprit rowset.

Use Global Variables and the Dynamic Property Task

What complex DTS package is complete without the use of a few ActiveX scripts? ActiveX scripts perform data transformations, control the package flow and execution, and can test for the existence of certain package conditions. Most ActiveX scripts use variables extensively, but if you find that many of your tasks reference the same variable, consider using package global variables instead of local ones. You can reference package global variables in the same way that you reference local script variables. Moreover, by using global variables, you can centrally manage script variables, eliminating the need to modify multiple scripts when just one script variable needs changing.

By design, package global variables are static values, but you can make them dynamic by using the DTS Dynamic Property Task. Global variables are listed in the Browser section of the Package Object Browser; you can add them by using the syntax DTSGlobalVariables("XYZ").Value, where XYZ is the name of the global variable.

The Dynamic Property Task is possibly the most powerful DTS task. With this task, you can set or modify attributes for all package objects—tasks, steps, lookups, connections, and global variables. The Dynamic Property Task can set attributes based on values from a variety of sources; for instance, it can set package attributes based on results from a database query, entries from an INI file, or a constant value. Alternatively, you can use a data file, a constant, or a value of a global variable to set object attributes from a Dynamic Property Task.

By using a Dynamic Property Task and global variables, you can design DTS packages so that little or no modification is necessary as runtime conditions (such as where the package is running) change. For example, you can use a combination of the Dynamic Property Task and global variables to derive the current fiscal period and year of your financial systems. If the current fiscal year and month are stored in a table in the financial system, you can use a database query to look up these values so that you don't need to hard-code the values in the package. Otherwise, if the package can't dynamically assign these values, you have to modify the package every month.

Use Disconnected Edit

You can't always program a DTS package so that it's 100 percent portable. This means that occasionally, you'll need to make some design-time modifications to tasks by using the Package Designer before you can move a package from one server to another. To make these modifications, consider using DTS's Disconnected Edit feature.

When you're editing certain properties in certain tasks in the Package Designer, the designer will reset all dependent properties. For example, if you reset the Destination Server in a Transform Data Task, the Package Designer might reset any transformations referencing that server, and you'll lose all your development work. With Disconnected Edit, you can edit these properties at design time without affecting any dependent attributes.

Disconnected Edit differs from a Dynamic Property Task in that you use the Disconnected Edit feature at design time, whereas you use the Dynamic Property Task to modify package attributes at runtime. Disconnected Edit is similar to editing the Windows registry—mistakes you make in using this feature can break the package and cause it not to function properly. Therefore, be sure to back up the package before making any changes.

Also note that Disconnected Edit doesn't validate changes. For example, when you change a value for a data-source destination, SQL Server doesn't validate the change against the data source. So if the data source doesn't exist or the login information is incorrect, the Disconnected Edit feature won't alert you to the mistake. If your modifications are incorrect, the package might not function properly.

Use Universal Naming Conventions

Using Universal Naming Conventions (UNC) paths can eliminate unnecessary troubleshooting efforts and make your package more portable. When providing paths to source and destination files, always use a UNC path instead of drive letters. Doing this is important because the likelihood of a server having the same drive-letter mappings as your workstation is very low. If you use drive letters instead of UNC paths, your DTS package will fail because it won't be able to locate the needed files. To use a UNC path, simply replace the drive letter with the server name. For example, you'd replace G:\<sharename> with \\<servername>\<sharename>.

VBA Macro

I'm not a Visual Basic for Applications (VBA) expert, but I do know an important trick that you can use for creating VBA macros: If you want to know how to perform an Excel task in VBA, pull down the Tools menu and select Macro, Record, New Macro. Perform the task that you want to record in Excel and end the macro. You can then enter the Visual Basic Editor and inspect the macro that Excel created.

Use Member Properties to Store Starting and Ending Dates

If you develop your own analytic applications and you want to create an application that understands what calendar dates each time member corresponds to, you can include in the Time dimension member properties that return the starting and ending date of each time period. Client applications can tell the difference between members at different levels, such as months and years, but they can't tell for sure what dates map to each time period. For example, a time period might be called Week 5, but that name doesn't tell the client application how to map the time period to the calendar. Plus, you can create the Time dimension members in any language, which further complicates the problem. For example, a client application might not be able to recognize the word January in multiple languages. Having member properties that contain the starting and ending dates of the time period can help you, as a client-application developer, determine how to correlate the members of the Time dimension to an actual calendar. For example, the member property for Holiday Season might contain the specific dates November 27-January 6.

Use DIMENSION PROPERTIES to Show Member-Property Values

Because member properties contain information about dimension members, displaying member properties along with query results is helpful. When you create an MDX query, you can use the DIMENSION PROPERTIES keyword to specify that the query returns certain member properties with the dimension members on the rows or columns. Few third-party OLAP front-end products use this keyword to display member properties along with the query result, probably because the functionality is Microsoft-specific and can be tricky to implement if you don't plan it during application design. However, if you're developing a custom application that displays query results, using DIMENSION PROPERTIES is an efficient way to retrieve and display the values of the member properties. The following code example shows how to retrieve the type of member card that each of the top 10 FoodMart 2000 customers holds, based on unit sales:

SELECT \{ \[Unit Sales\] \} ON COLUMNS,
TOPCOUNT( Customers.\[Name\].Members, 10,
\[Unit Sales\] )
DIMENSION PROPERTIES Customers.\[Member
Card\]
ON ROWS
FROM Sales

To see how the DIMENSION PROPERTIES keyword works, first run the preceding query in the MDX Sample Application. Then, double-click a customer name in the grid that contains the query result. Along with the query result, the MDX Sample Application returns all the available information about that dimension member, including the member-card type for each of the top 10 FoodMart 2000 customers.

Use Calculated Members to Show Member-Property Values

Many applications don't support the DIMENSION PROPERTIES keyword, but Analysis Services and OLAP Services let you use calculated members to return member-property values along with query results. Just create a calculated member that has the same name as the member property, and use a formula like the following to return the value of the member property:

WITH MEMBER \[Measures\].\[Member Card\] AS 'Customers
.CURRENTMEMBER.PROPERTIES ("Member Card")'
SELECT \{ \[Member Card\], \[Unit Sales\] \} ON COLUMNS,
TOPCOUNT( Customers.\[Name\].Members, 10, \[Unit Sales\] ) ON ROWS
FROM Sales

This query returns the top 10 customers based on unit sales. The query displays each customer's total unit sales value and the member-card type.

Use Member Properties to Store Alternative Language Names

These days, even small companies often have a presence in more than one country, and you might want or need to provide analytic applications that can display data in the appropriate language for the region where users are viewing it. When you create a member property in Analysis Manager, you can change the properties of the member property to reflect local data. For example, say your company is based in France and some of your product names appear slightly differently in English than in French. You can create a member property to contain all the English versions of your product names. You configure this property by setting the language property of the member property to English and the type to Caption. Setting the member property this way is a hint that will help the analytic application display the correct product caption for the locale where the application is running.

Understand the Requirements Before Coding

Before you model a database, you need to understand the business and system requirements. Data modelers should interview people within the organization who understand what kind of data needs to be captured and how, where, and when it will be used. If you don't have a full-time data modeler on your staff and there's no prospect of bringing in a consultant for this phase of the project, then you, as a programmer, DBA, or manager, will have to assume the responsibility for getting this part of the project done. You’ll encounter conflicting requirements; it’s your job to get those conflicts resolved somehow, so that the project can proceed and, with luck, succeed. Then, you need to match system specifications to the requirements. Without this information, data modelers can’t create a plan for a database that will work for the organization. For more advice about gathering business and system requirements, see "Data Modeling," April 2000.

Enumerated Paths Solution for Hierarchies

SQL Server 2000 and earlier releases don’t support recursive queries in T-SQL. Although you can’t write set-based queries to manipulate hierarchical data, you can introduce additional data into your database that describes the hierarchy, and you can write efficient set-based queries that manipulate the hierarchical data. You’ll also need to maintain the additional data in order to sync it with the base data. To demonstrate the technique, first run the code that Listing G shows to create the Employees table that I’ll use in my examples.

I used the column pair empid and mgrid to express the hierarchy. Each manager is also an employee and can be in charge of direct and indirect subordinates. I added the lvl column, which represents the employee’s level in the hierarchy, with the highest level having a 0 value, the next a 1, and so on. I also added the path column, which represents the enumerated employee ID path of the management chain leading to the employee, using a dot as a separator between the IDs. For example, James (empid = 14) has a lvl value of 4 and a path value of .1.3.7.11.14. because the chain of IDs for his managers and himself is 1, 3, 7, 11, 14. James is in the fourth level in the hierarchy if you start counting with 0.

Now suppose that with each modification issued against the Employees table, you want a trigger to fire that calculates and stores the lvl and path values for the modified employees. Run the script that Listing H shows to create the trg_Employees_i_calcpath trigger, which handles inserts. The trigger’s code calculates the lvl value by adding one to the lvl value of the new employee’s boss, then adding the employee ID and a dot to the path value of the new employee’s boss. To test the trigger, run Listing I’s code, which populates the Employees table.

In a similar manner, you can write triggers that handle updates and deletes. Now that the lvl and path values are calculated automatically and are synched with the base data, you can start writing set-based queries to manipulate the hierarchical data. For example, to return the list of employees sorted and indented by their hierarchical dependencies, run the following query, whose result Figure D shows:

SELECT REPLICATE(' | ', lvl) + empname AS empname
FROM Employees
ORDER BY path

To get Robert’s row and those of his (direct and indirect) subordinates, run the following query:

SELECT empid, empname, lvl, path
FROM Employees
WHERE path LIKE
(SELECT path
FROM Employees
WHERE empid = 7) + '%'
ORDER BY path

The query looks for all employees whose path starts with Robert’s path. Table D shows the result.