In last month's column ("Put Your Database to the Test," InstantDoc ID 93533), I introduced the concept of formalized unit testing of database objects. In that article, I showed how Visual Studio Team Edition for Database Professionals (Team Data) can provide a rich infrastructure that lets you create a full set of unit tests for your database. However, database testing requires more than just the unit tests; in fact, one of the most important aspects of database testing is a good set of test data. Without good test data, how can you be sure that the tests you run find all the problems? In most cases, unless you test against the real production data, it's hard to know that the tests you run are meaningful. Team Data can help you create realistic test data so that you won't need to deal with unit-testing problems.

The Problem with Data


In many situations, you can't test against production data because you aren't allowed to see it. Real production data can contain credit card numbers, social security numbers, salary information, and many other types of personal information that you don't want everyone to see. So how do you get around this problem? You could use an ETL tool to take a cut of the production data and obfuscate the sensitive data. But this solution can lead to a random, inaccurate representation of your production systems and requires a significant amount of work to set up.

Another solution is to create a random set of test data by using union queries that will generate the required quantity of data. Unfortunately, this solution can result in repeated data and doesn't accurately represent the production system either.You can also use SQL Server 2005's TABLESAMPLE option to sample production environment data and copy it to the test environment. However, this solution also has limitations, such as difficulty with handling data that's related through foreign keys and production data exposure.

Even if you get past the first problem of generating adequate test data, you'll still run into a second problem: How do you ensure that every time you run a unit test— potentially many times—you're testing against exactly the same data each time? In many cases, the tests you run alter the data so that the next time you run the test, the data is different. In these cases, the test results might be different—or at the best inconclusive.

Start with a Plan


Team Data addresses these test-data problems by letting you create something called a Data Generation Plan, which defines how to create a set of test data. A functionality called DataGenerator runs this Data Generation Plan to create the test data.To create a meaningful plan, you must understand the database's types and ranges of data, the shape of the data, and the normal data distribution between tables. All of these things are adjustable in the DataGenerator interface, which Web Figure 1 shows. The interface lets you define relationships between tables and define relationship ratios. For instance, if you looked at the shape of the data and determined that, on average, a customer had 10 orders, you can define this ratio to create data that matches that definition.

In addition, the Data Generation Plan lets you define exactly what data is generated into each field for any particular table when the Plan runs.You can define this data by using either one of the nine built-in generators or by creating custom generators. The basic data generators let you create data for all the standard data types and include a regular expression generator that lets you use regular expressions (i.e., RegEX) to define how data is generated. You can use RegEX to define the format, then let the generator create random values for you.This functionality is useful for generating order numbers, zip codes, phone numbers, and the like.

The DataGenerator functionality also avoids the danger of changing the data every time you run a test by assigning a seed value to all the generated data.This seed value ensures that every time you run a given Data Generation plan, the Data-Generator will create exactly the same test data because it is based on the seed.

You can run these Data Generation Plans manually or combine them with the unit test functionality I explained in "PutYour Database to the Test." You can assign a Data Generation Plan to a particular unit test so that every time you run the unit test, the Data Generation plan runs and deploys the same test data to the database that the test is running against. Now that we've defined a baseline, created a set of tests to support that baseline, and ensured our data's reliability, next I'll look at how Team Data features can make it easier to change the database.