Diagram your data with Visio 2000

Many readers have asked me what kind of software package I use to create the data models that I include in my columns. So this month, I spotlight the CASE software tool I use most often—Microsoft Visio. Note that Visio encompasses many components, but I describe here only the components that help me do data and database modeling. For a full Visio product description, see http://www.microsoft.com/presspass/guides/visio.

Anatomy of a CASE Tool


Although Microsoft categorizes Visio as a drawing package, I use it as a CASE tool to help me conceptualize, plan, model, and implement programs as simple as databases or as complex as software systems. By using a CASE tool, you can drive through the stages of the systems-development life cycle, document your ideas and concepts, and stay current with your naming conventions. You can also generate objects such as databases and track changes you make to these generated objects. Most important, a CASE tool can lead you through flow diagrams (for processes, data, and systems), hierarchy and data models, and Web site and network diagrams. Before you begin using a CASE tool, you need to know why, when, and how to use those diagrams and models. The tool then helps you create diagrams and data models more quickly by guiding you and enforcing certain design rules.

Visio 2000 vs. Visio 5.0


Shortly after Visio became a Microsoft subsidiary in early spring 2000, Microsoft released the newest version, Visio 2000, to replace Visio 5.0. Visio 2000 carries a lot more power and capability than earlier versions, but figuring out how to accomplish the tasks in Visio 2000 that you glided through in Visio 5.0 can be a challenge.

The framework for database data modeling changed dramatically from Visio 5.0 to Visio 2000. For starters, complexity has increased significantly in Visio 2000. Visio 5.0 is elemental and easy to use. You can choose from only four database templates (notation styles). You can forward-engineer only to a Microsoft Access 7.0 database platform. And you can use only those database-specific functions that Figure 1 shows (in the menu bar, choose Tools, Database Design). Despite the seeming paucity of features in Visio 5.0, I was able to use this tool to create many data models and implement numerous databases.

In contrast, Visio 2000 is more full-featured than Visio 5.0. You can accomplish either ad hoc or formal database, software, Web, and network modeling. To enhance database modeling, Visio 2000 has incorporated the CASE tool InfoModeler into its architecture. If you've used InfoModeler in the past, the new Visio design framework will be familiar. If you haven't worked with InfoModeler, you have lots to learn. An InfoModeler feature that is especially helpful to data modelers is its ability to enforce consistency and protect users from design mistakes by flagging data model inconsistencies. Another plus for Visio 2000 is the increased prominence it gives to database modeling. The database-specific functions are more plentiful, and they're now visible at the head level in the menu bar, as Figure 2, page 76, shows.

Despite Visio 2000's expanded features, the CASE tool currently suffers from a few failings. Displaying the output from a reverse-engineering session is a significant challenge for data modelers. You can select all the tables at once from the Tables list and drag them over to the drawing page to display the output. But don't expect to produce a quality—or even readable—image with this move. In Visio's earlier versions, the output is usually a well-designed display. Visio 2000, however, dropped the object-positioning algorithms, and the resulting display—especially for large models—is disorganized and incomprehensible. For example, tables frequently slide off the edge of the paper and even the screen. (Microsoft assures me that the development team is working on the problem.) Your time-consuming alternative is to select the tables individually from the Tables list, as Figure 3 shows, and place them on the drawing page one at a time. As you place each table, the defined relationships between the objects automatically become clear.

The problems with Visio 2000 don't end with output display glitches. Microsoft still needs to resolve a few bugs in the interface between Visio 2000 and SQL Server 2000. Notice that the reverse-engineered data model in Figure 4 doesn't include the complete data type. For example, most of the varchar data types fail to list the data lengths, although most of the char data type fields list them. If you reverse-engineer from a SQL Server 7.0 data source or an Oracle8i data source, you won't encounter this limitation. The Visio development team says that it's delaying the release of the SQL Server 2000 driver until the team decides how to incorporate user-defined data types.

Visio 2000's documentation also leaves much to be desired. Microsoft slighted the self-training materials, user guides, and Help files in the rush to market. For a short list of Internet resources, see the sidebar "Finding Online Help."

Choosing a Visio Product


Notwithstanding its challenges, Visio 2000 is a powerful tool for a broad range of users. The application comes in four editions­Standard, for the business professional; Technical, for the engineer; Professional, for the IT professional; and Enterprise, for the IT specialist. The Professional and Enterprise editions work best for data and database modelers. Although Visio 5.0 Professional Edition permits reverse and forward engineering, Visio 2000 Professional Edition includes only the reverse-engineering capability. Former Visio 5.0 Professional Edition users who want both features must buy Visio 2000 Enterprise Edition. I don't see any real advantage to using Visio 2000 Professional for simple database reverse engineering; you can do that within Enterprise Manager by building a database diagram. Of course, Visio 2000 Professional Edition has many useful functions that aren't related to data modeling, such as AutoDiscovery of network components and file and Web structures, so I encourage you to investigate its capabilities for yourself.

If you decide to buy Visio 2000 Enterprise Edition, you'll appreciate its added functionality; the Enterprise Edition supports the entire conceptual-logical-physical modeling scheme. To begin the modeling process, you can create a concept model by using Object Role Modeling (ORM), a notation that conceptualizes an idea and presents it to nontechnical decision makers. Next, you can transfer the ORM into an entity relationship (ER) source model, a notation style that I use to display logical modeling. Then, you can build out the ER source model, adding attributes, data types, primary- and foreign-key definitions, indexes, triggers, check constraints, definitions, and referential integrity (RI) behaviors. If your project is large, you can even divide the ER source model into submodels and give each submodel to a different data modeler to work on. Then you can roll all the submodels into a single database model, which is the physical model that you can generate a SQL Server database from. (The Enterprise Edition is the only edition that features the Generate database function.) Visio 2000 Enterprise Edition also lets you retain live links between the database model and the generated database so that each picks up changes that you make to the other.

Sample Project: The Student Retention Data Mart


I realize that justifying the purchase of Visio 2000 Enterprise Edition could be difficult (Microsoft's price is $999), so I'm going to demonstrate why you might need it. Working from a prototype data mart that a colleague created, I show you how to reverse-engineer, analyze, and redesign a database, then forward-engineer a new database. I carried out this project on a server running Windows NT 4.0 Service Pack 6 (SP6), SQL Server 2000, and Visio 2000 Enterprise Edition. Because Microsoft hasn't yet released the Visio 2000 drivers for SQL Server, I copied the SQL Server 2000 database structure over to a SQL Server 7.0 environment by using Data Transformation Services (DTS) and reverse-engineered the SQL Server 7.0 version (risking loss of any SQL Server 2000-specific variations on data types). Alternatively, you can wait until Visio releases the drivers for SQL Server 2000, manually correct the reverse-engineered SQL Server 2000 model (yikes!), or import SQL Server 2000 tables into Access (2000 or 7.0) and reverse-engineer the Access database (but then you'd see Access data types).

My colleague developed the data mart I use in this project to help identify student retention problems for a local university. Because he's a business analyst, not a data modeler, he organized a set of five tables around the business requirements he had established for his new project. From his specifications, I transcribed the tables into SQL Server 2000. Then, using the Reverse Engineering feature, I migrated the table structure into Visio 2000.

Reverse engineering in Visio 2000 Enterprise Edition is a little different from previous Visio versions. You start by opening Visio 2000. When the welcome screen prompts you to create a new drawing, choose the Database Model Diagram drawing type, then click OK. From the top menu, choose Database, Reverse Engineer. When the Reverse Engineer Wizard opens, you should be able to select SQL Server as an installed Visio driver and the database you want to reverse-engineer as a data source. If your database isn't listed, you need to create a new data source for your database. After you identify and choose the data source, click Next. You might need to log in to the database.

Select the object types to reverse-engineer (typically, you select only tables for your first draft), and click Next. In the following window, you can select the tables and views you want to reverse-engineer. Click Next. In the last window, you can review your selections, then click Finish. The screen that appears shows a list of messages from the reverse-engineering operation in the Output window and a list of tables in the Tables window, which Figure 3 shows.

After reviewing the project's business requirements, I determined that a reengineering of the original reverse-engineered model would better serve the clients' interests. Consequently, I modified the following four features.

Elevating the site attribute. The data mart's simple design comprises several dimension tables (Term, Course, Section, and Student) that surround and describe the central fact table (Enrollment), as Figure 4 shows. Site is an attribute of the section dimension table. I decided to convert the site attribute to a separate dimension to make it easier for end users to query the data mart by site alone, or by section alone. Figure 5 shows these changes and the modifications I describe in subsequent paragraphs.

Limiting a primary key. I decided to limit the Enrollment fact table's primary key to a single field, EnrollmentID, for two reasons. First, when a primary key consists of several fields, I must write time-consuming join queries on each of the fields. Figure 4 illustrates the primary key's five-part composite in the original data mart. Second, if I declare an identity column as the fact table's primary key, then SQL Server has more options when it optimizes queries.

Removing the wagon wheel. In a wagon wheel schema, dependencies among the dimension tables form the wagon wheel rim. The relationships between the dimension tables and the central fact table become the spokes. A wagon wheel schema is too complex to maintain in a data mart or data warehouse, especially if you plan to enforce those rim relationships. New data loads continuously into each table according to different schedules, so interdimensional table dependencies could complicate table loading and maintenance. Such a wagon wheel schema is beginning to develop in Figure 4 between the course dimension and the section dimension, so I removed it in the final model by selecting the relationship line between those two dimension tables and pressing the Delete key.

Changing relationships. I decided to change all the relationships in the data model from identifying to nonidentifying for pragmatic reasons. Because data loading from the source databases occurs at varying intervals for the different tables, trying to enforce RI at the table level would inhibit the data mart's efficient operation. In addition, I changed the relational behavior from enforced RI to unenforced RI on update and delete of the dimension records. The original rule was "On delete or update of the dimension record, cascade the operation to the associated fact records." But my client plans to truncate and reload the dimension tables periodically according to the modification activity at the data source. If I had left that rule intact, the resulting repercussions—an enormous amount of delete and update activity in the data mart—would shut out all other processing. In the reengineered data mart, which Figure 5 shows, neither an update nor a delete in any of the dimension tables would cause a corresponding change in the fact table. Consequently, the client can update the fact table on a schedule that's appropriate to the operation.

When you're happy with the changes that you've made in a reverse-engineered data model, you can forward-engineer the new architecture to generate a new database and store the current database image in the model for future design synchronizations. But, as I mentioned earlier, you can forward-engineer only in the Visio 2000 Enterprise Edition. If you're working in Visio 2000 Professional, you must manually write a SQL script to modify the existing tables or create a new set.

A Powerful Modeling Tool


Visio 2000 is a substantial upgrade from previous versions. Although it might still be the best buy for your dollar in the CASE tool market, the logical upgrade for Visio 5.0 Professional Edition users—Visio 2000 Professional Edition—now lacks some of the functionality that data modelers need. Although this version of the product is short on user documentation, the new features are in place and work relatively well. Online Help is available, although it's a little disjointed, and the forums and newsgroups are active. But if you're a data modeler, acquiring Visio 2000 Enterprise Edition would be a worthwhile purchase, whether you're working on a team or alone.