In the past two issues, I've discussed data modeling, explaining how to gather requirements and model the business processes of a fictitious publishing company. Now let's look at the entity model, which is the third step in the data-modeling process. I'll create two types of entity relationship diagrams (ERDs), explain how each connects to the data-flow diagram (DFD) of the process model, and show how they provide a more detailed view of the company's data requirements.

A model is a representation of reality. The entity model (also called a conceptual model) displays the data a business collects and uses. The ERD, which represents the entity model, shows what the database or system will look like but doesn't explain how to implement it. An ERD is vendor-neutral. It groups data into entities and relates the entities to one another based on business rules. An entity is something that has substance and about which you want to store data. A business rule is a policy or procedure that a business enforces. It describes how the business operates.

Like architecture or engineering, data modeling is both art and science. Many solutions exist for each situation; personal or corporate preference plus experience determine the final design. For convenience, I'll use the common Crow's Foot notation, which many CASE software packages support.

Creating an ERD

You construct an ERD of graphical components, mostly rectangles (representing entities and their properties) and lines (representing relationships and connecting entities to each other). Some methodologies (e.g., Chen) use diamond-and-line combinations to represent the relationships between entities and ellipses to represent attributes. I'll use the DFD from "Process Modeling," May 2000, to help construct an enterprise ERD, which is the executive summary of entity modeling. Then I'll expand the enterprise ERD into a detail ERD, which presents the same information in more complexity and depth.

The Enterprise ERD

As Figure 1, illustrates, the enterprise ERD contains only entity names and relationships and typically doesn't resolve all many-to-many (M:N) relationships, as the M:N relationships between Employee and Skill and between Publisher and Author show. The diagram purposely omits entity properties because, like an executive summary report, an enterprise ERD provides a general perspective of the forthcoming database design. You could easily display this ERD at a meeting of people unfamiliar with a database's technical details.

I've taken some liberties with this enterprise ERD. I've tagged each relationship with the corresponding process number from last month's DFD. The entities in this ERD correspond to external entities or data stores in the DFD. For instance, locate the entities Employee and Skill on the ERD. Notice that theirs is an unresolved M:N relationship, and the corresponding DFD process is 3.1, Skills Evaluation. In the DFD, this process begins when an employee takes a skills evaluation, which produces an Employee Skills Listing.

The ERD relationship between Employee and Skill reads, "An employee can have zero or many skills; a skill can belong to zero or many employees." In the DFD, both Employees and Employee Skills Listing are data stores—one external, one internal. In the ERD, both Employee and Skill are entities. Be aware of this intimate correlation between the process model DFD and the entity model ERD. As you develop the ERD, you might uncover relationships that weren't obvious in the DFD, such as the direct relationship between Employee and Publisher. The process model (DFD) looks at an environment from the processing perspective; the entity model looks at the same environment from the data perspective. Together they reveal an inclusive picture of the situation you're modeling.

The Detail ERD

The detail ERD is an extension of the enterprise ERD. Based on your requirements analysis, you add properties (also called attributes) for each entity. Resolve all M:N relationships from the enterprise ERD by creating an associative entity—one that cross-references the two entities in the M:N relationship. The associative entity contains records that identify which instance of one entity relates to which instance(s) of a second entity and vice versa. In the detail ERD, you introduce reference entities, which will contain lists of values that limit the domain (the set of valid values) of an entity property. Reference entities will become lookup tables at a later stage of data model development. In Figure 2, the entity Pub_Category is a list of publication categories (e.g., book, print magazine, online magazine, online newsletter), and each instance of Publication must fall into one of these categories.

Finally, you can generalize or special-ize, creating supertype or subtype entities. (For a review of supertype and subtype entities, see "Supertypes and Subtypes," May 1999.) To generalize, analyze the contents of each entity for similarities that require the creation of a supertype entity. To specialize, determine whether you can categorize an entity into subtypes. In Figure 2, the new entity Person is a supertype that describes the Employee and Author subtypes. You'll store all characteristics common to both employees and authors as attributes of Person. Store employee-specific characteristics as attributes of Employee, and author-specific characteristics as attributes of Author. You can thus categorize an individual as a Person, an Author, an Employee, or an Author and an Employee.

Identify Entities, Relationships, Properties

The first step in building an ERD is to identify the major entities. As I mentioned, an entity is something that has substance and about which you want to store data. If you're working from a DFD, every data store will map to at least one entity. If you're using a text-based case study, such as the one I presented in April, look for a noun or a verb that might identify an entity. Noun entities represent people, places, and things. Verb entities represent actions and interactions between the noun entities. I used the DFD (a data store) and the requirements specified in the case study to identify each entity in Figure 1.

How does each entity relate to other entities in the model? Relationships fall into three categories: one-to-one (1:1), one-to-many (1:M), and M:N. You must break down each M:N relationship into two 1:M relationships, with an intervening associative entity acting as a cross-reference for records. As you construct the ERD, if you're not sure how one entity relates to another, review the requirements and, if necessary, analyze more deeply. Data modeling is a repetitive process of discovery and rediscovery, and you must sometimes revert to a previous step to resolve confusion that arises as you continue to explore the details. I identified each relationship in Figure 1 either by the DFD (a process) or by the requirements specified in the case study.

Each entity has properties and characteristics, which you implement as a set of attributes that describe each instance of an entity. If you identify an object as an entity but can't identify any of its attributes, you need to reconsider your evaluation. Generally, if you find an entity with no properties or characteristics, either it's actually an attribute or you don't know the data well enough yet and need to revisit the requirements.

While you're creating the detail ERD, you need to test each entity's group of attributes for proper normalization. (For information about data normalization, see "Why You Need Data Normalization," premiere issue.) Make sure that for each attribute group, one attribute exists that uniquely identifies each instance of the entity. This attribute will be your primary identifying attribute (primary key, at a later stage of data modeling). For example, PersonID in the entity Person is a primary identifying attribute. Make sure that all the other attributes depend on this primary attribute for their meaning. Avoid transitive dependencies, in which one non-key attribute defines the meaning of another non-key attribute. The following mnemonic device can help you remember the steps of normalization: The key (first normal form—identify a proper primary key), the whole key (second normal form—make all non-key attributes fully dependent on the whole primary key), and nothing but the key (third normal form—be sure no non-key attributes are defining values), so help me Codd! (Dr. E. F. Codd—father of the relational theory in database management systems).

To support the activities of a business, a database needs a good blueprint and foundation. As part of this essential foundation, the ERD maps a corporate environment from the data perspective. When you combine the ERD with a process model (DFD), you can generate a comprehensive picture of the corporate environment.