Entity modeling is a way of graphically representing a database's design, which organizes the tables that hold user data. Making entity relationship (ER) models (also called ER diagrams) isn't difficult; you just have to practice. You can group ER diagrams into categories that represent sales, scheduling, reservations, asset management, inventory control, and so on. And you can represent each category with a template ER diagram, or metamodel. Just as we define metadata as "data about data," in this article we can define "metamodel" as a "model of models."

To start data modeling, you can work on one metamodel. Then, when you're ready to proceed, you can combine multiple metamodels to form the basis of a more complex design project. When you started writing program code, you probably found one or more code templates and modified them to help you create your programs. Entity modeling is similar; you can start with one or more metamodel templates and modify them to help you create your database design.

The Sales Scenario

One of the most common business scenarios is retail sales, so let's develop a metamodel specifically for that scenario. Before you begin to draw this model, you need to decide on the four or five entities that are part of any retail sale. The CUSTOMER entity purchases, leases, or otherwise acquires products. The PRODUCT entity is an inventory listing of objects that are available for sale, lease, or acquisition, such as red sweaters, rototillers, or purple widgets. The SALESPERSON entity sells or leases one or more products to customers or assists customers during a sale event. SALE is the event—a transaction (for example, a sale, a lease, or an acquisition) between customer, salesperson, and product. LOCATION is the place where the sale event occurs—a physical location such as a brick-and-mortar store or a virtual location such as an Internet shopping site.

Defining the Relationships

First, you need to determine the relationships among these retail-sale entities. Figure 1 shows some of the most important relationships. CUSTOMER to PRODUCT is a many-to-many (M:N) relationship because a customer can buy one or many products and one or many customers can buy a product. CUSTOMER to SALESPERSON is also M:N because over a period of time, a customer can buy from one or many salespeople and a salesperson can sell to one or many customers. CUSTOMER to LOCATION is also M:N; a customer can shop at one or more locations, and each LOCATION services many customers.

You've covered customers, products, locations, and salespeople. But where does SALE fit into this picture? If you decompose each of the three M:N relationships into two-component one-to-many (1:M) relationships, you find SALE. As Figure 2 shows, the interaction of a CUSTOMER and a PRODUCT is a SALE. Likewise, a CUSTOMER and a SALESPERSON interaction leads to a SALE. Finally, a CUSTOMER's shopping experience at a LOCATION results in a SALE. SALE is the common activity of these three M:N relationships.

Figure 3 shows the three 1:M relationships from Figure 2 combined into one entity model. At the center of the model is the action entity, SALE. Each sale involves a CUSTOMER, a LOCATION, a SALESPERSON, and a PRODUCT. The metamodel that Figure 3 shows represents the simple retail-sale definition: A customer buys a product at a location from a salesperson. The database records the sale action in separate transactions. For example, you go into an office-supply store and purchase a case of printer paper, a roll of transparent tape, and a box of pens. According to this model, the database records your buying activity as three separate sales (transactions), one for each product you purchased, in the SALE table.

Refining the Business Model

From a data-model perspective, the simple retail-sale metamodel works. But from a business perspective, designating each item purchased as a separate sale might not be the method you want for recording your data. This approach is cumbersome; when you record your data by item purchased, you have to reconstruct each shopper's trip to the store by time- and resource-consuming sorts and data ordering. And the separate-sale model might actually tell you less about overall buying habits than you want to know. If you build the metamodel on the traditional sale definition, where SALE represents all items that one customer purchases at the same place, time, and location, you might gain a more useful perspective of customer behavior. According to the traditional sale definition, your shopping trip—the occasion when you bought a case of printer paper, a roll of transparent tape, and a box of pens—constitutes one SALE.

To accommodate the traditional SALE definition, you have to modify the simple retail-sale metamodel. As Figure 4 shows, I simply adjusted the SALE:PRODUCT relationship to accommodate the expanded definition of a sale. Now, many sales can involve each PRODUCT, and each SALE can involve one or many products.

Finally, you need to decompose the last remaining M:N relationship into its two-component 1:M relationships. To clarify the entities' meaning and purpose, you might also want to add a few attributes for each entity. Figure 5 illustrates what the final retail-sale metamodel should look like. Typically, you want to capture the name and address of every retail location. For payroll reasons, you need to capture each salesperson's name and Social Security number. And if you want to launch a targeted marketing campaign or start a buyers club, you need customer name and address information as well. For each product, you need a description, the quantity on hand, and the quantity-on-hand value that triggers refill orders with the product vendor. The wholesale and retail price per product unit are also handy items of information.

For each sale, you want to capture the date and time of the sale, applicable taxes, applicable discounts, and the total sale amount. But can't you just dynamically calculate totals in a transactional database? Theoretically, you can. But realistically, when the SALE and SALE_ITEM tables hold millions of records, recalculating the values every time you want to know a total sale amount is impractical. Plus, because wholesale and retail unit prices change, sometimes daily, and tax percentages vary geographically, you might not be able to reconstruct the original total amounts. To show that the total attributes are calculated values, I italicize them in Figure 5. Remember, in an ER diagram, you don't show foreign keys—the relationships imply them. Therefore, you can assume that SALE has three foreign key attributes that Figure 5 doesn't show—the location identifier, the salesperson identifier, and, optionally, the customer identifier.

One or many items, which are PRODUCT instances, comprise each sale. For each sale item, you want to capture the quantity sold (for example, two boxes of pens) and the price they were sold at. (The sale price doesn't have to be either the wholesale or the retail price listed in PRODUCT.) The PRODUCT foreign key attributes, which Figure 5 doesn't show, are the sale identifier, which links each item sold to its parent sale, and a product identifier, which labels each item.

A Flexible Design Foundation

A metamodel is a template that you can use as a beginning point for your own production modeling projects. You can create your own set of metamodels by following these simple steps: Identify the basic set of entities that are common to the scenario you're creating the metamodel for, define the relationships between the entities, and add simple and common attributes that describe each entity. Then, when you apply this metamodel to a real design project, you need to incorporate many more attributes and, undoubtedly, more entities. But the beauty of this retail-sale metamodel is that you can use it to jump-start your design projects, and it's totally customizable. Happy modeling!