Extend the Music Collection database to include storage-location data

One of the unexpected benefits of writing this column is reader feedback. Whether my readers find my columns helpful or disagree with what I say, I can count on a steady stream of lively and thoughtfully composed letters. One dissenting letter came from Paul Gooding of Phoenix, who has a different perspective of the Music Collection database I redesigned in "Real-Life Database Design," April 2001, InstantDoc ID 19948. Pointing out a valid shortcoming of my redesign, Gooding said that both the design I started with (the Microsoft Access Music Collection database) and the redesign example seem to keep track of CDs as if I were going to inventory and sell them. He said that in building a database for his own use, he would want to focus on music as music (and therefore tracks). And in his model, the CD would be a container or storage location and useful only when he needed to physically retrieve a track. For that situation, he would create a nested storage model that stores a track on a CD, which is in turn stored in a specific location.

Gooding is right, of course. When I was developing the Music Collection database, I was more concerned with entering data than with accurately defining the requirements—the first law of database design. In my original column, I failed to conduct a thorough analysis of what I wanted to do and how I wanted to do it. Figure 1 is a recap of the final data model from the April 2001 article, and as you can see, the model doesn't name a storage location for the music. The statement of requirements that I made in that article was as follows: "A music collection database should be able to completely inventory your music so that you can search by artist, by title, or by category. Then, you can assemble a playlist from your findings. For example, if you want to construct a playlist of all Alasdair Fraser tracks, you should be able to do that, even though the tracks might be scattered over many CDs, including compilation CDs."

The design that Figure 1 shows clearly lets you search by artist, by title, or by category code, and it lets you assemble a music playlist. However, what it doesn't do is tell you where this music is physically stored. If all your CDs are stored in a CD jukebox, you know where they're physically located. Or if all your music is in MP3 files stored on a music server, physical location of each track isn't a concern for you. However, for most people, like Gooding, the original Music Collection database isn't as useful as it could be.

In Gooding's opinion, you need to design an all-purpose database that can accommodate a variety of nested-storage schemes. Then, you can record the location in which you stored one or more kinds of items that might be anything at all (e.g., music tracks, pickles, old magazines). And when you're designing the database, you should concentrate on what you really want to keep track of. Again, Gooding has a point. If the main purpose of your database is to keep track of locations, the items you're trying to locate don't matter. The central tables in the database—the most important tables—are those that name the storage location of whatever you're storing. And, as Gooding said, the music CD database should contain only two items of value: the music track and its location.

Let's redefine the requirements and expand the data model to incorporate Gooding's suggestions. Figure 2, page 56, shows an extension of the original Music Collection data model. (See the sidebar "PowerDesigner's Conceptual Data Model," page 56, for more information about how to interpret this data model.) Like every other data model, Figure 2's model has its trade-offs. In either Figure 1 or Figure 2, an artist records zero to many tracks, but a track can be recorded by only one artist. However, that rule doesn't always hold: Two or more artists can record one piece of music (which the Track entity stands for). To represent that multiple-artist condition, I'd have to change the relationship between Track and Recording Artist to a many-to-many (M:N) relationship. And if I made that change, I'd complicate querying and data entry by adding an associative table into every query.

Using Figure 2's model simplifies the design, but I'll encounter data redundancy when I start to input data. For each Track that includes more than one artist, I'll have to create a separate row for each artist. For example, the Trio CD, featuring Linda Ronstadt, Dolly Parton, and Emmylou Harris, contains multiple tracks recorded by all three artists. Therefore, Figure 2's design places three records in the Track table for each song that this threesome recorded.

I made a judgment call on this design. Based on my knowledge of the data that's going to be entered into this database, I chose the data model in Figure 2. I know that in my music collection, the number of tracks recorded by multiple artists is very small—less than 2 percent. Therefore, I'm willing to incur a little data redundancy in favor of the simpler design. To offset the data redundancy, I'll write trigger code to handle modifications to the redundant data in the Track table so that my data won't be unsynchronized for more than a few microseconds. Then, I'll monitor database performance so that if the amount of data stored in the database rises substantially and the resulting increase in trigger firings begins to adversely affect database performance, I'll be aware of the problem. At that point, I'll reevaluate my design decisions and probably normalize more intensely.

Note that both the highly normalized data model (Figure 1) and the less-normalized data model (Figure 2) let you satisfy the initial requirements of the Music Collection database; that is, they both let you search by artist, by title, or by category, and they both let you assemble a music playlist. The new entity in Figure 2's model that lets you conform to Gooding's storage-locator scheme is Location (the place where the CD is physically stored), whose relationship to Recording is 1:M. The Recording entity has one foreign key—Location.LocationID. Now this model meets Gooding's requirement for a nested (actually a tri-level) storage model. I can find any Track in a Recording row, and I can find any Recording in a row in the Location table.

The dilemma that extending the original Music Collection data model illustrates is one of the biggest challenges to data modeling. Do you simplify the model and incur data redundancy (clearly a violation of the rules of data normalization), or do you normalize like crazy, reduce data redundancy to nil, but incur performance penalties and the wrath of the end users and database programmers? You might also run the risk of diminishing database performance because of the high overhead involved with joining so many small tables. I don't see an easy answer, at least not in the real world. I recommend that you take each situation that you encounter and evaluate it on its own merits.