Preserve your database's records without sacrificing performance

Maintaining history in a production (online transaction processing—OLTP) database can be tricky. You want to be able to retain readily available historical facts in the database while sustaining performance. If you don't correctly control the architecture of this historical data, not only can you significantly increase the number of records in your production tables but you can also cause queries that access these enlarged tables to return inaccurate results. A SQL Server Magazine reader, Mustufa Baig of Toronto, asked me to review a case study that involved maintaining history in his insurance brokerage's database. The problems that Baig encountered aren't unique to the insurance industry; they're common to any organization that needs to retain history in its database.

To understand this case study, let's look at the background. Suppose an organization wants to offer its employees a full insurance package. Because no insurance company offers all plans that the organization needs, the organization knows it has to purchase plans from several insurance companies. Figure 1 shows a schematic that illustrates this concept. Insurance company A offers life, accidental death and dismemberment, and short- and long-term disability insurance. Insurance company C offers health, dental, and vision insurance plans. Insurance companies B and D offer a combination of plans. If the organization deals directly with several insurance companies, it has to juggle plans, often providing overlapping or incomplete coverage and creating much confusion for employees when they file their claims. To avoid the hassle, the organization can contract with an insurance broker—an intermediary between the organization (client) and the insurance companies—to assemble a package of plans that meets the client's needs. By using the broker, the client can greatly simplify insurance administration and offer flexible plan options.

Table 1, page 60, is a simplified example of the kind of insurance-benefit menu that a broker's client might offer its employees. From this menu, an employee can choose one of three kinds of life insurance, one of two accidental death and dismemberment plans, one of two dependent group life plans, and one of two long-term disability plans. The employee never needs to know that these plans come from different insurance companies. The client simply pays the insurance broker a monthly fee for the insurance coverage. The broker's job is to make sure that the appropriate amount of money reaches each participating insurance company.

During the first year after the insurance broker sets up a client's account, record-keeping isn't difficult. But over time, as the client changes its mix of plans and as employees change their levels of coverage, keeping a history of the changes can become a logistical nightmare. Typically, an organization is divided into divisions, and a division's employees are broken down into classes. And if the organization offers different insurance plans to different classes of employees based on job title and status (e.g., full-time, part-time, temporary, permanent), record-keeping becomes even more complicated.

In Baig's case study, his brokerage's executive management wanted an end-of-month report that showed all changes made in the previous month. The changes could be reported at the employee-coverage level or at the client-plan level. Baig wanted to know how to track these changes in his production database while retaining quick access to the data and maintaining production performance. I suggested that Baig approach this situation by

  • making a new record each time a coverage change occurs
  • recording the beginning and ending dates and times of each plan
  • storing the historical records in a history table
  • unioning the historical records and the current records together to obtain a full report

Figure 2 shows an entity relationship diagram (ERD) that illustrates my suggestions. The diagram is a basic model of the solution; to avoid confusion, I've minimized the attribute set. Each of the entities that might be archived into history tables has a set of common attributes: a beginning datetime, an ending datetime, and an archive flag (which I've defined as a SQL Server bit data type with valid values of 1 or 0, representing yes or no, respectively). The solution can work in one of four ways:

Scenario 1. Suppose that in a given year, one of your organization's divisions selects an insurance-benefit package for a class of employee (e.g., part-time or full-time). You have to enter the datetime value for the DivPlanBeginDateTime attribute of the appropriate DIV_PLAN_OFFERING record. DivPlanEndDateTime will be NULL by default, and DivPlanArchive will be 0 (implying no by default).

Scenario 2. The following year, the same division chooses a different package for that same class of employee. To incorporate the change, you first have to modify the current DIV_PLAN_OFFERING record, inserting date and time for DivPlanEndDateTime. Next, you set DivPlanArchive to 1 (for yes). Then, you create a new record for the new DIV_PLAN_OFFERING division or class, entering the datetime value for DivPlanBeginDateTime. The value of DivPlanEndDateTime will be NULL, and the DivPlanArchive value will be set to 0.

Scenario 3. Employees select a plan option for themselves. For this scenario, you begin by creating a new EMPLOYEE_PLAN record for each employee. Then, you enter the datetime value for EmpPlanBeginDateTime. EmpPlanEndDateTime will be NULL by default, and the value of EmpPlanArchive will be 0 by default.

Scenario 4. Employees decide to change the makeup of their plans, effectively creating new plans. In this case, you need to modify each current EMPLOYEE_PLAN record, inserting the datetime value for EmpPlanEndDateTime, then set EmpPlanArchive to 1. Next, create a new EMPLOYEE_PLAN record to hold the new information, then set the datetime value for EmpPlanBeginDateTime to the current date and time. By default, EmpPlanEndDateTime will be NULL, and EmpPlanArchive will be set to 0.

To accommodate all the scenarios, create a DIV_PLAN_OFFERING_HISTORY table and an EMPLOYEE_PLAN_HISTORY table in the current database. Then, create a scheduled job that calls stored procedures to do the following tasks:

  1. Search the EMPLOYEE_PLAN table, then look for any records in which EmpPlanArchive (the archive flag) is set to 1.
  2. Copy the records found to EMPLOYEE_PLAN_HISTORY, appending an ArchiveDateTime to the record being copied.
  3. Delete the archived records from EMPLOYEE_PLAN.
  4. Search table DIV_PLAN_OFFERING, then look for any records that have DivPlanArchive set to 1.
  5. Copy the records found to DIV_PLAN_OFFERING_HISTORY, appending an ArchiveDateTime to the record being copied.
  6. Delete the archived records found from DIV_PLAN_OFFERING.

The benefit of removing records from the current tables after you've archived them is that you can keep the number of records in the current tables to a manageable size if the divisions and employees tend to change their plans often. This data-management philosophy is especially helpful if you begin with a large data set. In addition, reports that query current plan subscriptions will access a different table than the reports that access historical plan subscriptions. You can even place the current and history tables in different file spaces to further spread the I/O load. The disadvantage of using this technique is that if you want to generate the full historical record for each history report, you have to union the current table with the history table. This operation might result in degraded performance.

If changing insurance plans isn't a common event and your data set is small, you might decide that you don't need an extra set of history tables. The positive side of this decision is that all the records—current and past—will be in the same table. Reporting about change will be straightforward and fast. The negative side is that you'll have more than one insurance plan record for each division or class offering and for employee-plan choices. Because of this multiplicity of records, you have to make sure that every query that searches for current plan data is looking for records in which the PlanEndDateTime is NULL and PlanArchive is turned off (0), or you risk returning inaccurate data.

Remember, each design decision you make has trade-offs; you have to balance data-integrity considerations with performance needs. If you decide to create a set of history tables (or even a separate history database), the history tables don't have to duplicate the structure of the production tables. You can choose only the data you think you need to save for historical reporting.