Each data warehouse is a repository of corporate data, the historical data that defines an organization. As a data warehouse administrator, you’re responsible for its health and welfare. Did you realize that you’re also the keeper of the corporate secrets, the jewels of data that can make or break an organization and that will invariably be the subject of an audit?

Buried within the data warehouse are transactions that record events. Some of these events might end up as part of internal and external forensic investigations, also called electronic discovery (e-discovery) projects. “Discovery” (in law) is part of the pre-trial phase of a lawsuit; “e-discovery” refers to discovery of the information that’s stored in electronic form as electronically stored information (ESI). (To learn more about ESI, see the sidebar “What is ESI?”).

Forensic investigations, both internal and external, are more and more common (think Goldman Sachs, AIG, or Bernie Madoff). At some point in your career you very well might get a request, possibly in the form of a court order, to hand over ESI that’s under your care. What can you do to prepare for this eventuality? The following Q & A list should help you get started.

Why Are You an E-Discovery Target?

You’re the keeper of the data warehouse. The data warehouse is populated with historical ESI that defines an organization, what it does and how it does it, and that’s what an e-discovery forensic team will be looking for.

Who Looks for ESI Data?

Your own legal staff and law firms your company might hire to defend itself could want to search through ESI data. In addition, here are some examples of other entities who might want to search your company’s ESI data: Law firms representing anyone suing your company, the SEC (when it investigates Sarbanes-Oxley issues), and the Department of Justice (when it investigates whether your company has ever paid bribes to foreign officials, for instance).

What Data Does an E-Discovery Team Want to Review?

An e-discovery forensic team is going to ask to review ESI that’s produced by any and possibly all software applications that are used within a company. This includes but isn’t limited to email, instant messaging, information worker applications (such as Microsoft Word, Excel, etc.), and database systems that support corporate functions such as human resources, accounting, stock trading and options tracking, customer relationship and sales force management, enterprise resource planning, and supply chain management. The e-discovery team is going to ask for reports that are produced as part of normal (and abnormal) business operations. In addition to current data, they’re going to mine for gold in all backup media that they can identify.

What Is an E-Discovery Team Looking for?

Just as with any other type of forensic investigation, an e-discovery team will look for anomalies—anything that isn’t defensible or explainable. They’ll take a very systematic approach to investigating the data, and anything that relates to their case is fair game.

What’s the Scope of a Typical E-Discovery Investigation?

Scope varies depending on the magnitude and number of issues involved in the case. The e-discovery team will create a custodian list, which is a list of past and present employees who will be interviewed and who will have their data collected and reviewed as part of the investigation. The team will define the time-frame for the investigation, and the information that needs to be collected and reviewed. The IT organization is invariably involved in these investigations. It’s possible that the time period defined by the custodian list will include applications and even database systems that are no longer used, but the forensic investigators will want to review data from the backups that were taken, nonetheless.

How Will the Investigators Examine the Data?

Preserving ESI in its original form is of paramount importance because electronic evidence can so easily be modified. Members of the forensic team, experts in the use of forensic tools and techniques, will preserve the ESI by one of two methods—forensic imaging or making a forensic copy of the hard disk. A forensic image is a complete sector-by-sector copy of the hard disk, which includes all deleted files and slack space. (To learn more see the sidebar “What is Slack Space?”) In a criminal investigation the forensic image will be the technique most likely used. For civil investigations, such as normal internal auditing, the forensic copy technique is standard. The forensic copy method takes file-level copies of documents such as email messages, records of time worked, prices paid, etc., and makes copies of these documents without changing anything including the date last accessed. Special software has to be used for both these operations in order to preserve the original condition of the data as it appeared on hard disk. (For additional information on forensic image and forensic copy, see the sidebar “What Are Forensic Images and Forensic Copies?”)

To record the preservation of the ESI that the investigators collect, the forensic expert must maintain detailed documentation from the time of collection through the investigation process. This is called the documenting the chain of custody. If the chain of custody is broken, then questions and concerns can be raised about data tampering, which can lengthen the investigation and add to the total cost. In a worst case scenario, a broken chain of custody could cause evidence derived from the ESI to be invalidated, possibly involving your company in another court case.

Where Is My Data Going?

The collecting and processing of ESI is typically not done in-house because that would constitute a possible conflict of interest. You would turn over a copy of the ESI to outside investigators; they will conduct the analysis required for the case at a location of their choice.

What Data Will the Investigators Seek?

Typically, forensic investigators will be looking to reconstruct or deconstruct situations involving, but not limited to, determining legally privileged documents, assessing financial reports, building detailed reviews of individual transactions related to the case, and extracting emails, IMs, text messages, and phone logs relevant to the case.

What Will Investigators Do with the Data?

The forensic investigators will analyze the data. Obviously, they can’t sift through the mountains of ESI by hand (or eye). They’ll employ methods of data management such as advanced culling, de-duplication, conceptual clustering, native document review, TIFF conversion, and redaction. They’ll track email and IM threads and review text messages for relevancy. From what they find they’ll reconstruct the events that surround the particulars of the case. Invariably, they’ll build and maintain an ESI archive that can be searched throughout the life of the investigation and, if necessary, any subsequent litigation.

How Can I Prepare for E-Discovery?

Be organized, and document thoroughly. This sounds simple and, in an organization where you’re running short-staffed and over-burdened, impossible. However, it’s very important that you strive for both organization and complete documentation. The volume of ESI is going to continue to grow. Governmental oversight and compliance isn’t going away. Internal forensic investigations, if they’re not common now will become so. Investigations cost money in terms of time and disruption to normal processing. You can minimize the cost of these investigations if you’re organized, and if you have good documentation in place. Documenting the methods used while processing transactions can validate to the investigative team that information given to compliance regulators and outside auditors is both appropriate and accurate. Without this documentation, your company will be suspect, its output will be in doubt, and its methods will be under suspicion.

Document legacy systems, even those that are no longer in use. If the legacy software documentation was long gone by the time you arrived on the scene, do your best.

Backups to tape and to disk, typically don’t have good documentation covering what data is stored on which piece of backup media. If the investigative team determines that it’s necessary to restore and review the contents of stacks of backup media, then this lack of documentation is going to cost your company big-time. Going forward, use a tape catalog scheme that provides detailed listings of the files stored on a backup tape. For legacy backups, find a vendor or software package that can do volume processing and content cataloguing, and reconstruct the legacy backup catalog.

Cooperate with the investigative team. Help them as they produce the e-discovery reports, which might consist of interview notes, ESI chain of custody tracking, file processing statistics, and production details. After all, who knows this stuff better than you? Help them develop counts of documents produced by custodians for each year, so that missing data can be easily identified or, conversely, to show that there is no missing data. If there are gaps in the document counts, help the investigators explain the anomalies. This can shorten the process of answering questions from an audit committee, from regulators, or from outside auditors.

In our business, environmental oversight and compliance regulations are constantly increasing. As part of the IT group, and as the keeper of the corporate data warehouse, you’ll almost surely be caught up in a forensic investigation with either internal or external auditors. There is an e-discovery in your future, so be prepared.

Many thanks to my colleague and former SQL Server Magazine and Windows IT Pro magazine contributing editor, Michael D. Reilly, JD and e-discovery specialist, for his contributions to this article.