Have you ever wanted to troubleshoot or performance tune a problem query, but the 500GB database and 250GB transaction log made it impossible, simply because of the query’s sheer size? Wouldn’t it be great if you could generate a facsimile of the database, keep it small, and still get realistic results from your query tuning?
Thanks to SQL Server experts like Lubor Kollar, Keith Elmore, and Bob Ward, it’s now possible, and in some cases even easy, to create a clone of a database (aka a statistics-only database). The SQL Server Customer Service and Support (CSS) team has long used cloned databases to reproduce a database and to see what estimated query execution plans look like without needing all of the data within the database, and the team is now making its process for cloning databases public for everyone to use. (Note that I previously reported on this type of utility, provided by consultant Joe Chang, in my In a Nutshell blog post "The Concept of a ‘shell’ database.")
In effect, a cloned database includes all of the schema objects of the database (e.g., tables, views, stored procedures), as well as the statistics and histograms (the so-called "statistics blob"). This metadata is quite small by volume but can tell you what estimated query plans look like outside of a large production environment and how those estimated query plans might change when SQL Server is upgraded. Cloned databases are especially useful when the data is confidential, classified, or subject to privacy laws.
New Scripting Features in SSMS
The SQL Server CSS team hasn’t released a database cloning tool or script per se. Instead, SQL Server Management Studio (SSMS) in SQL Server 2005 SP2 and later includes enhanced scripting features capable of providing enough information about the statistical distribution of data inside of tables and indexes to properly clone a database.
The SQL Server CSS team has published its process for cloning a database in the article "How to generate a script of the necessary database metadata to create a statistics-only database in SQL Server 2005 and in SQL Server 2008." It provides step-by-step instructions for scripting your entire database, as well as individual database objects. Also, this article notes that databases with thousands of schema objects can take a very long time to process, and that, because of the natural behavior of SSMS, some exceptionally large database scripts can be cumbersome to work with. Therefore, I recommend kicking off a database cloning session late in the day, before you head home for the evening.
In a nutshell, SSMS uses these scripts to recreate not only the objects in the database (e.g., tables, views) but also all the information in those objects that’s used by SQL Server’s query optimizer. In a hypothetical example, a 300GB database might be only 500MB when cloned, and yet the cloned database will still return the same query execution plans.
Start Cloning Databases Right Away
Because you can clone databases easily by following the SQL Server CSS team’s simple, step-by-step instructions, you’re ready to begin cloning databases right away. You can clone SQL Server 2008 and later databases (Developer Edition, Web Edition, Workgroup Edition, Standard Edition, and Enterprise Edition) and SQL Server 2005 SP2 and later databases (Developer Edition, Standard Edition, and Enterprise Edition).
| Summary |
Benefits: Database cloning is a technique used to create a small, working copy of a database in which the data might be too large or too sensitive for a development environment.
System Requirements: SQL Server Management Studio; SQL Server 2008 and later; SQL Server 2005 SP2 and later
How to Get It: You can download the article “How to generate a script of the necessary database metadata to create a statistics-only database in SQL Server 2005 and in SQL Server 2008” from support.microsoft.com/default.aspx?scid=kb;EN-US;914288.