When it comes to Microsoft network service applications, like SharePoint, a pretty package and simple interface mask the complexity of what the software is really doing. Many SQL Server DBAs have been challenged and confounded by their responsibilities to SharePoint databases that seem to magically appear overnight. This article will take some of the sting out of administering SharePoint content databases in a SQL Server instance. First we'll take a quick peek at the schema of the mysterious Windows SharePoint Services (WSS) content database and identify some objects ripe for querying. Then we'll explore specific T-SQL queries that can be used to garner useful details about a SharePoint environment. Finally, we'll take a look at the potential dangers of altering the content database via SQL Server.

It’s important to note that everything you're about to read goes against Microsoft best practices, and for good reason. Messing around with SharePoint databases directly in SQL Server can cause stability and security problems in SharePoint, as well as prevent successful troubleshooting and support when you need it from Microsoft. Carefully consider these risks before employing any of the outlined procedures in a production SharePoint environment. Remember, it's all fun and games until something (the server) gets hurt.

SharePoint's Content Database Schema

Of all the SharePoint databases in a WSS 3.0 or Microsoft Office SharePoint Server (MOSS) 2007 farm, the content database is by far the most volatile. In fact, the content database sees so much action its default recovery model (Full) makes its transaction log file a prime suspect when it comes to storage depletion. You would think a database that’s so popular among routine SharePoint transactions would have a wealth of documentation written about it. However, that’s not the case. In fact, because Microsoft recommends all interaction with SharePoint databases be conducted either through the SharePoint GUI or via programming against the SharePoint object model, there’s little explanation of the content database's structure.

Identifying SharePoint's content databases in a SQL Server instance is fairly easy if the default database name of WSS_Content(\{GUID\}) was generated by SharePoint Central Administration. However, administrators can also create custom names for content databases, so you might need to peek inside a database to determine if it is, in fact, a SharePoint content database. So let's break things down a bit by starting with some user-defined tables that can be found in a SharePoint content database. Each table serves a particular purpose and several of them would seem to be related, yet few referential integrity connections exist. For example, the three most important tables concerning document libraries all have primary keys and indexes but no foreign key relationships with one another (see Figure 1).

Suffice it to say that the WSS content database doesn’t use normalization to its advantage, which makes writing direct T-SQL queries into it a distinct challenge. Furthermore, most of the row data values are identification numbers (some object GUIDs, others internally generated), and although the columns containing these numbers make excellent reference choices for JOIN statements, the numbers themselves aren’t easily recognizable to users.

So how do you determine which tables hold the information you need to get out of SharePoint? Relying on table names isn't the best method because some of the names are misleading. For instance, the AllDocVersions table appears to contain different versions of documents held in a library that has versioning enabled. However, the actual documents are binary large objects of the image data type held in the AllDocStreams table's Content column with a reference to the version ID number from the AllDocVersions table that corresponds to the current version of the document. Talk about confusing! Table 1 outlines some of the more useful tables in SharePoint's content database that might be ripe for querying.

It would seem from Table 1 that SharePoint fails to follow a tried and true Microsoft best practice in SQL Server: Composite primary keys should be created sparingly and only when absolutely necessary. This SQL Server best practice stems from the theory that composite primary keys lengthen the key values of the corresponding index, resulting in less efficient query optimization and disk utilization. However, since it would be inadvisable to alter the table structure of SharePoint's content database, we'll just have to assume the SharePoint product team had their reasons for violating SQL Server best practices and leave the indexes as they lay. Furthermore, the tables listed with an asterisk next to their name in Table 1 have dependent View objects of the same name without the "All" prefix. For example, the Docs view retrieves rows from the AllDocs table. These views retrieve the entire column set of all rows not marked for deletion. So querying the view instead of the underlying table would retrieve a smaller result set but you could be missing desired rows during a salvage operation. These views support the relational data integrity of rows marked for deletion while also making it possible to recover previously deleted items such as by using the Recycle Bin.

None of the table objects listed in Table 1 contain any foreign key constraints, and as you can see from the Primary Key columns, almost all of them contain duplicate data. The content database is highly denormalized, a condition that enhances OLAP processing. Performance indicators will show more efficiency during read operations than during write operations. Furthermore, the dependencies of these tables list a bevy of stored procedures and functions that SharePoint employs for error control and row manipulation. Any direct queries into these tables should be written so as to eliminate duplicate or unrelated row data in the result set.

Querying the Content Database

Now that we've seen a few tables that contain useful information about our SharePoint environment, let's take a look at writing T-SQL queries directly into them. Keep in mind that the same results could be obtained programmatically by writing .NET code against the SharePoint object model, but for SharePoint administrators or SQL Server DBAs a new Query window in SQL Server Management Studio (SSMS) is sometimes quicker than tasking the developers in IT with a new project. Here are a just a few common scenarios that can be easily resolved by simple T-SQL queries.

Determining Which Site Template Was Used

In an existing SharePoint environment, it can often be difficult to determine which site templates were used to generate the sites, especially if administrators have customized the pages or added and deleted lists, libraries, and Web Parts. A quick query to the Webs table of the content database will reveal template and configuration information that can be translated by reading the appropriate XML file contained in SharePoint's "12 hive" (i.e., \%SystemDrive%\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12 of the SharePoint server). This information will reveal the templates used for every site in the database. Although SharePoint's default behavior is to generate a separate content database for each web application, it’s possible to span a single web application across multiple content databases or combine multiple web applications into a single content database. The following query reveals template information for all sites in the content database, regardless of web application assignment:

SELECT Title, WebTemplate, ProvisionConfig
FROM \[dbo\].\[Webs\]

To determine the template of a particular site, use a WHERE clause to filter the site by Title, Description, or Site ID number, as the following command shows:

WHERE Title = SiteA

The Web Template ID number and Provision Configuration ID number returned by the query might not be recognizable at first. The more friendly text name of the template can be garnered from one of the web configuration XML files located in the 12 hive on the SharePoint server. For example, WSS 3.0 site templates are listed in the webtemp.xml file, while many of the MOSS 2007 templates are defined in the webtempsps.xml file. With some experience, you'll be able to identify the template by its ID and Configuration Option numbers and not need to read these XML files.

Collecting Column Definitions

Imagine a scenario in which you suspect various lists have columns that are too generous with their storage size and you need to quickly determine the construction of all columns from a particular list, library, or gallery to prove it. Visiting the properties of each column individually in the GUI is too time consuming, and writing code might be too complex. However, simply querying the AllLists table in the content database can reveal column information about any and all lists throughout your SharePoint environment. The AllLists table contains a row for each list, library, and gallery throughout the logical portion of SharePoint that uses the given content database. Of the many columns in this table, the tp_fields column (ntext data type) contains detailed information about all of the columns in that row's particular list, library, or gallery. You can use the following code to query the tp_fields column:

SELECT tp_fields
FROM \[dbo\].\[AllLists\]

Figure 2 shows the output from this command laid out like an XML file, with a separate tag for each column divulging details such as column type, whether a value is required, size limit, and default assignment.

However, finding the column of interest can be a bit difficult if the same column name is used in more than one list. It’s best, if possible, to isolate this query to a particular list, library, or gallery by using a WHERE clause to avoid misinformation. Also keep in mind that although the values in the tp_fields column look like XML, they are actually ntext strings, so if you need to extrapolate only one column's worth of information you'll need an expression such as substring().

Gathering Security Information

It might be valuable to know who is accessing the databases you support. Although SharePoint maintains its own security architecture, from an audit and logging perspective, it would be nice to quickly see the security principals and permission assignments SharePoint is using to grant access to the content database. For example, say you just want a quick and dirty list of all users who have access to the sites in SharePoint—not individual permission levels or anything, just a list of names. Executing the query

SELECT dbo.Webs.FullUrl, dbo.Webs.Title, dbo.UserInfo.tp_ID, dbo.UserInfo.tp_Title
FROM dbo.UserInfo JOIN dbo.Webs
ON dbo.UserInfo.tp_SiteID = dbo.Webs.SiteId

returns results similar to those shown in Figure 3.

There are many more queries that will retrieve information directly from SQL Server about your SharePoint environment. Once you become familiar with the tables from the content database listed in Table 1, you'll find all kinds of new ways to decipher the myriad of ID numbers throughout the rows to join descriptive titles from other tables and produce recognizable result sets.

A Word of Warning

The T-SQL statements we've looked at so far have all been relatively harmless. The true "moving parts" of a SharePoint content database are the stored procedures and functions that manipulate the table rows. Although it might be tempting to get underneath the hood of these objects, be aware that altering a stored procedure could cause otherwise dormant triggers to fire and disabling corruption to ensue. The stored procedures and functions shown in Table 2 are best left alone in production SharePoint environments.If you really want to see what a stored procedure is doing, consider scripting it to a new query window or file in SSMS.

Explore These Objects in a Test Environment

This article has shed a bit of light on the schema of the WSS content database and identified some objects ripe for querying. There’s much more to the SharePoint databases, far more than can be covered in a single article, but the objects outlined herein should give you a starting point. Exploring these objects via SQL Server should always be performed in a development or testing environment and never in production. A simple slip of a mouse click could render a stored procedure capable of corruption, so be careful. For more information about these and other SharePoint objects in SQL Server, see the WSS library available at msdn.microsoft.com.