Large page allocations don't have to eat up all your space
For the release of SQL Server 7.0, Microsoft completely redesigned SQL Server's storage architecture, changing not only the size of its data and index pages but the way it keeps track of which pages are in use. The original (preSQL Server 7.0) algorithm caused few problems; the page size was 2KB, one-fourth the size of the 8KB pages in SQL Server 2000 and 7.0. Larger page sizes lead to the potential of more wasted space when SQL Server allocates pages to a table that doesn't use them. Let's examine in detail how SQL Server allocates space to a table, then look at an undocumented command for discovering a table's space allocation.
All About Allocations
In my April 1999 column, "The New Space Management" (InstantDoc ID 5110), I wrote about SQL Server 7.0's data-storage structures. Most of what I wrote is still valid, but SQL Server 2000 contains a few changes. The most obvious change in the space-management algorithm between SQL Server 7.0 and 2000 is that SQL Server 7.0 allocates space to an object as soon as you create it. Listing 1 shows a simple setup script I presented in April 1999. The script builds a table that has very large rows—one row fills an entire page. Because each new row requires a new page, you can see the effect of page allocation more easily.
In SQL Server 7.0, if you create the largerow table, then immediately run the sp_spaceused procedure, you get the output that Figure 1 shows. The output tells you that SQL Server has already allocated two 8KB pages for this table, even before you've inserted any data.
In SQL Server 2000, the same CREATE TABLE command followed by sp_spaceused returns the results that Figure 2 shows. You have to insert at least one row into the table to make SQL Server allocate the first page:
INSERT INTO largerow DEFAULT VALUES
Now if you run sp_spaceused again, you should see 8KB (one page) allocated for data and 8KB (again, one page) allocated for an index. That index page is the index allocation map (IAM) page, which keeps track of which pages and extents belong to the table. Every table and index in a SQL Server database has at least one IAM page. So, SQL Server has used a total of 16KB of space for the largerow table. At this point, the reserved space is equal to the used space.
Each subsequent insertion into this table will add one new page. If you add seven new rows, checking the reserved and data space values from the sp_spaceused output after each one, as Listing 2 shows, you'll see the space allocation increase by one page for each new row. After you insert the eighth row, sp_spaceused should show 72KB reserved and 64KB for data, plus the single index page, the IAM.
Most of the time, SQL Server allocates space in units called extents, which comprise eight contiguous pages. However, the first eight pages are usually allocated one at a time, as the preceding example illustrated. This single-page allocation is possible because extents come in two varieties. SQL Server allocates a uniform extent as an eight-page unit to one table or index, and it allocates a mixed extent one page at a time. When it uses pages from a mixed extent, SQL Server can allocate all the pages to the same table, one page to each of eight tables, four pages to one table and two pages to each of two other tables, and so on.
To see the effect of allocating an entire extent to a table, insert another row into the largerow table and check the sp_ spaceused results again:
INSERT INTO largerow DEFAULT VALUES EXEC sp_spaceused largerow, @updateusage = true
The reserved space has jumped 64KB, from 72KB to 136KB, and the 56KB (seven pages) that don't yet hold data show up as unused. Inserting one more row doesn't change the reserved space. It just increases the data space and decreases the unused space.
The undocumented command DBCC EXTENTINFO can show you which pages in a table come from mixed extents and which come from uniform extents. DBCC EXTENTINFO takes three parameters: a database ID or name, a table ID or name, and an index ID or name. If you specify -1 for the third parameter, you get back information about all indexes on the table you specified. The command returns one row for each allocation to the table or index. Single-page allocations from mixed extents have an ext_size value of 1, and allocations of uniform extents have an ext_size value of 8. Figure 3 shows the result of running this command on the largerow table. This figure shows only the first few columns of the output you'd receive when running the DBCC EXTENTINFO command. On your system, the file and page numbers might be different, but the pattern of page numbers and the last two columns of output should be similar.
To uniquely identify a database page, SQL Server 2000 and 7.0 need a file_id and page_id, so each page in the output has both of these values. The output in Figure 3 shows eight pages allocated individually and almost consecutively. The only missing page is 92, the IAM page, which DBCC EXTENTINFO doesn't show. The fourth column, ext_size, shows that these pages come from mixed extents. Because extents comprise eight pages, the first page of each extent always has a page number that's a multiple of 8. If you look at the page numbers for the eight individually allocated pages, you can determine that they came from two different extents. Pages 91, 93, 94, and 95 came from the extent that started on page 88, and pages 96, 97, 98, and 99 came from the extent that started on page 96.
Only the last row has an ext_size value of 8, which means that SQL Server allocated a uniform extent, but the pg_alloc column tells you that SQL Server has used only one page of that extent. If you insert one more row into this table and run DBCC EXTENTINFO again, you'll see the same nine rows of output, but the ninth row will show a pg_alloc value of 2. In this example, the uniform extent starts at page 120. You usually refer to extents by the page number of their first page, so the next contiguous extent in this database would be extent 128.
In a busy system, you'll rarely see so many pages from a mixed extent allocated to the same table because many processes frequently need single-page allocations for different tables. To see mixed extents sharing pages across multiple tables, run the code that Listing 3 shows. This script builds two tables similar to the largerow table and alternates inserting a row into each one. In the DBCC EXTENTINFO output for these tables, the page numbers alternate. On my system, one table uses pages 80, 82, 84, and 86 (and four other pages), and the other uses pages 81, 83, 85, and 87 (and four other pages). Together, they use all the individual pages from extent 80.
SQL Server 2000 allocates the first eight pages of a new table or index from mixed extents to avoid wasting space if you have many small tables. If SQL Server allocated a uniform eight-page extent to each of hundreds of small tables that used only part of one page, you'd have seven completely unused—and unusable—pages for each of those tables. To avoid this waste, SQL Server starts by giving tables one page at a time; then, after allocating eight pages, SQL Server realizes that this table might need a lot of space and starts allocating whole extents.
If SQL Server knows up front that it will need to allocate at least eight pages (as when you're rebuilding an index), it allocates only uniform extents to the table or index. For example, when you build a clustered index on a table, SQL Server allocates as much new space as already exists for the entire table. Because it has to read the whole table to build the index, it knows how many pages are in the table, so if the table already has more than eight pages, it allocates only uniform extents for the table. You can see this behavior by running the following commands:
CREATE CLUSTERED INDEX largerow_index ON largerow(a) DBCC EXTENTINFO(tempdb, largerow, -1)
The output should show two rows, representing two uniform extents. The ext_size should be 8 for each row. The first row will have a pg_alloc value of 8, and the second will have a pg_alloc value of 2.
Good News and Bad News
SQL Server's space-management algorithm tries to find a balance between efficient space allocation and avoiding wasted space. Usually, the algorithm works very well. However, in some cases, the overhead of managing the single-page allocations from mixed extents can be too resource-intensive. In my next column, I'll show you how to determine whether your system is experiencing this behavior and give you several suggestions for solving the problem.