Rebalancing data across files in a filegroup

Question: I’ve done some performance testing and determined that I need to add a couple more files to one of my filegroups (currently two files) to get better performance. The two existing files are very full so I’d like to add two more files and then have SQL Server rebalance the data across all four files. How can I do that?

Answer: Unfortunately there is no easy way to rebalance data across new files in a filegroup.

As some background for readers, it has long been known that for some workloads you can get a performance increase by adding multiple data files to a filegroup. It’s not guaranteed though – you have to perform some relevant testing to make sure you’ll see a performance boost in production. I’ve conducted tests on a workload using SSDs and spinning disks and found that a performance boost was possible in both cases, for my workload.

Blog posts: Benchmarking: Do multiple data files make a difference? and Benchmarking: Multiple data files on SSDs (plus the latest Fusioin-io driver)

SQL Server’s allocation system uses a round-robin mechanism where allocations are spread over all the data files in a filegroup – an allocation from file 1, then an allocation from file 2, and so on, then back to file 1 again. Combined with that, it also uses an algorithm called proportional fill that aims to allocate data from the files according to how much free space they have relative to the other files in the filegroup.

The basic premise of proportional fill is that each file has a weighting, where files with less free space will have a higher weighting, and files with lots of free space will have a low weighting. The files with the lower weightings will be allocated from more frequently – i.e. those files with more free space will be allocated from the most.

This means that if you add a new data file to a filegroup that has mostly full data files, the proportional fill weightings will be such that the new file will be the one where allocations come from until it fills up to the same level as the older files. The new file in essence becomes an allocation hot spot.

But is there a way to prevent this happening or work around it? Unfortunately not. The proportional fill algorithm cannot be circumvented, and the weightings cannot be changed. Rebuilding any indexes using the same filegroup won’t work either, as the space allocated to build the new index will mostly come from the new file.

If you’d like to rebalance over a certain number of files, the easiest method is to create a new filegroup with the desired number of files, then rebuild indexes using CREATE INDEX … WITH (DROP_EXISTING=ON) plus specifying the new filegroup, and then drop the old filegroup. That unfortunately means you need to provision even more space (easier said than done) but it will do the trick. Note that this method won’t work if you have any LOB data, as that cannot be moved without creating a new table and exporting the data into it.

Discuss this Blog Entry 3

on Aug 5, 2011
I had some mild success with this by doing a reindex without the drop_existing. I recently had to add files to existing filegroups on a vendor app as part of an upgrade. But no data would go to the new files. I reindexed and like I said, had some mild success. But it seems like the root is the free space weighting. If I had file_1 at 35GB and 10GB free and added file_2 at 5MB with 5MB free, file_1 still has the most free space and lower wieght and will get filled first. So it seems your new files need to be created with the same amount of free space and the other members of the group, evening out the weighting.
on Aug 4, 2011
So if I have a vendor app that wants me to stay in the same (primary) filegroup... can i do the following to rebalance? 1) create a new file as large or larger than current file (like tempfile.ndf, logical name temp) 2) dbcc shrinkfile w/ emptyfile parameter on original file (testdb.mdf, logical name testdata) 3) make new files the size I want in primary (testdb2.ndf , testdb3.ndf etc) 4) make testdb.mdf same size as files created in step 3 5) dbcc shrinkfile w/ emptyfile on tempfile.ndf 6) remove tempfile.ndf from filegroup ? I thought this might allow me to re-balance across the new files since I'd empty the original into another temp file and then empty it back into the other files. I know this would require a significant amount of space, (let alone time) but if it is available is there some caveat to prevent this? -redi311
on Aug 5, 2011
I'm pretty sure I tried this method recently - it sounded like it'd work to me too! However, it appears you can't "emptyfile" the PRIMARY filegroup, so whilst some of the data moves over to your second temporary filegroup, not all of it does. So then you've got the same problem on the way back of having one file significantly larger than the others, and data's not striped across all files nicely. Unless any else has some bright ideas (and I'd seriously appreciate a magic solution to this), all I can see is that you need to generate a new schema for your whole database on another server (or the same server if there's space) and use DTS/SSIS to migrate all the data, table by table, to the new schema which has 4 files in the primary filegroup instead of 2 (or however many you're dealing with). There is the added benefit that when you're doing this, if you're on the right version/edition of SQL, you can create your new schema with some wisely chosen PAGE or ROW level compression. Then as the data comes in, it's stored compressed which will improve your storage utilisation and performance as well as the multiple-data-files benefit. The difficulty here is working out how to import an entire set of tables to a new database as quickly as possible by doing things like turning clustered tables to heaps and disabling triggers during import, then putting these things back again afterwards.

Please or Register to post comments.

What's SQL Server Questions Answered?

Practical tips and answers to many of your questions about SQL Server including database management and performance issues.

Contributors

Paul S. Randal

Paul Randal worked on Microsoft's SQL Server team for nine years in development and management roles, writing many of the DBCC commands. Randal was ultimately responsible for SQL Server 2008'...

Kimberly L. Tripp

Kimberly L. Tripp has been working with SQL Server since 1990, and she’s worked as a consultant, trainer, speaker, and writer specializing in core SQL Server performance tuning and availability...
Blog Archive

Sponsored Introduction Continue on to (or wait seconds) ×