In previous posts within this series, we’ve already looked at how you can’t really prevent corruption. Instead, in order to be able to better deal with corruption, you need to be able to detect it early. To that end, there are actually a number of different ways to enable early (or early-ish) detection of corruption when it happens. And, in this post, we’ll look at the use of CHECKSUM Page Verification as the first of these methods.
Enabling CHECKSUM for Page Verification
When configured correctly, SQL Server can instantly spot when data that it is pulling off of disks was saved improperly – or corrupted. To enable this functionality, you just need to ensure that desired databases are configured to use the CHECKSUM option for page verification – which you can set on SQL Server 2005 and above.
To do this, simply query your server as follows, and review the names of all databases returned – as they are not configured for CHECKSUM verification:
FROM master .sys.databases
WHERE page_verify_option_desc != 'CHECKSUM'
Then, to enable (or toggle) these database to use CHECKSUM verification, execute the following statement per database that you wish to ‘convert’ to using CHECKSUM verification:
SET PAGE_VERIFY CHECKSUM
Obviously, if you’ve got a lot of databases that need to be ‘toggled’, you can wire up a simple cursor, or use a poor man’s script like the following:
SQLKeyword">Which you can then execute – and then copy/paste the output from that query into another (or the same) query window to execute the results – as this will give you a set of ALTER statements per database as needed.
CHECKSUM – Under the Covers Once PAGE_VERIFICATION is set to CHECKSUM, SQL Server will transition to a new
CHECKSUM Calculates a checksum over the contents of the whole page and stores the value in the page header when a page is written to disk. When the page is read from disk, the checksum is recomputed and compared to the checksum value stored in the page header. If the values do not match, error message 824 (indicating a checksum failure) is reported to both the SQL Server error log and the Windows event log. A checksum failure indicates an I/O path problem. To determine the root cause requires investigation of the hardware, firmware drivers, BIOS, filter drivers (such as virus software), and other I/O path components.
Of course, WHEN you switch a database over to using CHECKSUM there is NOT some magical process or operation that goes through and ‘writes’ CHECKSUMs into your EXISTING pages. If there were, toggling this option could obviously cause some ugly problems. But, instead, since this is effectively a ‘meta-data’ change that instructs SQL Server to start using a new behavior (i.e., CHECKSUM functionality for writes and reads of previously CHECKSUM’d pages), you can safely toggle this change at pretty much any time without fear of bogging down your systems.
Of course, since this, in turn, means that you may have gobs of existing/older data that’s been laying around without CHECKSUM configured, that data (or the data on those pages) will NOT end up getting CHECKSUM values until that data is finally pulled into SQL Server and then re-written. Consequently, if you want to ‘force’ CHECKSUMs into your pages, you’ll need to do full-blown index rebuilds or other modifications that force all data to be pulled in and then rewritten. Happily, though, CHECKSUM is already on by default in new SQL Server 2005 and beyond databases. As such, the worries/concerns listed in this post about ‘converting’ a database to use CHECKSUM verification only apply to databases that were intentionally set to use a PAGE_VERIFICATION option other than CHECKSUM (which is possible but makes no sense) or for SQL Server2000 and below databases that have been migrated up to SQL Server 2005 and above – because moving older databases to new hosts/servers does NOT result in any implicit changes to PAGE_VERIFICATION settings.
The Benefits of using CHECKSUM for PAGE_VERIFICATION
As Books Online states, with CHECKSUM verification in place, SQL Server will raise error message 824 (checksum failure) whenever it now detects CHECKSUM problems. Granted, that doesn’t mean that you’ll catch corruption immediately as it happens (since CHECKSUMs are re-calculated for this error when data is read back in – instead of when it is written as such an operation would have enormous IO consequences) – but you will be notified of it as soon as it happens. Provided, of course, that you’ve got some sort of mechanism in place to alert you to instances where error message 824 is raised – which we’ll address in our next post.
Part V: Storage Problem Alerts