Not All Data Compression Is Created Equal

While we like to say that storage is cheap, but the reality is that cheap is relative.  If you don’t have budget for more storage, suddenly storage isn’t very cheap.  Because of the absolutely massive amount of CPU power that we can pack into a server for relatively little money these days (again this is relative, but faster servers are much cheaper than SAN storage upgrades), storage compression is coming more and more the reality for more and more DBAs.

Related: Compression in SQL Server 2008

One big question is should I select ROW or PAGE level compression for my database tables.  While I can’t answer that question for you in a simple blog post, as the best way to figure that out is to try one then the other on your dev, test, and QA systems, I can give you a very helpful piece of advice.  When considering data compression one very key thing to remember is that out of row data (LOB data) isn’t compressed. If the LOB data is stored in row, then it will be compressed, but only when you compress the table with PAGE level compression.  If however you use ROW level compression then the LOB data will never be compressed, no matter if it is stored in row or if it is stored out of row.

We can see this if we do a little work with DBCC PAGE to look into the physical pages on the disk.  First we create a table using CREATE TABLE which isn’t compressed and insert a row into the table.

CREATE TABLE dbo.CompressTest1
(ID INT PRIMARY KEY,
StringValue VARCHAR(MAX))
GO
INSERT INTO CompressTest1
SELECT 1, 'Hello World!'
GO 1000

From where we look into a row on the first page and see that the data isn’t compressed.

Slot 4 Offset 0x0 LENGTH 0 LENGTH (physical) 0

KeyHashValue = (59855d342c69)      
Slot 5 Offset 0x123 LENGTH 39

Record TYPE = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record SIZE = 39                    
Memory Dump @0x0000000017B0A123

0000000000000000:   30000800 06000000 0200c801 00270048 0065006c †0.........È..'.H.e.l
0000000000000014:   006c006f 00200057 006f0072 006c0064 002100††††.l.o. .W.o.r.l.d.!.

Now we can use the ALTER TABLE statement to compress the table, and then look at the contents of the page again (don’t forget that the page number will change when you rebuild the table to a different compression level). First we will use ROW compression. If we scroll down the output from the page the rows all appear a little different but we can clearly see that no compression is being used as we can read the output of the row (I’ve included the output from a couple of the rows below).

Slot 11 Offset 0x1d6 LENGTH 34

Record TYPE = (COMPRESSED) PRIMARY_RECORD                                Record attributes =  LONG DATA REGION
Record SIZE = 34                    
CD Array

CD array entry = COLUMN 1 (cluster 0, CD array offset 0): 0x03 (TWO_BYTE_SHORT)
CD array entry = COLUMN 2 (cluster 0, CD array offset 0): 0x0a (LONG)    
Record Memory Dump

000000001192A1D6:   2102a380 f0010100 18004800 65006c00 6c006f00 †!.£.ð.....H.e.l.l.o.
000000001192A1EA:   20005700 6f007200 6c006400 2100††††††††††††††† .W.o.r.l.d.!.

Record TYPE = (COMPRESSED) PRIMARY_RECORD                                Record attributes =  LONG DATA REGION
Record SIZE = 34                    
CD Array

CD array entry = COLUMN 1 (cluster 0, CD array offset 0): 0x03 (TWO_BYTE_SHORT)
CD array entry = COLUMN 2 (cluster 0, CD array offset 0): 0x0a (LONG)    
Record Memory Dump

000000001192A2C4:   2102a380 f7010100 18004800 65006c00 6c006f00 †!.£.÷.....H.e.l.l.o.
000000001192A2D8:   20005700 6f007200 6c006400 2100††††††††††††††† .W.o.r.l.d.!.

Slot 27 Offset 0x3f6 LENGTH 34

Record TYPE = (COMPRESSED) PRIMARY_RECORD                                Record attributes =  LONG DATA REGION
Record SIZE = 34                    
CD Array

CD array entry = COLUMN 1 (cluster 0, CD array offset 0): 0x03 (TWO_BYTE_SHORT)
CD array entry = COLUMN 2 (cluster 0, CD array offset 0): 0x0a (LONG)    
Record Memory Dump

000000001192A3F6:   2102a381 00010100 18004800 65006c00 6c006f00 †!.£.......H.e.l.l.o.
000000001192A40A:   20005700 6f007200 6c006400 2100††††††††††††††† .W.o.r.l.d.!.

Now we rebuild the table again using PAGE compression and look at the contents of the page again.  The first row of the page looks pretty normal.

AnchorRecord @0x000000001972A065

Record TYPE = (COMPRESSED) PRIMARY_RECORD                                Record attributes =  LONG DATA REGION
Record SIZE = 32                    
CD Array

CD array entry = COLUMN 1 (cluster 0, CD array offset 0): 0x00 (NULL)    
CD array entry = COLUMN 2 (cluster 0, CD array offset 0): 0x0a (LONG)    
Record Memory Dump

000000001972A065:   2102a001 01001800 48006500 6c006c00 6f002000 †!. .....H.e.l.l.o. .
000000001972A079:   57006f00 72006c00 64002100 †††††††††††††††††††W.o.r.l.d.!.

But as we move down through the page everything else looks a little "different."

Slot 0 Offset 0x85 LENGTH 9

Record TYPE = (COMPRESSED) PRIMARY_RECORD                                Record SIZE = 9

CD Array

CD array entry = COLUMN 1 (cluster 0, CD array offset 0): 0x02 (ONE_BYTE_SHORT)
CD array entry = COLUMN 2 (cluster 0, CD array offset 0): 0x01 (EMPTY)  
Record Memory Dump

000000001972A085:   01021281 20002800 31††††††††††††††††††††††††††.... .(.1

Here’s another row from the page.

Slot 3 Offset 0xa0 LENGTH 9

Record TYPE = (COMPRESSED) PRIMARY_RECORD                                Record SIZE = 9

CD Array

CD array entry = COLUMN 1 (cluster 0, CD array offset 0): 0x02 (ONE_BYTE_SHORT)
CD array entry = COLUMN 2 (cluster 0, CD array offset 0): 0x01 (EMPTY)  
Record Memory Dump

000000001972A0A0:   01021284 0049004e 00††††††††††††††††††††††††††....I.N.

And here’s a third row from the same page.

Slot 5 Offset 0x0 LENGTH 0 LENGTH (physical) 0

KeyHashValue = (74a6cc4021ee)      
Slot 6 Offset 0xbb LENGTH 9

Record TYPE = (COMPRESSED) PRIMARY_RECORD                                Record SIZE = 9

CD Array

CD array entry = COLUMN 1 (cluster 0, CD array offset 0): 0x02 (ONE_BYTE_SHORT)
CD array entry = COLUMN 2 (cluster 0, CD array offset 0): 0x01 (EMPTY)  
Record Memory Dump

000000001972A0BB:   01021287 20002000 20††††††††††††††††††††††††††... . .

As we can see the rows are clearly compressed.  Now let’s overflow the values so that the values are stored off page.  We’ll leave the table with the same PAGE compression we’ll just truncate it and populate it with a large string.  The string that I’m using is simply “Hello World! - “ repeated a few thousand times.

After the script has run and we look at the normal page we can see that the row is in fact compressed (I’m included some header information here that I didn’t include in the captures above).

Slot 0 Offset 0x60 LENGTH 57

Record TYPE = (COMPRESSED) PRIMARY_RECORD                                Record attributes =  LONG DATA REGION
Record SIZE = 57                    
CD Array

CD array entry = COLUMN 1 (cluster 0, CD array offset 0): 0x02 (ONE_BYTE_SHORT)
CD array entry = COLUMN 2 (cluster 0, CD array offset 0): 0x0a (LONG)    
Record Memory Dump

0000000017B0A060:   2102a281 01010030 80040000 11010000 005f6500 †!.¢....0........._e.
0000000017B0A074:   00681f00 00230100 00010000 00d03e00 00240100 †.h...#.......Ð>..$..
0000000017B0A088:   00010000 00f84c00 00210100 00010000 00††††††††.....øL..!.......

When we look in the LOB pages for that row (pages 291, 292 and 289 according to DBCC PAGE on my system) we see that indeed the information stored on the LOB page is not compressed. 

BLOB ROW at: Page (1:291) Slot 0 LENGTH: 8054 TYPE: 3 (DATA)

BLOB Id:1700724736

000000001972A06E:  00650048  006c006c  0020006f  006f0057 H.e.l.l.o. .W.o.
000000001972A07E:  006c0072  00210064  002d0020  00480020 r.l.d.!. .-. .H.
000000001972A08E:  006c0065  006f006c  00570020  0072006f e.l.l.o. .W.o.r.
000000001972A09E:  0064006c  00200021  0020002d  00650048 l.d.!. .-. .H.e.
000000001972A0AE:  006c006c  0020006f  006f0057  006c0072 l.l.o. .W.o.r.l.
... Snipped FOR SPACE
000000001972BF8E:  0064006c  00200021  0020002d  00650048 l.d.!. .-. .H.e.
000000001972BF9E:  006c006c  0020006f  006f0057  006c0072 l.l.o. .W.o.r.l.
000000001972BFAE:  00210064  002d0020  00480020  006c0065 d.!. .-. .H.e.l.
000000001972BFBE:  006f006c  00570020  0072006f  0064006c l.o. .W.o.r.l.d.
000000001972BFCE:  00200021  0020002d                     !. .-. .

You can also look at the number of pages which are returned by DBCC IND (which is the command that you use to see what pages are used by a specific table) to see this work in action.  When my table isn’t compressed the table contains 8 pages (6 are in row data pages).  When the table is ROW compressed the table is 7 pages (6 are in row data pages).  When the table is PAGE compressed the table is 4 pages (2 are in row data pages).

Another item of importance to note about page level compression is that the page must be full before the page is compressed.  This really makes sense as you wouldn’t want to be compressing the data in the page over and over again every time it is written to.

Hopefully through reading this post I’ve shown you the power of compression as well as some potential sticking points if you use LOB values in the tables which you plan on compressing data.  Data compression can save you real money on storage when used correctly.  And in today’s economy saving real money is a very good thing.

Please or Register to post comments.

What's Troubleshooting SQL Server Storage Problems?

Practical advice, insight, and help for core SQL Server considerations.

Contributors

Denny Cherry

Denny Cherry is the owner and principal consultant for Denny Cherry & Associates Consulting and has over a decade of experience working with platforms such as Microsoft SQL Server, Hyper-V,...
Blog Archive

Sponsored Introduction Continue on to (or wait seconds) ×