Use bitwise operators to manipulate two-valued flags
As a SQL Server database programmer or implementer, you sometimes need to accommodate several two-valued entity attributes—or flags—when you create tables. Some examples of two-valued attributes are on/off, true/false, male/female, and married/single. SQL Server supplies the data type bit with which you can represent two-valued attributes—or three-valued, if you count NULL as a value. However, if you have many flags of this nature to store, you can end up with a lot of bit columns in your table and a lot of parameters in your stored procedures or user-defined functions (UDFs) that manipulate them. Furthermore, SQL Server 7.0 doesn't let you create an index on a bit column; this limitation in 7.0, among other reasons that apply to all versions, might mean that you'd choose to store flags in integer columns instead of storing them in bit columns. Fortunately, SQL Server 2000 lets you create indexes on bit columns, so indexing isn't a concern if you use SQL Server 2000.
If you have lots of flag values to store and you don't want so many columns in your tables, you might prefer to store several flag values in one binary or integer column. SQL Server stores integer or binary values (as it stores values of any other data type) internally as a series of bits in which each bit is either turned on (1) or turned off (0). In fact, SQL Server stores many flag values in various columns in system tables. For example, the status column in the sysindexes system table is an integer column that stores information about each index's characteristics in several bits (a bitmap). One bit in the status column specifies whether the index is unique, another bit specifies whether the index is clustered, and so on. If you want to manipulate your own bitmaps (e.g., to determine whether a certain bit is on or off or to update the value of a certain bit), you need to use bitwise operators.
Bitwise operations perform operations on bits. The bitwise operators you use in T-SQL are the same operators that you use in electronics: the binary AND (&), OR (|), and XOR (^) operators and the unary NOT (~) operator. Binary bitwise operators (AND, OR, and XOR) work on (i.e., involve in the calculation) two bits and return a result bit according to the truth table defined for them in SQL Server. A truth table contains all possible combinations of the input bits, and the result of the bitwise operation for each combination. Unary operators—in T-SQL, only the NOT operator—operate on one bit and return a result bit according to the truth table defined for them in SQL Server.
Bitwise Operators' Validity
Before you try to use bitwise operators, you need to know that they don't accept all combinations of data types for operands (arguments that the operator works on). Tables 1 and 2 list the data-type combinations that bitwise operations accept.
For example, suppose you want to determine whether you can perform a bitwise AND operation between two binary values. Table 1 shows an error at the intersection of binary and binary—so, an attempt to perform a bitwise operation between two binary values will result in an error. However, the intersection of int and binary is valid—so a bitwise AND operation between an integer value and a binary value produces a result value.
As surprising as it might seem, the binary bitwise operators (AND, OR, and XOR) don't allow both arguments to be binary data types, and the unary bitwise operator (NOT) doesn't allow its argument to be a binary data type. The fact that you can't perform a bitwise operation between two binary values can be a limitation in some situations. For example, you can't perform a bitwise operation between two values when one value is larger than 8 bytes (which is the largest integer to which a binary value can be converted) without truncating some of the larger value's data. To overcome this problem, you need to convert one of the input binary arguments to a data type that will result in one of the valid argument combinations.
Consider the following example. Suppose you want to perform a bitwise AND operation between two binary values. (Note that the binary values in the following query are represented in hexadecimal base; four binary bits are represented by one hex digit; a pair of hex digits make a byte.) The following statement will fail:
SELECT 0x00000001 & 0x00000001
Consulting the validity tables, you discover that you can't perform a bitwise AND between two binary data types. However, you can perform the AND operation between a binary data type and an integer data type, so you can convert one of the arguments to an integer data type, as follows:
SELECT CAST(0x00000001 AS int) & 0x00000001
Regardless of the arguments' data types, the result of a bitwise operation in SQL Server always returns a value that's an integer data type. In next month's T-SQL Solutions feature article, I'll show you how to use UDFs to solve this problem.
SQL Server doesn't support bitwise operations between two binary values; one of the values has to be an integer data type. The reason for this limitation is probably that the designers included support for bitwise operations so that you can manipulate strings of bits stored in individual integer columns instead of in multiple bit columns. But as soon as Microsoft added support for bitwise operations to SQL Server, programmers wanted to use the operators for purposes that SQL Server wasn't designed for. For example, programmers might want to manipulate bitmaps longer than the largest integer or use the bitwise XOR operator to encrypt binary values using a binary key that's longer than the largest integer value.
Bit Values According to Position
As I mentioned earlier, SQL Server stores integer values as a series of bits (1s or 0s). An integer data type, for example, consumes 4 bytes, which is equivalent to 32 bits. SQL Server uses an internal storage format called twos-complement for integer values. A complete discussion of the twos-complement format is outside the scope of this article. But you should know that when you're dealing with positive values, each of the first 31 bits (starting with the rightmost bit in a string of bits) represent the value 2 to the power of (bit number - 1). The integer value is the sum of all of the values that those 31 bits represent. For example, the binary depiction of the integer value 13 is 00000000000000000000000000001101. The first, third, and fourth bits are turned on, and all the other bits are turned off. If you sum all the values that the first, third, and fourth bits represent, you get 2(1 - 1) + 2(3 - 1) + 2(4 - 1) = 13. When you simply want to store numbers that represent quantities, you won't use this information. However, knowing this detail is useful when you want to manipulate a series of flags stored in a single column or variable. To make your life easier when you need to manipulate a particular bit in an integer data type, see Table 3, which shows the values of the bits at each position for positive integer values.
The bitwise AND operator returns 1 only when both of its arguments are 1; it returns 0 in all other cases. (This operation is determined on a bit-by-bit basis, meaning that the operator determines the result for each bit separately—one bit at a time.) You use bitwise AND to verify whether a particular flag bit in an integer or binary column is 1 or 0 (turned on or off). Such an operation is known as bit masking. Table 4 shows the truth table for the bitwise AND operator.
To understand how the bitwise AND operator helps you determine whether a given bit is turned on, consider the string of bits "10101010" (the number 170). To find out whether the first bit from the right is turned on, you apply the AND operator to it and the number 1, which represents the first bit:
SELECT 170 & 1
Take a look at the following representation of the processing of the bitwise AND operation (I show only the last byte here:
...10101010 — 170 & ...00000001 — 1 ————————— ...00000000 — 0
As I mentioned, the bitwise AND operation is performed on a bit-by-bit basis. Each pair of bits from the two operands (170 and 1) is examined, and a result bit is generated according to the bitwise AND truth table that Table 4 shows. Notice that the result is 0, not 1, so you don't have a match. To find out whether the second bit is turned on, you apply the AND operator to it and the number 2, which represents the second bit:
SELECT 170 & 2
The following representation of the operation's internal processing shows that the result is 2—you have a match:
...10101010 — 170 & ...00000010 — 2 ————————— ...00000010 — 2
You could continue applying AND, using 4 for the third bit, 8 for the fourth bit, and so on.
Suppose you want to provide the index properties that are stored as a string of bits in the sysindexes table. To do so, you can embed bitwise AND operations in a query. First, you need to decipher the status column of each index according to the following bit values:
- 2 (bit 2) = Unique index
- 16 (bit 5) = Clustered index
- 2048 (bit 12) = Index used to enforce PRIMARY KEY constraint
- 4096 (bit 13) = Index used to enforce UNIQUE constraint
You must know the above information to create the query that Listing 1 shows.
To turn off certain flags in an input value, you need to apply the AND operator to the flag and a specially constructed bitmap, also called a bitmask, in which all the bits are turned on except the ones in the bit positions that you want to turn off in the input value. For example, suppose you have an integer column called flags that holds a certain string of flag bits in a table called T1, and regardless of which bits are currently turned on or off, you want to turn off the third and seventeenth bits. Generate a bitmask in which all the bits are turned on except the third and seventeenth bits (in hex notation, such a bitmask would be 0xFFFEFFFB), then apply the AND operator to the bitmask and the column flags. In your T-SQL code, include the following UPDATE statement:
UPDATE T1 SET flags = flags & 0xFFFEFFFB WHERE key_col = <key_value>
After you run this UPDATE statement, the third and seventeenth bits in the column flags in the row in which key_col is equal to <key_value> are set to 0 (turned off).
Bitwise OR (|)
The bitwise OR operator returns 1 if either of its arguments is 1. This operator, too, processes bits one at a time. OR returns 0 only if both arguments are 0. Table 5 shows the bitwise OR truth table.
You generally use the bitwise OR operator to form a string of flag bits that you store in a variable or column. For example, suppose you want to create a bit string with bits 1, 2, 3, and 4 turned on. You can apply bitwise OR operations to the values that the specified bits represent—1, 2, 4, and 8 respectively—as follows:
SELECT 1 | 2 | 4 | 8
So, when you say "give me all the bits in positions 1 or 2 or 3 or 4," you actually get all four bits (1 and 2 and 3 and 4).
The following representation of the OR operation's internal processing shows that you can perform several bitwise OR operations in a single scalar computation:
...00000001 — 1 | ...00000010 — 2 | ...00000100 — 4 | ...00001000 — 8 ————————— ...00001111 — 15
However, T-SQL doesn't include support for an aggregate bitwise OR operation on values stored in different table rows. You can use only the built-in aggregate functions that T-SQL supplies, some of which are the SUM(), MIN(), and MAX() functions. For more information about why you'd want to perform aggregate bitwise OR operations and how to perform them in T-SQL, see the SQL Server Magazine article T-SQL Black Belt, "Auxiliary Tables, Bit by Bit," July 2001, InstantDoc 21079, at http://www.sqlmag.com.
Earlier in the article, I explained how to use the bitwise AND operator with a bitmask to turn off certain flag bits in a bit string. You can use a similar technique with the bitwise OR operator and a bitmask to turn on flag bits. Suppose you want to turn on the third and seventeenth bits of a value. If you use the table T1 with the flags column from the previous example, you can generate a bitmask in which all the bits are turned off except the third and seventeenth bits, and apply the OR operator to the bitmask and the column flags. In hex notation, such a bitmask would be 0x00010004. The UPDATE statement in your T-SQL code would look like this:
UPDATE T1 SET flags = flags | 0x00010004 WHERE key_col = <key_value>
An annoying fact about using bitmasks is that to turn on or off certain flag bits in an existing bit string, you have to express the bitmasks in either hex (for binary data types) or decimal (for integer data types) notation, instead of expressing the bitmasks as strings of bits (base 2), which is more intuitive. One way to translate the more intuitive base 2 representation of bitmasks to decimal or hex notation is to use the Windows Calculator in the scientific view (choose Scientific from the View menu). You can write your value in binary (base 2), oct (base 8), decimal (base 10), or hex (base 16) notation, and the calculator will translate the value to your desired notation. For example, suppose you want to use the Windows Calculator to find out the hex notation of a value that has the third and seventeenth bits turned on. Open the Windows Calculator and make sure it's set to the scientific view. Select the Bin option in the top-left option group to specify that you're going to input a value in a binary notation. Type the bit string 10000000000000100, which has the third and seventeenth bits turned on, then select the Hex option to specify that you want to translate the value to its hex notation. You should get the value 10004, which is the value I used in the above query. In next month's feature article, I'll show you how to write functions that will perform the translation for you.
Your knowledge of bitwise operators can come in handy in many situations. For example, you might want to write a trigger to identify which columns an UPDATE statement has modified. For information about creating such a trigger—and for a puzzle that will test your new knowledge—see the Web sidebar "Identifying Modified Columns Puzzle." (To access the Web sidebar, go to http://www.tsqlsolutions.com and enter 26629 in the InstantDoc ID text box.)
In this article we've examined the basics of bitwise operations and discussed scenarios in which you can use them. Bitwise operations are handy when you want to manipulate bit strings—for example, to determine the state of a certain bit or to set a bit's value. Also, SQL Server sometimes provides information in bit strings (e.g., the return value of the COLUMNS_UPDATED() trigger function) that you have to manipulate by using bitwise operations. In next month's T-SQL Solutions feature article, I'll discuss some more aspects of bitwise operations such as bitwise XOR, bitwise OR, and also provide helper UDFs that can make manipulating bit strings easier.