Congratulations to Chad Boyd, who won first prize of $100 for the best solution to the October Reader Challenge, "Enforcing Uniqueness Without the Unique Constraint." Chad graciously donated his prize to the Make-a-Wish Foundation (http://www.wish.org). Here’s a recap of the problem and the solution to the October Reader Challenge.

Paul is a database architect in a company that provides Web-based message services. He must design a schema to store and retrieve forum messages from the database, and he plans to use SQL Server 2000 as the database server. Among the tables that contain the messages is a table that has a unique-per-message identifier column. The table's schema is shown in the following code:

CREATE TABLE messages (
msg_id INT NOT NULL PRIMARY KEY,
msg_hdr VARCHAR(1024) NOT NULL
)

Paul wants to enforce uniqueness on the msg_hdr column and tries to define a unique constraint on the column by using the following script:

ALTER TABLE messages ADD CONSTRAINT uq_messages_id UNIQUE(msg_hdr)

The ALTER TABLE statement produces the following warning message:

Warning! The maximum key length is 900 bytes. The index 'uq_messages_id' has maximum length of 1024 bytes. For some combination of large values, the insert/update operation will fail.

Using generated sample data for the table, Paul performs tests that reveal that the msg_hdr value might exceed 900 bytes, so he can't use the unique constraint approach. (A unique index in SQL Server enforces a unique constraint, and, as the warning message says, index keys are restricted to a maximum of 900 bytes.)

Help Paul to efficiently enforce uniqueness on the msg_hdr column in the Messages table without the unique constraint.

Solution:


Paul can enforce the unique constraint by using a trigger with the following implementation

CREATE TRIGGER check_uq_msg_hdr
ON messages
AFTER INSERT, UPDATE
AS
BEGIN
   IF EXISTS(SELECT 1
             FROM inserted AS i
             JOIN messages AS m
             ON m.msg_hdr = i.msg_hdr
             GROUP BY m.msg_hdr
             HAVING COUNT(m.msg_hdr) > 1)
    BEGIN
        RAISERROR('Unique key violation on \[messages\].\[msg_hdr\] column!', 16, 1)
        ROLLBACK
    END
END
GO

-- DML statements to check the trigger logic:
-- Populate sample data using pubs.dbo.pub_info table:
INSERT INTO MESSAGES (msg_id, msg_hdr)
SELECT 1, substring(pr_info, 1, 1024)
FROM pubs.dbo.pub_info
WHERE pub_id = '0736'
INSERT INTO MESSAGES (msg_id, msg_hdr)
SELECT 1, substring(pr_info, 1, 1024)
FROM pubs.dbo.pub_info
WHERE pub_id = '1756'
INSERT INTO MESSAGES (msg_id, msg_hdr)
SELECT 1, substring(pr_info, 1, 1024)
FROM pubs.dbo.pub_info
WHERE pub_id = '9999'
GO

-- Insert duplicate rows:
DECLARE @v VARCHAR(1024)
SET @v = REPLICATE(CAST(NEWID() AS VARCHAR(36)), 20)
INSERT INTO messages (msg_id, msg_hdr) VALUES( 4, @v )
INSERT INTO messages (msg_id, msg_hdr) VALUES( 5, @v )
GO
-- Update existing row with same “msg_hdr” value as another:
UPDATE messages
SET msg_hdr = (SELECT msg_hdr FROM messages WHERE msg_id = 2)
WHERE msg_id = 3
GO

-- Check the data in the table:
SELECT msg_id, msg_hdr
FROM messages
GO

However, the trigger's query showplan output shows that the trigger logic is inefficient because the violation-check query requires a scan for every table row that's affected by insert and update operations. Thus, SQL Server is scanning the entire Messages table for each row, as you can see in the following statistics profile output:

-- Partial statisics profile output of the query in the trigger:
  |--COMPUTE Scalar(DEFINE:(\[Expr1005\]=CASE WHEN \[Expr1006\] THEN (1) ELSE (0) END))
       |--Nested Loops(Left Semi Join, DEFINE:(\[Expr1006\] = \[PROBE VALUE\]))
            |--Constant Scan
            |--Filter(WHERE:(\[Expr1003\]>(1)))
                 |--COMPUTE Scalar(DEFINE:(\[Expr1003\]=CONVERT_IMPLICIT(int,\[Expr1011\],0)))
                      |--Stream Aggregate(DEFINE:(\[Expr1011\]=Count(*)))
                           |--Nested Loops(Inner Join, WHERE:(\[MyDB\].\[dbo\].\[messages\].\[msg_hdr\] AS \[i\].\[msg_hdr\]=\[MyDB\].\[dbo\].\[messages\].\[msg_hdr\] AS \[m\].\[msg_hdr\]))
                                |--INSERTED Scan(OBJECT:(\[MyDB\].\[dbo\].\[messages\].\[PK__messages__2334397B\] AS \[i\]))
                                |--CLUSTERED INDEX Scan(OBJECT:(\[MyDB\].\[dbo\].\[messages\].\[PK__messages__2334397B\] AS \[m\]))

Paul can optimize the search by adding a column, msg_hdr_hash, which computes a hash value for the msg_hdr column and then indexing the msg_hdr_hash column. Paul can compute the hash value by using the CHECKSUM/BINARY_CHECKSUM built-in. The following code adds the computed column and index:

ALTER TABLE messages
ADD msg_hdr_hash AS ISNULL CHECKSUM(msg_hdr), 0)
GO
CREATE INDEX idx_messages_hdr_hash ON messages(msg_hdr_hash)
GO

Now, Paul can modify the trigger query that checks for the uniqueness violation so that the query uses the computed column. This change will make the search faster and more efficient because matching happens primarily on the msg_hdr_hash integer column. The modified trigger code looks like this:

ALTER TRIGGER check_uq_msg_hdr
ON messages
AFTER INSERT, UPDATE
AS
BEGIN
    IF EXISTS(SELECT 1
              FROM INSERTED AS i
              JOIN messages AS m
              ON m.msg_hdr_hash = CHECKSUM(i.msg_hdr)
              AND m.msg_hdr = i.msg_hdr
              GROUP BY m.msg_hdr
              HAVING COUNT(m.msg_hdr) > 1)
    BEGIN
        RAISERROR('Unique key violation on \[messages\].\[msg_hdr\] column!', 16, 1)
        ROLLBACK
    END
END
GO

Paul can now observe the showplan output of the query in the trigger and verify that SQL Server uses the index on the computed column. The statistics profile output based on the modified trigger code looks like this:

|--COMPUTE Scalar(DEFINE:(\[Expr1005\]=CASE WHEN \[Expr1006\] THEN (1) E (0) END))
   |--Nested Loops(Left Semi Join, DEFINE:(\[Expr1006\] = \[PROBE VALUE\]))
      |--Constant Scan
         |--Filter(WHERE:(\[Expr1003\]>(1)))
            |--COMPUTE Scalar(DEFINE:(\[Expr1003\]=CONVERT_IMPLICIT(int,\[Expr1014\],0)))
               |--Stream Aggregate(DEFINE:(\[Expr1014\]=Count(*)))
                  |--Filter(WHERE:(\[MyDB\].\[dbo\].\[messages\].\[msg_hdr\] AS \[m\].\[msg_hdr\]=\[MyDB\].\[dbo\].\[messages\].\[msg_hdr\] AS \[i\].\[msg_hdr\]))
                                |--Nested Loops(INNER JOIN, OUTER REFERENCES:(\[m\].\[msg_id\]))
                                     |--Nested Loops(INNER JOIN, OUTER REFERENCES:(\[Expr1009\]))
                                     |    |--COMPUTE Scalar(DEFINE:(\[Expr1009\]=CHECKSUM(\[MyDB\].\[dbo\].\[messages\].\[msg_hdr\] AS \[i\].\[msg_hdr\])))
                                     |    |    |--INSERTED Scan(OBJECT:(\[MyDB\].\[dbo\].\[messages\].\[PK__messages__2334397B\] AS \[i\]))
                                     |    |--INDEX SEEK(OBJECT:(\[MyDB\].\[dbo\].\[messages\].\[idx_messages_hdr_hash\] AS \[m\]), SEEK:(\[m\].\[msg_hdr_hash\]=\[Expr1009\]) ORDERED FORWARD)
                                     |--CLUSTERED INDEX SEEK(OBJECT:(\[MyDB\].\[dbo\].\[messages\].\[PK__messages__2334397B\] AS \[m\]), SEEK:(\[m\].\[msg_id\]=\[MyDB\].\[dbo\].\[messages\].\[msg_id\] AS \[m\].\[msg_id\]) LOOKUP ORDERED FORWARD)

NOVEMBER READER CHALLENGE:


Now, test your SQL Server savvy in the November Reader Challenge, " Parse and Insert Data from Uploaded Image Files" (below). Submit your solution in an email message to challenge@sqlmag.com by October 19. Umachandar Jayachandran, a SQL Server Magazine technical editor, will evaluate the responses. We’ll announce the winner in an upcoming SQL Server Magazine UPDATE. The first-place winner will receive $100, and the second-place winner will receive $50.

Problem:


Denny is a database developer who needs to develop a Web portal for intranet use. This Web portal will use a SQL Server 2000 database as the data store. The portal has functionality that lets users upload binary data, such as image files and Microsoft Office Outlook messages, to the database. The uploaded files are stored in a table called ItemChunks with the following schema:

CREATE TABLE ItemChunks (
 ItemChunkId INT NOT NULL PRIMARY KEY,
 ItemChunk image NOT NULL
)
GO

The Web portal page that lets users upload binary data has the following characteristics:

  1. The page sends the binary data as a single BLOB to SQL Server.
  2. One or more binary data files can be uploaded from the page in one transaction.
  3. The file sizes can range from 1K to 20K.

Denny is designing a stored procedure that can insert the data from the BLOB into the ItemChunks table. The skeleton stored procedure looks like the following code. For the sake of simplicity, assume that the data offsets and length of each binary data file contained in the parameter value are known.

CREATE PROCEDURE InsertItemChunks (@data image)
AS
BEGIN
    DECLARE @data_offsets
        TABLE (item INT NOT NULL, offset INT NOT NULL, length INT NOT NULL)
    INSERT INTO @data_offsets (item, offset, length) values(1, 1, 200)
    INSERT INTO @data_offsets (item, offset, length) values(2, 201, 10000)
    INSERT INTO @data_offsets (item, offset, length) values(3, 10201, 30)
END

Help Denny write the stored procedure logic to parse and insert the data from the image value into the ItemChunks table. Denny can use the item column in the @data_offsets table as the primary key value for the ItemChunks table.

Denny can test the stored procedure by using the following code. (The batch generates a binary data value of 10,230 bytes with three chunks each containing 200 bytes, 10,000 bytes, and 30 bytes of data respectively.)

DECLARE @chunk1 VARCHAR(400)
DECLARE @chunk2_1 VARCHAR(8000)
DECLARE @chunk2_2 VARCHAR(8000)
DECLARE @chunk2_3 VARCHAR(4000)
DECLARE @chunk3 VARCHAR(60)
SET @chunk1 = replicate('af', 200)
SET @chunk2_1 = replicate('be', 4000)
SET @chunk2_2 = replicate('be', 4000)
SET @chunk2_3 = replicate('cd', 2000)
SET @chunk3 = replicate('dc', 30)

EXEC
('EXEC InsertItemChunks 0x' + @chunk1 + @chunk2_1 + @chunk2_2 + @chunk2_3 + @chunk3)

SELECT *, datalength(ItemChunk)
FROM ItemChunks