I have a number of clients that I spend a day or two with each month. I like this style of engagement as I get to know the staff and their systems over a period of time, can see the improvements that we make over time. The staff members also know that if they have issues that aren't desperate, they can save them up for the days that I am onsite. When I arrived at one of these customer sites recently, several of the staff members had grins on their faces, and one told me that Terry (well let's call him Terry anyway) had really broken something.
Terry thought he had a simple problem. He needed to set up a SQL Server Agent job that would periodically copy the contents of a table on a remote server to his local server. He was using a linked server to access the remote table. He thought that he had a simple plan:
He created a schema named [MediaServer] that would be used to contain synonyms for the remote objects:
CREATE SCHEMA MediaServer AUTHORIZATION dbo;
He created a synonym for the table on the remote server:
CREATE SYNONYM MediaServer.MediaEntries
I liked the fact that he had set up a synonym for the remote table. I regularly see linked server names sprinkled endlessly throughout T-SQL code. That isn't great from a maintenance point of view. I really prefer to see a few synonyms set up, so that all the definitions are in a single location. If you ever need to move the database to another linked server, you'll be glad that you did this.
He scripted the table on the remote server and recreated it locally. I've simplified it here but kept the main parts:
CREATE TABLE dbo.MediaEntries
MediaEntryID int CONSTRAINT PK_MediaEntries PRIMARY KEY,
He set up a SQL Server Agent job to periodically execute two commands:
TRUNCATE TABLE dbo.MediaEntries;
INSERT dbo.MediaEntries (MediaEntryID, MediaData)
SELECT MediaEntryID, MediaData
While it seemed to be a simple plan, all the other staff were giving him grief because he had been spending days trying to make it work. His problem was that it almost worked. Anyone who has been in the industry for any length of time knows that systems that almost work can be the stuff of nightmares. It's far easier to fix something that never works. In the 1980's, I maintained large minicomputers and one of my colleagues (Phil) used to say "The more smoke and flames the better. At least you know where the problem is". There is a lot of truth in that.
At least Terry had narrowed down the problem by the time I arrived. What was happening was that the binary data was being truncated in some rows.
The first thing that he had checked was whether a simple truncation was happening. However, while the length of the data in the truncated column varied a great deal, the length of the data wasn't related to which rows were being truncated. Closer inspection showed that the truncation only happened when there were trailing zeroes in the data. Terry was worried that there was a problem with the linked server drivers.
That part was easy to check. Querying the remote data returned all entries as expected:
SELECT MediaEntryID, MediaData
Clearly the remote data was being received locally without an issue. However, querying the local data showed the problem:
Note that in the row for MediaEntryID 2, the MediaData column contains truncated data. The likely outcome was that the data was being lost while being inserted into the local table. But why wasn't this happening when it was inserted into the remote table in the first place?
While the column definitions both locally and remote were identical, what was not identical were the ANSI_PADDING settings at the time the table was created. When the table was scripted using SQL Server Management Studio, the following script was obtained:
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
SET ANSI_PADDING ON
CREATE TABLE [dbo].[MediaEntries]
[MediaEntryID] [int] NOT NULL,
[MediaData] [varbinary](4000) NULL,
CONSTRAINT [PK_MediaEntries] PRIMARY KEY CLUSTERED
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
SET ANSI_PADDING OFF
Note all the SET options that were scripted along with the table. The tricky one here is ANSI_PADDING. Had Terry run the entire script, the table would have been fine. However, what he had done was to highlight just the CREATE TABLE statement and execute that. Even that would have been ok if at the time that he created the table, the ANSI_PADDING option had not still been OFF from running a previous script in the same session.
ANSI_PADDING affects how certain columns work. The documentation for this option can be found here:
This article includes an important note: "We recommend that ANSI_PADDING always be set to ON."
Unfortunately, many scripts that are generated by SQL Server Management Studio include a statement to turn this option off, as you can see in the script above. ANSI_PADDING affects different column data types in different ways but the important issue for Terry was the note for varchar(n) or varbinary(n) which says that when the option is OFF: "Trailing blanks in character values inserted into a varchar column are trimmed. Trailing zeros in binary values inserted into a varbinary column are trimmed."
And there was Terry's issue. Had the data type been varbinary(max), he would not have seen the problem.
As a final note, I should mention that even though the ANSI_PADDING setting for an individual column does not appear beside the column name in the script for a table, you can easily check whether or not you have any columns that were created with ANSI_PADDING off:
SELECT t.name AS TableName
, c.name AS ColumnName
, typ.name AS DataType
FROM sys.columns AS c
INNER JOIN sys.tables AS t ON c.object_id = t.object_id
INNER JOIN sys.types AS typ ON c.system_type_id = typ.system_type_id
AND c.user_type_id = typ.user_type_id
WHERE t.type = N'U'
AND c.is_ansi_padded = 0;