Bug or Feature: REPLACE Puzzle

Davide Mauri, a colleague from SolidQ, brought up an interesting discussion in our company, which originally started in the Italian SQL Server forums. The topic was certain behavior of the T-SQL REPLACE function that some people thought was a bug and others a feature. I will present the example in question here as a puzzle, and before describing the correct answer and the reasoning behind it, I’ll let you try and figure out what you think the correct answer is.

See if you can guess what the output of the following code is going to be before executing it (empty space left intentionally before presenting the answer):

DECLARE @s AS VARCHAR(1000) = '.x.x.x.';

SELECT REPLACE(@s, '.x.', '.y.');

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Books online doesn’t provide much details about the REPLACE function. All it has is just one sentence saying: “Replaces all occurrences of a specified string value with another string value.” With this in mind, it’s interesting that for many people it seems intuitive that the correct answer should be .y.y.y., but in practice, it is, .y.x.y. (read: dot y dot x dot y dot). If you guessed incorrectly, now that you know what the correct answer is, see if you can explain the reasoning behind it.

Given the expression REPLACE(@s, @find, @replacement), the REPLACE function was designed to replace in @s nonoverlapping occurrences of @find with @replacement, scanning the string from left to right. So apply this logic to our example, where @s is initially .x.x.x.. You can express the original string as x, and after the replacement, the result is x. Translating the result to the actual string, you get .y.x.y..

For some this behavior seems like a bug for two main reasons:

1. You’re thinking of overlapping cases.

2. You’re thinking of the all-at-once concept that is supposed to be implemented by SQL, where all calculations are supposed to be handled in an all-at-once manner, as opposed to being handled from left to right.

Regarding point 1, the REPLACE function simply wasn’t designed to deal with overlapping cases, rather nonoverlapping ones.

Regarding point 2, the all-at-once concept is applicable to the set operations that apply to the tables (relations) involved in the query. All expressions that participated in the same logical query processing phase (e.g., the SELECT phase, the WHERE phase, the UPDATE SET phase, etc.), are conceptually evaluated at the same point in time. The concept is not applicable to the logic of a scalar expression operating on an input value expression, which is our case.

So, now that you know what the correct answer is, and that this behavior is in fact a feature and not a bug, how do you change the original expression such that it will handle overlapping cases? That is, can you think of an expression that when applied to the input .x.x.x., requesting to replace all ocurrences of .x. with .y., you will get the result .y.y.y.?

Cheers,

BG

 

Discuss this Blog Entry 5

on Jun 2, 2011
Like this?

DECLARE @s AS VARCHAR(1000) = '.x.x.x.';
SELECT REPLACE(REPLACE(@s, '.x.', '.y.'), '.x.', '.y.');

Assuming this is the expression your referring to, does this provide any performance gain over splitting the expression (other than the use of 1 less variable)?

I like using 2 expressions to make the intent/example more readable, and of course that is purely my personal preference.

Take care!








on May 31, 2011
To me, and this is only my personal opinion, this all makes perfect sense with how REPLACE works.

If you were to try to break the string into separated columns for each value of ".x." you'd only have 2 columns that would have that value. And when viewed this way it seems obvious that REPLACE should only have replaced the first and last values. Example of a table representation would look like:

|.x.|x|.x.|

I know this isn't how the function works; but, it's a way that I've used in the past to figure out why REPLACE has worked for in overlapping instances.

With all of this in mind, I'd solve it via a recursive method. Have the resulting REPLACE assigned to a second variable, then re-run the same REPLACE query on the new result to achieve the final result.

Below is a very crude line by line example:

DECLARE @s AS VARCHAR(1000) = '.x.x.x.';
DECLARE @s2 AS VARCHAR(1000) = REPLACE(@s, '.x.', '.y.');
SELECT REPLACE(@s2, '.x.', '.y.');

Is there a way to see the inner-workings of these types of functions, REPLACE in particular, or is this a closed source issue that prevents it?

Thanks for the excellent explanation as always Itzik!!

















on Jun 1, 2011
Hi jrea8830,

No, the source code for these functions isn't exposed anywhere.

BTW, there is a way to handle the task at hand with one expression...

Cheers,
Itzik






on Jun 26, 2012
For those that think this is a bug, what do you expect as an output from REPLACE('aaabbb','ab','ba')? DECLARE @s AS VARCHAR(8000) = 'aaabbb'; DECLARE @p1 AS VARCHAR(8000) = 'ab'; DECLARE @p2 AS VARCHAR(8000) = 'ba'; WITH t AS ( SELECT 0 AS i, @s AS s UNION ALL SELECT i+1, REPLACE(s,@p1,@p2) FROM t WHERE s <> REPLACE(s,@p1,@p2) ) SELECT i,s FROM t
on Jun 2, 2011
That's one option; another is:
SELECT REPLACE(REPLACE(REPLACE(@s, '.', '..'), '.x.', '.y.'), '..', '.');

As for why use a single expression, it's not just about performance. It's also the ability to embed such expressions inline in a query.

Cheers,
Itzik





Please or Register to post comments.

What's Puzzled By T-SQL Blog?

T-SQL tips and logical puzzles from Itzik Ben-Gan.

Blog Archive

Sponsored Introduction Continue on to (or wait seconds) ×