7

Ok so I have a table with three columns:

Id, Key, Value

I would like to delete all rows where Value is empty (''). Therefore I wrote the query to select before I delete which was:

Select * from [Imaging.ImageTag] where [Value] = ''

all pretty standard so far...

Now heres the strange part. This query returned two rows shown below with commas seperating columns:

CE7C367C-5C4A-4531-9C8C-8F2A26B1B980,   ObjectType,  
F5B2F8A8-C4A8-4799-8824-E5FFEEDAB887,   Caption,    

Why are these two rows matching on ''?

Extra Info

I am using Sql-Server, The [Value] column is of type NVARCHAR(300) and yes the table name really is [Imaging.ImageTag]

Solomon Rutzky
  • 41,664
  • 6
  • 112
  • 149
user1
  • 15,594
  • 12
  • 96
  • 166
  • 2
    What database platform are you using & what it the exact type of `Value`? – Alex K. Nov 29 '17 at 11:18
  • Can you do something like this: `Select *, CAST([value] as VARBINARY) from [Imaging.ImageTag] i where [Value] = ''` (SQLServer notation..) and show us? I suspect that whatever bytes that make up the emoji are naively being treated as equal to an emtpy string by the compare, perhaps because they start with an ascii nul 0x00.. (perhaps the emoji is being converted to ascii to compare, and teh conversion is reducing it to '') – Caius Jard Nov 29 '17 at 11:21
  • In SQL Server at my default collation `SELECT 1 where '' = N''` returns 1 - so it does match empty string for some reason. – Martin Smith Nov 29 '17 at 11:21
  • ...and are these Emojis actually a _bad_ thing? I wish my queries at work were this colorful ^ ^ – Tim Biegeleisen Nov 29 '17 at 11:22
  • `[Imaging.ImageTag]` looks wrong. Do you really have a table with the name `"Imaging.ImageTag"`? – a_horse_with_no_name Nov 29 '17 at 11:23
  • But at `latin1_general_100_ci_as` collation it doesn't match. So if SQL Server use a modern collation. – Martin Smith Nov 29 '17 at 11:24
  • I have added some extra info to the question to try help – user1 Nov 29 '17 at 11:30
  • @CaiusJard You are close. Emojis are Supplementary Characters, and Supplementary Characters did not have any sort weights defined until the version 90 collations (that started in SQL Server 2005). For more info, please see: [How Many Bytes Per Character in SQL Server: a Completely Complete Guide](https://sqlquantumleap.com/2019/11/22/how-many-bytes-per-character-in-sql-server-a-completely-complete-guide/#utf16_support_and_sc_collations). – Solomon Rutzky Feb 13 '20 at 19:01

4 Answers4

8

This is collation dependant.

Matches empty string

SELECT 1 where N'' = N''  COLLATE latin1_general_ci_as

Doesn't match empty string

SELECT 1 WHERE N'' = N''   COLLATE latin1_general_100_ci_as

The 100 collations are more up-to-date (though still not bleeding edge, they have been available since 2008) and you should use more modern collations unless you have some specific reason not to. The BOL entry for 100 collations specifically calls out

Weighting has been added to previously non-weighted characters that would have compared equally.

Martin Smith
  • 402,107
  • 79
  • 682
  • 775
  • @martin_smith Very interresting. Can you explain a bit about what the collate is actually doing? and if there a way to set this at a database level rather than query level? – user1 Nov 29 '17 at 11:34
  • 1
    The collation sets the string comparison semantics. You can set it at database and column level as well as query level but setting it at database level won't cascade down to existing columns. If this is an existing application I wouldn't change it at database level unless you have good test coverage as you may find you end up getting collation mismatch errors for comparisons. – Martin Smith Nov 29 '17 at 11:37
  • @user1 for more info, please see: [How Many Bytes Per Character in SQL Server: a Completely Complete Guide](https://sqlquantumleap.com/2019/11/22/how-many-bytes-per-character-in-sql-server-a-completely-complete-guide/#utf16_support_and_sc_collations) – Solomon Rutzky Feb 13 '20 at 18:58
2

It's not an answer to your "why", but in terms of your overall goal, perhaps you should alter your strategy for searching for empty values:

Select * from [Imaging.ImageTag] where LEN([Value]) = 0

As per the comments (thanks Martin Smith for providing some copy/pastable emoji):

SELECT CASE WHEN N'' = N'' then 1 else 0 end --returns 1, no good for checking

SELECT LEN(N'') --returns 2, can be used to check for zero length values?
Caius Jard
  • 47,616
  • 4
  • 34
  • 62
0

Google send me here looking for a way filter all rows with an emoji on a varchar column. In case that your looking for something similar:

SELECT mycolumn
FROM mytable
WHERE REGEXP_EXTRACT(mycolumn,'\x{1f600}')  <> ''
--sqlserver WHERE SUBSTRING(MyCol, (PATINDEX( '\x{1f600}', MyCol ))) <> ''

the \x{1f600} is the char code for the searched emoji, you can find the emoji codes here

Pablo
  • 1,876
  • 2
  • 19
  • 36
0

Complementing this answers When you need use 'like' at sql

WHERE
N'' + COLUMNS like N'%'+ @WordSearch +'%' COLLATE latin1_general_100_ci_as