MySQL equivalance comparison with varchars and literals -- not case sensitive with auto-trimming

Question

I was doing some query debugging and have found that I was getting unexpected (though apparently correct) TRUEs for when comparing a varchar field to a literal. Specifically the following:

the row in question is just an auto-increment int primary key, and a varchar(255)
to set up add a single row: insert into comp_test(test_string) values('TestString');
where test_string='tESTsTRING' clause is true
where test_string='TestString ' clause is true (it's padded with blank space at the end)

So while constructing my questions I can across a similar post post describing the cause and how to force case sensitivity (with BINARY and COLLATE among others). Will the BINARY and COLLATE solutions also cause blank padding to make the clause false?

I now have part of the solution, but can anyone explain why equivalance comparison is so sloppy? In the case above if the value in test_string is an 8 character string, there are ~64,000 literals that will cause the comparison to evaluate to true. What kind of equivalance is that? It seems wrong and nearly all other languages would never permit anything but 1 to 1 equivalance.

Thanks in advance.

It's actually not at all that sloppy. It's very well defined. That's why the collation is important. http://dev.mysql.com/doc/refman/5.0/en/charset-collation-expressions.html — Cylindric, May 24 '12 at 15:05
Yes, basically SQL string comparisons are designed to pull out text, so takes account of things like accent-sort-order and case-sensitivity etc. Programming languages tend to be more literal, although this isn't always the case. — Cylindric, May 24 '12 at 15:16

O. Jones · Accepted Answer · 2012-05-24T16:16:12.527

Notwithstanding the behavior of older languages like C and FORTRAN, and older DMBS systems like Oracle, MySQL's built-in string collation system allows the end user to specify language-specific collation rules. (So, by the way, does the string handling in systems like Java and DotNet.)

This is a really cool feature. It lets you ORDER BY the appropriate alphabetizing (===collating) rules for lots of different languages.

You can issue this search clause to get the kind of match you want.

WHERE BINARY test_string = 'TestString '

or

WHERE test_string = 'TestString ' COLLATE utf8_bin

or

WHERE test_string = 'TestString ' COLLATE utf8_swedish_ci

if your data happen to be in Swedish and stored in the UTF8 character set.

See http://dev.mysql.com/doc/refman/5.5/en/charset-collate.html

But you need to be careful with this. If you ask for a collation in a WHERE clause that doesn't match the collation in the table, your SQL may run inefficiently.

It's best to declare your columns with the correct character set and collation. If you do that then your tables indexes will be set up to grab the data you need quickly. If your data really are binary data (only you know that) you can declare the table, or the column, with the

  COLLATE BIN

modifier.

This part of MySQL is worth your effort to figure out.

Thanks Ollie. Excellent post, quite thorough. Upvoted/solved. — kmarks2, May 24 '12 at 17:26

MySQL equivalance comparison with varchars and literals -- not case sensitive with auto-trimming

1 Answers1