MySQL collate utf8 style string comparison in Python

Question

I have the following MySQL table

mysql> show create table names;
+-------+-----------------------------------------------------+
| Table | Create Table                                        |
+-------+----------------------------- -----------------------+
| names | CREATE TABLE `names` (
`name` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
 UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci  |
+-------+-----------------------------------------------------+

Now, the table has the following record

mysql> select * from names;
+--------+
| name   |
+--------+
| Luísa  |
+--------+

Note that the entry is Luísa. It's actually an 'í'. As you can see, I have specified the collation for the name field to COLLATE utf8_unicode_ci. I have a Python script that loads some names into this table and as the field name is unique and with the collation set to utf8_unicode_ci, I'm unable to insert Luisa in this table as it considers i and í to be the same.

Now, to check if the entry is present already in the table in python, I'm initially loading all the names present in the table in a set and I try to insert only if it is not present in the table already. Now, the problem is python is treating i and í to be different.

I read it in http://www.cmlenz.net/archives/2008/07/the-truth-about-unicode-in-python that Python doesn't support collation and that we have a python implementation of the uca written by James Tauber. However, that helps in sorting, but not while comparing if the two string will be treated the same in MySQL with utf8 Unicode CI collation.

Is there a way in Python to compare these two strings the MySQL way?

score 0 · Answer 1 · answered Dec 26 '12 at 08:30

0

Now, to check if the entry is present already in the table in python, I'm initially loading all the names present in the table in a set and I try to insert only if it is not present in the table already.

You're doing it wrong. Either perform a query against the table to see if the entry already exists, or try to insert regardless and catch the exception.

answered Dec 26 '12 at 08:30

Ignacio Vazquez-Abrams

699,552
132
1,235
1,283

Yes, either of them should fix the problem although I believe trying to insert and then catching will be more effective as it would reduce the number of queries to the DB. However, I also wanted to know if python provides some way to enforce such kind of string comparision. – Arun Kumar Nagarajan Dec 26 '12 at 09:16

score 0 · Answer 2 · answered Dec 26 '12 at 08:31

0

What about COLLATE utf8_bin?

It is for comparing characters in binary format (strict comparison).

answered Dec 26 '12 at 08:31

Paul T. Rawkeen

3,728
3
31
45

Yes, but I want the field to be in that collation. – Arun Kumar Nagarajan Dec 26 '12 at 09:14
@ArunKumarNagarajan, and what `STRCMP(str1, str2)` says if you compare strings manually? It returns 0 if the strings are the same, -1 if the first argument is smaller than the second according to the current sort order, and 1 otherwise. – Paul T. Rawkeen Dec 26 '12 at 09:29
@ArunKumarNagarajan, what the problem to lead the filed to that collation? – Paul T. Rawkeen Dec 29 '12 at 10:41

MySQL collate utf8 style string comparison in Python

2 Answers2