3

I have the following MySQL table

mysql> show create table names;
+-------+-----------------------------------------------------+
| Table | Create Table                                        |
+-------+----------------------------- -----------------------+
| names | CREATE TABLE `names` (
`name` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
 UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci  |
+-------+-----------------------------------------------------+

Now, the table has the following record

mysql> select * from names;
+--------+
| name   |
+--------+
| Luísa  |
+--------+

Note that the entry is Luísa. It's actually an 'í'. As you can see, I have specified the collation for the name field to COLLATE utf8_unicode_ci. I have a Python script that loads some names into this table and as the field name is unique and with the collation set to utf8_unicode_ci, I'm unable to insert Luisa in this table as it considers i and í to be the same.

Now, to check if the entry is present already in the table in python, I'm initially loading all the names present in the table in a set and I try to insert only if it is not present in the table already. Now, the problem is python is treating i and í to be different.

I read it in http://www.cmlenz.net/archives/2008/07/the-truth-about-unicode-in-python that Python doesn't support collation and that we have a python implementation of the uca written by James Tauber. However, that helps in sorting, but not while comparing if the two string will be treated the same in MySQL with utf8 Unicode CI collation.

Is there a way in Python to compare these two strings the MySQL way?

halfer
  • 18,701
  • 13
  • 79
  • 158
Arun Kumar Nagarajan
  • 2,044
  • 2
  • 13
  • 23

2 Answers2

0

Now, to check if the entry is present already in the table in python, I'm initially loading all the names present in the table in a set and I try to insert only if it is not present in the table already.

You're doing it wrong. Either perform a query against the table to see if the entry already exists, or try to insert regardless and catch the exception.

Ignacio Vazquez-Abrams
  • 699,552
  • 132
  • 1,235
  • 1,283
  • Yes, either of them should fix the problem although I believe trying to insert and then catching will be more effective as it would reduce the number of queries to the DB. However, I also wanted to know if python provides some way to enforce such kind of string comparision. – Arun Kumar Nagarajan Dec 26 '12 at 09:16
0

What about COLLATE utf8_bin?

It is for comparing characters in binary format (strict comparison).

Paul T. Rawkeen
  • 3,728
  • 3
  • 31
  • 45