I have the following MySQL table
mysql> show create table names;
+-------+-----------------------------------------------------+
| Table | Create Table |
+-------+----------------------------- -----------------------+
| names | CREATE TABLE `names` (
`name` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
+-------+-----------------------------------------------------+
Now, the table has the following record
mysql> select * from names;
+--------+
| name |
+--------+
| Luísa |
+--------+
Note that the entry is Luísa. It's actually an 'í'. As you can see, I have specified the collation for the name field to COLLATE utf8_unicode_ci. I have a Python script that loads some names into this table and as the field name is unique and with the collation set to utf8_unicode_ci, I'm unable to insert Luisa in this table as it considers i and í to be the same.
Now, to check if the entry is present already in the table in python, I'm initially loading all the names present in the table in a set and I try to insert only if it is not present in the table already. Now, the problem is python is treating i and í to be different.
I read it in http://www.cmlenz.net/archives/2008/07/the-truth-about-unicode-in-python that Python doesn't support collation and that we have a python implementation of the uca written by James Tauber. However, that helps in sorting, but not while comparing if the two string will be treated the same in MySQL with utf8 Unicode CI collation.
Is there a way in Python to compare these two strings the MySQL way?