Comparing string and unicode in Python 2.7.5

Question

I wonder why when I make:

a = [u'k',u'ę',u'ą']

and then type:

'k' in a

I get True, while:

'ę' in a

will give me False?

It really gives me headache and it seems someone made this on purpose to make people mad...

For what it's worth, this behaves as you expect in Python 3. — bbayles, Nov 14 '13 at 00:44
On my Python (2.7.2), this raises the warning `UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal` before returning `False`, which is the reason for it. Using `u'ę' in a` works as expected. — Waleed Khan, Nov 14 '13 at 00:45
I'm using python `2.7.15`, `'ę' in a` is True, which is strange... — Bin, Dec 24 '18 at 04:14

aIKid · Accepted Answer · 2014-03-17T22:07:48.387

And why is this?

In Python 2.x, you can't compare unicode to string directly for non-ascii characters. This will raise a warning:

Warning (from warnings module):
  File "__main__", line 1
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

However, in Python 3.x this doesn't appear, as all strings are unicode objects.

Solution?

You can either make the string unicode:

>>> u'ç' in a
True

Now, you're comparing both unicode objects, not unicode to string.

Or convert both to an encoding, for example utf-8 before comparing:

>>> c = u"ç"
>>> u'ç'.encode('utf-8') == c.encode('utf-8')
True

Also, to use non-ascii characters in your program, you'll have to specify the encoding, at the top of the file:

# -*- coding: utf-8 -*-

#the whole program

Hope this helps!

score 4 · Answer 2 · answered Nov 14 '13 at 00:47

4

You need to explicitly make the string unicode. The following shows an example, and the warning given when you do not specify it as unicode:

>>> a = [u'k',u'ę',u'ą']
>>> 'k' in a
True
>>> 'ę' in a
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
>>> u'ę' in a
True

answered Nov 14 '13 at 00:47

jordanm

26,799
4
56
64

@KulawyKrul See my answer for that – aIKid Nov 14 '13 at 01:01

score 1 · Answer 3 · answered Nov 14 '13 at 00:49

1

u'ę' is a unicode object, while 'ę' is a str object in your current locale. Sometimes, depending on locale, they will be the same, and sometimes they will not.

One of the nice things about Python 3 is that all text is unicode, so this particular problem goes away.

answered Nov 14 '13 at 00:49

Ethan Furman

52,296
16
127
201

Seems I need to start using Python 3 immediately! :) Thanks! – Kulawy Krul Nov 14 '13 at 00:52

score 0 · Answer 4 · answered Nov 14 '13 at 00:53

0

Make sure that you specify the source code encoding and use u in front of unicode literals.

This works both on Python 3 and Python 2:

#!/usr/bin/python
# -*- coding: utf-8 -*-

a = [u'k',u'ę',u'ą']

print(u'ę' in a)
# True

answered Nov 14 '13 at 00:53

dawg

80,841
17
117
187

Comparing string and unicode in Python 2.7.5

4 Answers4