10

I wonder why when I make:

a = [u'k',u'ę',u'ą']

and then type:

'k' in a

I get True, while:

'ę' in a

will give me False?

It really gives me headache and it seems someone made this on purpose to make people mad...

aIKid
  • 21,361
  • 4
  • 36
  • 58
Kulawy Krul
  • 223
  • 1
  • 2
  • 5
  • 2
    For what it's worth, this behaves as you expect in Python 3. – bbayles Nov 14 '13 at 00:44
  • On my Python (2.7.2), this raises the warning `UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal` before returning `False`, which is the reason for it. Using `u'ę' in a` works as expected. – Waleed Khan Nov 14 '13 at 00:45
  • 1
    @alKid, I just pasted it in my interpreter. – bbayles Nov 14 '13 at 00:45
  • Does the interpreter handle unicode input? – RyPeck Nov 14 '13 at 00:46
  • I'm using python `2.7.15`, `'ę' in a` is True, which is strange... – Bin Dec 24 '18 at 04:14

4 Answers4

15

And why is this?

In Python 2.x, you can't compare unicode to string directly for non-ascii characters. This will raise a warning:

Warning (from warnings module):
  File "__main__", line 1
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

However, in Python 3.x this doesn't appear, as all strings are unicode objects.

Solution?

You can either make the string unicode:

>>> u'ç' in a
True

Now, you're comparing both unicode objects, not unicode to string.

Or convert both to an encoding, for example utf-8 before comparing:

>>> c = u"ç"
>>> u'ç'.encode('utf-8') == c.encode('utf-8')
True

Also, to use non-ascii characters in your program, you'll have to specify the encoding, at the top of the file:

# -*- coding: utf-8 -*-

#the whole program

Hope this helps!

aIKid
  • 21,361
  • 4
  • 36
  • 58
4

You need to explicitly make the string unicode. The following shows an example, and the warning given when you do not specify it as unicode:

>>> a = [u'k',u'ę',u'ą']
>>> 'k' in a
True
>>> 'ę' in a
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
>>> u'ę' in a
True
jordanm
  • 26,799
  • 4
  • 56
  • 64
1

u'ę' is a unicode object, while 'ę' is a str object in your current locale. Sometimes, depending on locale, they will be the same, and sometimes they will not.

One of the nice things about Python 3 is that all text is unicode, so this particular problem goes away.

Ethan Furman
  • 52,296
  • 16
  • 127
  • 201
0

Make sure that you specify the source code encoding and use u in front of unicode literals.

This works both on Python 3 and Python 2:

#!/usr/bin/python
# -*- coding: utf-8 -*-

a = [u'k',u'ę',u'ą']

print(u'ę' in a)
# True
dawg
  • 80,841
  • 17
  • 117
  • 187