I wonder why when I make:
a = [u'k',u'ę',u'ą']
and then type:
'k' in a
I get True
, while:
'ę' in a
will give me False
?
It really gives me headache and it seems someone made this on purpose to make people mad...
I wonder why when I make:
a = [u'k',u'ę',u'ą']
and then type:
'k' in a
I get True
, while:
'ę' in a
will give me False
?
It really gives me headache and it seems someone made this on purpose to make people mad...
And why is this?
In Python 2.x, you can't compare unicode to string directly for non-ascii characters. This will raise a warning:
Warning (from warnings module):
File "__main__", line 1
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
However, in Python 3.x this doesn't appear, as all strings are unicode objects.
Solution?
You can either make the string unicode:
>>> u'ç' in a
True
Now, you're comparing both unicode objects, not unicode to string.
Or convert both to an encoding, for example utf-8 before comparing:
>>> c = u"ç"
>>> u'ç'.encode('utf-8') == c.encode('utf-8')
True
Also, to use non-ascii characters in your program, you'll have to specify the encoding, at the top of the file:
# -*- coding: utf-8 -*-
#the whole program
Hope this helps!
You need to explicitly make the string unicode. The following shows an example, and the warning given when you do not specify it as unicode:
>>> a = [u'k',u'ę',u'ą']
>>> 'k' in a
True
>>> 'ę' in a
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
>>> u'ę' in a
True
u'ę'
is a unicode
object, while 'ę'
is a str
object in your current locale. Sometimes, depending on locale, they will be the same, and sometimes they will not.
One of the nice things about Python 3 is that all text is unicode, so this particular problem goes away.
Make sure that you specify the source code encoding and use u
in front of unicode literals.
This works both on Python 3 and Python 2:
#!/usr/bin/python
# -*- coding: utf-8 -*-
a = [u'k',u'ę',u'ą']
print(u'ę' in a)
# True