Transform unicode \xe9 to é (python 2.7)

Question

I'm trying to transform this unicode value:

string_value = u'd\xe9cid\xe9'

to

string_value = u'décidé'

I feel like I've tried everything:

decoded_str = string_value.decode('utf-8')

or

string_value = str(string_value)
decoded_str = string_value.encode('latin1').decode('utf-8')

or

string_value = string_value.decode('latin-1')

for this one the result is:

d\xc3\xa9cid\xc3\xa9

I have the same result if I do:

string_value = string_value.encode('utf-8')

I've read from: How do I convert 'blah \xe9 blah' to 'blah é blah'

also from: Why does Python print unicode characters when the default encoding is ASCII?

and: How do I convert a unicode to a string at the Python level?

EDIT:

My problem is I need to use the data, I mean if I have :

string_value = u'mai 2017 \u2013 Aujourd\u2019hui'

which is :

mai 2017 – Aujourd’hui

I want to do:

string_list = string_value.split('-')

But the result is:

[u'mai 2017 \u2013 Aujourd\u2019hui']

And I would:

['mai 2017', 'Aujourd’hui']

NEW EDIT:

I understand that I'm going to the wrong direction, thanks to your answer. \xe9 is the right representation of 'é' and it's not a problem. My real issue is why does json.loads() transform 'mai 2017 – Aujourd’hui' to 'mai 2017 \u2013 Aujourd\u2019hui' ?

Why do you care how a string is represented in your source code? Does it not come out correctly when you `print` it? — Jongware, Mar 20 '18 at 10:38
string_value = string_value.encode('utf-8') is working for me in python2.7 — Rakesh, Mar 20 '18 at 10:38
`u'd\xe9cid\xe9'` already represents `u'décidé'`. You don't need to do anything. — deceze, Mar 20 '18 at 10:43
Thanks for your answer @usr2564301, the problem is I need to format the data. For exemple I've this unicode u'mai 2017 \u2013 Aujourd\u2019hui' that is 'mai 2017 – Aujourd’hui' and I want to split it at '-' and it's not working. I'm doing to edit my question with this exemple — PAscalinox, Mar 20 '18 at 10:43
Don't edit your question in a way that invalidates all existing answers. — deceze, Mar 20 '18 at 11:21

Florian Brucker · Answer 1 · 2018-03-20T10:49:57.310

2

I am not sure what you're asking: \xe9 is a representation of the code point 233 (e9 in hexadecimal), which simply is the letter "é":

>>> u'é' == u'\xe9'
True

Your confusion might stem from the fact that the repr of a Python string is (in Python 2) in ASCII, so non-ASCII characters are escaped. The Python console displays a value using repr if you do not print it explicitly:

>>> print(repr(u'é'))
u'\xe9'

>>> print(repr(u'\xe9'))
u'\xe9'

However, when you print the value, then it that conversion doesn't happen and everything works as expected:

>>> print(u'é')
é

>>> print(u'\xe9')
é

Also note that in Python 3, repr returns Unicode:

Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print(repr(u'\xe9'))
'é'

Update after the question was edited:

As pointed out in the comments, \u2013 is not the same character as - (just as a and b are separate characters). So you'll need to split on \u2013 instead of splitting on -.

edited Mar 20 '18 at 10:49

answered Mar 20 '18 at 10:45

Florian Brucker

7,641
3
37
62

Thanks a lot @Florian for your answer, I've edited my question to explain better why is a problem for me – PAscalinox Mar 20 '18 at 10:47
See my updates. – Florian Brucker Mar 20 '18 at 10:50
Ok thanks I understand perfectly now but there is a new question: why does json.loads() replace ' or - by \u ? – PAscalinox Mar 20 '18 at 10:57
Please post additional questions as new posts if they are not closely related to the original question (which this one isn't). – Florian Brucker Mar 20 '18 at 10:58
@PAscalinox: indeed, why would it? Best guess is that it does *not* do so. – Jongware Mar 20 '18 at 20:30

score 0 · Accepted Answer · answered Mar 20 '18 at 10:55

0

splitting a string with a unicode delimiter?

so...

print string_value.split(u"\u2013")

answered Mar 20 '18 at 10:55

Chris Curvey

7,513
6
38
57

Transform unicode \xe9 to é (python 2.7)

2 Answers2