Unicode handling in python 2

Question

>>> cmd="echo ö"
>>> type(s1)
<str>
>>> print s1
echo ö
>>> chan.exec_command(cmd)

I am getting a string with some unicode characters from an external application. How should I handle this string in my python code properly? I am getting exception as below when I am sending this to paramiko exec_command method. Here chan is my Paramiko object.

'ascii' codec can't encode character u'\xfc' in position 136: ordinal not in range(128)

I need to encode/decode this string before sending it to paramiko. I am new to python, any help would be really appreciated. This was the string I am adding:

X0A3549029:[u'Uni\xf3n de Cr\xe9', u'DemoModel', 'NA']
Traceback (most recent call last):
  File "updateTelemetry.py", line 98, in <module>
    query="insert into record_tmp(sn,cname,model,product) values('"+key+"','"+value[0].decode('utf8')+"','"+value[1]+"','"+value[2]+"')"
  File "/usr/lib64/python2.6/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
 UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 3: ordinal not in range(128)

Your example is a byte string that could be any encoding. Byte strings don't return `can't encode` error messages since they are already encoded. What is the *actual* string and data type that generated that error? Please post an [MCVE](http://stackoverflow.com/help/mcve). — Mark Tolonen, Jun 09 '16 at 03:20
Ok, you need to slow down and learn about Unicode in Python 2. See my quick summary here: http://stackoverflow.com/questions/21129020/how-to-fix-unicodedecodeerror-ascii-codec-cant-decode-byte/35444608#35444608. The stacktrace you've pasted seems to bear no resemblance to the lines of code at the top. The best thing to do, is to close this question and raise a new one with the full source, the input and the stacktrace given. — Alastair McCormack, Jun 11 '16 at 11:07

Günther Jena · Answer 1 · 2016-06-08T21:15:07.640

1

Use .decode('utf8') to turn it into a unicode type:

>>> cmd="echo ö"
>>> type(cmd)
<type 'str'>
>>> cmd_unicode=cmd.decode('utf8')
>>> type(cmd_unicode)
<type 'unicode'>

PS: Unicode handling differs between Python 2 and 3.

edited Jun 08 '16 at 21:15

answered Jun 08 '16 at 20:26

Günther Jena

3,481
3
30
47

This is not working with python 2.7. Getting UnicodeDecodeError: invalid start byte – Apoorv Gupta Jun 09 '16 at 11:45
Tested on Python 2.7.11+ on Ubunutu 64Bit 16.04. Have you tested my given example or have you tried it on your code? – Günther Jena Jun 09 '16 at 11:48
Tested your example only. When tested on windows, it didn't worked. However when tested on CentOS, it worked :) – Apoorv Gupta Jun 09 '16 at 11:51
1

`cmd_unicode=cmd.decode('utf8')` only works if `cmd` is `utf-8` encoded. In an interactive session on Windows, it will be codepage encoded. On a Unix box, it will be encoded to whatever your terminal emulator is configured for. Basically, you can't make any guarantees about encoding and you should use more robust ways to solve it – Alastair McCormack Jun 11 '16 at 11:13

score 1 · Answer 2 · answered Jun 12 '16 at 17:28

UnicodeEncodeError while calling .decode() on Python 2 indicates that the input is Unicode and therefore Python tries to encode it first using sys.getdefaultencoding() that should be ASCII on Python 2 before passing it to .decode() method.

Drop .decode('utf8') call—value[0] is already Unicode.

Unrelated: do not use string formatting, to create sql queries—use parametrized sql queries instead:

db.execute("insert into record_tmp(sn,cname,model,product) values(?,?,?,?)",
           [key] + values)

The placeholder syntax may be different depending on the Python db-api module that you use e.g., it could be %s instead of ?.

Unicode handling in python 2

2 Answers2