30

I'm working on a Python script that uses the scissor character (9986 - ✂) and I'm trying to port my code to Mac, but I'm running into this error.

The scissor character shows up fine when run from IDLE (Python 3.2.5 - OS X 10.4.11 iBook G4 PPC) and the code works entirely fine on Ubuntu 13.10, but when I attempt to run this in the terminal I get this error/traceback:

Traceback (most recent call last):
  File "snippets-convert.py", line 352, in <module>
    main()
  File "snippets-convert.py", line 41, in main
    menu()
  File "snippets-convert.py", line 47, in menu
    print ("|\t ",snipper.decode(),"PySnipt'd",snipper.decode(),"\t|")
UnicodeEncodeError: 'ascii' codec can't encode character '\u2702' in position 0: ordinal not in range(128)

and the code that is giving me the problem:

print ("|\t ",chr(9986),"PySnipt'd",chr(9986),"\t|")

Doesn't this signal that the terminal doesn't have the capability to display that character? I know this is an old system, but it is currently the only system I have to use. Could the age of the OS is interfering with the program?

I've read over these questions:

What's causing this error? Is it the age of the system/OS, the version of Python, or some programming error?

EDIT: This error crops up later with this duplicate issue (just thought I'd add it as it is within the same program and is the same error):

Traceback (most recent call last):
  File "snippets-convert.py", line 353, in <module>
    main()
  File "snippets-convert.py", line 41, in main
    menu()
  File "snippets-convert.py", line 75, in menu
    main()
  File "snippets-convert.py", line 41, in main
    menu()
  File "snippets-convert.py", line 62, in menu
    search()
  File "snippets-convert.py", line 229, in search
    print_results(search_returned)      # Print the results for the user
  File "snippets-convert.py", line 287, in print_results
    getPath(toRead)                                             # Get the path for the snippet
  File "snippets-convert.py", line 324, in getPath
    snipXMLParse(path)
  File "snippets-convert.py", line 344, in snipXMLParse
    print (chr(164),child.text)
UnicodeEncodeError: 'ascii' codec can't encode character '\xa4' in position 0: ordinal not in range(128)

EDIT:

I went into the terminal character settings and it does in fact support that character (as you can see in this screenshot:

enter image description here

when I insert it into terminal it prints out this: \342\234\202 and when I press Enter I get this: -bash: ✂: command not found

EDIT Ran commands as @J.F. Sebastian asked:

python3 test-io-encoding.py:

PYTHONIOENCODING:       None
locale(False):  US-ASCII
device(stdout): US-ASCII
stdout.encoding:        US-ASCII
device(stderr): US-ASCII
stderr.encoding:        US-ASCII
device(stdin):  US-ASCII
stdin.encoding: US-ASCII
locale(False):  US-ASCII
locale(True):   US-ASCII

python3 -S test-io-encoding.py:

PYTHONIOENCODING:       None
locale(False):  US-ASCII
device(stdout): US-ASCII
stdout.encoding:        US-ASCII
device(stderr): US-ASCII
stderr.encoding:        US-ASCII
device(stdin):  US-ASCII
stdin.encoding: US-ASCII
locale(False):  US-ASCII
locale(True):   US-ASCII

EDIT Tried the "hackerish" solution provided by @PauloBu:

As you can see, this caused one (Yay!) scissor, but I am now getting a new error. Traceback/error:

+-=============================-+
✂Traceback (most recent call last):
  File "snippets-convert.py", line 357, in <module>
    main()
  File "snippets-convert.py", line 44, in main
    menu()
  File "snippets-convert.py", line 52, in menu
    print("|\t "+sys.stdout.buffer.write(chr(9986).encode('UTF-8'))+" PySnipt'd "+ sys.stdout.buffer.write(chr(9986).encode('UTF-8'))+" \t|")
TypeError: Can't convert 'int' object to str implicitly

EDIT Added results of @PauloBu's fix:

+-=============================-+
|
✂ PySnipt'd 
✂       |
+-=============================-+

EDIT:

And his fix for his fix:

+-=============================-+
✂✂|       PySnipt'd     |
+-=============================-+
Community
  • 1
  • 1
RPiAwesomeness
  • 4,319
  • 6
  • 30
  • 46
  • What output exactly do you get when using `.encode('UTF-8')`? –  Jan 04 '14 at 16:45
  • @delnan It returns: `b'\xe2\x9c\x82'` – RPiAwesomeness Jan 04 '14 at 16:46
  • Ah, of course. You need to output bytes then, but I'm not sure how to do that reliably and it would only solve the problem if the console is *actually* using UTF-8 and Python just doesn't realize that. –  Jan 04 '14 at 16:54
  • This answer is for Python 2 but it might help: http://stackoverflow.com/a/1169209/5987 – Mark Ransom Jan 04 '14 at 17:23
  • @MarkRansom Yes, looked at that. I plan to try some of it if I can... – RPiAwesomeness Jan 04 '14 at 17:26
  • @RPiAwesomeness I edited my answer with a little more info in case you haven't solved it yet. Good luck! – Paulo Bu Jan 04 '14 at 21:12
  • In my case the problem was caused by `export LC_ALL=C` in my `.bash_profile`. Changing it to `export LC_ALL=en_US.UTF-8` (and restarting the terminal) made the unicode error go away. – ccpizza Sep 28 '17 at 16:05

4 Answers4

24

When Python prints and output, it automatically encodes it to the target medium. If it is a file, UTF-8 will be used as default and everyone will be happy, but if it is a terminal, Python will figure out the encoding the terminal is using and will try to encode the output using that one.

This means that if your terminal is using ascii as encoding, Python is trying to encode scissor char to ascii. Of course, ascii doesn't support it so you get Unicode decode error.

This is why you always have to explicitly encode your output. Explicit is better than implicit remember? To fix your code you may do:

import sys
sys.stdout.buffer.write(chr(9986).encode('utf8'))

This seems a bit hackerish. You can also set PYTHONIOENCODING=utf-8 before executing the script. I'am uncomfortable with both solutions. Probably your console doesn't support utf-8 and you see gibberish. But your program will be behaving correctly.

What I strongly recommend if you definitely need to show correct output on your console is to set your console to use another encoding, one that support scissor character. (utf-8 perhaps). On Linux, that can be achieve by doing: export lang=UTF_8. On Windows you change the console's code page with chcp. Just figure out how to set utf8 in yours and IMHO that'll be the best solution.


You can't mix print and sys.stdout.write because they're basically the same. Regarding to your code, the hackerish way would be like this:
sys.stdout.buffer.write(("|\t "+ chr(9986) +" PySnipt'd " + chr(9986)+" \t|").encode('utf8'))

I suggest you to take a read at the docs to see what's going on under the hood with print function and with sys.stdout: http://docs.python.org/3/library/sys.html#sys.stdin

Hope this helps!

Paulo Bu
  • 27,056
  • 6
  • 67
  • 69
  • That makes sense. How do I fix it? Sweet! I will try this out! – RPiAwesomeness Jan 04 '14 at 16:52
  • @RPiAwesomeness added some more explanation. Let me know if you get it. – Paulo Bu Jan 04 '14 at 16:54
  • In Python 3, this just outputs bytes, so code like this will output the `str` `"b'\\xe2\\x9c\\x82'"`. –  Jan 04 '14 at 16:55
  • @delnan: Pythons **always** outputs bytes. The console reads the bytes and turn them into chars if it recognizes it. Do this test: `s=chr(97);s.encode('utf-8');` You will see an `a` printed on the console :) which is byte 97 :) – Paulo Bu Jan 04 '14 at 16:57
  • Yeah, the terminal is just displaying this: `| b'\xe2\x9c\x82' PySnipt'd b'\xe2\x9c\x82' |` – RPiAwesomeness Jan 04 '14 at 16:57
  • @PauloBu Python ultimately writes bytes to the console, but since `sys.stdout` is a TextIO thing, `print` will convert its arguments to unicode strings to write to this file object, and this object will encode the unicode strings in the encoding of the underlying byte stream and write those bytes to the stream. Also, your example does *not* work, `encode` always results in `bytes` and those are never implicitly decoded, so handing them to anything which does text I/O results in the repr of bytes (`b'...'`) to be printed. Please keep in mind that we're talking Python **3**. –  Jan 04 '14 at 17:00
  • @delnan you're right. Although it tested in my console: `python3 -c "s=chr(97);print(s.encode('437'))"` it printed `b'a'`. That wasn't what I expected. I'm not python3 user although I know the basics but seems to me this one failed. Sorry, then how would I explicitly ask for an output encoding in Python3? – Paulo Bu Jan 04 '14 at 17:04
  • 1
    *"you always have to explicitly encode your output"* is incorrect. You don't want to use scripts that are complete duplicates except for the output character encoding to be able to run them in different environments. – jfs Jan 04 '14 at 17:18
  • No, you can write the script output explicitly and use it everywhere. They are not mutually exclusive options. – Paulo Bu Jan 04 '14 at 17:22
  • It works! kinda. I'm getting one (Yay!) scissor, but I'm also getting a new error, I've edited my question. – RPiAwesomeness Jan 04 '14 at 21:26
  • I'm a little lost know. What the error is? The last one edited? – Paulo Bu Jan 04 '14 at 21:32
  • @RPiAwesomeness know I see. You can't mix writes and print. Take a look at the end of my answer, I'll edit your code. – Paulo Bu Jan 04 '14 at 21:36
  • Edited the question with the results of this, it has both of the scissors (YAY!!) but now the formatting is off. – RPiAwesomeness Jan 04 '14 at 22:43
  • @RPiAwesomeness Fixed. Added an `end` argument to the two first `print` functions. That should do it. – Paulo Bu Jan 04 '14 at 23:11
  • @PauloBu It works, again kinda. It gets the lines back to where they were before, but now the scissors are on top of each-other...edited my question w. output. – RPiAwesomeness Jan 05 '14 at 02:29
  • @RPiAwesomeness ok try again. Changed it for one line of code. – Paulo Bu Jan 05 '14 at 14:51
  • @PauloBu You should not *"always have to explicitly encode your output"* - this makes your code brittle and platform specific. It will not work everywhere - it does not render on my Windows terminal or my old Solaris box with a `C` locale. The solution is **to fix the terminal**. `export LANG=UTF_8` is also **not a solution** as it's not a proper locale for most people - it will result in the `preferedencoding` being `'US-ASCII` and will result in a `UnicodeEncodeError` – Alastair McCormack Jul 18 '16 at 14:46
  • *"If it is a file, UTF-8 will be used as default and everyone will be happy,"*. wrong - Py3k uses your locale to determine the default encoding codec. It will only be "utf-8" if your locale is utf-8 based. – Alastair McCormack Jul 18 '16 at 14:48
  • @AlastairMcCormack Encoding has nothing to do with platform independency. If you don't explicitly encode your output it will be encoded anyway in whatever the default is (and you might not know which one and that's the source of errors). "The solution is to fix the terminal"... so, your program breaks and now is the terminal's fault :) – Paulo Bu Jul 18 '16 at 15:30
  • So you'd rather print UTF-8 to my console even if my console doesn't and never will support UTF-8? – Alastair McCormack Jul 18 '16 at 15:55
  • I get your point. But it's also a chicken egg problem. If the program *wants* to write UTF-8 only characters... changing the encoding won't do either, and your terminal will never support it then the program will never run correctly there. One solution could be ask what locale the medium is using and encode in that locale, but what if is using ASCII and it has some Latin-1 chars? I just wrote the answer (2 years ago) to help the OP, not as a generic solution. – Paulo Bu Jul 18 '16 at 15:59
  • 1
    :) I think the generic solution would be to try to `print` Unicode and catch any `UnicodeEncodeError` exceptions. The `repr()` of the string could then be printed. Just a thought :) – Alastair McCormack Jul 18 '16 at 16:04
  • That answer wouldn't had help the OP but ... be my guest ;) – Paulo Bu Jul 18 '16 at 17:10
  • @PauloBu: it is wrong to print bytes and to suggest that it is necessary to use bytes, you should use Unicode instead and configure your environment appropriately. e.g., on Windows, Unicode API may be used and therefore bytes are never materialized in Python ([win-unicode-console, PEP 528](http://stackoverflow.com/a/32176732/4279)). `print(unicode_string)` works across platforms and different environments. – jfs Oct 06 '16 at 20:15
17

test_io_encoding.py output suggests that you should change your locale settings e.g., set LANG=en_US.UTF-8.


The first error might be due to you are trying to decode a string that is already Unicode. Python 2 tries to encode it using a default character encoding ('ascii') before decoding it using (possibly) different character encoding. The error happens on the encode step:

>>> u"\u2702".decode() # Python 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2702' in position 0: ordinal not in range(128)

It looks like you are running your script using Python 2 instead of Python 3. You would get:

>>> "\u2702".decode() # Python 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'

different error otherwise.

Just drop the .decode() call:

print("|\t {0} PySnipt'd {0} \t|".format(snipper))

The second issue is due to printing a Unicode string into a pipe:

$ python3 -c'print("\u2702")'
✂
$ python3 -c'print("\u2702")' | cat
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u2702' in position 0: ordinal not in range(128)

Set appropriate for your purpose PYTHONIOENCODING environment variable:

$ PYTHONIOENCODING=utf-8 python3 -c'print("\u2702")' | cat
✂

the terminal is just displaying this: | b'\xe2\x9c\x82' PySnipt'd b'\xe2\x9c\x82' |

If snipper is a bytes object then leave the snipper.decode() calls.

$ python3 -c"print(b'\xe2\x9c\x82'.decode())"
✂
$ python3 -c"print(b'\xe2\x9c\x82'.decode())" | cat
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u2702' in position 0: ordinal not in range(128)

The fix is the same:

$ PYTHONIOENCODING=utf-8 python3 -c"print(b'\xe2\x9c\x82'.decode())" | cat
✂
jfs
  • 346,887
  • 152
  • 868
  • 1,518
  • Good answer, but it gives me the same thing as here: http://stackoverflow.com/questions/20923663/unicodeencodeerror-ascii-codec-cant-encode-character-in-position-0-ordinal/20923794?noredirect=1#comment31415491_20923794 – RPiAwesomeness Jan 04 '14 at 17:05
  • 1
    It's kind of overkill to set and environment variable. In Python2.7 it is as simple as specifying and output encoding. I just realized in Python3 this won't work exactly the same. There's got to be another simpler (pythonic) way to do it. – Paulo Bu Jan 04 '14 at 17:09
  • @RPiAwesomeness: I've updated the answer for "`snipper` is a `bytes` object" case – jfs Jan 04 '14 at 17:14
  • 1
    @PauloBu: do you want to change the source code of your programs every time you run them in a terminal that has different character encoding? – jfs Jan 04 '14 at 17:15
  • `snipper.decode()` just gives me the same error as when I originally started. I edited my question with some new info. – RPiAwesomeness Jan 04 '14 at 17:18
  • @J.F.Sebastian I just try to make them general purpose. I normally stick out utf-8 as output and if the console doesn't support it I just deal with gibberish. What I normally do is to adapt the console encoding to the output encoding of the program. – Paulo Bu Jan 04 '14 at 17:20
  • @RPiAwesomeness: Could you update your question and include the results of running the code snippets from my answer e.g., what do you get if you run `python3 -c"print(b'\xe2\x9c\x82'.decode())"`? – jfs Jan 04 '14 at 17:24
  • @J.F.Sebastian by the way. If he **need** to show the output correctly in the terminal (and not just because he's testing the program) then he actually need to change his code, because if he don't he'll never get the program right isn't he? – Paulo Bu Jan 04 '14 at 17:24
  • @J.F.Sebastian Sure thing. I get this: `UnicodeEncodeError: 'ascii' codec can't encode character '\u2702' in position 0: ordinal not in range(128)` – RPiAwesomeness Jan 04 '14 at 17:27
  • @RPiAwesomeness: How do you run it? Are you sure you copy-pasted it as is? I don't believe your terminal uses `ascii` as a character encoding. Have you changed `site.py` or `sitecustomize.py` or `usercustomize.py` modules? What do you get if you run: `PYTHONIOENCODING=ascii:backslashreplace python3 -c"print(b'\xe2\x9c\x82'.decode())"`? – jfs Jan 04 '14 at 17:39
  • @J.F.Sebastian I copy-pasted it directly, No I haven't used any of those modules. If I run that last command I get `\u2702` – RPiAwesomeness Jan 04 '14 at 17:50
  • @RPiAwesomeness: What is the output of `python3 -S test_io_encoding.py` where [`test_io_encoding.py`](https://gist.github.com/zed/5898423)? Is it different from mere `python3 test_io_encoding.py`? – jfs Jan 04 '14 at 18:54
  • @J.F.Sebastian Just edited my question with the output. – RPiAwesomeness Jan 04 '14 at 19:23
  • @RPiAwesomeness: what is the output of `locale` command? Try to set `LANG` environment variable e.g., `LANG=en_US.UTF-8` – jfs Jan 04 '14 at 19:31
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/44515/discussion-between-rpi-awesomeness-and-j-f-sebastian) – RPiAwesomeness Jan 04 '14 at 19:36
0

My locale is set to de_AT.UTF-8 but these lines in /etc/profile were missing:

export LANG=de_AT.UTF-8
export LANGUAGE=de_AT.UTF-8
export LC_ALL=de_AT.UTF-8

logout / login and your problem should be solved

To verify if all locales are set correctly type locale in your terminal

The output should be similar to this:

LANG=de_AT.UTF-8
LANGUAGE=de_AT.UTF-8
LC_CTYPE="de_AT.UTF-8"
LC_NUMERIC="de_AT.UTF-8"
LC_TIME="de_AT.UTF-8"
LC_COLLATE="de_AT.UTF-8"
LC_MONETARY="de_AT.UTF-8"
LC_MESSAGES="de_AT.UTF-8"
LC_PAPER="de_AT.UTF-8"
LC_NAME="de_AT.UTF-8"
LC_ADDRESS="de_AT.UTF-8"
LC_TELEPHONE="de_AT.UTF-8"
LC_MEASUREMENT="de_AT.UTF-8"
LC_IDENTIFICATION="de_AT.UTF-8"
LC_ALL=de_AT.UTF-8
Mike Mitterer
  • 5,309
  • 2
  • 35
  • 51
-4

in the first line of your file .py you need to add this string, :

# -- coding: utf-8 --

and you can also try this:

print ("|\t ",unichr(9986),"PySnipt'd",unichr(9986),"\t|")

archetipo
  • 529
  • 3
  • 9