25

Happy examples:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

czech = u'Leoš Janáček'.encode("utf-8")
print(czech)

pl = u'Zdzisław Beksiński'.encode("utf-8")
print(pl)

jp = u'リング 山村 貞子'.encode("utf-8")
print(jp)

chinese = u'五行'.encode("utf-8")
print(chinese)

MIR = u'Машина для Инженерных Расчётов'.encode("utf-8")
print(MIR)

pt = u'Minha Língua Portuguesa: çáà'.encode("utf-8")
print(pt)

Unhappy output:

b'Leo\xc5\xa1 Jan\xc3\xa1\xc4\x8dek'
b'Zdzis\xc5\x82aw Beksi\xc5\x84ski'
b'\xe3\x83\xaa\xe3\x83\xb3\xe3\x82\xb0 \xe5\xb1\xb1\xe6\x9d\x91 \xe8\xb2\x9e\xe5\xad\x90'
b'\xe4\xba\x94\xe8\xa1\x8c'
b'\xd0\x9c\xd0\xb0\xd1\x88\xd0\xb8\xd0\xbd\xd0\xb0 \xd0\xb4\xd0\xbb\xd1\x8f \xd0\x98\xd0\xbd\xd0\xb6\xd0\xb5\xd0\xbd\xd0\xb5\xd1\x80\xd0\xbd\xd1\x8b\xd1\x85 \xd0\xa0\xd0\xb0\xd1\x81\xd1\x87\xd1\x91\xd1\x82\xd0\xbe\xd0\xb2'
b'Minha L\xc3\xadngua Portuguesa: \xc3\xa7\xc3\xa1\xc3\xa0'

And if I print them like this:

jp = u'リング 山村 貞子'
print(jp)

I get:

Traceback (most recent call last):
  File "x.py", line 5, in <module>
    print(jp)
  File "C:\Python34\lib\encodings\cp850.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position
0-2: character maps to <undefined>

I've also tried the following from this question (And other alternatives that involve sys.stdout.encoding):

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import print_function
import sys

def safeprint(s):
    try:
        print(s)
    except UnicodeEncodeError:
        if sys.version_info >= (3,):
            print(s.encode('utf8').decode(sys.stdout.encoding))
        else:
            print(s.encode('utf8'))

jp = u'リング 山村 貞子'
safeprint(jp)

And things get even more cryptic:

リング 山村 貞子

And the docs were not very helpful.

So, what's the deal with Python 3.4, Unicode, different languages and Windows? Almost all possible examples I could find, deal with Python 2.x.

Is there a general and cross-platform way of printing ANY Unicode character from any language in a decent and non-nasty way in Python 3.4?

EDIT:

I've tried typing at the terminal:

chcp 65001

To change the code page, as proposed here and in the comments, and it did not work (Including the attempt with sys.stdout.encoding)

Community
  • 1
  • 1
Ericson Willians
  • 6,608
  • 10
  • 47
  • 97
  • 2
    [python3 print unicode to windows xp console encode cp437](http://stackoverflow.com/q/28521944) looks applicable, as does [python 3.0, how to make print() output unicode?](http://stackoverflow.com/q/507123). – Martijn Pieters May 29 '15 at 22:15
  • 1
    Your console is not configured for Unicode output; [CP850](http://en.wikipedia.org/wiki/Code_page_850) cannot handle all that much. – Martijn Pieters May 29 '15 at 22:16
  • I think the problem is not in Python but in the Windows console which will have only one code page, which is by default not a unicode one. Try `chcp 65001` to set it to UTF-8 code page. Taken from [Unicode characters in Windows command line](http://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how). – GolezTrol May 29 '15 at 22:16
  • I expect to get the same thing I've put inside the strings. – Ericson Willians May 29 '15 at 22:37
  • @Padraic, the problem is windoze, but the blame lies with compatibility requirements, a.k.a. [historical reasons](http://blogs.msdn.com/b/oldnewthing/archive/2005/03/08/389527.aspx). – TigerhawkT3 May 29 '15 at 23:39
  • 4
    The Windows console can print Unicode strings just fine (assuming the font supports it), using `WriteConsoleW`. But Python doesn't use that, see https://bugs.python.org/issue1602. – Philipp May 30 '15 at 22:17
  • I solved crash issues with the cmd console with that: right click on the top of the cmd console windows, on the tab `font` chose lucida console. – J. Does May 11 '17 at 20:11

2 Answers2

19

Update: Since Python 3.6, the code example that prints Unicode strings directly should just work now (even without py -mrun).


Python can print text in multiple languages in Windows console whatever chcp says:

T:\> py -mpip install win-unicode-console
T:\> py -mrun your_script.py

where your_script.py prints Unicode directly e.g.:

#!/usr/bin/env python3
print('š áč')      # cz
print('ł ń')       # pl
print('リング')     # jp
print('五行')      # cn
print('ш я жх ё') # ru
print('í çáà')    # pt

All you need is to configure the font in your Windows console that can display the desired characters.

You could also run your Python script via IDLE without installing non-stdlib modules:

T:\> py -midlelib -r your_script.py

To write to a file/pipe, use PYTHONIOENCODING=utf-8 as @Mark Tolonen suggested:

T:\> set PYTHONIOENCODING=utf-8
T:\> py your_script.py >output-utf8.txt 

Only the last solution supports non-BMP characters such as (U+1F612 UNAMUSED FACE) -- py -mrun can write them but Windows console displays them as boxes even if the font supports corresponding Unicode characters (though you can copy-paste the boxes into another program, to get the characters).

Community
  • 1
  • 1
jfs
  • 346,887
  • 152
  • 868
  • 1,518
  • How would you do the interactive versions? I guess Python is `python -i -m run`, but I cannot figure out ipython, even though it's stated on win-unicode-console's page that it's integrated. – hyperknot Aug 07 '15 at 21:48
  • @zsero: [the docs show several approaches](https://github.com/Drekin/win-unicode-console/tree/9652bb146379485d78f6c7534638f3701d651324#usage) e.g., `py -i -m run c:\path\to\ipython`. You could also use qtconsole interface or [a web-browser-based notebook](http://ipython.org/notebook.html). If it doesn't work for you; ask a separate question about what do you want to do with `ipython` and what fails exactly. – jfs Aug 07 '15 at 22:14
  • @eryksun: no. Notice that `py -mrun` is used. – jfs Aug 24 '15 at 06:15
  • @sebastian I guess I solved my issue with your help. Your answer is bite confusing: as a python 3.6 user I did not understood if I should ignore or take into account what you write bellow it. If it is the case a kind of "for the previous version:" would make it more clear. Thanks for your patience! – JinSnow Jan 13 '17 at 20:36
  • "the default console on Windows will now accept all Unicode characters" **BUT** you need to configure the console: right click on the top of the windows (of the cmd or the python IDLE), in default/font choose the "Lucida console". – JinSnow Jan 13 '17 at 20:46
  • @Guillaume: Thank you for your feedback. 1- did you miss the word "Update" in bold and the horizontal rule? It says *"Since Python 3.6, the code example ..should just work now .."* How would you formulate it clearer? What should it say instead? 2- The answer says explicitly *"configure the font in your Windows console"*. *How* to configure the font is a separate question (the exact GUI steps might change between Windows versions). You don't need to configure the font in IDLE (it should work by default). – jfs Jan 13 '17 at 21:01
  • 1
    Lucida console doesn't support Chinese or Japanese either. – Mark Tolonen Jan 13 '17 at 23:38
  • @J.F.Sebastian thanks for your great help! So future users will have several version of the same informations. – JinSnow Jan 14 '17 at 11:57
12

The problem iswas (see Python 3.6 update below) with the Windows console, which supports an ANSI character set appropriate for the region targeted by your version of Windows. Python throws an exception by default when unsupported characters are output.

Python can read an environment variable to output in other encodings, or to change the error handling default. Below, I've read the console default and change the default error handling to print a ? instead of throwing an error for characters that are unsupported in the console's current code page.

C:\>chcp
Active code page: 437   # Note, US Windows OEM code page.

C:\>set PYTHONIOENCODING=437:replace

C:\>example.py
Leo? Janá?ek
Zdzis?aw Beksi?ski
??? ?? ??
??
?????? ??? ?????????? ????????
Minha Língua Portuguesa: çáà

Note the US OEM code page is limited to ASCII and some Western European characters.

Below I've instructed Python to use UTF8, but since the Windows console doesn't support it, I redirect the output to a file and display it in Notepad:

C:\>set PYTHONIOENCODING=utf8
C:\>example >out.txt
C:\>notepad out.txt

enter image description here

On Windows, its best to use a Python IDE that supports UTF-8 instead of the console when working with multiple languages. If only using one language, select it as the system locale in the Region and Language control panel and the console will support the characters of that language.

Update for Python 3.6

Python 3.6 now uses Windows Unicode APIs to write directly to the console, so the only limit is the console font's support of the characters. The following code works in a US Windows console. I have a Chinese language pack installed, it even displays the Chinese and Japanese if the console font is changed. Even without the correct font, replacement characters are shown in the console. Cut-n-paste to an environment such as this web page will display the characters correctly.

#!python3.6
#coding: utf8
czech = 'Leoš Janáček'
print(czech)

pl = 'Zdzisław Beksiński'
print(pl)

jp = 'リング 山村 貞子'
print(jp)

chinese = '五行'
print(chinese)

MIR = 'Машина для Инженерных Расчётов'
print(MIR)

pt = 'Minha Língua Portuguesa: çáà'
print(pt)

Output:

Leoš Janáček
Zdzisław Beksiński
リング 山村 貞子
五行
Машина для Инженерных Расчётов
Minha Língua Portuguesa: çáà
Mark Tolonen
  • 132,868
  • 21
  • 152
  • 208
  • 3
    The Windows console can print arbitrary Unicode strings using `WriteConsoleW` (limited by font support and not handling non-BMP characters correctly, though). Python doesn't use that function; see https://bugs.python.org/issue1602 for some discussion. – Philipp May 30 '15 at 22:19
  • Python 3.6: you need to configure the console: right click on the top of the windows (of the cmd or the python IDLE), in default/font choose the "Lucida console". – JinSnow Jan 13 '17 at 20:52
  • 1
    @Guillaume That won't help for Chinese/Japanese. I installed the Chinese Language pack in Windows 10 and then new console fonts were available. The SimSun fonts looked good and supported all six of the languages above. – Mark Tolonen Jan 13 '17 at 23:36