1

I am running Windows 7 and its console has been configured to use Consolas font, which gives me a possibility of Unicode output. The ability to read Unicode in console has been proved by me many times for programs such as Far Manager: both Cyrillics and German äöü letters can be read on the same console in the same string without encoding switching.

Now about Python.

I am trying very hard, but can't see Unicode in it's output. By default print(sys.stdout.encoding) prints cp866 and stdout is unable to output any characters except ASCII and Cyrillics.

It gives me following results:

print("Ля-ля äöüÄÖÜß")

UnicodeEncodeError: 'charmap' codec can't encode characters in position 6-12: character maps to <undefined>

print("Ля-ля äöüÄÖÜß".encode("utf-8"))

b'\xd0\x9b\xd1\x8f-\xd0\xbb\xd1\x8f \xc3\xa4\xc3\xb6\xc3\xbc\xc3\x84\xc3\x96\xc3\x9c\xc3\x9f'

Ok, I've set the PYTHONIOENCODING environment variable in batch file:

SET PYTHONIOENCODING=UTF-8

and got:

print(sys.stdout.encoding)
UTF-8

print("Ля-ля äöüÄÖÜß")
╨Ы╤П-╨╗╤П ├д├╢├╝├Д├Ц├Ь├Я

print("Ля-ля äöüÄÖÜß".encode("utf-8"))`
b'\xd0\x9b\xd1\x8f-\xd0\xbb\xd1\x8f \xc3\xa4\xc3\xb6\xc3\xbc\xc3\x84\xc3\x96\xc3\x9c\xc3\x9f'

What to do?

Paul
  • 23,702
  • 36
  • 106
  • 215
  • The Windows console is notoriously difficult to print higher-codepoint unicode values to. – Martijn Pieters Jul 27 '13 at 16:28
  • @Martijn Pieters: I am not sure what are you calling "higher codepoint" values. I need at least Russian and German, and Windows console proved that it CAN do it. – Paul Jul 27 '13 at 16:44
  • The problem is that your console codepage needs to be switched, but the only codepage that Microsoft offers is cp65001; their idea of UTF-8, which is rather full of bugs. See http://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/ for example. – Martijn Pieters Jul 27 '13 at 17:07
  • I used the term 'higher-codepoint unicode values' to differentiate from ASCII and latin 1 codepoints; a surprising number of people do not count those as Unicode or some reason. – Martijn Pieters Jul 27 '13 at 17:08

1 Answers1

4

Actually, there's a kinda bug in interaction between Python and Windows console (see http://bugs.python.org/issue1602). It is possible to read and write Unicode in Windows console using C functions ReadConsoleW, WriteConsoleW instead of ReadConsole and WriteConsole. So one seems-to-be-working solution is to write your own stdout and stdin object, calling ReadConsoleW, WriteConsoleW via ctypes. For output this works, but for input there's a problem that Python interactive interpreter actually doesn't use sys.stdin for getting input (but calling input() function works) – see http://bugs.python.org/issue17620.

Many people say that there's a problem with Windows console. But you can actually type Unicode characters (if you have proper keyboard layout) with no problem. These are displayed with no problem. You can even run file called “∫.py” with some Unicode arguments and it is correctly run and arguments are correclty waiting in sys.argv strings.

Update: I have built a Python package to deal with these issues. See https://github.com/Drekin/win-unicode-console and https://pypi.python.org/pypi/win_unicode_console. Install by pip install win_unicode_console. It works at least for me on Python 3.4, Python 3.5, and Python 2.7.

user87690
  • 667
  • 3
  • 22