1

I happened to fail to set character encoding in Python terminal on Windows. According to official guide, it's a piece of cake:

# -*- coding: utf-8 -*-

Ok, now testing:

print 'Русский'

Produces piece of mojibake. What am doing wrong?

P.S. IDE is Visual Studio 2010, if it matters

Arnthor
  • 2,353
  • 5
  • 31
  • 53

4 Answers4

3

You should use unicode:

print u'Русский'

or switch to python3 (unicode by default).

JBernardo
  • 28,886
  • 10
  • 78
  • 103
  • 1
    `from __future__ import unicode_literals` enables Unicode literals on Python 2 too. Printing Unicode won't work by default if console chcp can't represent given Unicode characters or if the output is redirected (Python 2 uses `ascii` in this case). See [possible solutions](http://stackoverflow.com/a/29352343/4279). – jfs Dec 20 '15 at 11:44
2

It produces mojibake because '' is a bytestring literal in Python 2 (unless from __future__ import unicode_literals is used). You are printing utf-8 bytes (the source code encoding) to Windows console that uses some other character encoding (the encoding is different if you see mojibake):

>>> print(u'Русский'.encode('utf-8').decode('cp866'))
╨а╤Г╤Б╤Б╨║╨╕╨╣

The solution is to print Unicode instead as @JBernardo suggested:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
print(u'Русский')

It works if the console encoding supports Cyrillic letters e.g., if it is cp866.

If you want to redirect the output to a file; you could use PYTHONIOENCODING environment variable to set the character encoding used by Python for I/O:

Z:\> set PYTHONIOENCODING=utf-8
Z:\> python your_script.py > output.utf-8.txt

If you want to print Unicode characters that can't be represented using the console encoding (OEM code page) then you could install win-unicode-console Python package:

Z:\> py -m pip install win_unicode_console
Z:\> py -m run your_script.py
Community
  • 1
  • 1
jfs
  • 346,887
  • 152
  • 868
  • 1,518
2

Update: See J.F. Sebastian's answer for a better explanation and a better solution.

# -*- coding: utf-8 -*- sets the source file's encoding, not the output encoding.

You have to encode the string just before printing it with the exact same encoding that your terminal is using. In your case, I'm guessing that your code page is Cyrillic (cp866). Therefore,

print 'Русский'.encode("cp866")
Community
  • 1
  • 1
Eser Aygün
  • 6,566
  • 1
  • 17
  • 26
  • the code fails with *"UnicodeDecodeError: 'ascii' codec can't decode byte"* -- you forgot `u''` prefix to create a Unicode string. You should not hardcode the character encoding of your environment in your script. The environment may change. [Print Unicode instead](http://stackoverflow.com/a/29352343/4279) – jfs Mar 30 '15 at 17:23
  • Hm. I just tested it and it turns out that you are right. It's been quite a while since I wrote this answer, so maybe somethings have changed? In any case, I'm updating the answer by redirecting it to yours. – Eser Aygün Dec 20 '15 at 11:12
0

In case anyone else gets this page when searching easiest is to set the windows terminal code page

CHCP 65001

or for power shell start it with

powershell.exe -NoExit /c "chcp.com 65001"

from Is there a Windows command shell that will display Unicode characters?

Community
  • 1
  • 1
lxx
  • 1,224
  • 20
  • 27
  • In general, [`65001 != utf-8`](http://stackoverflow.com/questions/5419/python-unicode-and-the-windows-console#comment10878950_2013263). Though it might work in some cases. – jfs Mar 30 '15 at 17:24
  • In general, [changing `chcp` encoding in PowerShell is neither necessary nor sufficient.](http://stackoverflow.com/a/33959798/4279) – jfs Dec 20 '15 at 11:35