2

I used to think I had this whole encoding stuff pretty figured out. I seem to be wrong because I can't explain what's happening here.

What I was trying to do is to use the tabulate module to print a nicely formatted table using

from tabulate import tabulate
s = tabulate([[1,2],[3,4]], ["x","y"], tablefmt="fancy_grid")
print(s)

in IPython 3.5.0's interactive console under Windows 10. I expected the result to be

╒═════╤═════╕
│   x │   y │
╞═════╪═════╡
│   1 │   2 │
├─────┼─────┤
│   3 │   4 │
╘═════╧═════╛

but instead, I got a

UnicodeEncodeError: 'charmap' codec can't encode character '\u2552' in position 0: character maps to <undefined>

Puzzled, I tried to find out where the problem was and looked at the repr of the string:

In [15]: s
Out[15]: '╒═════╤═════╕\n│   x │   y │\n╞═════╪═════╡\n│   1 │   2 │\n├─────┼─────┤\n│   3 │   4 │\n╘═════╧═════╛'

Hmm, all the characters can be displayed by the terminal (even the first one that triggered the error).

Just checking some details:

In [16]: sys.stdout.encoding
Out[16]: 'cp850'

In [17]: s.encode("cp850")
[...]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2552' in position 0: character maps to <undefined>

So which encoding is the terminal using? Python says that it's cp850, and it tells me that cp850 doesn't have a character (which is true, it's one of the characters from cp437 that had to make room for accented letters), but I can see it in the terminal window!

To complicate things further, when using the native Python console instead of IPython, the error seems more understandable:

>>> s
'\u2552═══\u2564═══\u2555\n│ 1 │ 2 │\n├───┼───┤\n│ 3 │ 4 │\n\u2558═══\u2567═══\u255b'
>>> sys.stdout.encoding
'cp850'
>>> print(s)
Traceback (most recent call last):
[...]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2552' in position 0: character maps to <undefined>

So at least Python is consistent, but what's happening with IPython?

Tim Pietzcker
  • 297,146
  • 54
  • 452
  • 522
  • If you're seeing the cp437 characters, but Python is saying cp850, then Python is the one that's inconsistent. Find out what the console is actually set to (see for example [What encoding/code page is cmd.exe using](http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using)). – Thomas Dickey Nov 27 '15 at 15:49
  • @ThomasDickey: I'm seeing *both* - a string like `'╒Í'` that contains characters unique to both charsets (`╒` only in `cp437` and `Í` only in `cp850`) is displayed correctly... – Tim Pietzcker Nov 27 '15 at 18:26
  • I can repeat this result in IPython. The `repr` in IPython should be the same as the `repr` running straight Python, but it's not. – Mark Tolonen Nov 28 '15 at 01:03
  • The Windows console is actually capable of displaying a wider range of characters using the unicode APIs, but `print()` and `sys.stdout` still use the bytes API, which can only deal with character in the active code page. The [win_unicode_console](https://pypi.python.org/pypi/win_unicode_console) package tries to get round this. I guess IPython is finding a different default encoding to use when you display `s` - try importing `IPython.utils.encoding.DEFAULT_ENCODING` to see what it has found. – Thomas K Nov 28 '15 at 14:33

1 Answers1

1

IPython uses OEM code page in the interactive mode like any other Python console program:

In [1]: '\u2552'
ERROR - failed to write data to stream: <_io.TextIOWrapper name='<stdout>' mode=
'w' encoding='cp850'>
Out[1]:

In [2]: !chcp
Active code page: 850

The result changes if pyreadline is installed (it enables colors in the IPython console among other things):

In [1]: '\u2552'
Out[1]: '╒'

In [2]: import sys

In [3]: sys.stdout.encoding
Out[3]: 'cp850'

In [4]: !chcp
Active code page: 850

Once pyreadline has been installed, IPython's sys.displayhook writes the result to readline's console object that uses WriteConsoleW() Windows Unicode API that allows to print even unencodable in the current code page Unicode characters (to see them, you might need to configure a (TrueType) font such as Lucida Console in the Windows console).

jfs
  • 346,887
  • 152
  • 868
  • 1,518