1

If a unicode character (code point) that is unsupported by Windows cmd, e.g. EN DASH "–" is printed with Python 3 in a Windows cmd terminal using:

print('\u2013')

Then an exception is raised:

UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 0: character maps to < undefined >

Is there a way to make print convert unsupported characters to e.g. "?", or otherwise handle the print to allow execution to continue ?

EquipDev
  • 3,807
  • 8
  • 27
  • 53
  • 1
    Use [win-unicode-concole](https://github.com/Drekin/win-unicode-console) to access the full range of the console font. A font such as Consolas or Courier New supports most characters in Western alphabets and typographic symbols. – Eryk Sun Mar 08 '16 at 09:57
  • [this answer supports all Unicode characters](http://stackoverflow.com/a/32176732/4279) – jfs Mar 08 '16 at 18:56

2 Answers2

4

Update

There is a better way... see below.


There must be a better way, but this is all I can think of at the moment:

print('\u2013'.encode(errors='replace').decode())

This uses encode() to encode the unicode string to whatever your default encoding is, "replacing" characters that are not valid for that encoding with ?. That converts the string to a bytes string, so that is then converted back to unicode, preserving the replaced characters.

Here is an example using a code point that is not valid in GBK encoding:

>>> s = 'abc\u3020def'
>>> print(s)
s.abc〠def
>>> s.encode(encoding='gbk')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character '\u3020' in position 3: illegal multibyte sequence

>>> s.encode(encoding='gbk', errors='replace')
b'abc?def'
>>> s.encode(encoding='gbk', errors='replace').decode()
'abc?def'

>>> print(s.encode(encoding='gbk', errors='replace').decode())
abc?def

Update

So there is a better way as mentioned by @eryksun in comments. Once set up there is no need to change any code to effect unsupported character replacement. The code below demonstrates before and after behaviour (I have set my preferred encoding to GBK):

>>> import os, sys
>>> print('\u3030')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character '\u3030' in position 0: illegal multibyte sequence

>>> old_stdout = sys.stdout
>>> fd = os.dup(sys.stdout.fileno())
>>> sys.stdout = open(fd, mode='w', errors='replace')
>>> old_stdout.close()

>>> print('\u3030')
?
mhawke
  • 75,264
  • 8
  • 92
  • 125
  • Neat. Do you know if there is a way to specify which character to use as the replacement (other than `?`)? – Nick Mar 08 '16 at 10:03
  • 2
    Or rebind `sys.stdout` to a new `io.TextIOWrapper` that uses the `replace` error handler, or set the environment variable `PYTHONIOENCODING=:replace`. – Eryk Sun Mar 08 '16 at 10:04
  • @eryksun: Thanks for that. I have added reopening of `sys.stdout` to the answer. – mhawke Mar 08 '16 at 11:14
  • The method with redirection of `sys.stdout` returns "û" when printing '\u2013'. – EquipDev Apr 19 '16 at 12:40
1

@eryksun comment mentions assigning Windows environment variable:

PYTHONIOENCODING=:replace

Note the ":" before "replace". This looks like a usable answer that does not require any changes in Python scripts using print.

The print('\u2013') results in:

?

and print('Hello\u2013world!') results in:

Hello?world!

EquipDev
  • 3,807
  • 8
  • 27
  • 53
  • 1
    if the purpose is to *display* unsupported (by OEM codepage) characters in Windows console when you could use `win-unicode-console` package (it doesn't require to change your Python script too). – jfs Mar 08 '16 at 18:59