How to print unsupported unicode characters on Windows cmd as e.g. "?" instead of raising exception?

Question

If a unicode character (code point) that is unsupported by Windows cmd, e.g. EN DASH "–" is printed with Python 3 in a Windows cmd terminal using:

print('\u2013')

Then an exception is raised:

UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 0: character maps to < undefined >

Is there a way to make print convert unsupported characters to e.g. "?", or otherwise handle the print to allow execution to continue ?

Use [win-unicode-concole](https://github.com/Drekin/win-unicode-console) to access the full range of the console font. A font such as Consolas or Courier New supports most characters in Western alphabets and typographic symbols. — Eryk Sun, Mar 08 '16 at 09:57
[this answer supports all Unicode characters](http://stackoverflow.com/a/32176732/4279) — jfs, Mar 08 '16 at 18:56

mhawke · Accepted Answer · 2016-03-08T11:12:22.110

Update

There is a better way... see below.

There must be a better way, but this is all I can think of at the moment:

print('\u2013'.encode(errors='replace').decode())

This uses encode() to encode the unicode string to whatever your default encoding is, "replacing" characters that are not valid for that encoding with ?. That converts the string to a bytes string, so that is then converted back to unicode, preserving the replaced characters.

Here is an example using a code point that is not valid in GBK encoding:

>>> s = 'abc\u3020def'
>>> print(s)
s.abc〠def
>>> s.encode(encoding='gbk')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character '\u3020' in position 3: illegal multibyte sequence

>>> s.encode(encoding='gbk', errors='replace')
b'abc?def'
>>> s.encode(encoding='gbk', errors='replace').decode()
'abc?def'

>>> print(s.encode(encoding='gbk', errors='replace').decode())
abc?def

Update

So there is a better way as mentioned by @eryksun in comments. Once set up there is no need to change any code to effect unsupported character replacement. The code below demonstrates before and after behaviour (I have set my preferred encoding to GBK):

>>> import os, sys
>>> print('\u3030')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character '\u3030' in position 0: illegal multibyte sequence

>>> old_stdout = sys.stdout
>>> fd = os.dup(sys.stdout.fileno())
>>> sys.stdout = open(fd, mode='w', errors='replace')
>>> old_stdout.close()

>>> print('\u3030')
?

Neat. Do you know if there is a way to specify which character to use as the replacement (other than `?`)? — Nick, Mar 08 '16 at 10:03
Or rebind `sys.stdout` to a new `io.TextIOWrapper` that uses the `replace` error handler, or set the environment variable `PYTHONIOENCODING=:replace`. — Eryk Sun, Mar 08 '16 at 10:04
@eryksun: Thanks for that. I have added reopening of `sys.stdout` to the answer. — mhawke, Mar 08 '16 at 11:14
The method with redirection of `sys.stdout` returns "û" when printing '\u2013'. — EquipDev, Apr 19 '16 at 12:40

score 1 · Answer 2 · answered Mar 08 '16 at 10:24

1

@eryksun comment mentions assigning Windows environment variable:

PYTHONIOENCODING=:replace

Note the ":" before "replace". This looks like a usable answer that does not require any changes in Python scripts using print.

The print('\u2013') results in:

?

and print('Hello\u2013world!') results in:

Hello?world!

answered Mar 08 '16 at 10:24

EquipDev

3,807
8
27
53

1

if the purpose is to *display* unsupported (by OEM codepage) characters in Windows console when you could use `win-unicode-console` package (it doesn't require to change your Python script too). – jfs Mar 08 '16 at 18:59

How to print unsupported unicode characters on Windows cmd as e.g. "?" instead of raising exception?

2 Answers2