Update
There is a better way... see below.
There must be a better way, but this is all I can think of at the moment:
print('\u2013'.encode(errors='replace').decode())
This uses encode()
to encode the unicode string to whatever your default encoding is, "replacing" characters that are not valid for that encoding with ?
. That converts the string to a bytes
string, so that is then converted back to unicode, preserving the replaced characters.
Here is an example using a code point that is not valid in GBK encoding:
>>> s = 'abc\u3020def'
>>> print(s)
s.abc〠def
>>> s.encode(encoding='gbk')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character '\u3020' in position 3: illegal multibyte sequence
>>> s.encode(encoding='gbk', errors='replace')
b'abc?def'
>>> s.encode(encoding='gbk', errors='replace').decode()
'abc?def'
>>> print(s.encode(encoding='gbk', errors='replace').decode())
abc?def
Update
So there is a better way as mentioned by @eryksun in comments. Once set up there is no need to change any code to effect unsupported character replacement. The code below demonstrates before and after behaviour (I have set my preferred encoding to GBK):
>>> import os, sys
>>> print('\u3030')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character '\u3030' in position 0: illegal multibyte sequence
>>> old_stdout = sys.stdout
>>> fd = os.dup(sys.stdout.fileno())
>>> sys.stdout = open(fd, mode='w', errors='replace')
>>> old_stdout.close()
>>> print('\u3030')
?