-1

I am getting the error that Python can't decode character \u2002 when trying to print a block of text:

UnicodeEncodeError: 'charmap' codec can't encode character '\u2002' in position 355: character maps to <undefined>

What i don't understand is that, as far as i can tell, this is a unicode character (i.e. the EN SPACE character), so not sure why not printing.

For reference, the content was read in using file_content = open (file_name, encoding="utf8")

Saeid
  • 3,911
  • 7
  • 25
  • 41
kyrenia
  • 4,495
  • 7
  • 53
  • 82

2 Answers2

1

Works for me! (on a linux teminal)

>>> print("\u2002")                                                   

It's an invisible as it's EN_SPACE

If you are on windows however you are likely using codepage 125X in your terminal and...

>>> "\u2002".encode("cp1250")        
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.4/encodings/cp1250.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character '\u2002' in position 0: character maps to <undefined>
John La Rooy
  • 263,347
  • 47
  • 334
  • 476
  • thanks - maybe this is a "follow on" question, so i apologies, but what is the easiest way of avoiding - I just switched from Python 2, and used to use reload sys and set encoding as utf8 [had just assumed unicode and utf8 were the same] – kyrenia Nov 24 '15 at 22:59
  • @kyrenia, it's just a deficiency of the default windows terminal. you can try changing the default codepage http://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8 – John La Rooy Nov 24 '15 at 23:02
  • @JohnLaRooy the windows terminal (command window) doesn't use the normal code page, it uses a legacy code page such as [cp437](https://en.wikipedia.org/wiki/Code_page_437). You can see which encoding Python is using with `sys.stdout.encoding`. – Mark Ransom Nov 24 '15 at 23:14
  • 1
    @kyrenia, The reload sys trick was never a good idea. See [Why sys.setdefaultencoding() will break code](https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/). The easiest way of avoid the error is to use an IDE that supports UTF-8 output instead of the Windows console or try the [win-unicode-console](https://github.com/Drekin/win-unicode-console) package. – Mark Tolonen Nov 25 '15 at 02:18
1

There is no problem using that character in Unicode (as a unicode string in Python). But when you write it out ("print it") it needs to be encoded into an encoding. Some encodings don't support some characters. The encoding you are using to print does not support that particular character.

Probably you are using the Windows console which typically uses a codepage like 850 or 437 which doesn't include this character.

There are ways to change the Windows console codepage (chcp) or you could try in Idle or some other IDE

strubbly
  • 3,092
  • 3
  • 21
  • 32