How to print ê and other special characters available in ascii in Python for windows

Question

I would like to print an ê in Python for windows. When I am at the DOS prompt I can type alt+136 to get an ê, however when I try to do this in python for DOS (code page cp437 or after chcp 1252 to cp1252) I can't type alt+136 to get the ê character. Why is this?

print(chr(136)) correctly prints ê under code page cp437, but how can I open a unicode file with these characters:

Sokalâ€™, Lâ€™vivsâ€™ka Oblastâ€
BucureÅŸti, Romania
à¸‡'âŒ£'

and get it to print those characters instead of the below gobbledygook:

>>> import codecs
>>> f = codecs.open("unicode.txt", "r", "utf-8")
>>> f.read()
u"Sokal\xe2\u20ac\u2122, L\xe2\u20ac\u2122vivs\xe2\u20ac\u2122ka Oblast\xe2\u20ac\nBucure\xc5\u0178ti, Romania\n\xe0\xb8\u2021'\
xe2\u0152\xa3'\nThis text should be in \xe2\u20ac\u0153quotes\xe2\u20ac\\x9d.\nBroken text&hellip; it&#x2019;s ?ubberi?c!"

or even worse:

>>> f = codecs.open("unicode.txt", "r", "utf-8")
>>> print(f.read())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 6-7: character maps to <undefined>

The following

import codecs
f = codecs.open("unicode.txt", "r", "utf-8")
s = f.read()
print(s.encode('utf8'))

prints

Sokal├óΓé¼Γäó, L├óΓé¼Γäóvivs├óΓé¼Γäóka Oblast├óΓé¼
Bucure├à┼╕ti, Romania
├á┬╕ΓÇí'├ó┼Æ┬ú'
This text should be in ├óΓé¼┼ôquotes├óΓé¼\x9d.
Broken text&hellip; it&#x2019;s ?ubberi?c!

instead of

Sokalâ€™, Lâ€™vivsâ€™ka Oblastâ€
BucureÅŸti, Romania
à¸‡'âŒ£'

I'm using:

Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win32

Is there some way of replacing the ê, etc. in the unicode string to rather be the printable ascii version of ê aka chr(136)?

Note that my question relates to how I can create a new non-Unicode extended ascii string based on the original UTF-8 unicode that will change the non-printable characters to characters in the ascii code page if there are equivalent characters available, or to replace the character with a ? or something similar if an equivalent is available.

Your character is not ascii... You may read about that here : http://effbot.org/pyfaq/what-does-unicodeerror-ascii-decoding-encoding-error-ordinal-not-in-range-128-mean.htm — Richard, Sep 22 '15 at 23:54
That's exactly the question. How can I change the unicode characters to characters that are printable in ascii? So replace \xe2 with â in some way. — Superdooperhero, Sep 23 '15 at 00:03
unicode(s, "ascii") gives TypeError: decoding Unicode is not supported, so Richard's comment doesn't help — Superdooperhero, Sep 23 '15 at 00:33
possible duplicate of [Python, Unicode, and the Windows console](http://stackoverflow.com/questions/5419/python-unicode-and-the-windows-console) — roeland, Sep 23 '15 at 01:36
the last paragraph (recent edit) has many misconceptions. There are two string type in Python 2: `str` and `unicode`. What is your input? (if you don't know; run `type(your_input_string)`). If the input is a file then what its text character encoding (if you think it is not necessary to know the character encoding; read [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](http://www.joelonsoftware.com/articles/Unicode.html)). What is the desired result? (another file or you just want to display them in the console) — jfs, Sep 23 '15 at 22:52

score 1 · Accepted Answer · edited May 23 '17 at 12:29

I see multiple questions, you've stumbled upon several common Unicode issues:

how to type ê? -- Alt+136 should work for cp437. Try Alt+234 for cp1252 (not tested):

>>> u'ê'.encode('cp437')
b'\x88'
>>> int('88', 16)
136
>>> u'ê'.encode('cp1252')
b'\xea'
>>> int('ea', 16)
234

how to print Unicode to Windows console in Python? How to fix UnicodeEncodeError: 'charmap' ... exception? -- follow the link
why do you get mojibake if you read text from a file? -- don't print bytes from the file, convert to Unicode first: io.open('unicode.txt', encoding=encoding).read()
why does Python console display u'\u20ac' instead of €? And in reverse, how to display ê Unicode character using only ascii printable characters e.g., u'\xea'? -- Python REPL uses sys.displayhook() (customizable) function to display the result of Python expression. It calls repr() e.g.:
```
>>> print u'ê'
ê
>>> print repr(u'ê')
u'\xea'
>>> u'ê'
u'\xea'
```
u'\xea' is a text representation of the corresponding Unicode string. You can use it as a Unicode string literal, to create the string in Python source code.

It might not be necessary in your case but in general to input/display arbitrary Unicode characters in Windows console, you could install win-unicode-console package.

Unrelated: print(chr(136)) is incorrect. It will produce wrong output if the environment uses an incompatible to yours character encoding e.g.:

>>> print chr(136)
�

Print Unicode instead:

>>> print unichr(234)
ê

The reason is that chr() returns a bytestring on Python 2. The same byte may represent different characters in different character encodings that is why you should always use Unicode if you work with text.

score -1 · Answer 2 · answered Sep 22 '15 at 23:54

-1

You decoded it from utf8 when you read it, so you need to encode it when you write it (back to utf8, or to some other codec)

import codecs
f = codecs.open("unicode.txt", "r", "utf-8")
s = f.read()
print(s.encode('utf8'))

answered Sep 22 '15 at 23:54

Chad S.

5,000
11
23

How to print ê and other special characters available in ascii in Python for windows

2 Answers2

Linked