I would like to print an ê in Python for windows. When I am at the DOS prompt I can type alt+136 to get an ê, however when I try to do this in python for DOS (code page cp437 or after chcp 1252 to cp1252) I can't type alt+136 to get the ê character. Why is this?
print(chr(136))
correctly prints ê under code page cp437, but how can I open a unicode file with these characters:
Sokal’, L’vivs’ka Oblastâ€
BucureÅŸti, Romania
ง'⌣'
and get it to print those characters instead of the below gobbledygook:
>>> import codecs
>>> f = codecs.open("unicode.txt", "r", "utf-8")
>>> f.read()
u"Sokal\xe2\u20ac\u2122, L\xe2\u20ac\u2122vivs\xe2\u20ac\u2122ka Oblast\xe2\u20ac\nBucure\xc5\u0178ti, Romania\n\xe0\xb8\u2021'\
xe2\u0152\xa3'\nThis text should be in \xe2\u20ac\u0153quotes\xe2\u20ac\\x9d.\nBroken text… it’s ?ubberi?c!"
or even worse:
>>> f = codecs.open("unicode.txt", "r", "utf-8")
>>> print(f.read())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 6-7: character maps to <undefined>
The following
import codecs
f = codecs.open("unicode.txt", "r", "utf-8")
s = f.read()
print(s.encode('utf8'))
prints
Sokal’, L’vivs’ka Oblastâ€
BucureÅŸti, Romania
ง'⌣'
This text should be in “quotesâ€\x9d.
Broken text… it’s ?ubberi?c!
instead of
Sokal’, L’vivs’ka Oblastâ€
BucureÅŸti, Romania
ง'⌣'
I'm using:
Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win32
Is there some way of replacing the ê, etc. in the unicode string to rather be the printable ascii version of ê aka chr(136)
?
Note that my question relates to how I can create a new non-Unicode extended ascii string based on the original UTF-8 unicode that will change the non-printable characters to characters in the ascii code page if there are equivalent characters available, or to replace the character with a ? or something similar if an equivalent is available.