0

I'm processing the following xml file:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<tag>…</tag>

Just like python documentation says:

import xml.etree.cElementTree as ET

tree = ET.parse('file.xml')
print(tree.getroot().text)

But unfortunately I've got such an error:

Traceback (most recent call last):
  File "main.py", line 48, in <module>
    print(tree.getroot().text)
  File "C:\Python33\lib\encodings\cp852.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2026' in position 0: character maps to <undefined>

What am I doing so wrong?

Dejwi
  • 3,945
  • 10
  • 38
  • 64
  • 1
    Your console codec cannot handle the horizontal ellipsis character. Elementtree is doing it's job just fine and decoded the XML contents to a Unicode value just fine. `print()` however needs to *encode* the character again to match your console encoding, and your Windows codepage cannot handle this specific character. – Martijn Pieters Jan 14 '14 at 10:47

1 Answers1

2

Don't print the value. Processing it (what you more likely are about to do) will work just fine.

If you really want to print it, first convert the unicode string to something your output medium can handle (e. g. a UTF-8 encoded string). In case there are strange characters in there, you can use this to convert at least the rest:

byteString = value.encode(sys.stdout.encoding, 'ignore')
originalWithoutTrouble = byteString.decode(sys.stdout.encoding)
print(originalWithoutTrouble)

But of course, then some characters might be missing (in this case the ellipsis , as Martijn pointed out).

Alfe
  • 47,603
  • 17
  • 84
  • 139