I'm parsing an XML file which is too big to load into memory completely, so I am using an xml.etree.ElementTree.iterparse
to parse it.
The problem I'm having is that sometimes, when I retrieve an element from the iterator, I find that some information which is present in my XML file becomes ommitted by ElementTree. Is this expected behaviour?
An example
...
<car>
<engine>
<part name="pump"\>
<part name="ECU"\>
</engine>
</car>
...
Suppose I'm parsing the XML snippet above with an xml.etree.ElementTree.iterparse
iterator. In a given instance, the iterator gives me element elem
, which points to the XML car
element.
Then, I perform xml.etree.ElementTree.dump(elem)
to see how well elem
captures the actual XML data, and I get:
<car>
<engine>
<part name="pump"/>
<part/>
</engine>
<car>
Now, notice how the name of the second part
element was not captured. Why does this happen and how can I work around it?