Questions tagged [iterparse]

iterparse is used by XML parsers for tracking changes to the tree while it is being built

This tag is used in an XML parsing code. Usually iterparse builds a tree when parsing the XML. Also you can safely rearrange or remove parts of the tree while parsing.

See also:

72 questions
1
vote
1 answer

Is there a way to skip nodes/elements with iterparse lxml?

Is there a way using lxml iterparse to skip an element without checking the tag? Take this xml for example: text1 text2 text3 text4
Dan
  • 739
  • 5
  • 16
1
vote
1 answer

Python lxml iterparse sort by attribute large xml file

I have a large XML file which i'm trying to order the icons on for each programme, i want to order the icons descending by the value in the width attribute, i've managed to delete certain icons which are not needed but i'm unsure how i can order the…
Jamie B
  • 21
  • 2
1
vote
2 answers

Parsing incrementally a large wikipedia dump XML file using python

The goal is to read all … stuff from a Wikipedia DUMP (70Gb file). This is not possible to load in memory, therefore I tried to parse the file incrementally and get some values from it. However the script I just wrote does not print anything and…
Captain Nemo
  • 315
  • 1
  • 12
1
vote
2 answers

iterparse elements getting cleared before I can capture the data

I'm trying to use Python to parse a large XML file (27GB) using cElementTree and iterparse. I'm able to extract all the tags, but for some reason none of the element text is being retrieved (its always showing 'None'). I've checked the documentation…
1
vote
1 answer

python lxml iterparse() is skipping first event

I am using iterparse() from python lxml to parse through a large XML file and get relevant data. This works perfectly fine, except for the first time an event occurs. The data for the first node is not captured. The same thing happens for when I…
kratzlos
  • 35
  • 6
1
vote
1 answer

How to write with iterparse?

I am trying to loop through an XML document, find some tags, combine them into one new one and then write back to the xml doc using the ElementTree module in python. I have the code to the point where I believe it would work, but when i get to the…
Sam L
  • 142
  • 7
1
vote
2 answers

Python tree.iterparse export source XML of selected element including all descendants

Python 3.4, parsing GB++ size XML Wikipedia dump files using etree.iterparse. I want to test within the current matched element for its value, depending on the latter value I then want export the source XML of the whole object and…
mwra
  • 187
  • 1
  • 10
1
vote
0 answers

How to write ElementTree generated by iterparse into an xml file

Please, Note: Novice user of Python. Hi, I am working with more than 1Gb of XML file. Using Python2.7. Initially, I was using 'iter' to parse the XML. It worked fine with small files but with file such big I was getting a memory error. Then, I read…
rapport89
  • 107
  • 1
  • 14
1
vote
2 answers

Converting GraphML file to another

Hi I have a simple graphML file and I would like to remove the node tag from the GraphML and save it in another GraphML file. The GraphML size is 3GB below given is the sample. Input File :
arjun045
  • 93
  • 10
1
vote
1 answer

Modify large xml file using lxml

Language :- Python 2.7.6 File Size :- 1.5 GB XML Format 876543 ABC .... 876567 DEF .... …
Yogesh Yadav
  • 3,441
  • 4
  • 25
  • 36
1
vote
2 answers

lxml.etree iterparse() and parsing element completely

I have an XML file with nodes that looks like this: 41.3681107 3.9598 I am using lxml.etree.iterparse() to iteratively parse…
Andreas
  • 73
  • 8
1
vote
0 answers

how to skip malformed packet when using lxml's iterparse?

I have some very huge xml files (>50G) converted from wireshark. When using iterparse to extract information from these files, I found there are some malformed packets that cause the iterparse report error which says: for event, elem in context: …
cskathy
  • 21
  • 1
1
vote
1 answer

XML parser using iterparse 'loses' children

I appreciate your help on the following: I need to read a large XML file and convert it to CSV. I have two functions that are suppose to do the same, only that one (function1) uses iterparse (because I need to process about 2GB files) and another…
1
vote
2 answers

Why is elementtree.ElementTree.iterparse using so much memory?

I am using elementtree.ElementTree.iterparse to parse a large (371 MB) xml file. My code is basically this: outf = open('out.txt', 'w') context = iterparse('copyright.xml') context = iter(context) dummy, root = context.next() for event, elem in…
russell
  • 230
  • 1
  • 10
0
votes
1 answer

Why does ElementTree.iterparse sometimes retrieve XML elements incompletely?

I'm parsing an XML file which is too big to load into memory completely, so I am using an xml.etree.ElementTree.iterparse to parse it. The problem I'm having is that sometimes, when I retrieve an element from the iterator, I find that some…
Severo Raz
  • 175
  • 11