0

I need to parse a ginormous data source (14.9M lines of XML, 1.7GB).

I am having problems working with XMLReader to do this. I haven't before needed anything but SimpleXML, but given that I really can't load this whopper into memory I will need to do this via stream.

I have written this code:

<?php

$xml = new XMLReader(); 
$xml->open('public.xml'); 


while($xml->read())
{
    echo '.';
}
$xml->close();
?>

But am having issues with execution. Namely, I get "Fatal error: Maximum execution time of 30 seconds exceeded..."

When I do set_time_limit(600) the browser just crashes.

It is crashing because it can't handle the number of "." created?

What do you recommend here? Ultimately, I need this XML file into a relational database. I am testing feasibility before I get into the detail of schema.

MG55114
  • 27
  • 1
  • 5
  • You can set the time limit to 0, and the memory limit to be very high, but I would probably go with my own parser on something like this if the format is predictable. – datasage Feb 11 '13 at 22:45

2 Answers2

1

It is crashing because it can't handle the number of "." created?

To test this simply try it without echo '.';.
As you need a lot of RAM for this increase the maximal memory a script can use. Eventually split the XML File in smaller parts and process them sequentially.

Eventually look at:

Community
  • 1
  • 1
Tom
  • 126
  • 9
  • 1
    Excellent answer Tom. I'd like add another alternative, which is to split up the XML data into several AJAX calls to the same script, and add some post or get data indicating what row or element offset it's currently processing at and the amount of data to process for each call, so the reader can "skip" ahead. Just add some recursive func call on the `success` function and there you go. – ShadowScripter Feb 25 '13 at 18:24
0

You should also extend the Memory Limit for PHP.

powtac
  • 37,821
  • 25
  • 107
  • 164