0

I'm trying to parse a moderately large XML file (6mb) in php using simpleXML. The script takes each record from the XML file, checks to see if it's already been imported, and, if it hasn't, updates/inserts that record into my own db.

The problem is I'm constantly getting a Fatal error about exceeding memory allocation:

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 256 bytes) in /.../system/database/drivers/mysql/mysql_result.php on line 162

I avoided that error by using the following line to increase max memory allocation (following tip from here):

ini_set('memory_limit', '-1');

However, then I run up against the max execution time of 60 seconds, and, for whatever reason, my server (XAMPP on Mac OS X) won't let me increase that time (script simply won't run if I try to include a line like:)

set_time_limit(240);

This all seems very inefficient, however; shouldn't I be able to break the file up some how and process it sequentially? In the controller below I have a count variable ($cycle) to keep track of what record I'm on but I can't figure out how to implement it that it still doesn't have to process the whole XML file.

The controller (I'm using CodeIgniter) has this basic structure:

    $f = base_url().'data/data.xml';
    if($data = file_get_contents($f))
    {
        $cycle = 0;
        $xml = new SimpleXMLElement($data);
        foreach($xml->person as $p)
        {

        //this makes a single call to db for single field based on id of record in XML file                
        if($this->_notImported('source',$p['id']))
            {
               //various process here, mainly breaking up the data for inserting into four different bales
            }
            $cycle++;
        }
    }

Any thoughts?

Edited

To shed further light on what I'm doing, I'm grabbing most of the attributes of each element and subeelement and inserting them into my db. For example, using my old code, I have something like this:

$insert = array('indiv_name' => $p['fullname'],
                                    'indiv_first' => ($p['firstname']),
                                    'indiv_last' => ($p['lastname']),
                                    'indiv_middle' => ($p['middlename']),
                                    'indiv_other' => ($p['namemod']),
                                    'indiv_full_name' => $full_name,
                                    'indiv_title' => ($p['title']),
                                    'indiv_dob' => ($p['birthday']),
                                    'indiv_gender' => ($p['gender']),
                                    'indiv_religion' => ($p['religion']),
                                    'indiv_url' => ($url)
                                    );

With the suggestions of using XMLReader (see below), how could I accomplish parsing the attributes of both the main element and subelements?

Community
  • 1
  • 1
tchaymore
  • 3,508
  • 12
  • 44
  • 84

3 Answers3

6

Use XMLReader.

Say your document is like this:

<test>
   <hello>world</hello>
   <foo>bar</foo>
</test>

With XMLReader:

$xml = new XMLReader;
$xml->open('doc.xml');

$xml->read();
while ($xml->read()) {
        if ($xml->nodeType == XMLReader::ELEMENT) {
                print $xml->name.': ';
        } else if ($xml->nodeType == XMLReader::TEXT) {
                print $xml->value.PHP_EOL;
        }
}

This outputs:

hello: world
foo: bar

The nice thing is that you can also use expand to fetch the node as a DOMNode object.

netcoder
  • 61,842
  • 17
  • 117
  • 139
  • 1
    Thanks -- that answer is really helpful. But, how do I access the attributes of subelements? Each element has a variable number of subelements, and I need to grab the attributes of each one. – tchaymore Nov 08 '10 at 22:36
  • 2
    There are numerous way you can do this. The easiest being `getAttribute('attr_name')`. You can also use `moveToNextAttribute` or `DOMNode::$attributes` after you `expand`. However, I really think the first option is the way to go. ;) – netcoder Nov 08 '10 at 23:47
  • Thanks! When I use getAttribute, it returns every instance of that attribute at once. For instance, some of the elements have 10 subelements each with a "startdate" attribute, and using getAttribute returns all ten dates at one. But I only need to access them one at a time. How I can process them one at a time? – tchaymore Nov 09 '10 at 00:16
  • I'm not sure what to tell you. `getAttribute` returns only one attribute at a time. It might be a problem a loop or something. Might I suggest you create a new question with your updated code. :) – netcoder Nov 09 '10 at 00:32
  • Done (http://stackoverflow.com/questions/4129565/how-to-use-xmlreader-to-parse-multiple-identically-named-attributes-of-xml-eleme). Thanks again for all the help. – tchaymore Nov 09 '10 at 01:04
4

It sounds like the problem is you are reading the whole xml file into memory before trying to manipulate it. Use XMLReader to walk you way through the file stream instead of loading everything into memory for manipulation.

Justin Rassier
  • 888
  • 12
  • 26
1

How about instead of using xml, use json? The data will be much smaller in JSON format and I would imagine you won't run into the same memory issues because of that fact.

meder omuraliev
  • 171,706
  • 64
  • 370
  • 423