1

I am attempting to get a table from a specific URL by it's ID. My method is getting the raw HTML from the URL, converting it into a readable DOM for PHP, and then finding the table via a query.

The results of the below code is $elements always being empty (length of 0).

<?php
    $c = curl_init('http://www.urlhere.com/');
    curl_setopt($c, CURLOPT_RETURNTRANSFER, true);

    $html = curl_exec($c);

    if (curl_error($c))
        die(curl_error($c));

    curl_close($c);

    $dom = new DOMDocument();
    $dom->loadHTML($html);

    $xpath = new DOMXpath($dom);

    $elements = $xpath->query("*/table[@id=anyid]");

    if (!is_null($elements)) {
        foreach ($elements as $element) {
            echo "<br/>[". $element->nodeName. "]";

            $nodes = $element->childNodes;
            foreach ($nodes as $node) {
                echo $node->nodeValue. "\n";
            }
        }
    }
?>

How can I render this table successfully on my page?


EDIT:

A snippet of the HTML I am trying to get, taken directly from the $html variable:

<div></div><table class=sortable id=anyid></table>

Kevin
  • 40,904
  • 12
  • 48
  • 67
Fizzix
  • 20,849
  • 34
  • 100
  • 160
  • add quotations `@id='anyid'` and make sure that is indeed loaded initially, if its created thru JS, then you can't get it with that. – Kevin Nov 16 '15 at 04:26
  • @Ghost - I have made sure that it is loaded initially since it exists within the `$html` variable. I tried adding the quotations, although that did not work either. – Fizzix Nov 16 '15 at 04:30
  • not much to go on, if `$html` is indeed an html markup string, should [work](http://codepad.viper-7.com/oV2zmS) – Kevin Nov 16 '15 at 04:32
  • @Ghost - Yes you're right, your example works perfectly. Not too sure what's going wrong with mine then. Could it possibly be something to do with curl executing asynchronously while the rest of the code is activated, hence `$html` still being empty while `$xpath->query()` is hit? – Fizzix Nov 16 '15 at 04:41
  • are you sure that `` is inside `$html`? have to tried examining `$html` first? trying other elements? (debugging, etc.) no, that simple curl operation you have doesn't execute asynchronously.
    – Kevin Nov 16 '15 at 04:52

1 Answers1

1

To continue on the comments, you could hide those errors first thru:

libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();

This discussion is thoroughly tacked here.

Then to apply it, just add it in your code:

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();

$xpath = new DOMXpath($dom);

$elements = $xpath->query("//table[@id='anyid']");

if (!is_null($elements)) {
  foreach ($elements as $element) {
    echo "<br/>[". $element->nodeName. "]";

    $nodes = $element->childNodes;
    foreach ($nodes as $node) {
      echo $node->nodeValue. "\n";
    }
  }
}

Sample Output

Community
  • 1
  • 1
Kevin
  • 40,904
  • 12
  • 48
  • 67
  • Looks good, thanks Ghost. Although the table seems to just render the text instead of the table as HTML. Do you happen to know how to render the table as HTML along with all `td`s and `tr`s? – Fizzix Nov 16 '15 at 05:08
  • 1
    Managed to solve it with - `$htmlString = $dom->saveHTML($elements->item(0));` – Fizzix Nov 16 '15 at 05:24