Parsing HTML several tables DOM

Question

When preparing to do the following I found a lot of info that was not clear so I thought id ask this to see if someone could clear somethings up for me.

what exactly is the @ symbol doing to the following

 $domOb = new DOMDocument();
 $html  = @$domOb->loadHTMLFile('http:...');

This did remove an error and actually parse the data but is this a good practice solution. I have used this without the @ symbol and got expected results.

Given that I have several tables what is the best/simplist way to get all the <td> from lets say table 3. I was going to list all the <td> and then simply start and end with the value that correlates with the needed data

If looking to parse HTML via PHP I like the Idea of using DOM so when getting the file what should I use. loadHTMLFile() loadHTML()... can I still use Xpath?...If its very busy/badly marked up HTML does this matter?

Whats good practice for looking through the data

    $items = $domOb->getElementsByTagName('td');

    $k    = 0;
    $num  = $items->length;
    while ($k < $num)
    {
        echo $item_web = $items->item($k)->, '<br>';
        $k++;
    }

I found this which is good How do you parse and process HTML/XML in PHP? but its 2 years old so I thought id pose a few questions.

Just a small clip of the 3rd table... At first glance I noticed a space at the 3rd tag does this affect the results?

 <td>Parcel ID: <a href=... style=text-decoration:underline;><b>666666</b></a></td>
 <td>Name: Mr. help</td></tr><tr>
 <td >Parcel Address: 666 help RD&nbsp;</td>
 <td>Name2: Ms. help F</td></tr><tr><td>City: Helpover 66666</td>
 <td>Address: 6666 6TH AVE NE UNIT 333</td>

If you're familiar with jQuery, you'll probably love [phpQuery](https://code.google.com/p/phpquery/). I know that when I have to select deeply nested nodes and want developed child / parent / sibling relationships, I don't want to have to create a framework for it. — Ohgodwhy, Jun 25 '13 at 02:24
Stackoverflow works (best) by asking one question at a time. I therefore only answered the one about the error suppression operator. About the HTML Table Parsing we have some resources already that are related to DOM but I think we have no answer so far that offers a DOM based Table Model. However I'd say this requires an isolated question. For the rest: Xpath is explained, only having a question being 2 years old is no reason to ask it again. Just saying. — hakre, Jun 25 '13 at 03:07

score 0 · Accepted Answer · answered Jun 25 '13 at 03:00

what exactly is the @ symbol doing to the following

It's supposed to suppress errors, but this is not the right way to do it on DomDocument and related extensions. The correct way is calling libxml_use_internal_errors(true); before loading the malformed HTML.

can I still use Xpath?.

Yes:

$xpath = new DomXPath($domOb);
$tds = $xpath->query('//td');

I noticed a space at the 3rd tag does this affect the results?

Entities are converted when you access the textContent property from your TD nodes.

score 0 · Answer 2 · edited May 23 '17 at 12:30

This [@ error control operator] did remove an error and actually parse the data but is this a good practice solution. I have used this without the @ symbol and got expected results.

It does not remove the error, it ignores it, e.g. the error-level will be set to 0 and if display-errors is switched on, that error will not be displayed. But it is still there and will still be handled if an error handler is used.

As you can imagine, this is not good practice to have. Avoid it, and if you see code having it, rest assured that it is of lower quality. See as well:

Suppress error with @ operator in PHP

Parsing HTML several tables DOM

2 Answers2