133

I'm using PHP DOM and I'm trying to get an element within a DOM node that have a given class name. What's the best way to get that sub-element?

Update: I ended up using Mechanize for PHP which was much easier to work with.

bgcode
  • 24,347
  • 30
  • 92
  • 158
  • 2
    Reletated: [PHP dom to get tag class with multiple css class name](http://stackoverflow.com/questions/4835300/php-dom-to-get-tag-class-with-multiple-css-class-name) – hakre Jun 18 '11 at 09:33

7 Answers7

164

Update: Xpath version of *[@class~='my-class'] css selector

So after my comment below in response to hakre's comment, I got curious and looked into the code behind Zend_Dom_Query. It looks like the above selector is compiled to the following xpath (untested):

[contains(concat(' ', normalize-space(@class), ' '), ' my-class ')]

So the PHP would be:

$dom = new DomDocument();
$dom->load($filePath);
$finder = new DomXPath($dom);
$classname="my-class";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");

Basically, all we do here is normalize the class attribute so that even a single class is bounded by spaces, and the complete class list is bounded in spaces. Then append the class we are searching for with a space. This way we are effectively looking for and find only instances of my-class .


Use an xpath selector?

$dom = new DomDocument();
$dom->load($filePath);
$finder = new DomXPath($dom);
$classname="my-class";
$nodes = $finder->query("//*[contains(@class, '$classname')]");

If it is only ever one type of element you can replace the * with the particular tagname.

If you need to do a lot of this with very complex selector I would recommend Zend_Dom_Query which supports CSS selector syntax (a la jQuery):

$finder = new Zend_Dom_Query($html);
$classname = 'my-class';
$nodes = $finder->query("*[class~=\"$classname\"]");
prodigitalson
  • 58,127
  • 8
  • 92
  • 110
  • finds the class `my-class2` as well, but pretty sweet. Any way to only pick the first of all elements? – hakre Jun 16 '11 at 02:13
  • I dont think you can without xpath2... However the example for Zend_Dom_Query does exactly that. IF you dont want to use that compkenet in your project then you might want to see how they are translating that css selector to xpath. Maybe DomXPath supports xpath 2.0 - im not sure about that. – prodigitalson Jun 16 '11 at 02:31
  • @prodigitalson: Thanks much for your answer. I went and taught myself XPath and had a question.. Why do you use use contains, rather than simply doing [@class="$classname"]? – bgcode Jun 16 '11 at 23:59
  • 1
    because `class` can have more than one class for example: ``. – prodigitalson Jun 17 '11 at 01:43
  • 2
    @prodigitalson: This is incorrect as it does not reflect the spaces, try `//*[contains(concat(' ', normalize-space(@class), ' '), ' classname ')]` (Very informative: [CSS Selectors And XPath Expressions](http://www.a-basketful-of-papayas.net/2010/04/css-selectors-and-xpath-expressions.html)). – hakre Jun 18 '11 at 09:30
  • @hakare: am i mistaken or is the only differene the leading space? because technically that wouldnt matter anything ` classname ` matches will alos be matched by `classname `. GOOD LINK. Whis i had found that instead o reading the code in `Zend_Dom_Query`... would have been faster, haha. – prodigitalson Jun 18 '11 at 16:20
  • so..contains would still be the way to go? – bgcode Jun 18 '11 at 18:54
  • 1
    @babonk: yes, you need to use `contains` in combination with `concat`... we are jsut discussing the particulars of padding the spaces on both sides of the class youre searching for or only padding one side. Either should work though. – prodigitalson Jun 18 '11 at 20:20
  • @babonk: also make sure you take a look at the [link hakre posted in his comment to my answer](http://www.a-basketful-of-papayas.net/2010/04/css-selectors-and-xpath-expressions.html). There is a wealth of good info there dealing with xpath in comparison to css selectors. – prodigitalson Jun 18 '11 at 23:20
22

If you wish to get the innerhtml of the class without the zend you could use this:

$dom = new DomDocument();
$dom->load($filePath);
$classname = 'main-article';
$finder = new DomXPath($dom);
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
$tmp_dom = new DOMDocument(); 
foreach ($nodes as $node) 
    {
    $tmp_dom->appendChild($tmp_dom->importNode($node,true));
    }
$innerHTML.=trim($tmp_dom->saveHTML()); 
echo $innerHTML;
Tschallacka
  • 24,188
  • 10
  • 79
  • 121
14

I think the accepted way is better, but I guess this might work as well

function getElementByClass(&$parentNode, $tagName, $className, $offset = 0) {
    $response = false;

    $childNodeList = $parentNode->getElementsByTagName($tagName);
    $tagCount = 0;
    for ($i = 0; $i < $childNodeList->length; $i++) {
        $temp = $childNodeList->item($i);
        if (stripos($temp->getAttribute('class'), $className) !== false) {
            if ($tagCount == $offset) {
                $response = $temp;
                break;
            }

            $tagCount++;
        }

    }

    return $response;
}
dav
  • 7,757
  • 10
  • 71
  • 126
  • 2
    Where is the example for this? It would've been nice. – robue-a7119895 Feb 06 '15 at 17:49
  • That's great. I got the element with the class. Now I want to edit content of the element, like append child to the element containing the class. How to append the child and recreate whole HTML? Please help. This is what I have done. `$classResult = getElementByClass($dom, 'div', 'm-signature-pad'); $classResult->nodeValue = ''; $enode = $dom->createElement('img'); $enode->setAttribute('src', $signatureImage); $classResult->appendChild($enode);` – Keyur Nov 17 '16 at 13:47
  • 1
    for dom modification by php I think its better to use phpquery https://github.com/punkave/phpQuery – dav Nov 18 '16 at 07:19
10

There is also another approach without the use of DomXPath or Zend_Dom_Query.

Based on dav's original function, I wrote the following function that returns all the children of the parent node whose tag and class match the parameters.

function getElementsByClass(&$parentNode, $tagName, $className) {
    $nodes=array();

    $childNodeList = $parentNode->getElementsByTagName($tagName);
    for ($i = 0; $i < $childNodeList->length; $i++) {
        $temp = $childNodeList->item($i);
        if (stripos($temp->getAttribute('class'), $className) !== false) {
            $nodes[]=$temp;
        }
    }

    return $nodes;
}

suppose you have a variable $html the following HTML:

<html>
 <body>
  <div id="content_node">
    <p class="a">I am in the content node.</p>
    <p class="a">I am in the content node.</p>
    <p class="a">I am in the content node.</p>    
  </div>
  <div id="footer_node">
    <p class="a">I am in the footer node.</p>
  </div>
 </body>
</html>

use of getElementsByClass is as simple as:

$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadHTML($html);
$content_node=$dom->getElementById("content_node");

$div_a_class_nodes=getElementsByClass($content_node, 'div', 'a');//will contain the three nodes under "content_node".
oabarca
  • 9,380
  • 6
  • 50
  • 65
8

DOMDocument is slow to type and phpQuery has bad memory leak issues. I ended up using:

https://github.com/wasinger/htmlpagedom

To select a class:

include 'includes/simple_html_dom.php';

$doc = str_get_html($html);
$href = $doc->find('.lastPage')[0]->href;

I hope this helps someone else as well

iautomation
  • 840
  • 10
  • 16
  • So simple, so beautiful! Usability at it's very finest, compared to PHP's native DOM handling! Please upvote, this is the most useful answer. – Sliq Apr 30 '21 at 12:22
1

I prefer using Symfony for this. Their libraries are pretty nice.

Use the The DomCrawler Component

Example:

$browser = new HttpBrowser(HttpClient::create());
$crawler = $browser->request('GET', 'example.com');
$class = $crawler->filter('.class')->first();
Unicco
  • 1,797
  • 1
  • 14
  • 27
0

PHP's native DOM handling is so absurdly bad, do yourself a favour and use this or any other modern HTML parsing package which can handle this within in few lines:

Install paquettg/php-html-parser with

composer require paquettg/php-html-parser

Then create a .php file in the same folder with this content

<?php

// load dependencies via Composer
require __DIR__ . '/vendor/autoload.php';

use PHPHtmlParser\Dom;

$dom = new Dom;
$dom->loadFromUrl("https://example.com");
$links = $dom->find('.classname a');

foreach ($links as $link) {
    echo $link->getAttribute('href');
}

P.S. You'll find information on how to install Composer on Composer's homepage.

Sliq
  • 14,005
  • 24
  • 99
  • 137