0

I need your help...

I have a function to manipulate the HTML element to change the image url using DOM parse. My function was working properly. Here's my code:

//Update image src with new src
function upd_img_src_in_html($html_src='', $new_src='')
{
    if($html_src == '' || $new_src == ''):
        return '';
    endif;

    $xml = new DOMDocument();
    $xml->loadHTML($html_src, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

    $imgNodes = $xml->getElementsByTagName('img');
    for ($i = $imgNodes->length - 1; $i >= 0; $i--) {
        $imgNode = $imgNodes->item($i);
        $image_file_names = pathinfo($imgNode->getAttribute('src'), PATHINFO_BASENAME);

        if(!empty($image_file_names)):
            $imgNode->setAttribute('src', $new_src.$image_file_names);
            $imgNode->setAttribute('style', 'max-width:90%; margin-left:auto; margin-right:auto;');
        endif;
    }
    return html_entity_decode($xml->saveHTML());
}

However a lot of problems come after I made this function.

No 1: result_box already defined in Entity line 1

enter image description here

No. 2: unexpected line tag..

enter image description here

I cannot control at all the input from $html_src='' to make it run smoothly. I've tried some effort on dealing with the problem 1 but still not success. For example I used libxml_use_internal_errors() but still got the error.

The second problem I can not overcome it. Is it any easiest way to handle only to change image src instead of using DOMDocument()?

The answers from expert really needed here. Please give me some advice on how to deal with these problems.

Thank you..

Nere
  • 3,919
  • 4
  • 24
  • 62
  • 1
    the html may be invalid preventing the dom parsing from working. can you use regular expressions instead of dom parsing? – WEBjuju Dec 04 '16 at 16:14
  • I was tried it before..is it practical in large DOM content? Sometimes the reqular expression can not read at all since a lot of tags – Nere Dec 04 '16 at 16:15
  • 1
    most of the time i would say no, but if you control this html file where the images are, or you don't expect it to change, then what you are trying to do here is [feasible in this context](http://stackoverflow.com/questions/4231382/regular-expression-pattern-not-matching-anywhere-in-string/4234491#4234491) – WEBjuju Dec 04 '16 at 16:18
  • 1
    Try using `libxml_use_internal_errors()`. The messages look more like parser warnings, not actual fatal errors. And this are exceptions, you can catch them. – ThW Dec 04 '16 at 18:10
  • Can you post the sample values of `$html_src` and `$new_src`? – web-nomad Dec 04 '16 at 19:03
  • @ThW I liked to here that...I will update my answer soon... – Nere Dec 04 '16 at 23:55
  • @web-nomad I'll update my answer – Nere Dec 04 '16 at 23:55
  • @Imran Looking at the errors you have listed, it seems that the html contains more than 1 element with the same `ID`. You can try to use [SimpleHTMLDom](http://simplehtmldom.sourceforge.net/). – web-nomad Dec 05 '16 at 01:40

1 Answers1

0

One way to deal with messy HTML and DOMDocument is to use the PHP tidy extension first, which will correct all the errors that are in it.

cweiske
  • 27,869
  • 13
  • 115
  • 180