0

I am writing this answer because I am new to DomDocuement and I could not fin the answer anywhere else.

I am writing a custom WordPress theme for a CMS site. One requirement is to retrieve webpage contents from other sites, select parts of that content to display, and then refer to the original site for complete content.

Therefore I am using the wp_remote_get function from WordPress to get the webpage. Then I have to parse it to go to specific webpage elements, and I am using Domdocument to parse the page as follows

$dom = new domDocument();
$dom->loadHTML($html); // Where $html is the page retrieved earlier using wp_remote_get

Now, the problem is loadHTML() is causing these warnings:

PHP Warning:  DOMDocument::loadHTML(): DOCTYPE improperly terminated in Entity, Lline: 2
PHP Warning:  DOMDocument::loadHTML(): htmlParseStartTag: misplaced <html> tag in Entity, line: 3
PHP Warning:  DOMDocument::loadHTML(): htmlParseStartTag: misplaced <head> tag in Entity, line: 4
PHP Warning:  DOMDocument::loadHTML(): htmlParseStartTag: misplaced <body> tag in Entity, line: 105
PHP Warning:  DOMDocument::loadHTML(): ID 1 already defined in Entity, line: 551

And there are many more warnings.

Now obviously there is something wrong in this webpage, however most pages that we need to retrieve generate those errors.

My questions are:

  1. Should I be worried about those warnings
  2. Is there a way of telling domdocuemnt to ignore those imperfections?
  3. What should I do to make things work properly

Thanks for your input

Greeso
  • 5,502
  • 6
  • 43
  • 65

0 Answers0