I am writing this answer because I am new to DomDocuement and I could not fin the answer anywhere else.
I am writing a custom WordPress theme for a CMS site. One requirement is to retrieve webpage contents from other sites, select parts of that content to display, and then refer to the original site for complete content.
Therefore I am using the wp_remote_get
function from WordPress to get the webpage. Then I have to parse it to go to specific webpage elements, and I am using Domdocument to parse the page as follows
$dom = new domDocument();
$dom->loadHTML($html); // Where $html is the page retrieved earlier using wp_remote_get
Now, the problem is loadHTML() is causing these warnings:
PHP Warning: DOMDocument::loadHTML(): DOCTYPE improperly terminated in Entity, Lline: 2
PHP Warning: DOMDocument::loadHTML(): htmlParseStartTag: misplaced <html> tag in Entity, line: 3
PHP Warning: DOMDocument::loadHTML(): htmlParseStartTag: misplaced <head> tag in Entity, line: 4
PHP Warning: DOMDocument::loadHTML(): htmlParseStartTag: misplaced <body> tag in Entity, line: 105
PHP Warning: DOMDocument::loadHTML(): ID 1 already defined in Entity, line: 551
And there are many more warnings.
Now obviously there is something wrong in this webpage, however most pages that we need to retrieve generate those errors.
My questions are:
- Should I be worried about those warnings
- Is there a way of telling domdocuemnt to ignore those imperfections?
- What should I do to make things work properly
Thanks for your input