php to extract data from a website

Question

I want to get all <p> elements from 1st jokes so basically I made this script:

<?php
$url = "http://sms.hindijokes.co";
$html = file_get_contents($url);
$doc = new DOMDocument;
$doc->strictErrorChecking = false;
$doc->recover = true;
@$doc->loadHTML("<html><body>".$html."
</body>      </html>");
$xpath = new DOMXPath($doc);
$query1 = "//h2[@class='entry-title']/a";
$query2 = "//div[@class='entry-content']/p";
$entries1 = $xpath->query($query1);
$entries2 = $xpath->query($query2);
$var1 = $entries1->item(0)->textContent;
$var2 = $entries2->item(0)->textContent;
echo "$var1"; 
echo "<br>";
$f = 5;
for($i = 0; $i < $f; $i++){
echo $entries2->item($i)->textContent."\n";
}
?>

This time I was knowing that there are five <p> elements in first joke but if I want it to be automate script, there would be sometimes more or less than five <p> elements so it would cause mess.

please refer http://stackoverflow.com/questions/6366351/getting-dom-elements-by-classname — Jaydeep Pandya, Jan 12 '17 at 12:08

score 1 · Answer 1 · answered Jan 12 '17 at 11:54

1

DOMXPath::query returns DOMNodeList object. Use DOMNodeList::length property.

$f = $entries2->length;

answered Jan 12 '17 at 11:54

German Lashevich

1,731
21
30

How can i include br can you help me their are some
in p tags – Alive ColdJuan Jan 13 '17 at 07:01

shudder · Accepted Answer · 2017-01-12T19:29:27.477

1

You need first div's p elements only, so your query would be:

$entries2 = $xpath->query('//(div[@class='entry-content'])[1]/p');

Now you can iterate all p elements with foreach() loop (extracting its html contents):

$innerHtml = '';
foreach ($entries2 as $entry) {
    $children = $entry->childNodes;
    foreach ($children as $child) {
        $innerHtml .= $child->ownerDocument->saveXML($child);
    }
}
$innerHtml = str_replace(["\r\n", "\r", "\n", "\t"], '', $innerHtml);

edited Jan 12 '17 at 19:29

answered Jan 12 '17 at 12:23

shudder

2,016
2
18
20

How to get html content too.. like p elements is like this ( not using less than greater than) – Alive ColdJuan Jan 12 '17 at 19:01
how i get html content from p element included ? as their are
which needs to be included... – Alive ColdJuan Jan 12 '17 at 19:02
@AliveColdJuan Check my edited answer if it extracts inner html content. Its based on my old code and right now I'm not sure if it worked as I expect it should. – shudder Jan 12 '17 at 19:34

score 0 · Answer 3 · edited May 23 '17 at 12:13

0

Try this way it is returning until null; but some joke has multiple p tags so its better for you to find it by your custom class/id

$i = 0;
while($entries2->item($i)->textContent!=NULL) {
    echo "<br>";
    echo $i." ".$entries2->item($i)->textContent;
    $i++;
}

edited May 23 '17 at 12:13

Community

1
1

answered Jan 12 '17 at 12:17

Jaydeep Pandya

807
1
10
24

php to extract data from a website

3 Answers3