0

The following curl-call succeeds every time, if and only if $data is printed after the curl-call. curl_getinfo() returning

[content_type] => text/html; charset=UTF-8

If $data is not printed, the curl-call sometimes return the same result as above and sometimes returns $data being "Loading...", Which means that page has not finished loading yet. And curl_getinfo() returning

[content_type] => text/html

Furthermore, when using print_r($data), I can see the print_r(curl_getinfo($ch)); on my website being updated several times while performing the curl-call. What... The.... F?

(the set_opt-list has grown larger as I'm trying to find a solution LOL) Ooh.. yeah, even if I print $data after it's been returned to function caller and caught in another variable.. curl succeeds every time.

Is this normal behaviour? I don't want to print_r($data)!

Is it possible that the url I'm retrieving contains javascript which gets run when I "print" it on my website? Why does it work occasionally without the print_r($data)? Ref: is-there-a-way-to-let-curl-wait-until-the-pages-dynamic-updates-are-done

edit: Until further notice, I've put the curl-call in a while-loop, checking if downloaded size is above a certain threshold. I've set the while loop to 10 iterations, and so far it is enough, i.e. it will manage to download the content of interest. Time consumed is barely noticed.

function curl_get_contents($url) {
    global $dbg;

    $ch = curl_init();
    $timeout = 30;
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_NOSIGNAL, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    //curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
    curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17');
    curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
    curl_setopt($ch,CURLOPT_AUTOREFERER, true);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
    $data = curl_exec($ch);
    if ($dbg) {
        print_r(curl_getinfo($ch)); // This one gets refreshed if print_r($data) used below
        if(curl_errno($ch)){
            echo 'Curl error: ' . curl_error($ch);
        } else {
            echo "ALL GOOD <br>";
        }
    }
    curl_close($ch);
    //echo $data;    // If I do this...
    //print_r($data); // ... or this. curl is success 100%.
    return $data;
}
niCk cAMel
  • 809
  • 1
  • 8
  • 23
  • 1
    And what is your problem? – Lars Stegelitz Sep 25 '18 at 20:15
  • @LarsStegelitz added clarifying questions – niCk cAMel Sep 25 '18 at 20:19
  • 1
    No, that is not normal behaviour... how can curl return while the URL has not finished loading? It's not asynchron and I never heard of such behaviour before... – Lars Stegelitz Sep 25 '18 at 20:27
  • 3
    PHP cannot time travel. The success of an earlier operation cannot be dependent on what you do later. – Barmar Sep 25 '18 at 20:28
  • 2
    I suspect the result you're seeing has to do with the HTML that's being retrieved from the remote site. Maybe it contains JavaScript that's being executed when you return it to the browser. Use View Source to see the raw results. – Barmar Sep 25 '18 at 20:33
  • @Barmar, well I get that. Since I'm not all *high* on `curl` I figured it might have something to do with cache and/or similarities with enabledelayedexpansion in batch. "...time travel" - LMAO – niCk cAMel Sep 25 '18 at 20:33
  • can you replicate this behaviour in a fiddle? – Kaii Sep 25 '18 at 20:42
  • Most likely the 30 second timeout isn't enough. – Forbs Sep 25 '18 at 20:53
  • @Forbs it is. Never has the page taken more than 3 seconds to load.. with or without the `print_r` – niCk cAMel Sep 25 '18 at 20:55

0 Answers0