3

The issue:

I'm working with PHP, cURL and a public API to fetch json strings. These json strings are formatted like this (simplified, average size is around 50-60 kB):

{
   "data": {},
   "previous": "url",
   "next": "url"
}

What am trying to do is fetch all the json strings starting from the first one by checking for the "next" attribute. So I have a while loop and as long as there's a "next" attribute, I fetch the next URL.

The problem is sometimes, randomly the loop stops before the end and I cannot figure out why after many tests.

I say randomly because sometimes the loop goes through to the end and no problem occurs. Sometimes it crashes after N loops.

And so far I couldn't extract any information to help me debug it.

I'm using PHP 7.3.0 and launching my code from the CLI.

What I tried so far:

Check the headers:

No headers are returned. Nothing at all.

Use curl_errno() and curl_error():

I tried the following code right after executing the request (curl_exec($ch)) but it never triggers.

if(curl_errno($ch)) {
   echo 'curl error ' . curl_error($ch) . PHP_EOL;
   echo 'response received from curl error :' . PHP_EOL;
   var_dump($response); // the json string I should get from the server.
}

Check if the response came back null:

if(is_null($response))

or if my json string has an error:

if(!json_last_error() == JSON_ERROR_NONE)

Though I think it's useless because it will never be valid if the cURL response is null or empty. When this code triggers, the json error code is 3 (JSON_ERROR_CTRL_CHAR)

The problematic code:

function apiCall($url) {
   ...
   $ch = curl_init();
   curl_setopt($ch, CURLOPT_URL, $url);
   curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
   $response = curl_exec($ch);
}
$inc = 0;
$url = 'https://api.example.com/' . $id;
$jsonString = apiCall($url);

if(!is_null($jsonString)) {
file_put_contents('pathToDirectory/' . $id + $inc, $jsonString);
$nextUrl = getNextUrl($jsonString);

    while ($nextUrl) {
        $jsonString = apiCall($url . '?page=' . $nextUrl);

        if(!is_null($jsonString)) {
            $inc++;
            file_put_contents('pathToDirectory/' . $id + $inc, $jsonString);
            $nextUrl = getNextUrl($jsonString);
        }
    }
}

What I'm expecting my code to do:

Not stop randomly, or at least give me a clear error code.

yivi
  • 23,845
  • 12
  • 64
  • 89
  • 1
    Maybe capturing a problematic execution in wireshark would help? You'd need to use a non-https API though... – NieDzejkob Jan 14 '19 at 20:26
  • I just tried doing an api call over http but it doesn't let me. No chance of seeing anything with Wireshark then? (I'm not an expert at this). – KeksimusTotalus Jan 14 '19 at 20:34
  • What is `$id`? What characters are in this string? Please confirm that you mean to add and not concatenate `$id` and `$inc`. Please turn displaying of errors at the top of your script: https://stackoverflow.com/a/21429652/2943403 – mickmackusa Jan 14 '19 at 23:23
  • When you say "random", you mean the same secessive set of url calls will be successful on one attempt and fail on the next attempt? Or do you mean that some attempts succeed and different ones fail. Is this just a matter of the api provider having improperly encoded data being served up? If it is their error, bring it to there attention and tell them the urls that bonk. – mickmackusa Jan 14 '19 at 23:39
  • @mickmackusa Yes `$id` is an integer and i'm just incrementing it to have different filenames each time. And yes, random here really means random. I tested the URLs manually when my code fails, so far it's always been a different one and they're all working fine when doing the request myself. – KeksimusTotalus Jan 15 '19 at 19:52

1 Answers1

5

The problem is that your API could be returning an empty response, a malformed JSON, or even a status code different of 200 and you would stop execution imediately.

Since you do not control the API responses, you know that it can fail randomly, and you do not have access to the API server logs (because you don't, do you?); you need to build some kind of resilience in your consumer.

Something very simple (you'd need to tune it up) could be

function apiCall( $url, $attempts = 3 ) {
    // ..., including setting "$headers"
    $ch = curl_init();
    curl_setopt( $ch, CURLOPT_URL, $url );
    curl_setopt( $ch, CURLOPT_HTTPHEADER, $headers );
    curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );

    for ( $i = 0; $i < $attempts; $i ++ ) {
        $response  = curl_exec( $ch );
        $curl_info = curl_getinfo( $ch );

        if ( curl_errno( $ch ) ) {
            // log your error & try again
            continue;
        }

        // I'm only accepting 200 as a status code. Check with your API if there could be other posssible "good" responses
        if ( $curl_info['http_code'] != 200 ) {
            // log your error & try again
            continue;
        }

        // everything seems fine, but the response is empty? not good.
        if ( empty( $response ) ) {
            // log your error & and try again
            continue;
        }

        return $response;
    }

    return null;
}

This would allow you to do something like (pulled from your code):

do {
    $jsonString = apiCall($url . '?page=' . $nextUrl);
    $nextUrl    = false;

    if(!is_null($jsonString)) {
        $inc++;
         file_put_contents('pathToDirectory/' . $id + $inc, $jsonString);
         $nextUrl = getNextUrl($jsonString);
    }
}
while ($nextUrl);

I'm not checking if the return from the API is non-empty, not a connection error, a status different from '200' and yet an invalid JSON.

You may want to check for these things as well, depending on how brittle the API you are consuming is.

yivi
  • 23,845
  • 12
  • 64
  • 89
  • Thank you for your answer (and the code improvement with the do {} while). However, is the `if(empty($response))` test checking for an empty body in this case? Or both the header and body? Because when the error happens it's not just the body that is empty, it's everything (seemingly at least, so checking for the status code wouldn't work either, right?). Also I tried `if(curl_errno($ch))` but it never triggers. And no sadly I don't have access to anything on the server side. – KeksimusTotalus Jan 15 '19 at 19:49
  • 1
    If "everything" is empty, then `$response` will be empty. As long as `$response` is empty you know you need to retry, since things didn't go as planned. Checking for status codes is just something else (`$response` might be non-empty, but the status code could be `404` or `500`, for example). You could be getting empty responses or non 200 status codes, and yet the connection wouldn't be considered "failed", and `curl_errno()` would have nothing to report. – yivi Jan 15 '19 at 19:56
  • 1
    @Keksimus You can try this function and see if this help you resolve the problem. The complete solution will require some tinkering on your part, of course, but this is the basic concept to get you going and add some error handling to your code. – yivi Jan 15 '19 at 20:21
  • Yes I'll accept your answer for sure, took a bit of time as I was trying it in code. By `$curl_info['http_code'] != 200` did you mean `curl_getinfo(CURL_INFO_HTTPCODE)` ? Also to make sure I understand, I'm just repeating the request until it works without "fixing" anything right? I guess it would be a bad idea to do it in a while loop instead of a for loop then? – KeksimusTotalus Jan 15 '19 at 20:49
  • By the way, when the code fails, it fails it the http_code != 200 block, I still need to figure out how to extract more info as all I'm getting from curl_getingo() is a code 0. – KeksimusTotalus Jan 15 '19 at 20:52
  • 1
    I meant [curl_getinfo](http://php.net/manual/en/function.curl-getinfo.php). This returns an array with information about the last transfer. And yes, the idea is that since the API is unreliable and out of your control, in case of an undocumented error you just try again to see if it goes well. Obviously, 3 errors (or as many "attempts" as you decide to make) will also fail permanently. Since you say that some times it completes the process, and not always fails at the same time, I'm just _guessing_ it's a random reliability issue. But of course, without access to the API, it's part conjecture. – yivi Jan 15 '19 at 20:54
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/186737/discussion-between-keksimustotalus-and-yivi). – KeksimusTotalus Jan 15 '19 at 21:00
  • @yivi - thank you for your answer. It's exactly what I needed to work with the API I'm dealing with which is rather 'flooky' on certain calls. Helped imensely! – Scott Fleming Apr 23 '21 at 12:09