1

I have an endpoint in my Node.js API which returns a JSON array of results provided by the Google-Search-Scrapper library.

app.get('/google_image_search', (req, res) => {
    var options = {
        query: 'grenouille',
        age: 'y', // last 24 hours ([hdwmy]\d? as in google URL)
        limit: 10,
        params: {} // params will be copied as-is in the search URL query string
    };

    var results = [];
    scraper.search(options, function(err, url, meta) {
        sem.take(function() { 
            if(err) throw err;

            var result = {
                title: meta.title,
                meta: meta.meta,
                description: meta.desc
            }
            results.push(result);
            sem.leave();
        });
    })

    console.log(results);

    res.json({
        results
    });
})

I need the console.log(results) and the res.json({ results }) to happen after the scraper.search function is done. It is currently always returning an empty array.

The function passed to the scraper.search() function is called for every result. So if there are 10 results that function runs 10 times, which is why I'm waiting until the array is full to send the response.

I have tried using semaphores and mutex locks in different places but no luck. Any advice is appreciated.


This was solved using a LIMIT variable to check my results array against. Outlined in the answer marked correct below.

Thanks to everyone for the input.


Waggoner_Keith
  • 532
  • 2
  • 8
  • 32
  • 1
    Possible duplicate of [How do I return the response from an asynchronous call?](https://stackoverflow.com/questions/14220321/how-do-i-return-the-response-from-an-asynchronous-call) – tkausl Feb 10 '19 at 17:02
  • 1
    Not a duplicate but related question, it doesn't explain how the problem should be solved for this specific lib. – Estus Flask Feb 10 '19 at 17:05

3 Answers3

1

I need the console.log(results) and the res.json({ results }) to happen after the scraper.search function is done.

Put it in the innermost callback for scraper.search().

scraper.search(options, function(err, url, meta) {
        if(err) throw err;

        var result = {
            title: meta.title,
            meta: meta.meta,
            description: meta.desc
        };
        results.push(result);
        console.log(result);
        res.json({results});
});

That will call console.log() and res.json() every time the callback is run. If you want to only do it after 10 results or something else, add code to check a condition and only run console.log() and/or res.json() at the right time.

You can also look at things like async/await, but given the code you have posted, the above is likely the most incremental solution.

The problem with where you have the console.log() and res.json() now is that it is treating the asynchronous callback-using function as if it were synchronous.

Trott
  • 52,114
  • 21
  • 134
  • 179
  • Ok correct me if I'm wrong, but that function that is passed to the search function is called for every result. So if there are 10 results would we be sending back 10 responses? That's why I was trying to wait until the array was full? – Waggoner_Keith Feb 10 '19 at 17:07
  • 1
    Ah! Yes, in that case, the code in the callback will need to check the number of results or whatever it is you wish to use to indicate that the results are done. I'll edit my answer. – Trott Feb 10 '19 at 17:28
1

Trott's answer was on the right track, but how about having a variable that you increment every time, and then when it's equal to 10 (or 9, depending on how you implement it), run your completion code. You could also just count the elements in the array.

app.get('/google_image_search', (req, res) => {
    var options = {
        query: 'grenouille',
        age: 'y', // last 24 hours ([hdwmy]\d? as in google URL)
        limit: 10,
        params: {} // params will be copied as-is in the search URL query string
    };

    var results = [];
    scraper.search(options, function(err, url, meta) {
        sem.take(function() { 
            if(err) throw err;

            var result = {
                title: meta.title,
                meta: meta.meta,
                description: meta.desc
            }
            results.push(result);
            sem.leave();
        });
        if(results.length==10) {
            console.log(results);

            res.json({
                results
            });
        }
    })
})
Matt F.
  • 427
  • 3
  • 13
1

Putting res.send outside a callback will result in race condition similar to this problem. A shortcoming of google-search-scraper library is that it wasn't designed to collect results.

This should be fixed:

var LIMIT = 10;
var options = { limit: LIMIT, ... };

var results = [];
var errs = [];
var resultsCount = 0;

function resultsHandler() {
    if (errs.length) {
       // handle error
    } else
       res.json({ results });
}

scraper.search(options, function resultHandler(err, url, meta) {
    if (err)
        errs.push(err);
    else {
        var result = {
            title: meta.title,
            meta: meta.meta,
            description: meta.desc
        };

        results.push(result);
    });

    resultsCount++;

    if (resultsCount === LIMIT)
       resultsHandler();
});

This won't work if it's possible for search to not call a callback on some conditions.

Estus Flask
  • 150,909
  • 47
  • 291
  • 441