2

I'm trying to update a tool that was created a while ago which uses nodejs (I am not a JS developer, so I'm trying to piece the code together) and am getting stuck at the last hurdle.

The new functionality will take in a swagger .json definition, compare the endpoints against the matching API Gateway on the AWS Service, using the 'aws-sdk' SDK for JS and then updates the Gateway accordingly.

The code runs fine on a small definition file (about 15 endpoints) but as soon as I give it a bigger one, I start getting tons of TooManyRequestsException errors.

I understand that this is due to my calls to the API Gateway service being too quick and a delay / pause is needed. This is where I am stuck

I have tried adding;

  • a delay() to each promise being returned
  • running a setTimeout() in each promise
  • adding a delay to the Promise.all and Promise.mapSeries

Currently my code loops through each endpoint within the definition and then adds the response of each promise to a promise array:

promises.push(getMethodResponse(resourceMethod, value, apiName, resourcePath)); 

Once the loop is finished I run this:

        return Promise.all(promises)
        .catch((err) => {
            winston.error(err);
        })

I have tried the same with a mapSeries (no luck).

It looks like the functions within the (getMethodResponse promise) are run immediately and hence, no matter what type of delay I add they all still just execute. My suspicious is that the I need to make (getMethodResponse) return a function and then use mapSeries but I cant get this to work either.

Code I tried: Wrapped the getMethodResponse in this:

return function(value){}

Then added this after the loop (and within the loop - no difference):

 Promise.mapSeries(function (promises) {
 return 'a'();
 }).then(function (results) {
 console.log('result', results);
 });

Also tried many other suggestions:

Here

Here

Any suggestions please?

EDIT

As request, some additional code to try pin-point the issue.

The code currently working with a small set of endpoints (within the Swagger file):

module.exports = (apiName, externalUrl) => {

return getSwaggerFromHttp(externalUrl)
    .then((swagger) => {
        let paths = swagger.paths;
        let resourcePath = '';
        let resourceMethod = '';
        let promises = [];

        _.each(paths, function (value, key) {
            resourcePath = key;
            _.each(value, function (value, key) {
                resourceMethod = key;
                let statusList = [];
                _.each(value.responses, function (value, key) {
                    if (key >= 200 && key <= 204) {
                        statusList.push(key)
                    }
                });
                _.each(statusList, function (value, key) { //Only for 200-201 range  

                    //Working with small set 
                    promises.push(getMethodResponse(resourceMethod, value, apiName, resourcePath))
                });             
            });
        });

        //Working with small set
        return Promise.all(promises)
        .catch((err) => {
            winston.error(err);
        })
    })
    .catch((err) => {
        winston.error(err);
    });

};

I have since tried adding this in place of the return Promise.all():

            Promise.map(promises, function() {
            // Promise.map awaits for returned promises as well.
            console.log('X');
        },{concurrency: 5})
        .then(function() {
            return console.log("y");
        });

Results of this spits out something like this (it's the same for each endpoint, there are many):

Error: TooManyRequestsException: Too Many Requests X Error: TooManyRequestsException: Too Many Requests X Error: TooManyRequestsException: Too Many Requests

The AWS SDK is being called 3 times within each promise, the functions of which are (get initiated from the getMethodResponse() function):

apigateway.getRestApisAsync()
return apigateway.getResourcesAsync(resourceParams)
apigateway.getMethodAsync(params, function (err, data) {}

The typical AWS SDK documentation state that this is typical behaviour for when too many consecutive calls are made (too fast). I've had a similar issue in the past which was resolved by simply adding a .delay(500) into the code being called;

Something like:

    return apigateway.updateModelAsync(updateModelParams)
    .tap(() => logger.verbose(`Updated model ${updatedModel.name}`))
    .tap(() => bar.tick())
    .delay(500)

EDIT #2

I thought in the name of thorough-ness, to include my entire .js file.

'use strict';

const AWS = require('aws-sdk');
let apigateway, lambda;
const Promise = require('bluebird');
const R = require('ramda');
const logger = require('../logger');
const config = require('../config/default');
const helpers = require('../library/helpers');
const winston = require('winston');
const request = require('request');
const _ = require('lodash');
const region = 'ap-southeast-2';
const methodLib = require('../aws/methods');

const emitter = require('../library/emitter');
emitter.on('updateRegion', (region) => {
    region = region;
    AWS.config.update({ region: region });
    apigateway = new AWS.APIGateway({ apiVersion: '2015-07-09' });
    Promise.promisifyAll(apigateway);
});

function getSwaggerFromHttp(externalUrl) {
    return new Promise((resolve, reject) => {
        request.get({
            url: externalUrl,
            header: {
                "content-type": "application/json"
            }
        }, (err, res, body) => {
            if (err) {
                winston.error(err);
                reject(err);
            }

            let result = JSON.parse(body);
            resolve(result);
        })
    });
}

/*
    Deletes a method response
*/
function deleteMethodResponse(httpMethod, resourceId, restApiId, statusCode, resourcePath) {

    let methodResponseParams = {
        httpMethod: httpMethod,
        resourceId: resourceId,
        restApiId: restApiId,
        statusCode: statusCode
    };

    return apigateway.deleteMethodResponseAsync(methodResponseParams)
        .delay(1200)
        .tap(() => logger.verbose(`Method response ${statusCode} deleted for path: ${resourcePath}`))
        .error((e) => {
            return console.log(`Error deleting Method Response ${httpMethod} not found on resource path: ${resourcePath} (resourceId: ${resourceId})`); // an error occurred
            logger.error('Error: ' + e.stack)
        });
}

/*
    Deletes an integration response
*/
function deleteIntegrationResponse(httpMethod, resourceId, restApiId, statusCode, resourcePath) {

    let methodResponseParams = {
        httpMethod: httpMethod,
        resourceId: resourceId,
        restApiId: restApiId,
        statusCode: statusCode
    };

    return apigateway.deleteIntegrationResponseAsync(methodResponseParams)
        .delay(1200)
        .tap(() => logger.verbose(`Integration response ${statusCode} deleted for path ${resourcePath}`))
        .error((e) => {
            return console.log(`Error deleting Integration Response ${httpMethod} not found on resource path: ${resourcePath} (resourceId: ${resourceId})`); // an error occurred
            logger.error('Error: ' + e.stack)
        });
}

/*
    Get Resource
*/
function getMethodResponse(httpMethod, statusCode, apiName, resourcePath) {

    let params = {
        httpMethod: httpMethod.toUpperCase(),
        resourceId: '',
        restApiId: ''
    }

    return getResourceDetails(apiName, resourcePath)
        .error((e) => {
            logger.unimportant('Error: ' + e.stack)
        }) 
        .then((result) => {
            //Only run the comparrison of models if the resourceId (from the url passed in) is found within the AWS Gateway
            if (result) {
                params.resourceId = result.resourceId
                params.restApiId = result.apiId

                var awsMethodResponses = [];
                try {
                    apigateway.getMethodAsync(params, function (err, data) {
                        if (err) {
                            if (err.statusCode == 404) {
                                return console.log(`Method ${params.httpMethod} not found on resource path: ${resourcePath} (resourceId: ${params.resourceId})`); // an error occurred
                            }
                            console.log(err, err.stack); // an error occurred
                        }
                        else {
                            if (data) {
                                _.each(data.methodResponses, function (value, key) {
                                    if (key >= 200 && key <= 204) {
                                        awsMethodResponses.push(key)
                                    }
                                });
                                awsMethodResponses = _.pull(awsMethodResponses, statusCode); //List of items not found within the Gateway - to be removed.
                                _.each(awsMethodResponses, function (value, key) {
                                    if (data.methodResponses[value].responseModels) {
                                        var existingModel = data.methodResponses[value].responseModels['application/json']; //Check if there is currently a model attached to the resource / method about to be deleted
                                        methodLib.updateResponseAssociation(params.httpMethod, params.resourceId, params.restApiId, statusCode, existingModel); //Associate this model to the same resource / method, under the new response status
                                    }
                                    deleteMethodResponse(params.httpMethod, params.resourceId, params.restApiId, value, resourcePath)
                                        .delay(1200)
                                        .done();
                                    deleteIntegrationResponse(params.httpMethod, params.resourceId, params.restApiId, value, resourcePath)
                                        .delay(1200)
                                        .done();
                                })
                            }
                        }
                    })
                        .catch(err => {
                            console.log(`Error: ${err}`);
                        });
                }
                catch (e) {
                    console.log(`getMethodAsync failed, Error: ${e}`);
                }
            }
        })
};

function getResourceDetails(apiName, resourcePath) {

    let resourceExpr = new RegExp(resourcePath + '$', 'i');

    let result = {
        apiId: '',
        resourceId: '',
        path: ''
    }

    return helpers.apiByName(apiName, AWS.config.region)
        .delay(1200)
        .then(apiId => {
            result.apiId = apiId;

            let resourceParams = {
                restApiId: apiId,
                limit: config.awsGetResourceLimit,
            };

            return apigateway.getResourcesAsync(resourceParams)

        })
        .then(R.prop('items'))
        .filter(R.pipe(R.prop('path'), R.test(resourceExpr)))
        .tap(helpers.handleNotFound('resource'))
        .then(R.head)
        .then([R.prop('path'), R.prop('id')])
        .then(returnedObj => {
            if (returnedObj.id) {
                result.path = returnedObj.path;
                result.resourceId = returnedObj.id;
                logger.unimportant(`ApiId: ${result.apiId} | ResourceId: ${result.resourceId} | Path: ${result.path}`);
                return result;
            }
        })
        .catch(err => {
            console.log(`Error: ${err} on API: ${apiName} Resource: ${resourcePath}`);
        });
};

function delay(t) {
    return new Promise(function(resolve) { 
        setTimeout(resolve, t)
    });
 }

module.exports = (apiName, externalUrl) => {

    return getSwaggerFromHttp(externalUrl)
        .then((swagger) => {
            let paths = swagger.paths;
            let resourcePath = '';
            let resourceMethod = '';
            let promises = [];

            _.each(paths, function (value, key) {
                resourcePath = key;
                _.each(value, function (value, key) {
                    resourceMethod = key;
                    let statusList = [];
                    _.each(value.responses, function (value, key) {
                        if (key >= 200 && key <= 204) {
                            statusList.push(key)
                        }
                    });
                    _.each(statusList, function (value, key) { //Only for 200-201 range  

                        promises.push(getMethodResponse(resourceMethod, value, apiName, resourcePath))

                    });             
                });
            });

            //Working with small set
            return Promise.all(promises)
            .catch((err) => {
                winston.error(err);
            })
        })
        .catch((err) => {
            winston.error(err);
        });
};
Hexie
  • 3,222
  • 4
  • 25
  • 48
  • Use the `Promise.map()` option in Bluebird and use the `concurrency` option to determine how many requests are in flight at the same time and make sure you are not launching a request until the callback to `Promise.map()` is called. If `TooManyRequestsException` is too many requests within a time period as opposed to just too many simultaneous requests, then you will need to insert a delay. – jfriend00 Nov 20 '17 at 01:40
  • See: [How to insert a delay in a promise chain](https://stackoverflow.com/questions/39538473/using-settimeout-on-promise-chain/39538518#39538518) for how to insert a delay. – jfriend00 Nov 20 '17 at 01:43
  • @jfriend00 thanks for the help, seems that i'm getting closer - any chance we can please have a `conversation` to finalize it. It'll be easier for me to show you? – Hexie Nov 20 '17 at 02:34
  • If you show the actual code for what you've tried most recently (by adding it to your question) and describe what the remaining problem is, I can help you fix it. Besides the state of your current code, the main issue is what exactly triggers the TooManyRequestsException error. Simultaneous requests? Requests/sec? Something else. Without knowing what the limit actually is, you're just guessing and testing which isn't great. The stackoverflow way is to solve the problem here, not in an offline conversation. – jfriend00 Nov 20 '17 at 03:15
  • @jfriend00 fair point, I've updated the question with important sections and then entire js file for reference. This code is still very rough, I would like to get it working before doing a refactor, thanks! – Hexie Nov 20 '17 at 04:32
  • I've also noticed since, that it seems this line `promises.push(getMethodResponse(resourceMethod, value, apiName, resourcePath))` is actually executing promises and not simply adding them to the array. Seems like the last Promise.all doesn't actually do much. – Hexie Nov 20 '17 at 05:05

1 Answers1

3

You apparently have a misunderstanding about what Promise.all() and Promise.map() do.

All Promise.all() does is keep track of a whole array of promises to tell you when the async operations they represent are all done (or one returns an error). When you pass it an array of promises (as you are doing), ALL those async operations have already been started in parallel. So, if you're trying to limit how many async operations are in flight at the same time, it's already too late at that point. So, Promise.all() by itself won't help you control how many are running at once in any way.

I've also noticed since, that it seems this line promises.push(getMethodResponse(resourceMethod, value, apiName, resourcePath)) is actually executing promises and not simply adding them to the array. Seems like the last Promise.all() doesn't actually do much.

Yep, when you execute promises.push(getMethodResponse()), you are calling getMethodResponse() immediately right then. That starts the async operation immediately. That function then returns a promise and Promise.all() will monitor that promise (along with all the other ones you put in the array) to tell you when they are all done. That's all Promise.all() does. It monitors operations you've already started. To keep the max number of requests in flight at the same time below some threshold, you have to NOT START the async operations all at once like you are doing. Promise.all() does not do that for you.


For Bluebird's Promise.map() to help you at all, you have to pass it an array of DATA, not promises. When you pass it an array of promises that represent async operations that you've already started, it can do no more than Promise.all() can do. But, if you pass it an array of data and a callback function that can then initiate an async operation for each element of data in the array, THEN it can help you when you use the concurrency option.

Your code is pretty complex so I will illustrate with a simple web scraper that wants to read a large list of URLs, but for memory considerations, only process 20 at a time.

const rp = require('request-promise');
let urls = [...];    // large array of URLs to process

Promise.map(urls, function(url) {
    return rp(url).then(function(data) {
        // process scraped data here
        return someValue;
    });
}, {concurrency: 20}).then(function(results) {
   // process array of results here
}).catch(function(err) {
    // error here
});

In this example, hopefully you can see that an array of data items are being passed into Promise.map() (not an array of promises). This, then allows Promise.map() to manage how/when the array is processed and, in this case, it will use the concurrency: 20 setting to make sure that no more than 20 requests are in flight at the same time.


Your effort to use Promise.map() was passing an array of promises, which does not help you since the promises represent async operations that have already been started:

Promise.map(promises, function() {
    ...
});

Then, in addition, you really need to figure out what exactly causes the TooManyRequestsException error by either reading documentation on the target API that exhibits this or by doing a whole bunch of testing because there can be a variety of things that might cause this and without knowing exactly what you need to control, it just takes a lot of wild guesses to try to figure out what might work. The most common things that an API might detect are:

  1. Simultaneous requests from the same account or source.
  2. Requests per unit of time from the same account or source (such as request per second).

The concurrency operation in Promise.map() will easily help you with the first option, but will not necessarily help you with the second option as you can limit to a low number of simultaneous requests and still exceed a requests per second limit. The second needs some actual time control. Inserting delay() statements will sometimes work, but even that is not a very direct method of managing it and will either lead to inconsistent control (something that works sometimes, but not other times) or sub-optimal control (limiting yourself to something far below what you can actually use).

To manage to a request per second limit, you need some actual time control with a rate limiting library or actual rate limiting logic in your own code.

Here's an example of a scheme for limiting the number of requests per second you are making: How to Manage Requests to Stay Below Rate Limiting.

jfriend00
  • 580,699
  • 78
  • 809
  • 825
  • Thanks for the response, the problem I have with this is how to return "data" from my function(s) and not promises. I have tried wrapping them in a `return function (value) {}` which didn't really help - the alternative option i'm thinking of (have not yet tried) is to remove the `Async` from the method name being used? Your example does make sense but as you mentioned, it's a pretty simple example that is not working with methods but string URL's. – Hexie Nov 20 '17 at 21:48
  • @Hexie - My example shows you how to get all the results - right where it says "process array of results here". These are async operations so you cannot return the results directly from your function. You will have to use `.then()` to get the results just like my example shows. – jfriend00 Nov 20 '17 at 22:05
  • I was referring to changing the functions i currently use (getMethodResponse) to not return the promises they do and rather "data" as you said. I'm attempting a few changes and assume i could change your rp(url) to use simple Resolve? `Promise.map(promises, promise => {return Promise.resolve(promise).then(function (data)` – Hexie Nov 20 '17 at 22:16
  • @Hexie - If they are asynchronous operations, they have to return a promise. You can't return asynchronous data from something that gets its data ascynchronously. My `rp(url)` is in place of some asynchronous operation you already have that returns a promise. You plug in your asynchronous function there that returns a promise. It may be that you need to break your question into a lot smaller pieces and ask about one piece of your problem. As it is, you have a big complicated code block that is hard for us to understand what it all does - so hard for us to help rewrite all that. – jfriend00 Nov 20 '17 at 22:21
  • So it turns out that the reason this was failing was due to my promise chain being broken within the methods. Once they were all chained correctly, it started working as expected. Thanks again for the help! – Hexie Nov 22 '17 at 00:54
  • @Hexie - What do you want to do with this question? It's hanging out here and not marked as answered yet. Do you want to just click the checkmark on this answer to indicate to the community that your question is now answered or do you want me to add something else to my answer before doing that. – jfriend00 Dec 01 '17 at 01:25
  • Thanks for the follow up - your question did help me resolve the issue (that's why I upvoted it), it however, was not the answer - but is worth leaving here as it could help someone else with a decent troubleshooting mechanism. – Hexie Dec 04 '17 at 21:35