OP says:
they can sometimes be large up-to 20mb
Since the volume of data you serve can be pretty large, I think it is feasible for you to do this in 2 requests instead of one, where you decouple the content generation from the content serving part. The reason to do this is so as to minimize the amount of time/resources it takes on the server to fetch data from S3 and serve it.
AWS supports pre-signed URLs which can be valid for a short amount of time; We can try using the same here to avoid issues around security etc.
Currently, your architecture looks something like below, wherein. the client initiates a request, you check if the requested data exists on the S3 and then fetch and serve it if there, else you generate the content, and save it to S3:
if exists on S3
client --------> server --------------------> fetch from s3 and serve
|
|else
|------> generate content -------> save to S3 and serve
In terms of network resources, you always consume 2X the amount of bandwidth and time here. If the data exists, then once you have to pull it from server and serve it to customer (so it is 2X). If the data doesn't exist, you send it to customer and to S3 (so again it is 2X)
Instead, you can try 2 approaches below, both of which assume that you have some base template, and that the other data can be fetched via AJAX calls, and both of which bring down that 2X factor in the overall architecture.
Serve the content from S3 only. This calls for changes to the way your product is designed, and hence may not be that easily integrable.
Basically, for every incoming request, return the S3 URL for it if the data already exists, else create a task for it in SQS, generate the data and push it to S3. Based on your usage patterns for different artists, you should be having an estimate of how much time it takes to pull together the data on the average, and so return a URL which would be valid with the estimated_time_for_completetion(T
) of the task.
The client waits for time T
, and then makes the request to the URL returned earlier. It makes upto say 3 attempts for fetching this data in case of failure. In fact, the data already existing on S3 can be thought of as the base case when T = 0
.
In this case, you make 2-4 network requests from the client, but only the first of those requests comes to your server. You transmit the data once to S3 only in the case it doesn't exists and the client always pulls in from S3.
if exists on S3, return URL
client --------> server --------------------------------> s3
|
|else SQS task
|---------------> generate content -------> save to S3
return pre-computed url
wait for time `T`
client -------------------------> s3
Check if data already exists, and make second network call accordingly.
This is similar to what you currently do when serving data from the server in case it doesn't already exist. Again, we make 2 requests here, however, this time we try to serve data synchronously from the server in the case it doesn't exist.
So, in the first hit, we check if the content had ever been generated previously, in which case, we get a successful URL, or error message. When successful, the next hit goes to S3.
If the data doesn't exist on S3, we make a fresh request (to a different POST URL), on getting which, the server computes data, serves it, while adding an asynchronous task to push it to S3.
if exists on S3, return URL
client --------> server --------------------------------> s3
client --------> server ---------> generate content -------> serve it
|
|---> add SQS task to push to S3