What AWS technologies should I use in order to do light weight processing in real time before storing the data in Redshift?

Question

I want to create a few HTTP points where mobile clients, servers & IoT devices will be posting data. I may need to preprocess the events & act up on them. Eventually I want to access all the raw data & make queries using Domo, Cloud Business Intelligence | Chartio or Tableau .

I need to understand what are the differences & advantages for the following architectures:

AWS API Management + Lambda + Redshift: I can create an HTTP endpoint & a lambda function that will parse the data, compute & store in Redshift
Kinesis Firehose + Redshift (how do I stream the data over HTTP here?)
S3 + Kinesis + Redshift (I can use an HTTP endpoint that writes data to S3)
S3 + Kinesis Firehose + Redshift
S3 + Lambda + Redshift

I feel like 3, 4 & 5 create redundancy because of S3. Will the execution of Lambda functions have a significant cost overhead over using Kinesis?

score 0 · Answer 1 · answered Oct 08 '15 at 01:13

Way to broad a question, but of the top of my head I would say #1 is the best of the choices you provided.

Personally I would go to DynamoDB instead for receiving data from Lamba - then I would either query it directly from there, or use it as a source for Redshift if your usage patterns required it.

score 0 · Answer 2 · answered Dec 12 '18 at 19:35

How about below ?

If the requirement is not realtime then directly writing to S3 will help and can easily scale unlimited.
Have event handler configured on S3 files which triggers Lambda to process your logic.
Push it to Kinesis or Redshift for further processing. Redshift can query directly from S3 in open data formats.

What AWS technologies should I use in order to do light weight processing in real time before storing the data in Redshift?

2 Answers2