7

I am working on an application that will read and analyze the logs of payment transactions. I know I will use Kinesis Analytics as per my requirements, which takes the input from the Data Streams and Firehose. But I am having trouble deciding which input method should I use for my system. My requirements are:

  1. It can tolerate latency, but Data shouldn't lose data.
  2. Must record all the errors in DynamoDB or S3 buckets.

Which input stream is suitable for my use case?

John Rotenstein
  • 165,783
  • 13
  • 223
  • 298
Karandeep Singh
  • 1,075
  • 3
  • 20
  • 33

4 Answers4

7

Data Streams vs Firehose

  1. Streams: Kinesis data streams is highly customizable and best suited for developers building custom applications or streaming data for specialized needs.
    • Going to write custom code
    • Real time (200ms latency for classic, 70ms latency for enhanced fan-out)
    • You must manage scaling (shard splitting/merging)
    • Data storage for 1 to 7 days, replay capability, multi consumers
    • Use with Lambda to insert data in real-time to ElasticSearch
  2. Firehose: Firehose handles loading data streams directly into AWS products for processing.
    • Fully managed, send to S3, Splunk, Redshift, ElasticSearch
    • Serverless data transformations with Lambda
    • Near real time (lowest buffer time is 1 minute)
    • Automated Scaling
    • No data storage
4

There are some key differences between Kinesis Stream (KS) and Firehose (FH):

  • KS is real time, while FH is near-real time.
  • KS requires manual scaling and setup of its provisioning (shards) , while FH is basically serverless.
  • KS records are immutable (they persist in stream for its retention period - default 24h), while records in FH are gone from FH the moment they are delivered to destination.

From what you wrote, I think FH should be considered first, as you are not concerned about non-real-time nature of FH, it is much easier to manage and setup, and you can specify S3 as a backup for failed or all messages:

Kinesis Data Firehose uses Amazon S3 to backup all or failed only data that it attempts to deliver to your chosen destination.

The S3 backup ensures you are not loosing records, if delivery or lambda processing fail. Subsequently, in my view, Firehose addresses your two points well.

Marcin
  • 108,294
  • 7
  • 83
  • 138
3

Kinesis Data Streams allows consumers to READ streaming data. And it gives you a plenty of options to do so. It is best suitable for use cases that require custom processing, choice of stream processing frameworks, and sub-second processing latency. Data is reliably stored in streams up to 7 days and distributed across 3 Availability Zones.

Kinesis Firehose is used to LOAD streaming data to a target destination (S3, Elasticsearch, Splunk, etc). You can also transform streaming data (by using Lambda) before loading it to destination. Data from failed attempts will be saved to S3.

So if your goal is to only load data to Kinesis Data Analytics service with minimal or no pre-processing when try Kinesis Firehose first.

Please note, that you also would need to consider such aspects as cost, development efforts, scaling options, volume of the data when choosing a proper service.

Please take a look at the following AWS Solutions Implementation for reference: https://aws.amazon.com/solutions/implementations/real-time-web-analytics-with-kinesis/ https://aws.amazon.com/solutions/implementations/real-time-iot-device-monitoring-with-kinesis/

shuraosipov
  • 31
  • 1
  • 1
0

You can use firehose to feed into analytics, but question is how firehose gets data? You can write your own code to feed data or use kinesis data steams. Firehose mainly is delivery system for stream data that can be written in to various destinations such as S3, Redshift or others with optional capability to perform data transformation.

Check this link https://www.slideshare.net/AmazonWebServices/abd217from-batch-to-streaming?from_action=save and see how your use case can benefit from the information.

More info: https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works.html https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html