178

I have a use case where there will be stream of data coming and I cannot consume it at the same pace and need a buffer. This can be solved using an SNS-SQS queue. I came to know the Kinesis solves the same purpose, so what is the difference? Why should I prefer (or should not prefer) Kinesis?

Apoorv
  • 2,055
  • 2
  • 11
  • 18

11 Answers11

85

Keep in mind this answer was correct for Jun 2015

After studying the issue for a while, having the same question in mind, I found that SQS (with SNS) is preferred for most use cases unless the order of the messages is important to you (SQS doesn't guarantee FIFO on messages).

There are 2 main advantages for Kinesis:

  1. you can read the same message from several applications
  2. you can re-read messages in case you need to.

Both advantages can be achieved by using SNS as a fan out to SQS. That means that the producer of the message sends only one message to SNS, Then the SNS fans-out the message to multiple SQSs, one for each consumer application. In this way you can have as many consumers as you want without thinking about sharding capacity.

Moreover, we added one more SQS that is subscribed to the SNS that will hold messages for 14 days. In normal case no one reads from this SQS but in case of a bug that makes us want to rewind the data we can easily read all the messages from this SQS and re-send them to the SNS. While Kinesis only provides a 7 days retention.

In conclusion, SNS+SQSs is much easier and provides most capabilities. IMO you need a really strong case to choose Kinesis over it.

Marcel Gosselin
  • 4,430
  • 2
  • 25
  • 50
Roee Gavirel
  • 17,192
  • 12
  • 58
  • 82
  • 2
    FYI: You can have Kinesis retain for up to 7 days. – Didier A. Mar 22 '16 at 22:25
  • @DidierA., yeah they increase the max retention policy to 7 days. I'll update the answer. thanks. – Roee Gavirel Mar 23 '16 at 08:35
  • 30
    Recently, AWS has announced SQS FIFO [http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html] which can serve the time-ordering of messages. – VijeshJain Dec 28 '16 at 21:54
  • 5
    super minor comment - probably wouldn't use the word `split` in `SNS split the message to multiple SQSs` since it doesn't break down messages into pieces but copies it to multiple destinations. – Mobigital May 03 '17 at 01:11
  • 3
    Kinesis is unsuitable for fan-out (pub-sub) use-cases due to limitations around the number of readers per shard/second. Whilst not relevant to the original enquiry, anyone relying on Kinesis scaling to n-readers should take this fact into consideration. https://forums.aws.amazon.com/message.jspa?messageID=760351 – codeasone Jun 07 '17 at 14:07
  • 1
    By piping SNS into SQS back into another SQS for backup... you implemented your own version of Kinesis/Kafka. What's the pain point of Kinesis that you're trying to so hard to avoid? I'm new to Kinesis, curious about what I'm going to hit. – alexP_Keaton Sep 26 '17 at 23:18
  • 1
    @alexP_Keaton - I'll start by saying that this answer is from 2.5 years ago and I guess many things were changed in Kinesis since then. In that time, the pain in Kinesis was the fact the it store data up to 48 hours while SQS gave you 14 days. In Kinesis you have overhead on the client and the use of DynamoDB to handle the positioning of you reads. And most important in Kinesis you had to provisioned shards in advanced which limit you read speed while in SQS you can go up and down in speed without giving about it a second thought. – Roee Gavirel Sep 27 '17 at 04:26
  • 2
    Kinesis's order guarantee is per-shard, not per-stream. Once you have more than one shards, the whole stream would have no guarantee on order. For a SQS queue, when the throughput is relatively low, it is almost FIFO. Only when your throughput goes higher, the order is less followed. This is regarding classic SQS queues, not FIFO queues. – yoroto Nov 13 '18 at 22:30
61

On the surface they are vaguely similar, but your use case will determine which tool is appropriate. IMO, if you can get by with SQS then you should - if it will do what you want, it will be simpler and cheaper, but here is a better explanation from the AWS FAQ which gives examples of appropriate use-cases for both tools to help you decide:

FAQ's

Bazzinga...
  • 986
  • 2
  • 16
  • 26
E.J. Brennan
  • 42,120
  • 6
  • 74
  • 108
  • 7
    FYI http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-subscribe-queue-sns-topic.html SQS FIFO does not work with SNS – Brent Oct 26 '17 at 05:49
54

Kinesis support multiple consumers capabilities that means same data records can be processed at a same time or different time within 24 hrs at different consumers, similar behavior in SQS can be achieved by writing into multiple queues and consumers can read from multiple queues. However writing again into multiple queue will add sub seconds {few milliseconds} latency in system.

Second, Kinesis provides routing capability to selective route data records to different shards using partition key which can be processed by particular EC2 instances and can enable micro batch calculation {Counting & aggregation}.

Working on any AWS software is easy but with SQS is easiest one. With Kinesis, there is a need to provision enough shards ahead of time, dynamically increasing number of shards to manage spike load and decrease to save cost also required to manage. it's pain in Kinesis, No such things are required with SQS. SQS is infinitely scalable.

Rohit Banga
  • 16,996
  • 26
  • 101
  • 179
kartik
  • 1,971
  • 3
  • 19
  • 28
  • 11
    Regarding you explanation on the SQS. You can achieve an easy way to send the same message to multiple SQSs by having an SNS before them. – Roee Gavirel Jun 16 '15 at 10:15
  • 8
    app --> sns topic ---> sqs1, sqs2, sqs3... ? – kartik Jun 16 '15 at 10:45
  • 4
    Yes, I was referring to this exactly approach. – Roee Gavirel Jun 17 '15 at 08:29
  • @RoeeGavirel what about the request/second limitations for sns api? – Barbaros Alp Oct 27 '16 at 08:02
  • @BarbarosAlp - I only aware of SMS (mobile text messages) limitation which is off topic here. this is the official documentation: http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_sns – Roee Gavirel Oct 30 '16 at 12:32
  • Yes, multiple SQS queues can be used, however that basically multiplies the price. If you check the pricing structure for Kinesis Streams and SQS - you can tell that SQS is probably more expensive with large amounts of messages. If you involve additional queues that will blow up your expenses. – Moszi Feb 08 '17 at 08:52
  • The problem I am finding with using sns along with SQS is order is not guaranteed. You can only use Standard SQS and not FIFO – m0g Jun 07 '18 at 14:29
  • Q: can anyone please explain why I should use the combination of SNS topic --> SQS Queue and directly not pushing data to SQS – Narendra Maru Jul 08 '19 at 07:17
47

Semantics of these technologies are different because they were designed to support different scenarios:

  • SNS/SQS: the items in the stream are not related to each other
  • Kinesis: the items in the stream are related to each other

Let's understand the difference by example.

  1. Suppose we have a stream of orders, for each order we need to reserve some stock and schedule a delivery. Once this is complete, we can safely remove the item from the stream and start processing the next order. We are fully done with the previous order before we start the next one.
  2. Again, we have the same stream of orders, but now our goal is to group orders by destinations. Once we have, say, 10 orders to the same place, we want to deliver them together (delivery optimization). Now the story is different: when we get a new item from the stream, we cannot finish processing it; rather we "wait" for more items to come in order to meet our goal. Moreover, if the processor process crashes, we must "restore" the state (so no order will be lost).

Once processing of one item cannot be separated from processing another one, we must have Kinesis semantics in order to handle all the cases safely.

Konstantin Triger
  • 1,221
  • 13
  • 10
  • With SQS FIFO queue, we would have the messages ordered as they are sent. Does that make SQS similar to Kinesis on this aspect? – Andy Dufresne Sep 17 '18 at 13:21
  • 1
    @AndyDufresne: this covers well the scenario where order is important. In the case (1) above, you may want to handle orders "in order". So if you run out of stock, later orders are rejected or delayed. FIFO semantics does not solve the core relativity (grouping) problem. – Konstantin Triger Sep 18 '18 at 13:46
35

The biggest advantage for me is the fact that Kinesis is a replayable queue, and SQS is not. So you can have multiple consumers of the same messages of Kinesis (or the same consumer at different times) where with SQS, once a message has been ack'd, it's gone from that queue. SQS is better for worker queues because of that.

Matthew Curry
  • 642
  • 5
  • 11
35

Excerpt from AWS Documentation:

We recommend Amazon Kinesis Streams for use cases with requirements that are similar to the following:

  • Routing related records to the same record processor (as in streaming MapReduce). For example, counting and aggregation are simpler when all records for a given key are routed to the same record processor.

  • Ordering of records. For example, you want to transfer log data from the application host to the processing/archival host while maintaining the order of log statements.

  • Ability for multiple applications to consume the same stream concurrently. For example, you have one application that updates a real-time dashboard and another that archives data to Amazon Redshift. You want both applications to consume data from the same stream concurrently and independently.

  • Ability to consume records in the same order a few hours later. For example, you have a billing application and an audit application that runs a few hours behind the billing application. Because Amazon Kinesis Streams stores data for up to 7 days, you can run the audit application up to 7 days behind the billing application.

We recommend Amazon SQS for use cases with requirements that are similar to the following:

  • Messaging semantics (such as message-level ack/fail) and visibility timeout. For example, you have a queue of work items and want to track the successful completion of each item independently. Amazon SQS tracks the ack/fail, so the application does not have to maintain a persistent checkpoint/cursor. Amazon SQS will delete acked messages and redeliver failed messages after a configured visibility timeout.

  • Individual message delay. For example, you have a job queue and need to schedule individual jobs with a delay. With Amazon SQS, you can configure individual messages to have a delay of up to 15 minutes.

  • Dynamically increasing concurrency/throughput at read time. For example, you have a work queue and want to add more readers until the backlog is cleared. With Amazon Kinesis Streams, you can scale up to a sufficient number of shards (note, however, that you'll need to provision enough shards ahead of time).

  • Leveraging Amazon SQS’s ability to scale transparently. For example, you buffer requests and the load changes as a result of occasional load spikes or the natural growth of your business. Because each buffered request can be processed independently, Amazon SQS can scale transparently to handle the load without any provisioning instructions from you.

Pang
  • 8,605
  • 144
  • 77
  • 113
cloudtechnician
  • 545
  • 6
  • 15
17

Another thing: Kinesis can trigger a Lambda, while SQS cannot. So with SQS you either have to provide an EC2 instance to process SQS messages (and deal with it if it fails), or you have to have a scheduled Lambda (which doesn't scale up or down - you get just one per minute).

Edit: This answer is no longer correct. SQS can directly trigger Lambda as of June 2018

https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html

shonky linux user
  • 5,223
  • 2
  • 36
  • 66
DenNukem
  • 7,486
  • 3
  • 37
  • 44
  • 1
    -1 Disagree. While Kinesis can trigger lambda this poses no advantage over a sheduled SQS lambda. The latter will scale seamlessly (ie if it takes longer than a minute a second lambda will get spun up). Price is per compute time so no appreciable difference there either. And if you need more than 5 concurrent lambdas then just add multiple triggers scheduled a few seconds apart (using cron). This is not a reason to use Kinesis over SNS/SQS. – Steven de Salas Dec 13 '16 at 19:05
  • 5
    I'm not sure I agree with the disagreement ;] - you can schedule one lambda / minute, which would limit you to batch process the messages that arrived that interval. Kinesis would allow you to read the messages immediately. Or is something I misunderstood? – Moszi Feb 08 '17 at 08:41
  • There is a huge difference between a couple of cloudwatch triggers and hundreds when needing to invoke the pulling lambda against SQS. – Coding Pig Dec 21 '17 at 00:46
  • 14
    Lambda now supports SQS as a trigger! – sixty4bit Jun 30 '18 at 18:24
13

The pricing models are different, so depending on your use case one or the other may be cheaper. Using the simplest case (not including SNS):

  • SQS charges per message (each 64 KB counts as one request).
  • Kinesis charges per shard per hour (1 shard can handle up to 1000 messages or 1 MB/second) and also for the amount of data you put in (every 25 KB).

Plugging in the current prices and not taking into account the free tier, if you send 1 GB of messages per day at the maximum message size, Kinesis will cost much more than SQS ($10.82/month for Kinesis vs. $0.20/month for SQS). But if you send 1 TB per day, Kinesis is somewhat cheaper ($158/month vs. $201/month for SQS).

Details: SQS charges $0.40 per million requests (64 KB each), so $0.00655 per GB. At 1 GB per day, this is just under $0.20 per month; at 1 TB per day, it comes to a little over $201 per month.

Kinesis charges $0.014 per million requests (25 KB each), so $0.00059 per GB. At 1 GB per day, this is less than $0.02 per month; at 1 TB per day, it is about $18 per month. However, Kinesis also charges $0.015 per shard-hour. You need at least 1 shard per 1 MB per second. At 1 GB per day, 1 shard will be plenty, so that will add another $0.36 per day, for a total cost of $10.82 per month. At 1 TB per day, you will need at least 13 shards, which adds another $4.68 per day, for a total cost of $158 per month.

John Velonis
  • 791
  • 9
  • 10
  • I don't completely follow why the exponential increase in size, here, matters. Can you dig in a bit more? It sounds like you got some insight that I'd like to have. *Edit* Actually, looking at Euguene Feingold's answer, there looks to be a pretty solid debate on this (?). – Thomas Sep 22 '17 at 14:41
  • Sorry, I made some mistakes in my calculations (fixed now, I hope). – John Velonis Sep 25 '17 at 17:19
  • right, but what if your average SQS message size is small, say 1kb or less? – mcmillab Jun 28 '19 at 02:54
  • 1
    @mcmillab SQS will charge the same whether your message is 1 KB or 64 KB -- see [Amazon's SQS pricing page](https://aws.amazon.com/sqs/pricing/). So if your messages are only 1 KB, SQS will cost 64x as much as the figures I gave above if you are sending the same total amount of data. However, a single request can contain up to 10 messages, so if you are able to batch messages together, it might only be 6x as much (depending how full your batches are). – John Velonis Jul 03 '19 at 21:34
  • 1
    @JohnVelonis Above calculations for SQS pricing are missing a key piece. Extra care is needed to understand how SQS requests are charged. 1 request = 1 API Action. In order to process a single "message", it's necessary to perform at least 3 API actions: 1 send + 1 read + 1 delete. Other SQS features such as changing visibility will incur more API actions. This unexpected multiplier is quite nasty and typically results in SQS being 2-10x more expensive than Kinesis Streams for large data sets (say processing 100 million messages per month). – Vlad Poskatcheev Oct 29 '19 at 23:00
10

Kinesis solves the problem of map part in a typical map-reduce scenario for streaming data. While SQS doesnt make sure of that. If you have streaming data that needs to be aggregated on a key, kinesis makes sure that all the data for that key goes to a specific shard and the shard can be consumed on a single host making the aggregation on key easier compared to SQS

bhanu tadepalli
  • 131
  • 1
  • 2
5

I'll add one more thing nobody else has mentioned -- SQS is several orders of magnitude more expensive.

Eugene Feingold
  • 263
  • 2
  • 5
  • 3
    Are you sure? From my calculation Kinesis is much more expensive, but I've never been talented using the Amazon Simple Price Calculator. – Didier A. Mar 22 '16 at 22:27
  • Looking at the current pricing examples on aws: Kinesis with 267M messages is around $60, while putting that amount of messages through SQS would result in $107. Obviously I just did a really quick comparison, and this highly differs with different use cases, but this answer definitely should deserve some credit. – Moszi Feb 08 '17 at 08:48
  • 1
    Assume you are doing a fan out to say 2 consumers and 100 million messages a day. SNS cost is $50/day. SQS cost is $40/day/consumer or $80/day total. Kinesis is $1.4/day for PUTs and $0.36/shard. Even with 100 shards (100 MB/s in, 200 MB/s out) it's just $3.60/day + $1.40/day. So Kinesis at $4/day vs. SNS/SQS at $130/day. – Carlos Rendon Feb 27 '17 at 04:44
  • @Moszi How did you arrive at this calculation? A standard SQS queue with 15,000,000 messages per month and 10GB data transfer in and data transfer out costs only a measly 5.60USD/mo, while a 256KB payload (SQS max) and ~15,000,000 PUT units/mo (130 shards) in Kinesis costs 1,638.43USD/mo. – simoncpu Jul 18 '17 at 12:46
  • 4
    I'd be interested to know why there is such a disparity in costs in this thread. – Thomas Sep 22 '17 at 14:43
  • absolutely true!, SQS is only cheaper for small records and lower number of rate of input (think 10 records/second and 10KB each). as size of the records increase, sqs gets pretty expensive due to the fact records get broken down into 64KB sized requests and it is the numb of requests that are billed. Kinesis puts a limit on size of records at 1MB. On the other hand, when rate of records increases, rate of cost of sqs increase by higher margin than that of kinesis – human Jul 24 '18 at 05:48
  • One thing that definitely needs to be considered is how spiky your workload is. For SQS you pay for all messages in a month regardless of distribution whereas for Kinesis you need to dimension for peak load. You _can_ scale Kinesis up and down, but there's nothing automatic about it and if you hit the maximum throughput before you've scaled up it refuses your requests, requiring back-off or buffering (possibly via SQS). – Raniz Oct 17 '18 at 12:01
  • 1
    One gatcha with SQS pricing is that the stated cost rate per 1 million requests isn't actual messages. It's API actions, where 1 request = 1 API action. In order to process a single "message", it's necessary to perform at least 3 API actions: 1 send + 1 receive + 1 delete. Other SQS features such as changing visibility will incur more API actions. This unexpected multiplier is quite nasty and typically results in SQS being 2-10x more expensive than Kinesis Streams for large data sets (say processing 100 million messages per month). – Vlad Poskatcheev Oct 29 '19 at 23:11
4

Kinesis Use Cases

  • Log and Event Data Collection
  • Real-time Analytics
  • Mobile Data Capture
  • “Internet of Things” Data Feed

SQS Use Cases

  • Application integration
  • Decoupling microservices
  • Allocate tasks to multiple worker nodes
  • Decouple live user requests from intensive background work
  • Batch messages for future processing