12

I've seen the page on amazon and understand that 1 RCU is a 4KB item.

If I have a table with 50 items, I've read that a scan will read the full 50 items and use 50 RCU. But lets say I did a query, my table is 10 by 5, will it still use 50 RCU?

Mike Dinescu
  • 48,812
  • 10
  • 104
  • 136
zuba
  • 255
  • 4
  • 14
  • 1
    Query will only consume the count of items there are returned (assuming there is no filter, which will be filtered after the reading, and total size is less than 1mb) – Can Sahin May 04 '18 at 16:28

2 Answers2

35

Scanning a table that contains 50 items will consume 50 RCU only if the total size of the 50 items combined equal 200KB (for a strongly consistent read, or 400KB for an eventual consistent read). Most items are not that big, so a 50 items typically only require about 10KB to store meaning a full scan for a table of 50 items, with eventual consistency, would only cost about 3 RCU.

The consumed Read Capacity Units (RCU) depends on multiple factors:

If an item is read using a GetItem operation than the consumed capacity is billed in increments of 4KB, based on the size of the item (ie. a 200B item and a 3KB item would each consume 1RCU, while a 5KB item would consume 2 RCU)

If you read multiple items using a Query or Scan operation, then the capacity consumed depends on the cumulative size of items being accessed (you get billed even for items filtered out of a query or scan when using filters). So, if your query or scan accesses 10 items, that are approximately 200 bytes each in size, then it will consume only 1 RCU. If you read 10 items but each item is about 5KB in size, then the total consumed capacity will be 13 RCU (50KB / 4KB = 12.5, rounded up, is 13)

What's more, if you perform an eventual consistent read, then you can double the size per capacity unit. So it would only cost 7 RCU to read the 10 5KB items.

You can read more about throughput capacity here.

A couple of things to note:

  • a single item may be as large as 400KB, so reading an item could consume as much as 100 RCU.
  • when calculating item size, attribute names count towards the item size as well, not just their values!
Mike Dinescu
  • 48,812
  • 10
  • 104
  • 136
  • 1
    Useful summary. However, its unclear to me what "accessed" mean. If I query based on the Hash Key, would my query access only items with that key? How about sort key? – nagy.zsolt.hun Jan 13 '19 at 00:04
  • Correct. A query will only access items of a particular hash key – Mike Dinescu Jan 13 '19 at 00:06
  • Thanks. If I also set constraint on the sort key, would all items of the HashKey be accessed, or only the ones matching the constraint on the sort key as well? – nagy.zsolt.hun Jan 13 '19 at 00:09
  • Not sure what you mean. A query **requires** a hash key. It is that hash key that gets accessed in that query. – Mike Dinescu Jan 13 '19 at 01:49
  • 2
    I'm asking about composite keys (consisting of a hash key + a sort key): multiple Items may have the same hash key. When running a query where I specify the hash key + a constraint on the sort key (e.g. a BETWEEN condition), which items get accessed? All items with the same Hash Key, or only the ones matching the constraint on the sort key? – nagy.zsolt.hun Jan 13 '19 at 09:55
  • You can verify this by asking to return the consumed capacity in the query response but ionly the items returned by the key constraint should be counted towards the consumed capacity – Mike Dinescu Jan 13 '19 at 17:19
  • @MikeDinescu if you performed 4 rapid queries in succession (as is often the case with geoqueries), are those 4 queries guaranteed to be calculated individually? Or might they be calculated twice, for example, if each query made it to DynamoDB within half a second? In other words, if the first and second query hit the API within 1 second, would the RCU calculation be on their combined item size and treated as one API call? – acidgate Jan 31 '19 at 17:17
  • This would be better asked as a separate question but the TL;DR is each query is a separate request therefore capacity utilization is billed per reqest – Mike Dinescu Jan 31 '19 at 17:30
  • @MikeDinescu Good idea https://stackoverflow.com/questions/54468374/calculating-dynamodb-rcu-pricing-per-day-not-per-second – acidgate Jan 31 '19 at 20:13
  • *Most items are not that big, so a 50 items typically only require about 10KB to store meaning a full scan for a table of 50 items, with eventual consistency, would only cost about 3 RCU.* **Is this really correct?** According to AWS' docs, *"One read request unit represents one strongly consistent read request, or two eventually consistent read requests, **for an item** up to 4 KB in size."* No where in docs does it say reads capacity is cumulative... – user1322092 Feb 13 '19 at 02:53
  • Capacity consumed is for each operation(request), based on the amount of data accessed, not per item. Meke sense?! – Mike Dinescu Feb 13 '19 at 04:23
  • @nagy.zsolt.hun To answer your question, all the items with the same hash key are accessed and then filter it applied on top it. Capacity consumption also is for all items accessed, not just the ones returned. – Vinay Nov 12 '19 at 13:11
  • This needs to be so much more clear in the documentation... Maybe the *pricing page* – danthegoodman Dec 06 '20 at 01:04
2

Query—Reads multiple items that have the same partition key value. All items returned are treated as a single read operation, where DynamoDB computes the total size of all items and then rounds up to the next 4 KB boundary. For example, suppose your query returns 10 items whose combined size is 40.8 KB. DynamoDB rounds the item size for the operation to 44 KB. If a query returns 1500 items of 64 bytes each, the cumulative size is 96 KB.

Ref: https://docs.amazonaws.cn/en_us/amazondynamodb/latest/developerguide/ProvisionedThroughput.html

rajd
  • 21
  • 2