4

Our dataset has rows which are highly compressible with respect to adjacent rows. As I understand it, Bigtable supports automatic compression (via SSTable block compression). It would make a huge difference to us if Spanner can or will support similar compression at the database level. We project this to make a 3-5x difference in our cost structure. While we could consider trying to do so at the application layer, it isn't much fun at all...

Maxim
  • 3,172
  • 9
  • 21

1 Answers1

3

Cloud Spanner charges for logical bytes -- the size of the data you send us. The sizes for types are listed here: https://cloud.google.com/spanner/docs/data-types

Albert Cui
  • 364
  • 1
  • 4
  • Hello -- yes, we realize this, but is database-level compression on the road map at all? Even if it's forthcoming in a year, that would help us with our decision to spend the $XXX,XXX on Spanner or not. Client-side compression doesn't work well for us because our individual rows aren't compressible, but on, say, a 100-adjacent-rows basis they are... (and, of course, client-side compression severely restricts the operations we can perform via Spanner). – spannerfan123 May 24 '17 at 18:24
  • We don't talk about roadmaps on StackOverflow. Cost of storage is likely the smallest part of the bill. If you are looking at a large scale Spanner deployment, you can reach me on my @google.com email using the first 3 letters of my first name and last name (6 letters total). – Dan McGrath May 25 '17 at 15:31
  • We have been experimenting with Spanner since its release, and exploring various cost tradeoffs. As you point out, the direct storage price is only one factor; but there is also the minimum number of nodes required to "manage" this storage (1 node per 2 TB), even if it is a relatively cold dataset that still needs to be in same transaction domain. If the required transaction QPS is low relative to the corpus size, then the pricing is highly sensitive to implicit cost of storage. For example 100TB might cost $750k/year. If our dataset is compressible by 5x, it'd be only $150k/year. – spannerfan123 May 25 '17 at 19:34
  • That would be assuming that the compression still keeps the 1:1 mapping of nodes per TB of storage, which isn't necessarily true. – Dan McGrath May 26 '17 at 21:20
  • In that case, we will reach out to you (or Google generally) to discuss roadmap + pricing options + flexibility; the cost for relatively cold and/or compressible datasets seems high, and IMHO it might not have to be this way from a fundamentals-oriented view (i.e. the cost to Google). We would love to use Spanner, but are pretty value-oriented... – spannerfan123 May 27 '17 at 01:17