0

I have a client using SQL Server for everything, including powering marketing dashboards using analytics queries. They are excited to start pushing data into Redshift as a data warehouse for the speed improvements alone (eg queries that take hours now take minutes or seconds). But they want to compare cost.

I'm a bit naive how to do this, since these are very different tools, like saying I'm nervous about the cost of a flight compared to using my bicycle around town.

What's the best framework and pricing data to share to do a realistic comparison?

ScottieB
  • 3,617
  • 6
  • 34
  • 57
  • This question is too big for here. Also you have no where near enough data points for anyone to make a guess. i suggest you hire a redshift professional to help you with this. – Jon Scott Feb 20 '19 at 17:55
  • I guess you'll need to include the cost of migration in the total cost, not just the operational cost. In fact if you can't optimise your existing reporting solution it's unlikely you'll have the skills to migrate it to Redshift – Nick.McDermaid Feb 20 '19 at 22:43
  • 1
    Woah, hard disagree about "skills to migrate". Redshift is intended to be easy to migrate to (most function names and arguments are compatible) and Redshift will require much less tuning and admin effort than SQL Server (source: over a decade of building SQL Server data warehouses and 5+ years of doing the same with Redshift). – Joe Harris Feb 21 '19 at 19:45

2 Answers2

1

Need to run a POC to fully capture the dynamics of price/performance.

As a very rough guide you could use this formula to find the size of Redshift cluster to start with (you can resize to add more nodes as needed).

First you need to extract the data from SQL Server and gzip it. 1TB in SQL Server will typically be 100-200GB gzipped, could be less.

  size of gzipped data in GB  | 500GB
* 1.5 (size once loaded in Rs)| 750GB
------------------------------|---------- 
= initial Redshift data size  | 750GB
/ 256GB (per DC2.large node)  | 2.93 nodes
/ 0.70  (target 70% disk use) | 4.18 nodes
------------------------------|---------- 
= initial Redshift node count | 4 nodes
* 25¢/hr (US-East-1 price)    | $1/hr
Joe Harris
  • 12,009
  • 3
  • 42
  • 52
  • 1
    I just realized 6 years later, you're still answering my naive questions on SQL Server -> Redshift migrations! Much obliged!! – ScottieB Feb 21 '19 at 16:57
1

There are two factors that determine the desired size of an Amazon Redshift cluster:

  • Data storage volume
  • Compute

You will obviously need enough nodes to store the data, so the real question is how many (and what size) nodes you need to obtain a desired level of performance.

If "queries that take hours now take minutes or seconds", then cost probably isn't going to be a huge determinant in the choice of system, but it is worth a comparison.

You should probably start with the smallest cluster that can hold your data volumes and then compare costs. If the performance is satisfactory, you'll be able to do a quick price comparison. However, you might also consider adding additional nodes to get even better performance. This might be worth the additional expense if it means the business can get information faster.

Also, spend some time ensuring that your data is optimally stored in Amazon Redshift. This requires an understanding of how to use DISTKEY and SORTKEY to let queries run super-fast. Doing this well can reduce/eliminate the need for additional nodes.

John Rotenstein
  • 165,783
  • 13
  • 223
  • 298