BIG DATA | Database and Architecture

Question

First of the all I want to say: I checked the similar posts in internet, and I saw similar question on stack overflow like:

https://dba.stackexchange.com/questions/188667/best-database-and-table-design-for-billions-of-rows-of-data

Best data store for billions of rows

How to store 7.3 billion rows of market data (optimized to be read)?

But I want to open my question for double check.

So...I on the start to write my [BIG PROJECT] and right now I'm write all documentations etc...

While check the "things" I see that in 1 of my general USE CASES of application I will need handle...

[!!!ATTENTIONS!!!] About BILLIONS Requests per DAY!

Yep. Billions per day!

I cannot say what is this requests and etc, but I can say:

1) The data inside request is has pretty good structure 2) I will need work with this data a lot. I mean many-many queries to this data.

Today I did fast test for calculate in MS SQL Server 2017 (14.0.100):

50M of this records = 10GB

===> 1B ==> 200GB

So 200GB is DAILY SIZE!!!

200Gb * 30 = 6TB - Monthly

6TB * 12 ===> 72TB - 1 Year size

And Queries (Store procedure) not was so fast.

Because I'm only on the Documentation, Technical Design step..I want to take time and check the best way to handle this data.

If I look in 1-3-5 year forward...

(Don't want after 2 years start change how migrate data etc..)

The second question is Architecture...

This Big data flow very similar to Google Analytics. But I have send ID of request in response .

I'm in generally .NET DEVELOPER and will develop this project on .NET CORE and Microservices architecture

And now I see big power in .NET CORE under linux, ngnix etc...

So my Question is : What is best practices/ architecture template to write this microservice. How Google analytics handle this millions and billions requests per day.

I check about DB of Google analytics - this is BigTable.

The best alternative I found is: HBase

If the HBase is my HERO??

And 1 more question is:

What is the best choice:

Use cloud database solution (like in AWS EMR/Dynamo/etc..)
Launch EC2 instanse and run own database on this instance

Thank you guys for help, and sorry for my English grammar.

This is a *Question and Answer* site. Please note that *Question* is singular, not plural. This question is far too broad in scope for this site. Please narrow it down to a single **specific** question. If you have multiple questions, create a separate post for each of them. See [ask]. — Ken White, Oct 20 '18 at 22:18

score 2 · Accepted Answer · answered Oct 20 '18 at 22:09

David this is a good challenge to have. TBH, I wouldn't bother with a relational database for data at that scale and cloud is a must.

If you are a .NET developer, stick to Azure and have a look at Cosmos DB but it will be expensive!!! Alternatively, if your system is read heavy, look at Cassandra but you are limited with how you query data, you will need to use something like Elasticsearch for complex query scenarios. I haven't got any experience with HBase.

Application wise there are other concerns like eventual consistency and availability, you might want to look at CQRS or patterns like Actor Pattern http://getakka.net/ for creating a highly available reactive application. Also don't forget docker and kubernetes will be your friends.

BIG DATA | Database and Architecture

1 Answers1