First of the all I want to say: I checked the similar posts in internet, and I saw similar question on stack overflow like:
-
Best data store for billions of rows
How to store 7.3 billion rows of market data (optimized to be read)?
But I want to open my question for double check.
So...I on the start to write my [BIG PROJECT] and right now I'm write all documentations etc...
While check the "things" I see that in 1 of my general USE CASES of application I will need handle...
[!!!ATTENTIONS!!!] About BILLIONS Requests per DAY!
Yep. Billions per day!
I cannot say what is this requests and etc, but I can say:
1) The data inside request is has pretty good structure 2) I will need work with this data a lot. I mean many-many queries to this data.
Today I did fast test for calculate in MS SQL Server 2017 (14.0.100):
50M of this records = 10GB
===> 1B ==> 200GB
So 200GB is DAILY SIZE!!!
200Gb * 30 = 6TB - Monthly
6TB * 12 ===> 72TB - 1 Year size
And Queries (Store procedure) not was so fast.
Because I'm only on the Documentation, Technical Design step..I want to take time and check the best way to handle this data.
If I look in 1-3-5 year forward...
(Don't want after 2 years start change how migrate data etc..)
The second question is Architecture...
This Big data flow very similar to Google Analytics. But I have send ID of request in response .
I'm in generally .NET DEVELOPER and will develop this project on .NET CORE and Microservices architecture
And now I see big power in .NET CORE under linux, ngnix etc...
So my Question is : What is best practices/ architecture template to write this microservice. How Google analytics handle this millions and billions requests per day.
I check about DB of Google analytics - this is BigTable.
The best alternative I found is: HBase
If the HBase is my HERO??
And 1 more question is:
What is the best choice:
- Use cloud database solution (like in AWS EMR/Dynamo/etc..)
- Launch EC2 instanse and run own database on this instance
Thank you guys for help, and sorry for my English grammar.