Currently I have situation, which makes me a scared. I have like 20k rows in DB, which isn't even 1% of data, I will have in next 3 months. Each row represents an object(let's call it Object1) with some data. Also, I have table with stats for each Object1, let's call it Object1Stats, which is located in MongoDB. I have Object1Stats for each day, so to get, for example, total stats, I should sum every Object1Stats for Object1.
The problem is: often I need to have this data precalculated. For example, to display it to user, with ability to sort Object1 collection by stats. I could load it and sort in the code, but with, for example, 5 millions Object1, it will be too expensive.
So, I came with idea of precalculating stats each hour(Object1Stats updated twice in hour), for each Object1. But this process makes me afraid of time it will need to perform everything... I should take each Object1, send query to MongoDB to sum Object1Stats, make SQL UPDATE for Object1. Repeat this AT LEAST 3 million times.
I have 2 bottlenecks here: calculation of sum(MapReduce) in MongoDB and SQL update queries in Postgre. I can't really speedup MapReduce now(I assume it will be good), but I'm thinking about SQL updates.
Any thoughts or suggestions? I accept anything, even suggestions to use different DB or approach.
Also, I can't just add new stats data for object, because last day stats can be changed often, and previous days stats can be changed too.