44

The Setup:
Imagine a 'twitter like' service where a user submits a post, which is then read by many (hundreds, thousands, or more) users.

My question is regarding the best way to architect the cache & database to optimize for quick access & many reads, but still keep the historical data so that users may (if they want) see older posts. The assumption here is that 90% of users would only be interested in the new stuff, and that the old stuff will get accessed occasionally. The other assumption here is that we want to optimize for the 90%, and its ok if the older 10% take a little longer to retrieve.

With this in mind, my research seems to strongly point in the direction of using a cache for the 90%, and then to also store the posts in another longer-term persistent system. So my idea thus far is to use Redis for the cache. The advantages is that Redis is very fast, and also it has built in pub/sub which would be perfect for publishing posts to many people. And then I was considering using MongoDB as a more permanent data store to store the same posts which will be accessed as they expire off of Redis.

Questions:
1. Does this architecture hold water? Is there a better way to do this?
2. Regarding the mechanism for storing posts in both the Redis & MongoDB, I was thinking about having the app do 2 writes: 1st - write to Redis, it then is immediately available for the subscribers. 2nd - after successfully storing to Redis, write to MongoDB immediately. Is this the best way to do it? Should I instead have Redis push the expired posts to MongoDB itself? I thought about this, but I couldn't find much information on pushing to MongoDB from Redis directly.

Didier Spezia
  • 63,324
  • 10
  • 166
  • 145
Ryan Ogle
  • 706
  • 1
  • 7
  • 15
  • Redis won't push to MongoDb. You have to do it yourself. Or just write to both places at the same time (as you suggested). – Sergio Tulentsev Jun 27 '12 at 08:32
  • I'd always push to the more robust store first (MongoDB in this case), or as Sergio suggested, async at the same time. Never the other way around. – Geert-Jan Jun 28 '12 at 08:35
  • My question is , would you store only the ids of posts in cache or the whole lists of post objects in cache ? – user636525 Jan 08 '13 at 21:02

1 Answers1

37

It is actually sensible to associate Redis and MongoDB: they are good team players. You will find more information here:

MongoDB with redis

One critical point is the resiliency level you need. Both Redis and MongoDB can be configured to achieve an acceptable level of resiliency, and these considerations should be discussed at design time. Also, it may put constraint on the deployment options: if you want master/slave replication for both Redis and MongoDB you need at least 4 boxes (Redis and MongoDB should not be deployed on the same machine).

Now, it may be a bit simpler to keep Redis for queuing, pub/sub, etc ... and store the user data in MongoDB only. Rationale is you do not have to design similar data access paths (the difficult part of this job) for two stores featuring different paradigms. Also, MongoDB has built-in horizontal scalability (replica sets, auto-sharding, etc ...) while Redis has only do-it-yourself scalability.

Regarding the second question, writing to both stores would be the easiest way to do it. There is no built-in feature to replicate Redis activity to MongoDB. Designing a daemon listening to a Redis queue (where activity would be posted) and writing to MongoDB is not that hard though.

Community
  • 1
  • 1
Didier Spezia
  • 63,324
  • 10
  • 166
  • 145
  • 1
    I'm curious, any links/background on why Redis and Mongo shouldn't be deployed on the same machine? – Geert-Jan Jun 27 '12 at 23:17
  • 10
    It is due to the fact MongoDB maps the data files in memory. So it uses the virtual memory mechanism to access the data whose structure is designed to favor locality (btrees are used for indexes for instance). With MongoDB, when the data do not fit in memory, the machine will swap, and it is designed for this. – Didier Spezia Jun 28 '12 at 06:43
  • 10
    On the contrary, Redis is a pure main-memory data store, based on memory oriented data structures (hash tables, lists, skip lists, etc ...) which do not enforce any kind of locality. Because it is single-threaded, performance is dramatically impacted when Redis memory is swapped out. – Didier Spezia Jun 28 '12 at 06:44
  • 19
    So if you put MongoDB and Redis on the same box and MongoDB data do not fit in memory, MongoDB will "steal" memory to Redis via the OS paging mechanism. The consequence is a major performance drop for Redis. – Didier Spezia Jun 28 '12 at 06:47
  • 3
    Thanks, good to know. On boxes where both Mongo and Redis data fit in Ram completely I take it this isn't a problem? – Geert-Jan Jun 28 '12 at 08:32
  • Correct. If everything fit in memory, there is no issue. – Didier Spezia Jun 28 '12 at 08:50
  • Can't we limit mongo with cgroups so that redis has at least max-memory-limit available at all times? – farnoy Jan 19 '13 at 16:17
  • I have never tried using cgroups with mongo, but it should work. Please note Redis requires more memory that max-memory-limit (communication buffers, etc ...). You will probably have to measure to size the cgroups config. – Didier Spezia Jan 19 '13 at 16:42
  • 1
    So in the end does it make more sense to write to redis and mongodb at the same time? Also, to do a read, should redis be queried first and if it does not exist query mongo? – Lion789 Sep 05 '13 at 04:53