SocketIO scaling architecture and large rooms requirements

Question

We are using socketIO on a large chat application.

At some points we want to dispatch "presence" (user availability) to all other users.

io.in('room1').emit('availability:update', {userid='xxx', isAvailable: false});

room1 may contains a lot of users (500 max). We observe a significant raise in our NodeJS load when many availability updates are triggered.

The idea was to use something similar to redis store with Socket IO. Have web browser clients to connect to different NodeJS servers.

When we want to emit to a room we dispatch the "emit to room1" payload to all other NodeJS processes using Redis PubSub ZeroMQ or even RabbitMQ for persistence. Each process will itself call his own io.in('room1').emit to target his subset of connected users.

One of the concern with this setup is that the inter-process communication may become quite busy and I was wondering if it may become a problem in the future.

Here is the architecture I have in mind.

enter image description here

What happens when in a single process scheme (per room) you try to run 500 users? Can't you use a process per room? Perhaps having their own connection. Consider an example: Camfrog - a non-js chat with webcam and sound as well. They structure their service having a central process and one process per room with their own port binding — Luis Masuelli, May 22 '14 at 17:43
I'm new to this: can't you have a "global" queue for global messages and let them "peek" the same message? — Luis Masuelli, May 22 '14 at 19:24
Why not apply a common load balancing pattern? Each users gets to cash a token, and sends that token with each request, your load balancer handles the distribution? — Johan, May 23 '14 at 11:05
You can try SocketCluster, it has a similar interface to Socket.io and runs as multiple parallel workers which share load efficiently (based on event name hashes and message-queues): https://github.com/topcloud/socketcluster — Jon, Jun 20 '14 at 01:41
This is an excellent question. There is very little information out there on large scale deployments of Socket.IO — Gaurav Ramanan, Dec 29 '14 at 09:33

score 2 · Answer 1 · answered Aug 04 '14 at 17:56

Could you batch changes and only distribute them every 5 seconds or so? In other words, on each node server, simply take a 'snapshot' every X seconds of the current state of all users (e.g. 'connected', 'idle', etc.) and then send that to the other relevant servers in your cluster.

Each server then does the same, every 5 seconds or so it sends the same message - of only the changes in user state - as one batch object array to all connected clients.

Right now, I'm rather surprised you are attempting to send information about each user as a packet. Batching seems like it would solve your problem quite well, as it would also make better use of standard packet sizes that are normally transmitted via routers and switches.

score 2 · Answer 2 · answered Apr 17 '15 at 17:12

2

You are looking for this library: https://github.com/automattic/socket.io-redis

Which can be used with this emitter: https://github.com/Automattic/socket.io-emitter

answered Apr 17 '15 at 17:12

Mustafa Dokumacı

2,800
2
12
11

score 1 · Answer 3 · answered May 23 '14 at 11:27

About available users function, I think there are two alternatives,you can create a "queue Users" where will contents "public data" from connected users or you can use exchanges binding information for show users connected. If you use an "user's queue", this will be the same for each "room" and you could update it when an user go out, "popping" its state message from queue (Although you will have to "reorganize" all queue message for it).

Nevertheless, I think that RabbitMQ is designed for asynchronous communication and it is not very useful approximation have a register for presence or not from users. I think it's better for applications where you don't know when the user will receive the message and its "real availability" ("fire and forget architectures"). ZeroMQ require more work from zero but you could implement something more specific for your situation with a better performance.

An publish/subscribe example from RabbitMQ site could be a good point to begin a new design like yours where a message it's sent to several users at same time. At summary, I will create two queues for user (receive and send queue messages) and I'll use specific exchanges for each "room chat" controlling that users are in each room using exchange binding's information. Always you have two queues for user and you create exchanges to binding it to one or more "chat rooms".

I hope this answer could be useful for you ,sorry for my bad English.

Filip Dupanović · Answer 4 · 2015-02-13T21:19:14.073

This is the common approach for sharing data across several Socket.io processes. You have done well, so far, with a single process and a single thread. I could lamely assume that you could pick any of the mentioned technologies for communicating shared data without hitting any performance issues.

If all you need is IPC, you could perhaps have a look at Faye. If, however, you need to have some data persisted, you could start a Redis cluster with as many Redis masters as you have CPUs, though this will add minor networking noise for Pub/Sub.

SocketIO scaling architecture and large rooms requirements

4 Answers4