32

Many linux/unix programming books and tutorials speak about the "Thundering Herd Problem" which happens when multiple threads or forks are blocked on a select() call waiting for readability of a listening socket. When the connection comes in, all threads and forks are woken up but only one "wins" with a successful call to "accept()". In the meantime, a lot of cpu time is wasted waking up all the threads/forks for no reason.

I noticed a project which provides a "fix" for this problem in the linux kernel, but this is a very old patch.

I think there are two variants; One where each fork does select() and then accept(), and one that just does accept().

Do modern unix/linux kernels still have the Thundering Herd Problem in both these cases or only the "select() then accept()" version?

jdkoftinoff
  • 2,351
  • 1
  • 17
  • 17
  • 1
    never heard of this, but a lot of stuff the "linux kernel does badly", definitely isn't true any longer! – Matt Joiner Feb 06 '10 at 16:35
  • 1
    That's a very interesting question. It would be really hard to handle well in the `select` then `accept` case because you can't guarantee that the process that is doing `select` will eventually `accept`. I think someone should run a test to find out, and maybe test `epoll` too. – Omnifarious Feb 06 '10 at 17:44

4 Answers4

12

For years, most unix/linux kernels serialize response to accept(2)s, in other words, only one thread is waken up if more than one are blocking on accept(2) against a single open file descriptor.

OTOH, many (if not all) kernels still have the thundering herd problem in the select-accept pattern as you describe.

I have written a simple script ( https://gist.github.com/kazuho/10436253 ) to verify the existence of the problem, and found out that the problem exists on linux 2.6.32 and Darwin 12.5.0 (OS X 10.8.5).

President James K. Polk
  • 36,717
  • 16
  • 86
  • 116
kazuho
  • 166
  • 1
  • 4
  • Thank you for the succinct test! For reference, I just tried running it in select-accept mode on linux 3.2.0-58:$ perl thundering-herd.pl select-accept connected! at thundering-herd.pl line 49. accept failed:Resource temporarily unavailable at thundering-herd.pl line 52. - And in 'accept' mode, there is no thunder. – jdkoftinoff Apr 11 '14 at 03:22
  • This tells me that the thundering herd problem does still exist and this means that one should either have all threads doing accept() without select() or have one thread doing select() and then accept() and then pass the fd to a worker thread for processing. – jdkoftinoff Apr 11 '14 at 03:24
  • This problem also exists on my ubuntu 8.04 (linux 2.6.24) – ASBai Jul 11 '14 at 15:47
11

This is a very old problem, and for the most part does not exist any more. The Linux kernel (for the past few years) has had a number of changes with the way it handles and routes packets up the network stack, and includes many optimizations to ensure both low latency, and fairness (i.e., minimize starvation).

That said, the select system has a number of scalability issues simply by way of its API. When you have a large number of file descriptors, the cost of a select call is very high. This is primarily due to having to build, check, and maintain the FD sets that are passed to and from the system call.

Now days, the preferred way to do asynchronous IO is with epoll. The API is far simpler and scales very nicely across various types of load (many connections, lots of throughput, etc.)

T.J. Crowder
  • 879,024
  • 165
  • 1,615
  • 1,639
0xfe
  • 4,453
  • 1
  • 16
  • 14
  • I'm thinking of the situation where there is only one socket per fork and the socket is blocking. So epoll() vs poll() vs select() doesn't matter here. Does linux have special task waking behaviour for listening sockets now? – jdkoftinoff Feb 07 '10 at 15:12
  • Yes, IIRC, the implementation now wakes up only one thread (as opposed to waking up all threads and having them compete.) How it picks the thread is complicated, but the simple answer is: it's priority based, so it relies on the scheduler to maintain fairness. Anyhow, that specific herding problem does not exist anymore so the situation isn't as dire upon overload. All this said, using select instead of epoll does incur a higher CPU cost (both in the kernel and in user-space), which is, IMO, negligible for your use-case. – 0xfe Feb 07 '10 at 16:47
2

I recently saw tested a scenario where multiple threads polled on a listening unix-domain socket and then accepted the connection. All threads woke up using the poll() system call.

This was a custom build of the linux kernel rather than a distro build so perhaps there is a kernel configure option that changes it but I don't know what that would be.

We did not try epoll.

codeshot
  • 1,155
  • 1
  • 9
  • 20
2

Refer the link below which talks about separate flags to epoll to avoid this problem.

http://lwn.net/Articles/632590/

pastum
  • 531
  • 1
  • 4
  • 4