I recently stumbled on an interesting TCP performance issue while running some performance tests that compared network performance versus loopback performance. In my case the network performance exceeded the loopback performance (1Gig network, same subnet). In the case I am dealing latencies are crucial, so TCP_NODELAY is enabled. The best theory that we have come up with is that TCP congestion control is holding up packets. We did some packet analysis and we can definitely see that packets are being held, but the reason is not obvious. Now the questions...
1) In what cases, and why, would communicating over loopback be slower than over the network?
2) When sending as fast as possible, why does toggling TCP_NODELAY have so much more of an impact on maximum throughput over loopback than over the network?
3) How can we detect and analyze TCP congestion control as a potential explanation for the poor performance?
4) Does anyone have any other theories as to the reason for this phenomenon? If yes, any method to prove the theory?
Here is some sample data generated by a simple point to point c++ app:
Transport Message Size (bytes) TCP NoDelay Send Buffer (bytes) Sender Host Receiver Host Throughput (bytes/sec) Message Rate (msgs/sec) TCP 128 On 16777216 HostA HostB 118085994 922546 TCP 128 Off 16777216 HostA HostB 118072006 922437 TCP 128 On 4096 HostA HostB 11097417 86698 TCP 128 Off 4096 HostA HostB 62441935 487827 TCP 128 On 16777216 HostA HostA 20606417 160987 TCP 128 Off 16777216 HostA HostA 239580949 1871726 TCP 128 On 4096 HostA HostA 18053364 141041 TCP 128 Off 4096 HostA HostA 214148304 1673033 UnixStream 128 - 16777216 HostA HostA 89215454 696995 UnixDatagram 128 - 16777216 HostA HostA 41275468 322464 NamedPipe 128 - - HostA HostA 73488749 574130
Here are a few more pieces of useful information:
- I only see this issue with small messages
- HostA and HostB both have the same hardware kit (Xeon X5550@2.67GHz, 32 cores total/128 Gig Mem/1Gig Nics)
- OS is RHEL 5.4 kernel 2.6.18-164.2.1.el5)
Thank You