I wrote user-mode client-server c application based on Berkeley sockets that interact over some private network.
The situation is definitely strange. Occasionally the connection becomes very slow under some vague circumstances. The normal TCP data exchange in my case is about 10-25 Kbytes payload per segment, but sometimes it becomes about ~200-500 bytes per segment.
After some troubleshooting, I realized that this problem is not reproducible for other network services, thus it looks like my service is to blame. But I can't figure out, what's wrong. It worked well on 3.10 Linux kernel, but have that strange behavior on 4.4. Could it be some internal kernel changes which caused such problem?
I tried to play with Linux sysctl
settings:
net.ipv4.tcp_congestion_control
net.ipv4.tcp_sack
net.ipv4.route.flush
but that did not help.
Seems that the problem appears at listen socket side. In tcpdump
the TCP Window size is OK while handshaking. But after first incoming packet window size reduces (by listener's side).
UPD
Here is my server-side code snippet:
serv_fd = socket(AF_INET, SOCK_STREAM, 0);
if (serv_fd == -1) {
perror("socket");
return;
}
server.sin_family = AF_INET;
server.sin_port = htons(LISTEN_PORT);
server.sin_addr.s_addr = htonl(INADDR_ANY);
#ifdef SET_BUF
if (setsockopt(serv_fd, SOL_SOCKET, SO_RCVBUF, &buflen, sizeof(int)) == -1) {
perror ("setsockopt");
return;
}
if (setsockopt(serv_fd, SOL_SOCKET, SO_SNDBUF, &buflen, sizeof(int)) == -1) {
perror ("setsockopt");
return;
}
#endif // SET_BUF
if (bind(serv_fd, (struct sockaddr *) &server, sizeof(server)) == -1) {
perror("bind");
return;
}
if (listen(serv_fd, 3)) {
perror("listen");
return;
}
printf("Server is listening on %u\n", LISTEN_PORT);
Could someone shed some light on my problem? I would be very grateful!
Can it be related to some recent Linux kernel modifications? Do I need to tune some Linux kernel settings or check some user-mode settings (f.e. socket options or whatever)?
P.S. The problem is unstable.
UPD:
tcpdump's output:
IP 10.0.0.34.31334 > 10.0.0.99.12345: Flags [S], seq 426261790, win 43690, options [mss 65495,sackOK,TS val 799180610 ecr 0,nop,wscale 7], length 0
IP 10.0.0.99.12345 > 10.0.0.34.31334: Flags [S.], seq 803872704, ack 426261791, win 65483, options [mss 65495,sackOK,TS val 799180567 ecr 799180610,nop,wscale 0], length 0
IP 10.0.0.34.31334 > 10.0.0.99.12345: Flags [.], ack 1, win 342, options [nop,nop,TS val 799180610 ecr 799180567], length 0
IP 10.0.0.34.31334 > 10.0.0.99.12345: Flags [P.], seq 1:1301, ack 1, win 342, options [nop,nop,TS val 799180610 ecr 799180567], length 1300
IP 10.0.0.34.31334 > 10.0.0.99.12345: Flags [P.], seq 1301:1804, ack 1, win 342, options [nop,nop,TS val 799181412 ecr 799180610], length 503
IP 10.0.0.99.12345 > 10.0.0.34.31334: Flags [.], ack 1804, win 512, options [nop,nop,TS val 799181412 ecr 799181412], length 0
10.0.0.34.31334 is a client, 10.0.0.99.12345 is a server. Pay attention to unexpected win 512
in the last line.
UPD2: I saw several messages about SYN-cookies in dmesg like:
possible SYN flooding on port 12345. Sending cookies.
But they are not so time related with slow transmissions.