How to manage persistent connections in kubernetes

Question

In Kubernetes services talk to each other via a service ip. With iptables or something similar each TCP connection is transparently routed to one of the pods that are available for the called service. If the calling service is not closing the TCP connection (e.g. using TCP keepalive or a connection pool) it will connect to one pod and not use the other pods that may be spawned.

What is the correct way to handle such a situation?

My own unsatisfying ideas:

Closing connection after each api call

Am I making every call slower only to be able to distribute requests to different pods? Doesn't feel right.

Minimum number of connections

I could force the caller to open multiple connections (assuming it would then distribute the requests across these connections) but how many should be open? The caller has (and probably should not have) no idea how many pods there are.

Disable bursting

I could limit the resources of the called services so it gets slow on multiple requests and the caller will open more connections (hopefully to other pods). Again I don't like the idea of arbitrarily slowing down the requests and this will only work on cpu bound services.

score 3 · Answer 1 · edited Jul 23 '19 at 12:37

3

The keep-alive behavior can be tuned by options specified in the Keep-Alive general header:

E.g:

Connection: Keep-Alive
Keep-Alive: max=10, timeout=60

Thus, you could re-open a tcp connection after a specific timeout instead than at each API request or after a max number of http transactions.

Keep in mind that timeout and max are not guaranteed.

EDIT:

Note that If you use k8s service you can choose two LB mode:

iptables proxy mode (By default, kube-proxy in iptables mode chooses a backend at random.)
IPVS proxy mode where you have different load balancing options:

IPVS provides more options for balancing traffic to backend Pods; these are:

rr: round-robin lc: least connection (smallest number of open connections) dh: destination hashing sh: source hashing sed: shortest expected delay nq: never queue

check this link

edited Jul 23 '19 at 12:37

Wytrzymały Wiktor

5,442
3
13
22

answered Jul 23 '19 at 08:34

melix

418
4
16

I think this would not help in my situation. There would still be one pod that receives all the requests while the others are idle. Or am I missing something? – deflomu Jul 23 '19 at 08:47
After the TCP connection expire (because of timeout or max connection) the client will open a new TCP connection that this time will probably be distributed to another pod (with equal probability between pods). Assuming you are using a k8s service. – melix Jul 23 '19 at 08:57
I updated my response to better answer your question – melix Jul 23 '19 at 09:22
Hmm, interesting. But if the service does not close a connection and if one connection is enough to handle all requests (but only if bursting is allowed) would that help me? – deflomu Jul 23 '19 at 13:43
If you initialise the connection with that header the connection is going to be closed after the timeout or after the max number of http request. – melix Jul 24 '19 at 09:18
I tried it and it helps but it is not a perfekt solution. Now every 10 requests or every 60 seconds the load shifts from one pod to another while the rest of them is still doing nothing. I also found https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/ which looks like this is problem that cannot be solved in the application itself – deflomu Aug 02 '19 at 13:24

score 1 · Answer 2 · answered Aug 14 '19 at 19:17

One mechanism to do this might be to load balance in a layer underneath the TCP connection termination. For example, if you split your service into two - the microservice (let's call it frontend-svc) that does connection handling and maybe some authnz, and another separate service that does your business logic/processing.

clients <---persistent connection---> frontend-svc <----GRPC----> backend-svc

frontend-svc can maintain the make calls to your backend in a more granular fashion, making use of GRPC for example, and really load balance among the workers in the layer below. This means your pods that are part of the frontend-svc aren't doing much work and are completely stateless (and therefore have less need to load balance), which means you can also control them with an HPA provided you have some draining logic to ensure that you don't terminate existing connections.

This is a common approach that is used by SSL proxies etc to deal with connection termination separately from LB.

This article I found suggests that GRPC will have the same problem https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/ or am I missing something? — deflomu, Aug 16 '19 at 06:59

How to manage persistent connections in kubernetes

Closing connection after each api call

Minimum number of connections

Disable bursting

2 Answers2