Pods can't ping each other in a kubernetes cluster spawned over nodes from two different subnets

Question

I am trying to bring up an on-prem k8 cluster using kubespray with 3 master and 5 worker nodes. The node IPs are from 2 different subnets.

Ansible inventory:

hosts:
saba-k8-vm-m1:
  ansible_host: 192.168.100.1
  ip: 192.168.100.1
  access_ip: 192.168.100.1
saba-k8-vm-m2:
  ansible_host: 192.168.100.2
  ip: 192.168.100.2
  access_ip: 192.168.100.2
saba-k8-vm-m3:
  ansible_host: 192.168.200.1
  ip: 192.168.200.1
  access_ip: 192.168.200.1
saba-k8-vm-w1:
  ansible_host: 192.168.100.3
  ip: 192.168.100.3
  access_ip: 192.168.100.3
saba-k8-vm-w2:
  ansible_host: 192.168.100.4
  ip: 192.168.100.4
  access_ip: 192.168.100.4
saba-k8-vm-w3:
  ansible_host: 192.168.100.5
  ip: 192.168.100.5
  access_ip: 192.168.100.5
saba-k8-vm-w4:
  ansible_host: 192.168.200.2
  ip: 192.168.200.2
  access_ip: 192.168.200.2
saba-k8-vm-w5:
  ansible_host: 192.168.200.3
  ip: 192.168.200.3
  access_ip: 192.168.200.3


children:
    kube-master:
      hosts:
        saba-k8-vm-m1:
        saba-k8-vm-m2:
        saba-k8-vm-m3:
    kube-node:
      hosts:
        saba-k8-vm-w1:
        saba-k8-vm-w2:
        saba-k8-vm-w3:
        saba-k8-vm-w4:
        saba-k8-vm-w5:

I spawned dnsutils next - kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml This is on w1 worker. It is able to lookup a svc name (I have created elasticsearch pods on w2)

root@saba-k8-vm-m1:/opt/bitnami# kubectl get svc -n kube-system
    NAME                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
    coredns                     ClusterIP   10.233.0.3      <none>        53/UDP,53/TCP,9153/TCP   6d3h
        
root@saba-k8-vm-m1:/opt/bitnami# kubectl exec -it dnsutils sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
/ #

/ # nslookup elasticsearch-elasticsearch-data.lilac-efk.svc.cluster.local. 10.233.0.3
Server:         10.233.0.3
Address:        10.233.0.3#53

Name:   elasticsearch-elasticsearch-data.lilac-efk.svc.cluster.local
Address: 10.233.49.187

I spawned the same dnsutils pod on w5 (.200 subnet) next. nslookup fails on this.

root@saba-k8-vm-m1:/opt/bitnami# kubectl exec -it dnsutils sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
/ #
/ # ^C
/ # nslookup elasticsearch-elasticsearch-data.lilac-efk.svc.cluster.local 10.233.0.3
;; connection timed out; no servers could be reached
    
/ # exit
command terminated with exit code 1

Logs from nodelocaldns running on w5:

 [ERROR] plugin/errors: 2 elasticsearch-elasticsearch-data.lilac-efk.lilac-efk.svc.cluster.local. AAAA: dial tcp 10.233.0.3:53: i/o timeout
 [ERROR] plugin/errors: 2 elasticsearch-elasticsearch-data.lilac-efk.lilac-efk.svc.cluster.local. A: dial tcp 10.233.0.3:53: i/o timeout

From the dnsutils container, I'm not able to reach coredns pod IPs on the other subnet, through overlay network. The cluster is spawned using Calico.

 root@saba-k8-vm-m1:/opt/bitnami# kubectl get pods -n kube-system -o wide | grep coredns
    pod/coredns-dff8fc7d-98mbw                        1/1     Running   3          6d2h    10.233.127.4    saba-k8-vm-m2   <none>           <none>
    pod/coredns-dff8fc7d-cwbhd                        1/1     Running   7          6d2h    10.233.74.7     saba-k8-vm-m1   <none>           <none>
    pod/coredns-dff8fc7d-h4xdd                        1/1     Running   0          2m19s   10.233.82.6     saba-k8-vm-m3   <none>           <none>
        
 root@saba-k8-vm-m1:/opt/bitnami# kubectl exec -it dnsutils sh
 kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
 / # ping 10.233.82.6
 PING 10.233.82.6 (10.233.82.6): 56 data bytes
 64 bytes from 10.233.82.6: seq=0 ttl=62 time=0.939 ms
 64 bytes from 10.233.82.6: seq=1 ttl=62 time=0.693 ms
 ^C
 --- 10.233.82.6 ping statistics ---
 2 packets transmitted, 2 packets received, 0% packet loss
 round-trip min/avg/max = 0.693/0.816/0.939 ms
 / # ping 10.233.74.7
 PING 10.233.74.7 (10.233.74.7): 56 data bytes
 ^C
 --- 10.233.74.7 ping statistics ---
 4 packets transmitted, 0 packets received, 100% packet loss
 / # ping 10.233.127.4
 PING 10.233.127.4 (10.233.127.4): 56 data bytes
 ^C
 --- 10.233.127.4 ping statistics ---
 2 packets transmitted, 0 packets received, 100% packet loss

kube_service_addresses: 10.233.0.0/18 kube_pods_subnet: 10.233.64.0/18

Because of this behaviour, fluentd running as daemon set on all 5 workers is in CrashLoopBack since it is unable to resolve elasticsearch svc name.

What am I missing? Any help is appreciated.

Are Calico IPIP links established on each server? `sudo calicoctl.sh node status`. Do you see all those IPs using this command on each server? `ip r | grep tunl` . Are you able to ping those IPs from any server to any server? Do you see any issues in Calico pods log? — laimison, Apr 08 '21 at 19:19

score 1 · Accepted Answer · answered Apr 09 '21 at 09:49

Thanks to @laimison for giving me those pointers.

Posting all my observations, so it can be useful to somebody.

On M1,

root@saba-k8-vm-m1:~# ip r | grep tunl
10.233.72.0/24 via 192.168.100.5 dev tunl0 proto bird onlink
10.233.102.0/24 via 192.168.100.4 dev tunl0 proto bird onlink
10.233.110.0/24 via 192.168.100.3 dev tunl0 proto bird onlink
10.233.127.0/24 via 192.168.100.2 dev tunl0 proto bird onlink

root@saba-k8-vm-m1:~# sudo calicoctl.sh node status
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+------------+-------------+
| PEER ADDRESS  |     PEER TYPE     | STATE |   SINCE    |    INFO     |
+---------------+-------------------+-------+------------+-------------+
| 192.168.100.2 | node-to-node mesh | up    | 2021-04-06 | Established |
| 192.168.200.1 | node-to-node mesh | start | 2021-04-06 | Passive     |
| 192.168.100.3 | node-to-node mesh | up    | 2021-04-06 | Established |
| 192.168.100.4 | node-to-node mesh | up    | 2021-04-06 | Established |
| 192.168.100.5 | node-to-node mesh | up    | 2021-04-06 | Established |
| 192.168.200.2 | node-to-node mesh | start | 2021-04-06 | Passive     |
| 192.168.200.3 | node-to-node mesh | start | 2021-04-06 | Passive     |
+---------------+-------------------+-------+------------+-------------+
IPv6 BGP status
No IPv6 peers found.

On M3,

lilac@saba-k8-vm-m3:~$ ip r | grep tunl
10.233.85.0/24 via 192.168.200.3 dev tunl0 proto bird onlink
10.233.98.0/24 via 192.168.200.2 dev tunl0 proto bird onlink

lilac@saba-k8-vm-m3:~$ sudo calicoctl.sh node status
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+------------+--------------------------------+
| PEER ADDRESS  |     PEER TYPE     | STATE |   SINCE    |              INFO              |
+---------------+-------------------+-------+------------+--------------------------------+
| 192.168.100.1 | node-to-node mesh | start | 2021-04-06 | Active Socket: Connection      |
|               |                   |       |            | reset by peer                  |
| 192.168.100.2 | node-to-node mesh | start | 2021-04-06 | Active Socket: Connection      |
|               |                   |       |            | closed                         |
| 192.168.100.3 | node-to-node mesh | start | 2021-04-06 | Active Socket: Connection      |
|               |                   |       |            | closed                         |
| 192.168.100.4 | node-to-node mesh | start | 2021-04-06 | Active Socket: Connection      |
|               |                   |       |            | closed                         |
| 192.168.100.5 | node-to-node mesh | start | 2021-04-06 | Active Socket: Connection      |
|               |                   |       |            | closed                         |
| 192.168.200.2 | node-to-node mesh | up    | 2021-04-06 | Established                    |
| 192.168.200.3 | node-to-node mesh | up    | 2021-04-06 | Established                    |
+---------------+-------------------+-------+------------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.

On M1, 192.168.200.2 and 192.168.200.3 are passive. On M3, I noticed Active Socket: Connection for all .100 IPs. This suggested that M3 is trying to establish a BGP connection, but it is not able to get through.

I was able to telnet 192.168.100.x 179 from M3.

Checking the calico pod log and node dump from running /usr/local/bin/calicoctl.sh node diags on M1, I could see

bird: BGP: Unexpected connect from unknown address 10.0.x.x (port 53107)

10.0.x.x was the management IP of the server on which .200 VMs were hosted. It was doing a source NAT.

I added this rule:

-A POSTROUTING ! -d 192.168.0.0/16 -j SNAT --to-source 10.0.x.x

That solved the issue.

root@saba-k8-vm-m1:/tmp/calico050718821/diagnostics/logs# /usr/local/bin/calicoctl.sh node status
Calico process is running.

IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS  |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+---------------+-------------------+-------+----------+-------------+
| 192.168.100.2 | node-to-node mesh | up    | 08:08:38 | Established |
| 192.168.200.1 | node-to-node mesh | up    | 08:09:15 | Established |
| 192.168.100.3 | node-to-node mesh | up    | 08:09:24 | Established |
| 192.168.100.4 | node-to-node mesh | up    | 08:09:02 | Established |
| 192.168.100.5 | node-to-node mesh | up    | 08:09:47 | Established |
| 192.168.200.2 | node-to-node mesh | up    | 08:08:55 | Established |
| 192.168.200.3 | node-to-node mesh | up    | 08:09:37 | Established |
+---------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

Other things that I tried:

I updated ipipMode across all the nodes. This doesn't solve the issue, but helps improves performance.

sudo /usr/local/bin/calicoctl.sh patch ippool default-pool -p '{"spec":{"ipipMode": "CrossSubnet"}}'
Successfully patched 1 'IPPool' resource

I referred to calico/node is not ready: BIRD is not ready: BGP not established and set interface=ens3, although this is the only interface on my VMs. Again, doesn't solve the issue, but will help when there are multiple interfaces on the calico node.

Pods can't ping each other in a kubernetes cluster spawned over nodes from two different subnets

1 Answers1