How to troubleshot: Kubernetes pods not creating or terminating

Question

I am new at K8s so I am having troubles getting to the button of the issue. Last week I installed a cluster with 1 master 2 nodes in centos with kubeadm:

kubectl get nodes

NAME             STATUS   ROLES                  AGE    VERSION
ardl-k8latam01   Ready    control-plane,master   7d2h   v1.20.0
ardl-k8latam02   Ready    <none>                 7d2h   v1.20.0
ardl-k8latam03   Ready    <none>                 7d2h   v1.20.0

At first was working fine, but then started failing after I start working with helm (don't know if its related). Now I cannot run any deployment and have a lot of pods in "terminating" status that never finish. Here I am trying to apply kubectl apply -f https://k8s.io/examples/controllers/nginx-deployment.yaml as an example:

[root@ardl-k8latam01 ~]# kubectl get all --all-namespaces
NAMESPACE     NAME                                          READY   STATUS        RESTARTS   AGE
default       pod/nginx-deployment-66b6c48dd5-2xt7b         1/1     Terminating   0          19h
default       pod/nginx-deployment-66b6c48dd5-5cttk         1/1     Terminating   0          19h
default       pod/nginx-deployment-66b6c48dd5-8bz2f         0/1     Pending       0          18h
default       pod/nginx-deployment-66b6c48dd5-dksqx         1/1     Terminating   0          19h
default       pod/nginx-deployment-66b6c48dd5-fj9kl         0/1     Pending       0          18h
default       pod/nginx-deployment-66b6c48dd5-j4hqv         0/1     Pending       0          18h
kube-system   pod/calico-kube-controllers-bcc6f659f-bgmkb   1/1     Running       0          18h
kube-system   pod/calico-kube-controllers-bcc6f659f-pksws   1/1     Terminating   0          7d21h
kube-system   pod/calico-node-fns6d                         0/1     Running       2          7d21h
kube-system   pod/calico-node-t854c                         1/1     Running       0          7d21h
kube-system   pod/calico-node-vbsdr                         1/1     Running       0          7d21h
kube-system   pod/coredns-74ff55c5b-gw8j2                   1/1     Running       1          18h
kube-system   pod/coredns-74ff55c5b-xhvqb                   1/1     Terminating   0          7d21h
kube-system   pod/coredns-74ff55c5b-xr9mb                   1/1     Terminating   0          7d21h
kube-system   pod/coredns-74ff55c5b-zhhkx                   1/1     Running       1          18h
kube-system   pod/etcd-ardl-k8latam01                       1/1     Running       2          7d21h
kube-system   pod/kube-apiserver-ardl-k8latam01             1/1     Running       4          7d21h
kube-system   pod/kube-controller-manager-ardl-k8latam01    1/1     Running       2          7d21h
kube-system   pod/kube-proxy-2lmpb                          1/1     Running       0          7d21h
kube-system   pod/kube-proxy-fchv8                          1/1     Running       2          7d21h
kube-system   pod/kube-proxy-xks7h                          1/1     Running       0          7d21h
kube-system   pod/kube-scheduler-ardl-k8latam01             1/1     Running       2          7d21h
kube-system   pod/metrics-server-68b849498d-6q74v           1/1     Terminating   0          7d20h
kube-system   pod/metrics-server-68b849498d-7lpz8           0/1     Pending       0          18h

NAMESPACE     NAME                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
default       service/dashboardlb      ClusterIP   10.100.82.105   <none>        8001/TCP                 7d20h
default       service/kubernetes       ClusterIP   10.96.0.1       <none>        443/TCP                  7d21h
kube-system   service/kube-dns         ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   7d21h
kube-system   service/metrics-server   ClusterIP   10.101.85.63    <none>        443/TCP                  7d20h

NAMESPACE     NAME                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
kube-system   daemonset.apps/calico-node   3         3         0       3            0           beta.kubernetes.io/os=linux   7d21h
kube-system   daemonset.apps/kube-proxy    3         3         1       3            1           kubernetes.io/os=linux        7d21h

NAMESPACE     NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
default       deployment.apps/nginx-deployment          0/3     3            0           18h
kube-system   deployment.apps/calico-kube-controllers   1/1     1            1           7d21h
kube-system   deployment.apps/coredns                   2/2     2            2           7d21h
kube-system   deployment.apps/metrics-server            0/1     1            0           7d20h

NAMESPACE     NAME                                                DESIRED   CURRENT   READY   AGE
default       replicaset.apps/nginx-deployment-66b6c48dd5         3         3         0       18h
kube-system   replicaset.apps/calico-kube-controllers-bcc6f659f   1         1         1       7d21h
kube-system   replicaset.apps/coredns-74ff55c5b                   2         2         2       7d21h
kube-system   replicaset.apps/metrics-server-68b849498d           1         1         0       7d20h

in cluster info dump I get:

==== START logs for container second-node of pod default/second-app-deployment-7f794d896f-q6zn5 ====
Request log error: the server rejected our request for an unknown reason (get pods second-app-deployment-7f794d896f-q6zn5)
==== END logs for container second-node of pod default/second-app-deployment-7f794d896f-q6zn5 ====

with describe:

[root@ardl-k8latam01 testwordpress]# kubectl describe pod nginx-deployment-66b6c48dd5-5cttk
Name:           nginx-deployment-66b6c48dd5-5cttk
Namespace:      default
Priority:       0
Node:           ardl-k8latam02/10.48.41.12
Start Time:     Fri, 18 Dec 2020 17:06:57 -0300
Labels:         app=nginx
                pod-template-hash=66b6c48dd5
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/nginx-deployment-66b6c48dd5
Containers:
  nginx:
    Container ID:
    Image:          nginx:1.14.2
    Image ID:
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9rnk6 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-9rnk6:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-9rnk6
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Warning  FailedCreatePodSandBox  22m                   kubelet            Failed to create pod **sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "044a2201b141e6679570d0f0ec3b1967b2a5bf0b230fa5058ed2bc6711eba55e" network for pod "nginx-deployment-66b6c48dd5-5cttk": networkPlugin cni failed to set up pod "nginx-deployment-66b6c48dd5-5cttk_default" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: connect: no route to host, failed to clean up sandbox container "044a2201b141e6679570d0f0ec3b1967b2a5bf0b230fa5058ed2bc6711eba55e" network for pod "nginx-deployment-66b6c48dd5-5cttk": networkPlugin cni failed to teardown pod "nginx-deployment-66b6c48dd5-5cttk_default" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: connect: no route to host]
  Normal   Scheduled               21m                   default-scheduler  Successfully assigne**d default/nginx-deployment-66b6c48dd5-5cttk to ardl-k8latam02
  Normal   SandboxChanged          2m27s (x93 over 22m)  kubelet            Pod sandbox changed, it will be killed and re-created.

I also tried rebooting the nodes and master but nothing changed. When I try to "describe" a "Terminating" pod it tells me that the pode does not exist.

Is my problem related to calico? How can I go deep about Request log error: the server rejected our request for an unknown reason?
How should I continue the investigation?

describe of the pod is giving some clue ```networkPlugin cni failed to set up pod ...https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: connect: no route to host]``` so there is some problem with CNI Pods in your cluster please paste the output of ```kubectl get pods -n calico-system -o wide``` — confused genius, Dec 19 '20 at 15:33
Try the same thing with k3d and see what happens. If it works in k3d then you have isolated the problem to your custom cluster configuration. Btw: I teach k8s, and I recommend not setting up custom clusters with kubeadm -- yes you can learn a lot by doing so, but most people will end up using "someone else's cluster" and the information will be the 99% academic and never needed, or if the information is ever needed it'll be woefully out of date. Unless you're planning to find work releasing k8s distros, learning how to use k8s is a more effective use of your time. — Software Engineer, Dec 19 '20 at 17:05
This is a generic message which can refer to multiple root causes. Try to check kubelet logs. Btw, aren't your failing pods related to a certain node? Try `kubectl get po -A -o wide`. If yes, concentrate your research on this node, esp. kubelet logs. — Olesya Bolobova, Dec 20 '20 at 22:54
Can you walk us thru the steps you performed when bootstrapping your cluster and installing CNI? Which OS your are running? Is that the first installation? — thomas, Dec 21 '20 at 08:39

How to troubleshot: Kubernetes pods not creating or terminating

0 Answers0