This article covers troubleshooting control plane failures in Kubernetes, focusing on diagnosing and fixing issues with application deployments and cluster functionality.
In this lesson, we troubleshoot a control plane failure caused by a malfunctioning application deployment. We will methodically investigate the issue and apply the necessary corrections to restore cluster functionality.
Before diving into troubleshooting, ensure that you have configured an alias for kubectl and enabled autocompletion to speed up command entry. Execute the following commands:
Copy
Ask AI
source <(kubectl completion bash) # Enable autocompletion in the current shellecho "source <(kubectl completion bash)" >> ~/.bashrc # Persist autocompletion in your bash shellalias k=kubectlcomplete -F __start_kubectl k
Now you can use commands like kubectl get ... or simply k ... with autocompletion to improve your workflow.
The error indicates an incorrect command—“kube-schedulerrrr”—which contains extra characters. Because the kube-scheduler is a static pod defined in /etc/kubernetes/manifests/kube-scheduler.yaml, edit that file to remove the extra characters. After saving the corrected file, check the pod’s status again:
Copy
Ask AI
root@controlplane:~# vi /etc/kubernetes/manifests/kube-scheduler.yaml
Then verify:
Copy
Ask AI
root@controlplane:~# kubectl get pods -n kube-systemNAME READY STATUS RESTARTS AGE...kube-scheduler-controlplane 0/1 CreateContainerConfigError 0 8s
Watch until the pod reaches the ready state:
Copy
Ask AI
root@controlplane:~# kubectl get pods -n kube-system --watchNAME READY STATUS RESTARTS AGE...kube-scheduler-controlplane 0/1 Running 0 72s
Finally, review the logs to confirm the scheduler has started successfully:
Describe the controller manager pod to capture error details. Check its logs:
Copy
Ask AI
root@controlplane:~# kubectl logs kube-controller-manager-controlplane -n kube-systemFlag --port has been deprecated, see --secure-port instead.I0422 22:24:31.928604 1 serving.go:331] Generated self-signed cert in-memorystat /etc/kubernetes/controller-manager-XXXX.conf: no such file or directory
The log message indicates that the controller manager is referencing a non-existent kubeconfig file (/etc/kubernetes/controller-manager-XXXX.conf) instead of the correct /etc/kubernetes/controller-manager.conf. Edit the manifest file located at /etc/kubernetes/manifests/kube-controller-manager.yaml to remove the erroneous characters. A corrected snippet should appear as follows:
After saving the corrected manifest, monitor the controller manager pod:
Copy
Ask AI
root@controlplane:~# kubectl get pods -n kube-system --watch
Once the controller manager pod is running and ready, the scaling issue should resolve since the controller manager updates the ReplicaSets. Confirm this with:
Copy
Ask AI
root@controlplane:~# kubectl get deployNAME READY UP-TO-DATE AVAILABLE AGEapp 3/3 3 3 16m
All pods should now be running as expected.This concludes the troubleshooting lesson for control plane failures. We covered environment setup, diagnosing scheduling issues, and correcting static pod manifest errors for both the kube-scheduler and kube-controller-manager. For more information, refer to the Kubernetes Documentation.Happy troubleshooting!