Pod FailedScheduling

How to debug and resolve a FailedScheduling events for Kubernetes Pods

FailedScheduling is an event in Kubernetes that indicates that the scheduler was unable to place a pod on any available node.

This can happen for several reasons, including insufficient resources, node selectors or taints, or other constraints that prevent the scheduler from finding a suitable node for the pod.

Possible causes

Some common reasons for FailedScheduling events include:

Insufficient resources

The pod may require more resources (CPU, memory, or storage) than any available node can provide.

Node selectors or affinity

If the pod has node selectors or affinity rules that don’t match any of the available nodes, the pod will not be scheduled.

Taints and tolerations

Nodes can have taints that prevent pods from being scheduled on them unless the pod has a matching toleration. If the pod doesn’t have the required toleration for a taint, it won’t be scheduled on the tainted nodes.

Pod anti-affinity

If the pod has anti-affinity rules that prevent it from being scheduled alongside other pods, and all the nodes have those pods running, the scheduler won’t be able to find a suitable node for the new pod.

Diagnosis

To diagnose and resolve FailedScheduling issues, you can use kubectl describe pod <pod-name> to view the pod’s events and details. This command will show the reason for the FailedScheduling event and provide more information to help you understand and resolve the issue.

❯ kubectl describe pod metrics-server-5b8644d458-ml9fv 
...

Remediation

Insufficient resources (CPU/Memory)

When a pod cannot be scheduled due to insufficient resources, you may encounter an event similar to the following:

Warning FailedScheduling 16m  default-scheduler 0/2 nodes are available: 
2 Insufficient cpu. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.

This event indicates that none of the available nodes have the necessary resources (CPU, memory, or both) to accommodate the pod’s requirements. To resolve this issue, you can take the following steps:

  1. Review the pod’s resource requests: Check if the pod’s CPU and memory requests are accurate and necessary. If the requests are too high, consider lowering them to more accurately reflect the pod’s actual resource usage.
  2. Evaluate other pods' resource requests: Analyze the resource requests of other pods running in the cluster. Some pods may be requesting excessive amounts of CPU or memory, preventing the scheduler from finding a suitable node for the new pod. If necessary, adjust the resource requests of these pods to free up resources.
  3. Provision additional capacity: If the cluster’s resources are genuinely insufficient, consider adding more nodes or upgrading existing nodes to increase the available CPU and memory. This will allow the scheduler to find a suitable node for the new pod.
  4. Enable cluster autoscaling: If your cluster supports autoscaling, you can configure the cluster autoscaler to automatically add or remove nodes based on the cluster’s resource needs. This will help ensure that the cluster has sufficient capacity to accommodate new pods.