How to debug and resolve a Evited status for Kubernetes Pods
evicted pod is a pod that has been terminated by the Kubernetes node as a result of resource pressure. Eviction serves as a mechanism to safeguard the node from exhausting critical resources, ensuring the overall stability of the cluster.
When a pod is
evicted, Kubernetes attempts to reschedule it onto another node. However, if all nodes lack sufficient resources or fail to meet scheduling constraints, the pod may stay in the
Evicted state until resources become available.
Evicted pods occur when a node evicts one or more of its running pods due to resource constraints such as insufficient memory, CPU, or disk space. The Kubernetes scheduler detects these evictions and reports them as events in the Kubernetes API, which can be viewed using:
kubectl describe pod <pod-name>
It can also be detected through any number of monitoring and/or alerting tools, such as Prometheus Alerts.
A pod may be evicted if the node it is running on runs out of memory. This can happen when a node is over-provisioned with memory, such as when pods are configured with low memory requests and high memory limits. This setup can cause memory pressure to build up on a node.
As a result, the kubelet identifies that the node’s memory usage has exceeded a specific threshold, initiating pod eviction. In this case, the kubelet prioritizes evicting pods that utilize more memory than initially requested.
In addition to memory pressure, other factors that can cause node pressure include:
- Insufficient disk space
- A high number of PIDs on the host
- A lack of free inodes
For more information, read the kubernetes documentation on node pressure
Pods with lower priority may be evicted in favor of higher-priority pods when resources become scarce.
To diagnose an evicted pod, follow the steps below:
Identify the evicted pod: Use the kubectl get pods command to list all pods in your namespace. Look for pods with a status of Evicted.
kubectl get pods
If you want to list evicted pods across all namespaces, run:
kubectl get pods --all-namespaces --field-selector 'status.phase=Failed'
Inspect the pod’s events and status
kubectl describe pod <pod-name> command to view details about the evicted pod. Look for the Reason and Message fields under the Status section. These fields provide information about the cause of the eviction.
kubectl describe pod <pod-name>
Review the pod’s resource requests and limits
Inspect the pod’s YAML definition or the output of the
kubectl describe pod command to review the pod’s resource requests and limits. Ensure that the requested resources are within the available capacity of the cluster.
Analyze the node’s resources
kubectl describe node <node-name> command to check the resources of the node where the evicted pod was running. Look for the Allocatable and Capacity fields to see the node’s available resources, and the Non-terminated Pods section to view the resources consumed by other pods on the node.
kubectl describe node <node-name>
You can also check the node events using the following command:
kubectl get events -A --field-selector involvedObject.kind=Node
Check cluster-wide resource usage
Use monitoring tools such as Prometheus or Grafana, or cloud-provider-specific monitoring tools, to analyze resource usage across the entire cluster. Identify any resource bottlenecks or imbalances that might be contributing to pod evictions.
Examine the logs
kubectl logs <pod-name> command to check the logs of the evicted pod, if it’s still available. The logs may provide insight into the application’s behavior and resource consumption patterns.
kubectl logs <pod-name>
Check the pod’s priority
Review the pod’s priority class in its YAML definition or using the kubectl describe pod command. Lower-priority pods are more likely to be evicted when resources become scarce.
After gathering this information, use it to pinpoint the cause of the eviction and make any necessary adjustments, such as modifying resource requests and limits, adding more nodes, or addressing application-level issues.
- Check the node’s disk usage and clean up any unused resources, such as orphaned volumes or unused images.
- Review the CPU/memory requests and limits for the affected pod, and adjust them as necessary.
- Analyze other pods running on the same node to see if any are consuming excessive CPU or memory.
- Add more nodes to the cluster or increase the CPU memory of existing nodes to provide additional capacity.
- Clean up any unnecessary files or resources on the node.
Note that Nodes within a cluster have thresholds after which pods will be evicted. Kubernetes uses the following default hard eviction thresholds:
memory.available<100Mi: node memory available is less than 100Mi
nodefs.available<10%: node has less than 10% disk space available. This could happen due to excessive logging or writing persistent data to the root file system of a machine, instead of a Persistent Volume.
- Review the pod priority settings for the affected pod and other pods in the cluster.
- Adjust the pod priorities as needed to ensure that critical workloads have higher priority.
- Add more nodes or increase the capacity of existing nodes to accommodate all workloads.