Pod in CrashLoopBackOff
How to debug and resolve a pod that's in a CrashLoopBackOff state
Situation
A pod is in a CrashLoopBackOff
state. This is either detected through kubectl get pod
or for example through Kubernetes alerts.
A crash loop means Kubernetes tried to start a pod, but it has crashed to often. After each restart, Kubernetes will increase the delay before attempting another start.
Possible causes
Some common causes could be;
- upstream dependencies such as databases are not available (e.g. do not accept new connections)
- configuration mistake for your application
- Out of memory (OOM killed)
Good examples of this are newly configured network policies, preventing DNS queries to coreDNS.
Diagnosis
- Examine logs to find causes. This can be done either through
kubectl logs
or using Grafana Explore. - Use
kubectl describe pod
to find causes. - Make sure network policies are correct in case your application requires connectivity to external resources
- Resource usage (most often memory)
View logging
Crash looping pods are often not in a running state. To see the application output from the crash, use the --previous
flag:
You may need to adjust the --tail
flag to get more or fewer log lines.
Get events
Events are a helpful indicator to figure out if resource usage or failing health checks are causing the crash
You can either use a describe:
Or use get events --field-select
:
Remediation
DNS issues
If the logging indicate an issue with resolving a hostname (e.g. database connection url), check the following:
- When using network policies in the pod's namespace, make sure a network policy is in place to allow connectivity to coreDNS
- Check for a mistake in the hostname. Note that some clusters do not use
.cluster.local
. Make sure the service exists.
Failing livenessProbe
If a livenessProbe is failing, it will restart the container.
If this happens during start-up, your initialDelay is configured correctly. We'd recommend to also configure a startUpProbe.
Failing livenessProbes is an indicator that your application or it's runtime not functioning properly.
Often when this is the cause, a restart of the container solves this problem. However if this occures to often, Kubernetes will start the crashLoopBackOff
.
Common causes are;
- Resource exaustion
- To low CPU limits
- Memory saturation
- Downstream dependencies unavailable
Please note that if you use the availability of downstream dependencies in your livenessProbe, you may cause a cascading failure within your environment.
Last updated on