Pod in CrashLoopBackOff.
How to debug and resolve a pod that's in a CrashLoopBackOff state
On this page
Situation
A pod is in a CrashLoopBackOff
state. This is either detected through kubectl get pod
or for example through Kubernetes alerts.
A crash loop means Kubernetes tried to start a pod, but it has crashed to often. After each restart, Kubernetes will increase the delay before attempting another start.
Possible causes
Some common causes could be;
- upstream dependencies such as databases are not available (e.g. do not accept new connections)
- configuration mistake for your application
- Out of memory (OOM killed)
Good examples of this are newly configured network policies, preventing DNS queries to coreDNS.
Diagnosis
- Examine logs to find causes. This can be done either through
kubectl logs
or using Grafana Explore. - Use
kubectl describe pod
to find causes. - Make sure network policies are correct in case your application requires connectivity to external resources
- Resource usage (most often memory)
View logging
Crash looping pods are often not in a running state. To see the application output from the crash, use the --previous
flag:
kubectl logs mycrashlooppod --previous --tail=100
You may need to adjust the --tail
flag to get more or fewer log lines.
Get events
Events are a helpful indicator to figure out if resource usage or failing health checks are causing the crash
You can either use a describe:
kubectl describe pod mycrashlooppod
Or use get events --field-select
:
$ kubectl get event --field-selector involvedObject.name=nginx-9d97dbffb-rvgt2
LAST SEEN TYPE REASON OBJECT MESSAGE
52s Normal Scheduled pod/nginx-9d97dbffb-rvgt2 Successfully assigned default/nginx-9d97dbffb-rvgt2 to docker-desktop
52s Normal Pulled pod/nginx-9d97dbffb-rvgt2 Container image "nginx:1.19.8" already present on machine
52s Normal Created pod/nginx-9d97dbffb-rvgt2 Created container nginx
52s Normal Started pod/nginx-9d97dbffb-rvgt2 Started container nginx
Remediation
DNS issues
If the logging indicate an issue with resolving a hostname (e.g. database connection url), check the following:
- When using network policies in the pod’s namespace, make sure a network policy is in place to allow connectivity to coreDNS
- Check for a mistake in the hostname. Note that some clusters do not use
.cluster.local
. Make sure the service exists.
Failing livenessProbe
If a livenessProbe is failing, it will restart the container.
If this happens during start-up, your initialDelay is configured correctly. We’d recommend to also configure a startUpProbe.
Failing livenessProbes is an indicator that your application or it’s runtime not functioning properly.
Often when this is the cause, a restart of the container solves this problem. However if this occures to often, Kubernetes will start the crashLoopBackOff
.
Common causes are;
- Resource exaustion
- To low CPU limits
- Memory saturation