Debug - pod in CrashLoopBackOff
How to debug and resolve a pod that's in a CrashLoopBackOff state
A pod is in a
CrashLoopBackOff state. This is either detected through
kubectl get pod or for example through Kubernetes alerts.
A crash loop means Kubernetes tried to start a pod, but it has crashed to often. After each restart, Kubernetes will increase the delay before attempting another start.
Some common causes could be;
- upstream dependencies such as databases are not available (e.g. do not accept new connections)
- configuration mistake for your application
- Out of memory (OOM killed)
Good examples of this are newly configured network policies, preventing DNS queries to coreDNS.
- Examine logs to find causes. This can be done either through
kubectl logsor using Grafana Explore.
kubectl describe podto find causes.
- Make sure network policies are correct in case your application requires connectivity to external resources
- Resource usage (most often memory)
Crash looping pods are often not in a running state. To see the application output from the crash, use the
kubectl logs mycrashlooppod --previous --tail=100
You may need to adjust the
--tail flag to get more or fewer log lines.
Events are a helpful indicator to figure out if resource usage or failing health checks are causing the crash
You can either use a describe:
kubectl describe pod mycrashlooppod
get events --field-select:
$ kubectl get event --field-selector involvedObject.name=nginx-9d97dbffb-rvgt2 LAST SEEN TYPE REASON OBJECT MESSAGE 52s Normal Scheduled pod/nginx-9d97dbffb-rvgt2 Successfully assigned default/nginx-9d97dbffb-rvgt2 to docker-desktop 52s Normal Pulled pod/nginx-9d97dbffb-rvgt2 Container image "nginx:1.19.8" already present on machine 52s Normal Created pod/nginx-9d97dbffb-rvgt2 Created container nginx 52s Normal Started pod/nginx-9d97dbffb-rvgt2 Started container nginx
If the logging indicate an issue with resolving a hostname (e.g. database connection url), check the following:
- When using network policies in the pod’s namespace, make sure a network policy is in place to allow connectivity to coreDNS
- Check for a mistake in the hostname. Note that some clusters do not use
.cluster.local. Make sure the service exists.
If a livenessProbe is failing, it will restart the container.
If this happens during start-up, your initialDelay is configured correctly. We’d recommend to also configure a startUpProbe.
Failing livenessProbes is an indicator that your application or it’s runtime not functioning properly.
Often when this is the cause, a restart of the container solves this problem. However if this occures to often, Kubernetes will start the
Common causes are;
- Resource exaustion
- To low CPU limits
- Memory saturation