Pod in CrashLoopBackOff

How to debug and resolve a pod that's in a CrashLoopBackOff state

Situation

A pod is in a CrashLoopBackOff state. This is either detected through kubectl get pod or for example through Kubernetes alerts.

A crash loop means Kubernetes tried to start a pod, but it has crashed to often. After each restart, Kubernetes will increase the delay before attempting another start.

Possible causes

Some common causes could be;

upstream dependencies such as databases are not available (e.g. do not accept new connections)
configuration mistake for your application
Out of memory (OOM killed)

Good examples of this are newly configured network policies, preventing DNS queries to coreDNS.

Diagnosis

Examine logs to find causes. This can be done either through kubectl logs or using Grafana Explore.
Use kubectl describe pod to find causes.
Make sure network policies are correct in case your application requires connectivity to external resources
Resource usage (most often memory)

View logging

Crash looping pods are often not in a running state. To see the application output from the crash, use the --previous flag:

kubectl logs mycrashlooppod --previous --tail=100

You may need to adjust the --tail flag to get more or fewer log lines.

Get events

Events are a helpful indicator to figure out if resource usage or failing health checks are causing the crash

You can either use a describe:

kubectl describe pod mycrashlooppod

Or use get events --field-select:

$ kubectl get event --field-selector involvedObject.name=nginx-9d97dbffb-rvgt2
LAST SEEN   TYPE     REASON      OBJECT                      MESSAGE
52s         Normal   Scheduled   pod/nginx-9d97dbffb-rvgt2   Successfully assigned default/nginx-9d97dbffb-rvgt2 to docker-desktop
52s         Normal   Pulled      pod/nginx-9d97dbffb-rvgt2   Container image "nginx:1.19.8" already present on machine
52s         Normal   Created     pod/nginx-9d97dbffb-rvgt2   Created container nginx
52s         Normal   Started     pod/nginx-9d97dbffb-rvgt2   Started container nginx

Remediation

DNS issues

If the logging indicate an issue with resolving a hostname (e.g. database connection url), check the following:

When using network policies in the pod's namespace, make sure a network policy is in place to allow connectivity to coreDNS
Check for a mistake in the hostname. Note that some clusters do not use .cluster.local. Make sure the service exists.

Failing livenessProbe

If a livenessProbe is failing, it will restart the container.

If this happens during start-up, your initialDelay is configured correctly. We'd recommend to also configure a startUpProbe.

Failing livenessProbes is an indicator that your application or it's runtime not functioning properly. Often when this is the cause, a restart of the container solves this problem. However if this occures to often, Kubernetes will start the crashLoopBackOff.

Common causes are;

Resource exaustion
To low CPU limits
Memory saturation
Downstream dependencies unavailable

Please note that if you use the availability of downstream dependencies in your livenessProbe, you may cause a cascading failure within your environment.

Pod in CrashLoopBackOff

On this page