Debug - kubectl top SerivceUnavailable

How to debug and resolve a SerivceUnavailable error when using kubectl top

Situation

When running a kubectl top command you receive an ServiceUnavailable error message from the Kubernetes API server.

❯ kubectl top node
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

Possible causes

Some common causes could be;

  • metrics-server is unavailable (e.g. crashLoopBackOff, or misconfigured network policies)
  • misconfigured APIService resource
  • the Kubernetes API Server has no access to the metrics-server

Diagnosis

Make sure the pod is running. This can be done by running kubectl get pod -n kube-system. Note that on AME Kubernetes the metrics-server is always installed in the kube-system namespace. On other distributions or platforms this namespace may differ.

❯ kubectl get pod
...
metrics-server-5b8644d458-ml9fv                  1/1     Running   0          4m48s

If you find that the metrics server pod is not running, you can investigate the cause of this. A common occurance when the metrics-server pod is restarted or briefly unavailable could be a cluster upgrade on AME Kubernetes. In this case the situation should automatically resolve itself.

If the metrics-server pod is running, examine logs to find causes. This can be done either through kubectl logs or using Grafana Explore. You may experience an error when doing this if the Kubernetes node on which this metrics-server pod is running has become unavailable. AME will automatically replace unhealthy nodes. If you use a self-hosted or other provider, you may need to take action to get this node available again. Deleting the metrics-server pod may be a quick temporarly solution to make kubectl top working as well.

Remmediation

CrashLoopBackOff

  • When using network policies in the pod’s namespace, make sure a network policy is in place to allow connectivity to the kubelet ports for each node, as well as the Kubernetes API Server.
  • Check for a mistake in the flags provider to the metrics-server. If do not install your metrics-server own deployment this should be done by your cluster provider.