kubectl top ServiceUnavailable

How to debug and resolve a ServiceUnavailable error when using kubectl top

Situation

When running kubectl top, you may encounter the ServiceUnavailable error, which indicates that the metrics server is unable to provide resource usage metrics.

❯ kubectl top node
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
What is the metrics-server?

The Metrics Server is a Kubernetes component that collects resource usage metrics for containers and pods running in a Kubernetes cluster. It provides a simple way to query resource usage metrics such as CPU and memory utilization, and it can be used by tools such as kubectl top to display resource usage statistics.

❯ kubectl top node
NAME                                         CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
ip-172-16-4-35.eu-west-1.compute.internal    1242m        31%    6373Mi          43%
ip-172-16-4-42.eu-west-1.compute.internal    534m         13%    5823Mi          39%

The Metrics Server is a cluster-level component, meaning that it collects metrics across all namespaces and pods in the cluster.

The Metrics Server runs as a Deployment in the kube-system namespace and requires access to the Kubernetes API server and Kubelet to collect metrics. It usually comes pre-installed by your Managed Kubernetes Service Provider, or you may install it manually.

Possible causes

Some potential causes of the “ServiceUnavailable” error when running kubectl top include:

  • Pod Not yet ready: The metrics-server has just started up and is currently collecting metrics. This proces usually takes a minute and should resolve itself automatically.
  • crashLoopBackOff: The metrics-server is unavailable due to issues such as crashLoopBackOff or misconfigured network policies. See also our pod crashloopbackoff runbook
  • misconfiguration: Misconfigured APIService resource, which is used to expose the metrics API to the Kubernetes API server.
  • Kubernetes API Server: The Kubernetes API server has no access to the metrics-server due to network connectivity or invalid service account permissions.
  • Network Access: The metrics-server is running but cannot access the necessary Kubernetes API resources to collect metrics.
  • Unable to schedule the pod: Not enough compute resources / nodes available within the cluster to schedule the metrics-server pod. To resolve this, see pod scheduling failed

Diagnosis

Make sure the pod is running. This can be done by running kubectl get pod -n kube-system. Note that depending on your installation method of metrics-server, it may also be located in a different namespace, such as metrics-server.

❯ kubectl get pods -n kube-system | grep metrics-server
metrics-server-5b8644d458-ml9fv                  1/1     Running   0          4m48s

If you find that the metrics server pod is not running, you can investigate the cause of this. A common occurance when the metrics-server pod is restarted or briefly unavailable could be a cluster upgrade on AME Kubernetes. In this case the situation should automatically resolve itself.

If the metrics-server pod is running, examine logs to find causes. This can be done through kubectl logs.

kubectl logs -n kube-system <metrics-server-pod-name>

Look for any error messages that might indicate why the metrics server is unable to provide metrics. You may experience an error when doing this if the Kubernetes node on which this metrics-server pod is running has become unavailable.

Check the connectivity between the metrics server and the Kubernetes API server:

kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes

If this command returns an error, there may be a network connectivity issue that is preventing the metrics server from accessing the Kubernetes API server.

Remediation

CrashLoopBackOff

  • When using network policies in the pod’s namespace, make sure a network policy is in place to allow connectivity to the kubelet ports for each node, as well as the Kubernetes API Server.
  • Check for a mistake in the flags provider to the metrics-server. If do not install your metrics-server own deployment this should be done by your cluster provider.

See also our pod crashloopbackoff runbook

Once the issue has been resolved, retry the kubectl top command to ensure that resource usage metrics are now available.