DNS issues

Debugging and fixing DNS Resolving Issues in Kubernetes

In a Kubernetes cluster, DNS is crucial for inter-service communication and service discovery. Kubernetes uses DNS for service discovery, allowing pods to communicate with each other via service names instead of IP addresses.

DNS issues can be tricky to diagnose and solve. The main symptom is that Pods cannot reach each other using Kubernetes Service names.

Possible Causes

CoreDNS is not running or is improperly configured.
The kube-dns/coredns ConfigMap is incorrectly set.
The Pod's resolv.conf file is misconfigured.
Network policies are restricting communication.
Node local DNS settings are incorrect.
External factors such as CNI plugins, cloud-provider-specific settings, upstream resolvers, etc.

Diagnosis

Validate a DNS issue

First, confirm that the issue is DNS-related. Try to access the service using its IP address. If that works but the service name does not, it's likely a DNS issue.

kubectl exec [pod-name] -- curl [service-ip]

Note that not all pods or containers contain curl. You may need to use wget or alternative solutions. Failing the presence of any tool, you can use kubectl debug to attach a new container to your pod with the right debug tooling pre-installed.

kubectl debug -it debugcontainer --image=busybox:1.28 --target=<podname>

CoreDNS

CoreDNS is the default DNS server in Kubernetes as of version 1.11, replacing kube-dns. It's responsible for DNS service discovery in a Kubernetes cluster, allowing Pods to communicate using service names instead of IP addresses.

The CoreDNS behavior can be customized via a ConfigMap, allowing you to specify custom DNS settings for your Kubernetes cluster. This ConfigMap is usually named coredns and located in the kube-system namespace.

Confirm that CoreDNS is running as a Deployment in the kube-system namespace (kubectl -n kube-system get pods -l k8s-app=kube-dns). If not, check the events and logs.

kubectl -n kube-system describe pod [coredns-pod-name]
kubectl -n kube-system logs [coredns-pod-name]

DNS ConfigMap

If the CoreDNS pod is not running properly, it could indicate an issue with the core dns configmap. Check the coredns ConfigMap (kubectl -n kube-system get configmap coredns -o yaml). The upstream nameserver and domain should be correctly configured.

This is an example of the corefile for configuring CoreDNS:

.:53 {
    errors
    health {
        lameduck 5s
    }
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
        ttl 30
    }
    prometheus :9153
    forward . /etc/resolv.conf {
        max_concurrent 1000
    }
    cache 30
    loop
    reload
    loadbalance
}

pod resolv.conf

Check the Pod's resolv.conf file (kubectl exec -ti [pod-name] -- cat /etc/resolv.conf). The nameserver should be set to the ClusterIP of the kube-dns service, or if using node-local DNS, to the node-local DNS IP.

Network Policies

Network policies can restrict communication. Make sure there's no NetworkPolicy preventing DNS queries.

kubectl get networkpolicy -o yaml

If needed, install a network policy to allow DNS queries:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-egress-dns
  namespace: mynamespace
spec:
  podSelector:
    matchLabels: {}
  policyTypes:
    - Egress
  egress:
    - ports:
        # Allow DNS
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP
      to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system

Node Local DNS

NodeLocal DNSCache is a DNS caching agent that runs as a DaemonSet in Kubernetes. Its primary purpose is to improve DNS lookup performance and reliability, particularly in larger or high-load clusters.

NodeLocal DNSCache addresses DNS performance and timeout problems by running a DNS caching agent on each node (as a DaemonSet), which stores the DNS query result locally on the node. When a Pod performs a DNS lookup, it contacts the local caching agent first, which can return a cached response if it's available, avoiding the need to traverse the network and query the cluster DNS service. This significantly reduces DNS lookup latency and network DNS traffic.

The NodeLocal DNSCache also helps bypass issues related to conntrack entries for DNS queries. In some cases, these conntrack entries can be a limiting factor for DNS performance, or they can cause DNS lookup timeouts.

If your cluster uses NodeLocal DNSCache, check the logs of the node local DNS pods.
Make sure the DNSCache pod is running correctly on your nodes.

External Factors

Consider external factors such as CNI plugins, cloud-provider-specific settings, upstream DNS servers and/or firewalls.