In the first part of this post, we visited auto scaling using the metrics server. In part two, we will look at using custom metrics, specifically those from linkerd and ingress-nginx to perform auto scaling based on latency and requests per second.

Read auto scaling in Kubernetes (Part 1)

Auto scaling with linkerd

Linkerd is a light weight service mesh and CNCF project. It can help you get instant platform wide metrics for things such as success rates, latency, and many other traffic related metrics, without having to change your code.

For the first part of our auto scaling, we will look at using the metrics obtained from the linkerd proxy to scale our deployment up or down. For this to work, you need to have both linkerd and the viz plugin installed.

linkerd install | kubectl apply -f -
linkerd viz install | kubectl apply -f -

See the linkerd getting started documentation for detailed instructions on how to install linkerd.

Setting-up the adapter

The Prometheus Adapter is a project that implements the custom metrics apiservice from Kubernetes and connects to a Prometheus instance.

In order to have this work with linkerd, create a value file (we will call it prometheus-adapter.yaml) with the following contents:

prometheus:
  url: http://prometheus.linkerd-viz.svc
  port: 9090
  path: ""
 
rules:
  custom:
    - seriesQuery: 'response_latency_ms_bucket{namespace!="",pod!=""}'
      resources:
        template: "<<.Resource>>"
      name:
        matches: "^(.*)_bucket$"
        as: "${1}_99th"
      metricsQuery: 'histogram_quantile(0.99, sum(irate(<<.Series>>{<<.LabelMatchers>>, direction="inbound", deployment!="", namespace!=""}[5m])) by (le, <<.GroupBy>>))'

This tells the prometheus adapter where to find Prometheus, and configure a custom query we can use in our Horizontal Pod Autoscaler resource.

Install this through:

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ helm install prometheus-adapter -f prometheus-adapter.yaml prometheus-community/prometheus-adapter
NAME: prometheus-adapter
LAST DEPLOYED: Sat Oct  9 16:37:20 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
prometheus-adapter has been deployed.
In a few minutes you should be able to list metrics using the following command(s):
 
  kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1

After installation, you can validate if the Prometheus Adapter is running correctly by querying the APIServices:

$ kubectl get apiservice v1beta1.custom.metrics.k8s.io
NAME                            SERVICE                      AVAILABLE                  AGE
v1beta1.custom.metrics.k8s.io   default/prometheus-adapter   False (MissingEndpoints)   24s

During start-up, the status will remain at MissingEndpoints. Once the pod is up and running, AVAILABLE should jump to True:

$ kubectl get apiservice v1beta1.custom.metrics.k8s.io
NAME                            SERVICE                      AVAILABLE   AGE
v1beta1.custom.metrics.k8s.io   default/prometheus-adapter   True        43s

You should now be able to valid if your custom rules have been installed properly by using:

$ kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"custom.metrics.k8s.io/v1beta1","resources":[....]}

Example

We can now make use of the custom metrics in our HPA resources. Here is our example deployment, called scalingtest:

apiVersion: v1
kind: Namespace
metadata:
  name: scalingtest
  annotations:
    linkerd.io/inject: enabled
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sampleapp
  namespace: scalingtest
spec:
  minReadySeconds: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: sampleapp
  template:
    metadata:
      labels:
        app: sampleapp
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: sampleapp
          image: nginx
          resources:
            requests:
              cpu: 100m
            limits:
              memory: "128Mi"
              cpu: "600m"
          ports:
            - containerPort: 80
              name: http
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 10"]
          startupProbe:
            httpGet:
              path: /
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
          readinessProbe:
            httpGet:
              path: /
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: sampleapp
  namespace: scalingtest
spec:
  selector:
    app: sampleapp
  ports:
    - port: 80
      targetPort: 80

We apply the example.yaml into the cluster:

$ kubectl apply -f example.yaml
namespace/scalingtest created
deployment.apps/sampleapp created
service/sampleapp created
 
$ kubectl get pod
NAME                         READY   STATUS        RESTARTS   AGE
sampleapp-7db7fdcd9d-95czg   2/2     Running       0          4s

Next, we deploy a load generator. For this, we make use of the slow_cooker project from Buoyant, the people behind linkerd.

kubectl run load-generator --image=buoyantio/slow_cooker -- -qps 100 -concurrency 10 http://sampleapp

This will generate 100 rps traffic against the deployed nginx. You can follow the logs of the load-generator pod to view various metrics, such as latency.

If you wait a minute or so, and run kubectl top pod, you will notice that the CPU usage of the nginx pod has risen.

$ kubectl top pod
NAME                        CPU(cores)   MEMORY(bytes)
load-generator              128m         5Mi
sampleapp-7db7fdcd9d-95czg  79m          2Mi

We will now start configuring a HPA policy:

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta2
metadata:
  name: sampleapp
  namespace: scalingtest
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sampleapp
  minReplicas: 1
  maxReplicas: 20
  metrics:
    # Scale based on request latency, linkerd-proxy metric
    - type: Object
      object:
        metric:
          name: response_latency_ms_99th
        describedObject:
          apiVersion: apps/v1
          kind: Deployment
          name: sampleapp
        target:
          type: AverageValue
          averageValue: 1000000m # 1s

Auto scaling using ingress-nginx

Instead of using linkerd, you can also use any other custom metrics for auto scaling. A good example would be ingress-nginx latency metrics.

The set-up would be nearly identical; you would also need to set-up the Prometheus adapter and configure it to connect to a Prometheus.

Setting up the adapter

The following queries can be used to configure auto scaling based on request per second, or 99th procentile latency of on ingress resource:

prometheus:
  url: http://prometheus.monitoring.svc
  port: 9090
  path: ""
 
rules:
  custom:
    - seriesQuery: '{__name__=~"^nginx_ingress_.*",namespace!=""}'
      seriesFilters: []
      resources:
        template: <<.Resource>>
        overrides:
          exported_namespace:
            resource: "namespace"
      name:
        matches: ""
        as: ""
      metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
 
    - seriesQuery: '{__name__=~"^nginx_ingress_controller_requests.*",namespace!=""}'
      seriesFilters: []
      resources:
        template: <<.Resource>>
        overrides:
          exported_namespace:
            resource: "namespace"
      name:
        matches: ""
        as: "nginx_ingress_controller_requests_rate"
      metricsQuery: round(sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>), 0.001)
 
    - seriesQuery: '{__name__=~"^nginx_ingress_controller_request_duration_seconds_bucket",namespace!=""}'
      seriesFilters: []
      resources:
        template: <<.Resource>>
        overrides:
          exported_namespace:
            resource: "namespace"
      name:
        matches: "^(.*)_bucket$"
        as: "${1}_99th"
      metricsQuery: histogram_quantile(0.99, round(sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (le, <<.GroupBy>>), 0.001))

Install this just like in our previous example using helm:

$ helm install prometheus-adapter -f prometheus-adapter.yaml prometheus-community/prometheus-adapter

We can now use these metrics in our HPA:

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta2
metadata:
  name: sampleapp
  namespace: scalingtest
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sampleapp
  minReplicas: 1
  maxReplicas: 20
  metrics:
    - type: Object
      object:
        metric:
          name: nginx_ingress_controller_request_duration_seconds_99th
        describedObject:
          apiVersion: networking.k8s.io/v1
          kind: ingress
          name: sampleapp
        target:
          type: AverageValue
          averageValue: 10m # 10ms
 
  # configure scale up/down behaviour
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 120
      policies:
        - type: Percent
          value: 25
          periodSeconds: 10
        - type: Pods
          value: 1
          periodSeconds: 5
      selectPolicy: Max
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 25
          periodSeconds: 30
        - type: Pods
          value: 1
          periodSeconds: 5
      selectPolicy: Max

Conclusion

In this second part we created a deployment that used auto scaling based on custom metrics. We've done this using both ingress-nginx and Linkerd metrics, using the Prometheus Adapter.

Using custom metrics, you can set-up your auto scaling based on predictions so you can start spinning up capacity before your peak load.