Event Loop Utilization with HPA

Horizontal Pod Autoscaling in Kubernetes using Event Loop Utilization of Node.js applications

At NearForm, we build and operate Node.js applications at scale, and we often scale them up and down with the usual metrics of CPU and memory. However, over the past few years Node.js has added many capabilities, such as a parallel garbage collector, worker threads and a parallel optimizing compiler, making the CPU usage a poor predictor of an application's actual CPU needs. The usual outcome is the overprovisioning of the number of Node.js instances of an application. This can be misleading to teams, prompting them to focus on optimising their Node.js processes instead of their I/O interaction.

In order to solve this problem, Trevor Norris added the measurement of Event Loop Utilization to Node.js. Check out his write-up at https://nodesource.com/blog/event-loop-utilization-nodejs . Here's an excerpt: "CPU is no longer enough of a measurement to scale applications. Other factors such as garbage collection, crypto, and other tasks placed in libuv's thread pool can increase the CPU usage in a way that is not indicative of the application's overall health. Even applications that don't use Worker threads are susceptible to this issue."

In this article, we will cover how to use Event Loop Utilization to scale your Node.js pods in Kubernetes, maximising resource usage. You can find all the source code accompanying this article at github.com/nearform/fastify-elu-scaler . This blog post was inspired by Simone Busoli's post on integrating backpressure into the infrastructure .

Requirements

Before we start, we need to prepare our environment. We need a running Kubernetes cluster as well as the Prometheus Operator . Keda CRDs will then complement our monitoring to define a Horizontal Pod Autoscaler (HPA) for Kubernetes more precisely and flexibly. One of the benefits of Keda is its flexibility for usage of sources of metrics and the use of CRDs to define HPAs. Additionally, Keda is now a CNCF sandbox project and is widely supported by its community.

Kubernetes Cluster via kinD

We use kinD to demonstrate the use of Keda and a custom metric to autoscale pods. But it’s completely up to you how and where you run your Kubernetes cluster, as long as you have kubectl installed and set up KUBECONFIG to your config file. So let’s create a kinD cluster first.

Plain Text
$ kind create cluster --name elu --kubeconfig ./elu.kubeconfig
$ export KUBECONFIG=$(pwd)/elu.kubeconfig

Prometheus and Grafana

By design, Keda is not able to scrape metric endpoints of containers. But we can use the Prometheus API as a source like the Prometheus adapter does. In addition to visualising and having a good experience of what is going on in our cluster, we can install the whole Prometheus/Grafana stack at once and use its CRDs.

Plain Text
$ git clone https://github.com/prometheus-operator/kube-prometheus.git
$ kubectl apply -f kube-prometheus/manifests/setup
$ kubectl apply -f kube-prometheus/manifests

Installing Keda CRDs

Keda itself contains two CRDs and the Operator. Furthermore, Keda provides the metrics adapter, which acts as a Kubernetes metrics server to provide selected series to the horizontal pod autoscaler. Each HPA can be defined via the Scaler CRD and will be provisioned automatically by the Operator.

Plain Text
$ helm repo add kedacore https://kedacore.github.io/charts
$ helm repo update
$ helm upgrade -i -n keda --create-namespace keda kedacore/keda

Node.js and Event Loop Utilization

Our example Node.js application also provides a Dockerfile we can use to build up a container image and deploy it into our new Kubernetes cluster. This application already provides a metric endpoint exporting ELU. Let’s have a quick look at important parts of the ELU plugin.

First, we have to declare our custom metric using prom-client . We use a Summary here for all quantiles by default, with ageBuckets numbers of buckets in our sliding window over maxAgeSeconds time before reset. Our label names are defined by the eventLoopUtilization() result from the perf-hook package.

JavaScript
const metric = new prometheus.Summary({
  name: 'event_loop_utilization',
  help: 'ratio of time the event loop is not idling in the event provider to the total time the event loop is running',
  maxAgeSeconds: 60,
  ageBuckets: 5,
  labelNames: ['idle', 'active', 'utilization'],
})

To get default metrics as well, we simply add two lines of code:

JavaScript
const collectDefaultMetrics = prometheus.collectDefaultMetrics;
collectDefaultMetrics();

Next, we declare a variable, initialise it with our first measured ELU and overwrite it at every measuring interval (100ms) with the next measuring point. The observed value is the diff of the two measuring points:

JavaScript
const interval = setInterval(() => { 
  const elu2 = eventLoopUtilization()
  metric.observe(eventLoopUtilization(elu2, elu1).utilization)
  elu1 = elu2 
}, 100)

fastify.addHook('onClose', () => {
  clearInterval(interval)
})

Finally, we need to define our metrics endpoint:

JavaScript
fastify.get('/metrics', async (request, reply) => {
  const metricsData = await prometheus.register.metrics()
  return metricsData
})

Let’s build that container image now and provide it to our kinD cluster. The second step is required only if you use kinD and no external docker registry.

Plain Text
$ docker build -t elu:latest .
$ kind load docker-image --name elu elu:latest

We already provided Kubernetes manifest files for a deployment and a service too.

Plain Text
$ kubectl create namespace elu
$ kubectl apply -f manifests/deployment.yaml
$ kubectl apply -f manifests/service.yaml

To inform Prometheus about the new metrics endpoint and how it can find and read it, we simply need to apply a service monitor CDR and give Prometheus the namespace permissions.

Plain Text
$ kubectl apply -f manifests/prometheus-namespaceRole.yaml
$ kubectl apply -f manifests/prometheus-namespaceRoleBinding.yaml
$ kubectl apply -f manifests/serviceMonitor.yaml

Autoscaling in Kubernetes

There are different types of scaling in Kubernetes. We will focus here only on the common horizontal pod autoscaling, which upscales and downscales the numbers of running instances of pods for a deployment. Instead of defining HPA Kubernetes objects manually, we will use Keda, which offers many benefits over HPA alone. One is the flexibility to use and define scaling functionality. Multiple built-in scalers to connect event sources make it easy to use.

Plain Text
$ cat <<EOF | kubectl apply -f -
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: elu
  namespace: elu
spec:
  maxReplicaCount: 12
  scaleTargetRef:
    name: elu
    apiVersion: apps/v1
    kind: Deployment
  pollingInterval: 5
  cooldownPeriod: 10
  minReplicaCount: 1
  triggers:
    - type: prometheus
      metadata:
        metricName: event_loop_utilization
        threshold: '20'
        serverAddress: "http://prometheus-k8s.monitoring:9090"
        query: 100*avg(event_loop_utilization{service="elu"})
EOF

The trigger defines our event source, where Keda can find it and how the metric is requested, as well as a threshold when a scaling is triggered. In this case, we can use common Prometheus query language. On the other side, we also define the target to scale, including minimum and maximum numbers of instances and some timing values to control how often and when to scale in/out. For more attributes, you can find the official documentation of scalers at https://keda.sh/docs/2.4/scalers/prometheus/ .

Multiple scaling trigger

It is possible to define multiple triggers at once at a scalingObject. The first one that reaches its threshold will fire the scaling event. We add a CPU utilisation threshold at 80% workload to upscale our Pod in addition to ELU. It's as simple as that.

Plain Text
triggers:
  - type: prometheus
    metadata:
      metricName: event_loop_utilization
      threshold: '20'
      serverAddress: "http://prometheus-k8s.monitoring:9090"
      query: 100*avg(event_loop_utilization{service="elu"})
  - type: cpu
    metadata:
      type: Utilization
      value: "80"

Verification and wrapping up

We are almost finished. Let’s verify if our scaler is working as expected. We can open a proxy port and visit our Grafana instance:

Plain Text
$ export POD=$(kubectl -n monitoring get pod -l app.kubernetes.io/component=grafana --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}')
$ kubectl -n monitoring port-forward $POD 3000:3000

A direct link to our Grafana visualization of our new Pod metric for ELU shows us a regular load of around 20, which is exactly our threshold and ends in an upscaling up to two instances.

https://localhost:3000/explore?orgId=1&left=%5B%22now-15m%22,%22now%22,%22prometheus%22,%7B%22exemplar%22:true,%22expr%22:%22100*avg(event_loop_utilization%7Bservice%3D%5C%22elu%5C%22%7D)%22,%22interval%22:%225s%22,%22instant%22:false,%22range%22:true%7D%5D

Plain Text
$ kubectl -n elu get pods
NAME                   READY   STATUS    RESTARTS   AGE
elu-74b6c5dc5c-qzj9f   1/1     Running   0          35m
elu-74b6c5dc5c-qzj9f   1/1     Running   0          36m

The reason why it is not downscaling back to one instance is the cool-down period, which is 5 minutes by default. Let’s give it a booster to trigger our scaler. We can use Apaches ab tool provided as a Docker image to run a benchmark to generate traffic.

Plain Text
$ kubectl run -it --rm --image=piegsaj/ab ab -n elu -- -c 1 -n 10000 http://elu.elu:3000/
If you don't see a command prompt, try pressing enter.
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests

If we go back to our Grafana graph and also check the number of pods, we can see that HPA was scaling up our example elu pod to three instances and back to two after a while.

Plain Text
$ kubectl -n elu get pods
NAME                   READY   STATUS    RESTARTS   AGE
ab                     1/1     Running   0          80s
elu-74b6c5dc5c-qc45t   1/1     Running   0          6m25s
elu-74b6c5dc5c-qzj9f   1/1     Running   0          42m
elu-74b6c5dc5c-vj5c7   1/1     Running   0          24s

We can show the events in bottom-up order by querying the HPA object:

Plain Text
$ kubectl -n elu describe hpa
...
Events:
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  20m   horizontal-pod-autoscaler  New size: 2; reason: external metric event_loop_utilization(&LabelSelector{MatchLabels:map[string]string{deploymentName: elu,},MatchExpressions:[]LabelSelectorRequirement{},}) above target

  Normal  SuccessfulRescale  14m   horizontal-pod-autoscaler  New size: 3; reason: external metric event_loop_utilization(&LabelSelector{MatchLabels:map[string]string{deploymentName: elu,},MatchExpressions:[]LabelSelectorRequirement{},}) above target

  Normal  SuccessfulRescale  35s   horizontal-pod-autoscaler  New size: 2; reason: All metrics below target

Insight, imagination and expertly engineered solutions to accelerate and sustain progress.

Contact