Horizontal Pod Autoscaling in Kubernetes using Event Loop Utilization of Node.js applications

At NearForm, we build and operate Node.js applications at scale, and we often scale them up and down with the usual metrics of CPU and memory. However, over the past few years Node.js has added many capabilities, such as a parallel garbage collector, worker threads and a parallel optimizing compiler, making the CPU usage a poor predictor of an application’s actual CPU needs. The usual outcome is the overprovisioning of the number of Node.js instances of an application. This can be misleading to teams, prompting them to focus on optimising their Node.js processes instead of their I/O interaction.

In order to solve this problem, Trevor Norris added the measurement of Event Loop Utilization to Node.js. Check out his write-up at https://nodesource.com/blog/event-loop-utilization-nodejs. Here’s an excerpt:
“CPU is no longer enough of a measurement to scale applications. Other factors such as garbage collection, crypto, and other tasks placed in libuv’s thread pool can increase the CPU usage in a way that is not indicative of the application’s overall health. Even applications that don’t use Worker threads are susceptible to this issue.”

In this article, we will cover how to use Event Loop Utilization to scale your Node.js pods in Kubernetes, maximising resource usage. You can find all the source code accompanying this article at github.com/nearform/fastify-elu-scaler.
This blog post was inspired by Simone Busoli’s post on integrating backpressure into the infrastructure.

Requirements

Before we start, we need to prepare our environment. We need a running Kubernetes cluster as well as the Prometheus Operator. Keda CRDs will then complement our monitoring to define a Horizontal Pod Autoscaler (HPA) for Kubernetes more precisely and flexibly. One of the benefits of Keda is its flexibility for usage of sources of metrics and the use of CRDs to define HPAs. Additionally, Keda is now a CNCF sandbox project and is widely supported by its community.

Kubernetes Cluster via kinD

We use kinD to demonstrate the use of Keda and a custom metric to autoscale pods. But it’s completely up to you how and where you run your Kubernetes cluster, as long as you have kubectl installed and set up KUBECONFIG to your config file. So let’s create a kinD cluster first.

Copy to Clipboard

Prometheus and Grafana

By design, Keda is not able to scrape metric endpoints of containers. But we can use the Prometheus API as a source like the Prometheus adapter does. In addition to visualising and having a good experience of what is going on in our cluster, we can install the whole Prometheus/Grafana stack at once and use its CRDs.

Copy to Clipboard

Installing Keda CRDs

Keda itself contains two CRDs and the Operator. Furthermore, Keda provides the metrics adapter, which acts as a Kubernetes metrics server to provide selected series to the horizontal pod autoscaler. Each HPA can be defined via the Scaler CRD and will be provisioned automatically by the Operator.

Copy to Clipboard

Node.js and Event Loop Utilization

Our example Node.js application also provides a Dockerfile we can use to build up a container image and deploy it into our new Kubernetes cluster. This application already provides a metric endpoint exporting ELU. Let’s have a quick look at important parts of the ELU plugin.

First, we have to declare our custom metric using prom-client. We use a Summary here for all quantiles by default, with ageBuckets numbers of buckets in our sliding window over maxAgeSeconds time before reset. Our label names are defined by the eventLoopUtilization() result from the perf-hook package.

Copy to Clipboard

To get default metrics as well, we simply add two lines of code:

Copy to Clipboard

Next, we declare a variable, initialise it with our first measured ELU and overwrite it at every measuring interval (100ms) with the next measuring point. The observed value is the diff of the two measuring points:

Copy to Clipboard

Finally, we need to define our metrics endpoint:

Copy to Clipboard

Let’s build that container image now and provide it to our kinD cluster. The second step is required only if you use kinD and no external docker registry.

Copy to Clipboard

We already provided Kubernetes manifest files for a deployment and a service too.

Copy to Clipboard

To inform Prometheus about the new metrics endpoint and how it can find and read it, we simply need to apply a service monitor CDR and give Prometheus the namespace permissions.

Copy to Clipboard

Autoscaling in Kubernetes

There are different types of scaling in Kubernetes. We will focus here only on the common horizontal pod autoscaling, which upscales and downscales the numbers of running instances of pods for a deployment. Instead of defining HPA Kubernetes objects manually, we will use Keda, which offers many benefits over HPA alone. One is the flexibility to use and define scaling functionality. Multiple built-in scalers to connect event sources make it easy to use.

Copy to Clipboard

The trigger defines our event source, where Keda can find it and how the metric is requested, as well as a threshold when a scaling is triggered. In this case, we can use common Prometheus query language. On the other side, we also define the target to scale, including minimum and maximum numbers of instances and some timing values to control how often and when to scale in/out. For more attributes, you can find the official documentation of scalers at https://keda.sh/docs/2.4/scalers/prometheus/.

Multiple scaling trigger

It is possible to define multiple triggers at once at a scalingObject. The first one that reaches its threshold will fire the scaling event. We add a CPU utilisation threshold at 80% workload to upscale our Pod in addition to ELU. It’s as simple as that.

Copy to Clipboard

Verification and wrapping up

We are almost finished. Let’s verify if our scaler is working as expected. We can open a proxy port and visit our Grafana instance:

Copy to Clipboard

A direct link to our Grafana visualization of our new Pod metric for ELU shows us a regular load of around 20, which is exactly our threshold and ends in an upscaling up to two instances.

http://localhost:3000/explore?orgId=1&left=%5B%22now-15m%22,%22now%22,%22prometheus%22,%7B%22exemplar%22:true,%22expr%22:%22100*avg(event_loop_utilization%7Bservice%3D%5C%22elu%5C%22%7D)%22,%22interval%22:%225s%22,%22instant%22:false,%22range%22:true%7D%5D

Copy to Clipboard

The reason why it is not downscaling back to one instance is the cool-down period, which is 5 minutes by default.

Let’s give it a booster to trigger our scaler. We can use Apaches ab tool provided as a Docker image to run a benchmark to generate traffic.

Copy to Clipboard

If we go back to our Grafana graph and also check the number of pods, we can see that HPA was scaling up our example elu pod to three instances and back to two after a while.

Copy to Clipboard

We can show the events in bottom-up order by querying the HPA object:

Copy to Clipboard

Don’t miss a beat

Get all the latest NearForm news,
from technology to design.
View all posts  |  Technology  |  Business  |  Culture  |  Opinion  |  Design
Follow us for more information on this and other topics.