Horizontal Pod Autoscaling in Kubernetes using Event Loop Utilization of Node.js applications
At NearForm, we build and operate Node.js applications at scale, and we often scale them up and down with the usual metrics of CPU and memory. However, over the past few years Node.js has added many capabilities, such as a parallel garbage collector, worker threads and a parallel optimizing compiler, making the CPU usage a poor predictor of an application’s actual CPU needs. The usual outcome is the overprovisioning of the number of Node.js instances of an application. This can be misleading to teams, prompting them to focus on optimising their Node.js processes instead of their I/O interaction.
In order to solve this problem, Trevor Norris added the measurement of Event Loop Utilization to Node.js. Check out his write-up at https://nodesource.com/blog/event-loop-utilization-nodejs. Here’s an excerpt:
“CPU is no longer enough of a measurement to scale applications. Other factors such as garbage collection, crypto, and other tasks placed in libuv’s thread pool can increase the CPU usage in a way that is not indicative of the application’s overall health. Even applications that don’t use Worker threads are susceptible to this issue.”
In this article, we will cover how to use Event Loop Utilization to scale your Node.js pods in Kubernetes, maximising resource usage. You can find all the source code accompanying this article at github.com/nearform/fastify-elu-scaler.
This blog post was inspired by Simone Busoli’s post on integrating backpressure into the infrastructure.
Before we start, we need to prepare our environment. We need a running Kubernetes cluster as well as the Prometheus Operator. Keda CRDs will then complement our monitoring to define a Horizontal Pod Autoscaler (HPA) for Kubernetes more precisely and flexibly. One of the benefits of Keda is its flexibility for usage of sources of metrics and the use of CRDs to define HPAs. Additionally, Keda is now a CNCF sandbox project and is widely supported by its community.
Kubernetes Cluster via kinD
We use kinD to demonstrate the use of Keda and a custom metric to autoscale pods. But it’s completely up to you how and where you run your Kubernetes cluster, as long as you have kubectl installed and set up KUBECONFIG to your config file. So let’s create a kinD cluster first.
Prometheus and Grafana
By design, Keda is not able to scrape metric endpoints of containers. But we can use the Prometheus API as a source like the Prometheus adapter does. In addition to visualising and having a good experience of what is going on in our cluster, we can install the whole Prometheus/Grafana stack at once and use its CRDs.
Installing Keda CRDs
Keda itself contains two CRDs and the Operator. Furthermore, Keda provides the metrics adapter, which acts as a Kubernetes metrics server to provide selected series to the horizontal pod autoscaler. Each HPA can be defined via the Scaler CRD and will be provisioned automatically by the Operator.
Node.js and Event Loop Utilization
Our example Node.js application also provides a Dockerfile we can use to build up a container image and deploy it into our new Kubernetes cluster. This application already provides a metric endpoint exporting ELU. Let’s have a quick look at important parts of the ELU plugin.
First, we have to declare our custom metric using prom-client. We use a
Summary here for all quantiles by default, with
ageBuckets numbers of buckets in our sliding window over
maxAgeSeconds time before reset. Our label names are defined by the
eventLoopUtilization() result from the perf-hook package.
To get default metrics as well, we simply add two lines of code:
Next, we declare a variable, initialise it with our first measured ELU and overwrite it at every measuring interval (100ms) with the next measuring point. The observed value is the diff of the two measuring points:
Finally, we need to define our metrics endpoint:
Let’s build that container image now and provide it to our kinD cluster. The second step is required only if you use kinD and no external docker registry.
We already provided Kubernetes manifest files for a deployment and a service too.
To inform Prometheus about the new metrics endpoint and how it can find and read it, we simply need to apply a service monitor CDR and give Prometheus the namespace permissions.
Autoscaling in Kubernetes
There are different types of scaling in Kubernetes. We will focus here only on the common horizontal pod autoscaling, which upscales and downscales the numbers of running instances of pods for a deployment. Instead of defining HPA Kubernetes objects manually, we will use Keda, which offers many benefits over HPA alone. One is the flexibility to use and define scaling functionality. Multiple built-in scalers to connect event sources make it easy to use.
The trigger defines our event source, where Keda can find it and how the metric is requested, as well as a threshold when a scaling is triggered. In this case, we can use common Prometheus query language. On the other side, we also define the target to scale, including minimum and maximum numbers of instances and some timing values to control how often and when to scale in/out. For more attributes, you can find the official documentation of scalers at https://keda.sh/docs/2.4/scalers/prometheus/.
Multiple scaling trigger
It is possible to define multiple triggers at once at a scalingObject. The first one that reaches its threshold will fire the scaling event. We add a CPU utilisation threshold at 80% workload to upscale our Pod in addition to ELU. It’s as simple as that.
Verification and wrapping up
We are almost finished. Let’s verify if our scaler is working as expected. We can open a proxy port and visit our Grafana instance:
A direct link to our Grafana visualization of our new Pod metric for ELU shows us a regular load of around 20, which is exactly our threshold and ends in an upscaling up to two instances.
The reason why it is not downscaling back to one instance is the cool-down period, which is 5 minutes by default.
Let’s give it a booster to trigger our scaler. We can use Apaches ab tool provided as a Docker image to run a benchmark to generate traffic.
If we go back to our Grafana graph and also check the number of pods, we can see that HPA was scaling up our example elu pod to three instances and back to two after a while.
We can show the events in bottom-up order by querying the HPA object: