Optimise Node.js performance by avoiding broken promises
Understanding the Node.js event loop is crucial for optimising Node.js performance.
Several years ago, I was called out by a customer to help them resolve some performance issues they were having in their Node.js application. They were experiencing massive event loop blocking issues in their server, getting a whole 5 requests per second — and, in one extreme case, an event loop delay of over one minute!
After reviewing some of the preliminary background details, the first question I asked them was simple: "Are you using Promises?" When they all looked at me and said "yes", my immediate response, without even looking at their code first, was, "Then you're likely using them wrong". It was a bold statement. I spent the next three days onsite with them going through their code in detail helping to find the specific issues and proving that initial bold assertion correct — their codebase was full of misuses of promises and async await.
That engagement, and many more similar ones, prompted me to develop the " Broken Promises " workshop that Matteo Collina and I present to customers and periodically at conferences. Here, I want to pull back the curtain on that workshop just a bit to help folks better understand one of the most critical aspects of Node.js performance: Understanding how Node.js schedules asynchronous execution.
But first, a puzzle
Whenever we present the Broken Promises workshop, I like to start with a bit of a brain teaser to get things going. The puzzle is a specially (and a little sadistically) designed piece of code that is meant to highlight how difficult it often is to reason about the order in which asynchronous code executes in JavaScript and Node.js.
Here's the challenge: The example prints a message to the console. It does so using all the various ways in which the execution of code can be scheduled in Node.js. Without running the code, can you tell me what message it prints to the console?
No cheating! And if you figure it out, keep the answer to yourself so that it's not spoiled for others who are trying to figure it out. (And if you have a difficult time with it, don't feel bad — I've shown this to seasoned Node.js core contributors who had difficulty working through it!)
I encourage you to really take the time to dig through this example. Through the rest of this blog post — particularly the part where we break down how the Node.js event loop works — we will give you the clues you need to figure out what message it generates.
Reasoning about order in the Node.js event loop
In nearly every case where we have worked with customers who are struggling with the performance of their Node.js applications, the issues come down to developers either not understanding, or not paying attention to, the order in which code will be executed and what effect that may have on everything else the application may be doing.
Specifically: if you cannot reason about the order in which your code will execute, you will be unable to optimise its performance.
Let's start with an example. Save the following script in a file called first.js
:
As with the earlier puzzle, develop a hypothesis on the order the console.log statements will be printed before you run this code. Then run the code using the command node first.js
and see if your hypothesis was correct.
Did the statements execute in the order you expected? What do you notice about the scheduling priority of each of the scheduling mechanisms? What surprised you most?
Now, let's change the example up just a little. Save the following in a separate file named second.js
:
The order in which the various console.log
elements are scheduled is identical to the first example. However, the difference is that we are scheduling those from within a callback that is invoked after asynchronously reading a file. In your best guess, will the order of the console.log
statements be the same when this code is executed? If not, why not? What role does the Node.js event loop play in the timing of these various operations?
Before we start to break it down, let's look at a third example. In this case, simply copy the first example from first.js
to first.mjs
— that is, create an identical file that differs only in the file extension. The *.mjs
file extension identifies it as an ESM module, rather than Node.js's more traditional "CommonJS". When you run this example using the command node first.mjs
, will the ordering of the console.log
statements remain the same as node first.js
? If not, why not?
Here is the output of each of the three examples shown side-by-side:
Remember, the order in which all three examples scheduled the console.log
statements is identical across all three examples, and the code in first.js
and first.mjs
is line-for-line identical. Why, then, would each example produce such different results? And what does all of this have to do with promises anyway? The answer to these questions and more lies in understanding the fundamental operation of the event loop.
The Node.js event loop: How it works and why it matters
One of the most important pieces of Node.js documentation that exists isn't even a part of the official Node.js documentation . It is a guide that breaks down the basic operation of the Node.js event loop, published as a separate document on the Node.js website: https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/ .
In this guide, you will find precise explanations of the event loop phases, the operation of process.nextTick()
, a description of how timers work, a description of setImmediate()
, an explanation of how asynchronous polling works and more. We consider it to be required reading for all developers who are building applications on top of Node.js. I don't want to duplicate everything that guide says here, but I do want to touch on a couple of fundamentals. Specifically, let's explore this diagram from the guide:
This diagram illustrates the phases of the Node.js event loop.
The event loop itself is really nothing more than a simple do/while loop. Within each iteration of the loop, a number of queues are checked to see if there is any work to do. Each of these queues represents one of the "phases" outlined in the diagram. At the end of loop iteration, an exit condition is checked to see if another iteration of the loop is needed. If that check determines that there's nothing more for the event loop to work on, the do/while loop exits and the Node.js process terminates.
There are queues for timers and pending callbacks, as well as an idle queue, a prepare queue, a polling phase (where we check to see if there are any pending notifications from the operating system), a check queue and a close callbacks queue. Each of these queues is essentially just a list of function references waiting to be executed. At each phase, the relevant queue is drained by executing its functions one after the other.
So, for instance, whenever you use setTimeout()
or setInterval()
to schedule a timer in Node.js, a callback in the event loop's timers queue is scheduled to process those timers. Whenever you asynchronously read a file from the underlying operating system, a callback is scheduled during the polling phase of the event loop. Whenever you use setImmediate()
, a callback in the check queue is scheduled.
So here's a key question: Where do promises fit in with all of this? The event loop guide was written several years ago and does not include any information about promises and async await, so some people have trouble understanding how and when promises get executed.
The answer lies in one of the most important, yet least understood, characteristics of the Node.js event loop.
The Node.js event loop is implemented in C by the libuv dependency library. At each phase the callbacks that are triggered are C/C++ functions (what we'll call the "native layer"). When that native layer function is invoked, it may or may not cause JavaScript to be executed. What's important to know, however, is that while that native layer function is executing (for however long it takes to execute) the event loop is stopped. The event loop will not continue until after that native layer function returns. Specifically, this means that while the callback is executing, the event loop cannot do the other things it is meant to do, like trigger timers, poll for operating system events, accept new HTTP requests, and so forth.
If the native layer function does execute JavaScript, it will call into V8 to invoke a JavaScript function. That JavaScript function may end up doing things like creating and resolving promises or calling process.nextTick()
to schedule tasks. What is unique about promises and "nextTicks" is that those are stored in two separate queues that are processed independently of the event loop. Those queues are special in that they are drained every time control returns back to a native layer function that has used V8 to invoke JavaScript. This can happen many times per event loop turn and can happen many times during any phase of the event loop.
The diagram above illustrates what is perhaps the single most important concept that you will ever need to know about the performance of Node.js applications, so study it well.
Look back at the examples given previously — both the puzzle and the three example orders. Can you identify what JavaScript in each of those examples is blocking the Node.js event loop? Think about it carefully because it's actually a bit of a trick question!
The answer? All of it!
The basic rule of thumb is this: Whenever JavaScript is executing in Node.js, the event loop is blocked. The longer your JavaScript takes to run, the longer your event loop will be blocked from doing anything else. The longer your event loop is blocked, the worse your Node.js application performance will be. It's really that simple.
So what does this have to do with "broken promises"?
In our experience, the overwhelming majority of cases we see with our customers are applications that allocate thousands upon thousands of synchronously-resolved promises in tight synchronous loops or hot code paths that are repeatedly executed. In one extreme example, for instance, I worked with one customer who created over 30,000 synchronously-resolved promises in a single for-loop that ended up blocking the Node.js event loop for over a minute! The worst part was that only a very small part of that code actually scheduled asynchronous work, meaning that most of the promises created were wasted allocations.
Think about the diagram above and what this code was doing: Some bit of JavaScript was being executed, creating thousands of promises in a blocking for loop, most of which were resolved synchronously — which means the thousands of then
or catch
functions were being put immediately into the microtask queue that is immediately drained after the for loop exits and control returns back to the native layer function. Those thousands of then
handlers would each schedule additional then
handlers which would also be put into the microtask queue, and drained, and so on. Because most of those were resolved synchronously, all of this would simply cause the event loop to be blocked waiting for the native layer function to finally return control back to it so it could move on to the next thing.
Let's return to the three examples running side by side that schedule code in the same order but print different results:
Using what we just explained about the event loop phases and the nextTick and microtasks queues, can you reason about why these examples have such different results?
In the first example, the Node.js native layer invokes the JavaScript in first.js
at startup, then starts the event loop only if there is work for the event loop to do. When that initial bit of JavaScript has finished running and control returns back to the native layer, Node.js drains the nextTick and microtask queues — in that order. That is why, in the first column, we see the three nextTick statements followed by the two then statements and microtask statements. After those print, the event loop starts to turn and we move through each phase of the event loop where timers and immediates are invoked.
In the second example, the only thing that the initial bit of JavaScript does is schedule an asynchronous read of the file. None of the other callbacks are scheduled until after the event loop has started. The callback that does the scheduling is invoked during the event loop polling phase. Here we see that as soon as that JavaScript callback is complete, the nextTick and microtask queues are drained, exactly as in the first example. However, notice that, unlike the first, the immediates are executed before the timers. That is because we are still in the middle of the event loop turn. Functions scheduled using setImmediate()
are executed after the polling phase and before the start of the next event loop iteration. Timers are always executed at the start of the next event loop iteration.
In the third example, although the code is identical to the first, we see that the order of the statements has diverged even further. That is because the file is processed as an ES6 Module — which means the JavaScript is being executed within the context of a promise after the event loop has already started. Any synchronously pending microtasks end up being drained before the nextTicks in that case, and because the event loop has already started and the JavaScript ends up being run during the polling phase of the event loop, we again see the immediates being printed before the timeouts.
The timing differences here matter a great deal when we need our Node.js applications to perform well under load. It also depends a great deal on what kind of application you are building. If your Node.js application is intended to be used by just one person locally on their own desktop, reading a file and crunching through some data, then event loop blocks really are not that big of a deal — in fact, in some cases, it is better to block the event loop. However, if your Node.js application is driving an HTTP server that needs to serve thousands of requests, it is critical that you allow your event loop to turn — you need to carefully reason about and design your code to allow JavaScript functions to run as quickly and as efficiently as possible, so that the event loop can continue to turn and move on to processing that next request.
Using promises correctly
In our Broken Promises workshop, we break down all of the various ways we've seen promises abused in real world applications and show how to correct those issues. Some of the areas covered include:
- The dangers of using promises in APIs that do not expect them
- The dangers of creating and resolving promises in loops (and how to do so correctly)
- The correct way to mix events and promises
- The correct way to mix traditional callbacks and promises
- The correct way to cache promises
- Understanding how promises branch and fork and how to handle those correctly
- The dangers of using Promise.race() and Promise.all() and how to handle those correctly
- How to cancel promises correctly using the standard AbortSignal and AbortController APIs
- How to handle errors and promise rejections correctly
We frequently present an abridged three-hour version of the workshop at events and conferences, but to really get the full picture we offer companies a more expansive three-day workshop that not only breaks down promises but our entire methodology around diagnosing and fixing Node.js performance issues.
If you feel your team would benefit from having a deeper understanding of promises and Node.js performance in general, please reach out for details on pricing and availability of the full Broken Promises and Node.js Performance Workshop.
Insight, imagination and expertly engineered solutions to accelerate and sustain progress.
Contact