Workers in Node.js: How to write a sudoku-solving server


Node.js Core Maintainer
Node.js, Open Source | 12th November 2019

One awesome thing about Node.js 12, which became the main Long Term Support release line in October 2019, is stable support for Worker threads.

Workers are originally a feature of the Web, where they have enabled developers to run background tasks without blocking the Browser’s rendering thread for a long time. Similarly, in Node.js running synchronous JavaScript code for a long time is considered a bad practice, as it keeps the event loop from handling I/O and e.g. other HTTP requests that are waiting.

Traditionally, the solution to the issue of CPU-intensive tasks in Node.js has been spawning multiple child processes that handle requests in parallel, managed through process managers like pm2 or the built-in cluster module of Node.js. Workers do not replace this model, but they do provide an alternative for use cases in which easier communication between the different tasks is desirable. For example, they provide ways to share or transfer typed arrays when that is helpful.

Those who have used Web Workers will find a familiar API in Node.js. Some of the differences include that Node.js does not have an equivalent of EventTarget, and instead Workers use the more customary EventEmitter API – for example, messages from inside the Worker are received through `worker.on(‘message’, callback);` rather than `worker.onmessage = callback;`.

So let’s actually put together a concrete example: Let’s say we want to provide an API that allows developers to send unsolved sudokus to it and get back solved ones. Solving a sudoku doesn’t take a ton of time – something on the order of 40 ms on my laptop – but it’s enough to say that maybe we don’t want everything else in the process to wait for it to finish.

The full code used in this blog post is also available as a Github repository in https://github.com/addaleax/workers-sudoku.

Basic Communication

Assume we’ve already written a sudoku solver that takes an Array of 81 fields, with values 1–9 inside the array for fields with a fixed value and 0 where we don’t know the number in the field, and we want that solver to run on its own thread. Here’s what that could look like:

'use strict';
const { parentPort } = require('worker_threads');
// See the Github repo for the full sudoku solving code
const { solveSudoku } = require('./solve-sudoku.js');

// parentPort is the Worker’s way of communicating with the parent, similar to
// window.onmessage in Web Workers.
parentPort.on('message', (sudokuData) => {
  const solution = solveSudoku(sudokuData);
  parentPort.postMessage(solution);
});

In browsers, Workers run in very different environments with access to much fewer APIs than the main thread, but in Node.js, Workers behave much closer to standard Node.js scripts; for example, `require()` works the same way it always does in Node.js, and all built-in modules are available. Node.js also doesn’t add special methods to the global object for Workers, so for example for communication from the main thread, the `parentPort` object needs to be loaded from the worker_threads module. It is an instance of `MessagePort`, which in Node.js has the `.on(‘message’)` and `.postMessage()` APIs that can be used as shown above.

Now, as a second step, we need to talk to that Worker thread from our main application somehow. Assuming that we saved the code above as worker.js, here’s what that could look like:

'use strict';
const { Worker } = require('worker_threads');

// Example Sudoku based on the one on Wikipedia’s 'Sudoku' page:
const sudoku = new Uint8Array([
  5, 3, 0, 0, 7, 0, 0, 0, 0,
  6, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 9, 8, 0, 0, 0, 0, 6, 0,
  8, 0, 0, 0, 6, 0, 0, 0, 3,
  4, 0, 0, 8, 0, 3, 0, 0, 1,
  7, 0, 0, 0, 2, 0, 0, 0, 6,
  0, 6, 0, 0, 0, 0, 2, 8, 0,
  0, 0, 0, 4, 1, 9, 0, 0, 5,
  0, 0, 0, 0, 8, 0, 0, 7, 9,
]);

const worker = new Worker('./worker.js');
worker.postMessage(sudoku);
worker.once('message', (solution) => {
  console.log(solution);
  // Let the Node.js main thread exit, even though the Worker
  // is still running:
  worker.unref();
});
 

This prints 

$ node local-test.js 
Uint8Array [
  5, 3, 2, 1, 7, 6, 9, 4, 8, 6, 7, 4,
  3, 9, 8, 5, 1, 2, 1, 9, 8, 2, 4, 5,
  3, 6, 7, 8, 5, 9, 7, 6, 1, 4, 2, 3,
  4, 2, 6, 8, 5, 3, 7, 9, 1, 7, 1, 3,
  9, 2, 4, 8, 5, 6, 9, 6, 1, 5, 3, 7,
  2, 8, 4, 2, 8, 7, 4, 1, 9, 6, 3, 5,
  3, 4, 5, 6, 8, 2, 1, 7, 9
]

to the console, so it actually works. Yay! Again, the `.on(‘message’)` and `.postMessage()` APIs are also available on the Worker object itself and represent the other end of the communication channel to which `parentPort` belongs.

We’re using a `Uint8Array` here, even though an `Array` would also work: The reason for this is that it’s a bit more efficient to pass typed arrays between threads than regular arrays, but more on that later.

Worker Pooling

This works, but it turns out that there are two sides to Node’s Workers being as powerful as they are: Because they are basically full-featured Node.js instances, starting one takes a few milliseconds each time, so in practice, it’s better to keep a few Worker instances in a so-called Worker pool readily available to answer requests. There is a number of npm packages that implement worker pools, and while it is recommended to use them in practice, for this example we’ll implement a worker pool ourselves. (This also means we’ll skip implementing advanced features such as proper tracking of asynchronous operations – npm packages for worker thread pools should implement those.)

So let’s put together a real-world HTTP server that accepts HTTP requests with a JSON payload containing unsolved sudoku, and returning the solved one in the response. We’ll use a Worker pool with a fixed size, meaning that when we need to run a task, we take a Worker from the pool when one is available and otherwise wait until one becomes available.

'use strict';
const http = require('http');
const { Worker } = require('worker_threads');

const workerPool = [  // Start a pool of four workers
  new Worker('./worker.js'),
  new Worker('./worker.js'),
  new Worker('./worker.js'),
  new Worker('./worker.js'),
];
const waiting = [];

http.createServer((req, res) => {
  let body = '';
  req.setEncoding('utf8');  // Receive strings rather than binary data
  req.on('data', chunk => body += chunk);
  req.on('end', () => {
    let dataAsUint8Array;
    try {
      dataAsUint8Array = new Uint8Array(JSON.parse(body));
      // Fix the length at 81 = 9*9 fields so that we are
      // not DoS’ed through overly long input data.
      dataAsUint8Array = dataAsUint8Array.slice(0, 81);
    } catch (err) {
      res.writeHead(400);
      res.end(`Failed to parse body: ${err}`);
      return;
    }

    res.writeHead(200, {
      'Content-Type': 'application/json'
    });
    if (workerPool.length > 0) {
      handleRequest(res, dataAsUint8Array, workerPool.shift());
    } else {
      waiting.push((worker) => handleRequest(res, dataAsUint8Array, worker));
    }
  });
}).listen(3000);

function handleRequest(res, sudokuData, worker) {
  worker.postMessage(sudokuData);
  worker.once('message', (solutionData) => {
    res.end(JSON.stringify([...solutionData]));

    // Put the Worker back in the queue.
    if (waiting.length > 0)
      waiting.shift()(worker);
    else
      workerPool.push(worker);
  });
}

Running `node server.js` spins up an HTTP server on port 3000:

$ curl -d '[5,3,0,0,7,0,0,0,0,6,0,0,0,0,0,0,0,0,0,9,8,0,0,0,0,6,0,8,0,0,0,6,0,0,0,3,4,0,0,8,0,3,0,0,1,7,0,0,0,2,0,0,0,6,0,6,0,0,0,0,2,8,0,0,0,0,4,1,9,0,0,5,0,0,0,0,8,0,0,7,9]' http://localhost:3000/

[5,3,2,1,7,6,9,4,8,6,7,4,3,9,8,5,1,2,1,9,8,2,4,5,3,6,7,8,5,9,7,6,1,4,2,3,4,2,6,8,5,3,7,9,1,7,1,3,9,2,4,8,5,6,9,6,1,5,3,7,2,8,4,2,8,7,4,1,9,6,3,5,3,4,5,6,8,2,1,7,9]

So we can actually send in unsolved sudoku and get the right solution back!

Transferring data

To optimize this application further, we can replace

worker.postMessage(sudokuData);

with 

worker.postMessage(sudokuData, [sudokuData.buffer]]);

That is, we can leverage the fact that we’ve stored the `sudokuData` as a typed array, and instead of copying its contents we can transfer it, by adding the underlying `ArrayBuffer` to the “transfer list”, a list of objects that are moved to the receiving end of the communication channel rather than copied. Currently, only `ArrayBuffer` and `MessagePort` are supported as transferables. Once a transferable has been posted using `.postMessage()`, it is no longer usable in the sending thread – for example, `sudokuData` will appear as an empty array after the call.

By applying this optimization in the server and the worker code, we can almost entirely get rid of data copying! While this optimization may not save a lot of time when sending 81 bytes back and forth for a sudoku puzzle, for larger objects it becomes quite noticeable.

Another trick that achieves a similar thing is to actually share memory rather than moving it between threads; for that, we would use code similar to the following:

const sudokuDataShared =
  new Uint8Array(new SharedBuffer(sudokuData.length));

// Copy into the new Uint8Array
sudokuDataShared.set(sudokuData); 

worker.postMessage(sudokuDataShared);

This way, both threads would be able to access the same typed array at the same time – this is a very powerful feature, although it can be tricky to get concurrent data access right.

 

Anna is a seasoned software developer and maintainer of Node.js Core, known for major Node.js features such as Workers, Brotli support and strong involvement in the HTTP/2 effort among many others. She is also part of the NearForm Research Team. Apart from over a decade and a half of software development experience, she has a strong mathematical background and is passionate about sharing what she learns in her work with the wider community through speaking at conferences and meetups. 

 

Top