Welcome to pino@7.0.0 - and the era of worker_thread transport

14 Oct 2021

Pino's latest feature, pino.transport, makes it possible to write transports using worker_threads.

We have just released pino@7.0.0 with a brand new feature, pino.transport(). The pinnacle of years of innovation in Node.js and JavaScript, this new capability is made possible by worker_thread , WeakRef and FinalizationRegistry . Before we get into this new pino.transport() , let's look at the history of Pino, the fastest JSON logger for Node.js.

The history of Pino

Back in 2016, when David Mark Clements and I announced Pino to the world at Node Summit , we made a bold claim: Your application should just log to stdout or to a file. The fundamental reason for this claim is to keep the main event loop of the application free from sending data out to third-party services.

A Node.js server can be viewed as a black box that generates logs from the network traffic it receives and sends them to a downstream receiver. This receiver must be able to handle as many logs as the application is producing in a given time unit; otherwise it will start accumulating logs in memory, leading to additional load on the event loop and memory issues similar to memory leaks.

In 2016, other loggers ran those transports within the main Node.js process, ballooning memory and slowing everything down: We recommended running the transports as Unix processes that would send the logs for further processing. However, this made logging more complex, requiring more work to wire custom logging scenarios.

Do we need separate processes? Now we have worker_threads and we can use them in Node.js! The idea for this work came to my friend Mark Marron a few years ago, when he implemented https://github.com/mrkmarron/logpp . Unfortunately, the reliance on a N-API/C++ module to interact with low-level threading primitives made it difficult to adopt that module in the wild. This approach was published on SIGPLAN notices https://dl.acm.org/doi/abs/10.1145/3393673.3276952 .

pino.transport() is here

Today we are launching pino@7 with the new pino.transport() , which allows us to write transports using worker_threads . The syntax for a transport is simple:

JavaScript

import fs from 'fs'
import { once } from 'events'
export default async (options) => {
  const stream = fs.createWriteStream(options.destination)
  // awaiting for open is not strictly necessary
  await once(stream, 'open')
  return stream
}

How would you use this?

JavaScript

import pino from 'pino'
const transport = pino.transport({
  target: '/absolute/path/to/my-transport.mjs'
})
pino(transport)

It is possible to log to multiple transports at the same time:

JavaScript

import pino from 'pino'
const transport = pino.transport({
  targets: [
    { target: '/absolute/path/to/my-transport.mjs', level: 'error' },
    { target: 'some-file-transport', options: { destination: '/dev/null' }
  ]
})
pino(transport)

We can even create a pipeline similar to a unix pipe:

JavaScript

import pino from 'pino'
const transport = pino.transport({
  pipeline: [
    { target: '/absolute/path/to/my-transform.mjs' },
    { target: 'pino-pretty' } // pino-pretty is now a transport
  ]
})
pino(transport)

Finally, it’s possible to create the transport in line with the pino creation:

JavaScript

import pino from 'pino'
const logger = pino({
  transport: {
    pipeline: [
      { target: '/absolute/path/to/my-transform.mjs' },
      { target: 'pino-pretty' } // pino-pretty is now a transport
    ]
  }
}
logger.info('hello pino@7!')

I talked about the design and evolution of pino.transport() at JSNation last June in "Multithreaded Logging with Pino" , and I will be providing an updated version of that talk at the NodeConf Remote 2021 .

Solving the exit problem

The most significant issue with asynchronous logging is handling the “exit” case. How do we make sure that everything is fully flushed before exiting the process? There are a few key events from Node.js:

process.on(‘beforeExit’): is emitted when the event loop has nothing else to run; this is the “normal” shutdown.
process.on(‘exit’) is emitted when process.exit() is called; in this case, the event loop is stopped and only synchronous actions can be completed before the process exits.

In the past, we recommended using pino.final() to install those handlers to control the process shutdown; pino.final() wraps around pino.destination(), and it makes sure that it calls flushSync(), ensuring that logs are flushed out. Unfortunately, we could not adopt this solution for the new transport system because we wanted to support fully asynchronous logging. Workers provide a good solution to this problem: During process.on(‘exit’) we can synchronously wait for the worker_thread to complete its execution and all data to be flushed correctly.

On a final note, using those events adds some global state that could extensively leak memory if multiple pino instances are created. To solve this problem, we created https://github.com/mcollina/on-exit-leak-free , which uses WeakRef and FinalizationRegistry.

Benchmarks

In order to measure the impact of the new transport system, we used pino@7 inside a Fastify application, measuring the 99th percentile of latency and the median request per second. The synchronous option is the default: Note that asynchronous logging has a few caveats, as noted in the linked document.

We measured the following scenarios:

sync to file is using pino.destination() with the sync: true option to save to a file.
async to file is using pino.destination() with the sync: false option to save to a file.
transport to file is using pino.transport() to save to a file.
sync to elastic is using pino.destination() with the sync: true option; the process is connected with a unix pipe to pino-elasticsearch.
async to elastic is using pino.destination() with the sync: false option; the process is connected with a unix pipe to pino-elasticsearch.
transport to elastic is using pino.transport() with pino-elasticsearch.

The first observation is that asynchronous logging should be preferred when sending data to elastic. While not plotted, asynchronous logging had a significantly higher memory usage as more data was buffered.

Overall, the overhead of shipping data to Elastic is quite minimal using either an external process (with async logging) or the new transport system. The new system offers a significantly improved developer experience, and we recommend that most users switch.

Thanks

This release would not have been possible without the support from NearForm, which sponsored my time in developing the feature and the work of Manuel Spigolon in updating all the known transports. Warm thanks are also due to James Sumners, who tirelessly reviewed all the changes, and to David Mark Clements, who developed the initial implementation of the thread-based transport.

Insight, imagination and expertly engineered solutions to accelerate and sustain progress.

Contact