10 tips for coding with Node.js #4: reproduce core callback signatures

By: David Mark Clements

Welcome to part four of the Ten Tips decalogy. Here are the ten tips:

  1. Develop debugging techniques
  2. How to avail and beware of the ecosystem
  3. How to know when (not) to throw
  4. Reproduce core callback signatures
  5. Use streams
  6. Break out blockers
  7. Deprioritize synchronous code optimizations
  8. Use and create small single-purpose modules
  9. Prepare for scale with microservices
  10. Expect to fail, recover quickly

In this post, we’ll be investigating the various ways of managing control flow with callbacks. The Node.js-style callback (a.k.a the error-first, a.k.a the errback) is the main recommended approach. While higher-level abstractions are certainly worth considering, it’s important to have a firm grasp of the trade-offs.

Emulating native APIs

When it comes to JavaScript and the native environments that it exists in, reproducing patterns used by native APIs is usually not encouraged, especially not patterns in the browser.

For instance, the onEvent style from DOM Level 0 (e.g. onClickonMouseOver, etc.) isn’t a great API to imitate. Event-methods can easily be overwritten, or gratuitously modified by third-parties and registering multiple events requires a custom user-land implementation of an event stack. A better approach to event management would be via an event emitter.

Even when new APIs are added to the browser, that shouldn’t be a signal for users to adopt these approaches. For instance, the comparatively recent WebSockets API did replicate this approach (onOpenonMessage).

An example of a Node.js API that we shouldn’t reproduce is the simulated virtual function approach implemented by Node core streams, where the stream creator must sub-class and then supply _read_write or _transform methods. This leads to class noise and requires extensive explanation via documentation. A better approach is the revealing constructor pattern.

Continuation passing style

However, there is one pattern in Node.js core that absolutely should be reproduced, which is the way core APIs use callbacks.

The humble callback is an implementation of continuation passing style programming (CPS). A continuation is essentially an operational building block, it’s a flow-control primitive.

There are two ways to get data out of a function (without mutating external state, that is): returning a value or passing a value through a continuation (through a callback).

//returning a value
function returnSquare(n) {
  return n * n;
}

//passing a value through a continuation
function cpsSquare(n, cb) {
  cb(n * n);
}

Unlike returning from a function, callbacks allow us to control when the data is released from the function:

function asyncSquare(n, cb) {
  setTimeout(cb, 1000, n * n);
}

Callback arity

Since callbacks are functions, we can pass multiple values in as arguments:

//not a great design..., but:
function squareAndCube(n, cb) {
  var sq = n * n;
  cb(sq, sq * n);
}

squareAndCube(10, console.log.bind(console, '10² = %d, 10³ = %d'));

Using multiple parameters for return values is typically a poor design, for the same reason that functions with lots of parameters are a bad idea: it demands that humans associate values to indices instead of namespaces.

Emulating named parameters with objects is a better way to return multiple values (both for return and callbacks):

//better design, but what about errors?
function squareAndCube(n, cb) {
  var sq = n * n;
  cb({square: sq, cube: sq * n});
}

Errors and values

Whilst using multiple callback arguments for multiple values is considered a poor design choice, using two arguments to communicate a single value and error state turns out to be a powerful abstraction.

By definition a continuation allows us to pass state on. Passing errors through callbacks delegates handling to a consumer. This is perfect for scenarios where the severity of an error is determined by it’s surrounding context – which is almost all operational errors, any errors that involve user input, and some forms of developer errors.

function errbackSquareAndCube(n, cb) {
  if (typeof n !== 'number' || !Number.isNaN(n); ) {
    return cb(Error('n must be a number!'));
  }
  var sq = n * n;
  cb(null, {square: sq, cube: sq * n});
}

errbackSquareAndCube(userInputNum, 
  function processResults(err, results) { 
    if (err) { return displayUserError(err); }
    displayAnswers(results);
  });

Using continuations to send both error state and return value also has significant asynchronous advantages because it’s impossible to catch a throw outside of an asynchronous operation. See How to know when (not) to throw for an in-depth explanation.

Error first

We could put the error last, but there are a couple of advantages to putting it first. Primarily, it’s about inducing positive developer habits, one of the hardest yet most practical and cost-effective of all design goals.

Placing the error parameter between the developer and the result is a constant reminder to the developer to handle and propagate errors. If the error parameter was last it could easily be ignored.

It also removes any need to define a value if there is an error, which can sometimes act as a kind of fail-safe if the error isn’t handled. When an expected value is undefined it usually isn’t too long before the process throws upon attempting to execute undefined or looking up a property onundefined or generates some unexpected output due to a NaN which at least protects us from more nefarious bugs like memory leaks or security issues (although a NaN could feasibly create a security hole…).

Core patterns

The error-first callbacks, sometimes called errbacks, were chosen by core Node.js API developers early on. Node.js was the first project to use this pattern in a significant way. Many core asynchronous operations use the errback signature:

var fs = require('fs');
fs.readFile('./meta.yaml', function outputFile(err, buffer) {
  if (err) { return console.error('oh noes'); }
  console.log(buffer.toString());
});

Synchronous callbacks

Whilst some of our examples are in fact synchronous operations, the core API only ever uses callbacks for asynchronous operations. However it may be worth considering using continuations for all forms. This has two advantages.

First, it allows a function to seamlessly evolve form synchronous to asynchronous without refactoring. Secondly, it evades inherent problems with throw and try/catch (see How to know when (not) to throw). The downside is the additional boilerplate for synchronous functions, but whilst inconvenient, this may be a worthy trade-off, if such a discipline can be enforced across a team.

Hell

Continuations are meant to be strung together, replicating the form of core callback signatures essentially implements a consistent control flow protocol for an application, allowing for composable encapsulated asynchronous (and synchronous) logic.

However it can lead to code readability and quality issues when used naively:

function findPetsForHuman(id, cb) {

  getPerson({query: id}, function (err, person) {
    if (err) { 
      cb(err); 
    } else {

      findPets({
        species: person.preference.species,
        breeds: person.preference.breeds
      }, function (err, pets) {
        if (err) { 
          cb(err); 
        } else {
          filterPets({
            criteria: person.profile, 
            pets: pets
          }, function (err, matches) {
            if (err) { 
              cb(err); 
            } else {
              if (matches.length > 10) {
                filterPets({
                  criteria: person.preferences.niceToHave,
                  pets: matches,
                  max: 10
                }, function (err, matches) {
                  checkAvailability(matches, function (err, availablePets) {
                    if (err) { 
                      cb(err); 
                    } else {
                      cb(null, availablePets);
                    }
                  });
                });
              } else {
                checkAvailability(matches, function (err, availablePets) {
                  if (err) { 
                    cb(err); 
                  } else {
                    cb(null, availablePets);
                  }

                });
              }
            }
          });
        }

      });
  }

  });

}

The above example is comparatively mild compared to some occurrences in the wild.

As requirements become more complex, heavy use of callbacks leads to rightward syntax creep, otherwise known as the pyramid of doom or callback hell. However, callbacks per se are not the source of this problem. It’s fundamentally a code organization issue which is easily fixed by… organizing the code:

function findPetsForHuman(id, cb) {

  getPerson({query: id}, function petMatch(err, person) {
    if (err) { return cb(err); }

    findPets({
      species: person.preference.species,
      breeds: person.preference.breeds
    }, refine);
  });

  function refine (err, pets) {
    if (err) { return cb(err); }

    filterPets({
      criteria: person.profile, 
      pets: pets
    }, respond);

  }

  function respond(err, matches) {
    if (err) { return cb(err); }
    if (matches.length > 10) {
      return filterPets({
        criteria: person.preferences.niceToHave,
        pets: matches,
        max: 10
      }, function culledHandler(err, matches) {
        checkAvailability(matches, cb);
      });
    }

    checkAvailability(matches, cb);

  }

}

We were able to quickly tidy the code up by breaking some of the callbacks out into function statements. Function statements are hoisted which allows us to layout operational logic from top to bottom.

Nesting is also reduced by not using else branches, instead we can create logical branches by simply returning early from the function (and it doesn’t matter what we return because the values are never used).

We can also pass cb directly to checkAvailability, because cb is an errback and checkAvailability expects an errback. This is the principal benefit of establishing a consistent callback contract.

An advantage to breaking out functions is it forces us to name them, this allows for easier debugging. Having a stack filled with anonymous functions makes life difficult, so it’s best practice to name all functions. This is why we also named function expressions, not just those elevated into statements.

See Develop debugging techniques for more about anonymous functions.

Control flow patterns

The basic asynchronous unit (the callback) can be wrapped in higher level control flow patterns to increase code organization and associate semantic meaning with asynchronous logic. One library that has been particularly successful in this area is is async.

Our earlier example keeps querying for data based on refined criteria (for the purpose of explanation, the example is not optimal, IRL we would probably want to use SQL or MapReduce on the DB side).

The async.waterfall is built for this particular case, essentially allowing us to break up our logic into asynchronous steps:

function findPetsForHuman(id, cb) {

  async.waterfall([
    function findPetsForHumanStep1(next) { 
      getPerson({query: id}, next);
    },
    function findPetsForHumanStep2(person, next) {
      findPets({
        species: person.preference.species,
        breeds: person.preference.breeds
      }, next);
    },
    function findPetsForHumanStep3(pets, next) {
      filterPets({
        criteria: person.profile, 
        pets: pets
      }, next);
    },
    function findPetsForHumanStep4(matches, next) {
      if (matches.length <=10) { return next(null, matches); }
      filterPets({
        criteria: person.preferences.niceToHave,
        pets: matches,
        max: 10
      }, next);
    },
    checkAvailability //<-- step 5
  ], cb);

}

Notice that we’re still using the same errback idea, but we don’t have to handle an error parameter in every function, only the second argument to async.waterfall (where we pass cb) actually has an error parameter.

The async library is for heavy lifting, and that comes at a price (abstraction overhead, additional state). For small single purpose modules it tends not to be necessary unless there’s a lot of asynchronous activity. For application level code it can be very useful, both client and server side.

Alternative abstractions

There are other common forms of continuation passing style, all of which, at an atomic level, use callbacks. Some well known ones are:

  • promises
  • event emitters
  • streams
  • generators

Promises allow us to treat logic as an object, we can pass around a value we don’t have yet. Since promises are part of the EcmaScript 2015 standard and are implemented in more recent versions of v8 we’ll be seeing a lot more of them.

Event emitters are part of the Node.js core. Unlike an errback or a promise event emitters tend to be for communicating multiple values according to a namespace. This means they don’t use errbacks; instead, errors are communicated by calling a function associated with an “error” namespace:

ee.on('error', function (err) { /* deal it it */ });

We’ll be talking about streams in the next 10 tips article, streams are built on event emitters so they handle errors in the same way.

Generators are part of EcmaScript 2015, they allow the control flow of a function to be managed externally by calling next on an iterator object. The yield keyword is used inside the generator function to determine step points. For instance:

function * g() {  //<-- notice the asterisk
  yield 1;
  yield 2;
  yield 3;
}

var i = g();
console.log(i.next()); // {value: 1, done: false}
console.log(i.next()); // {value: 2, done: false}
setTimeout(function () { 
  console.log(i.next()); // {value: 3, done: false}
  console.log(i.next()); // {value: undefined, done: true}
}, 100);

This isn’t that exciting until we consider that since next can be called at any point, it can be called within a callback. Therefore, it’s possible to build a light abstraction around generators to provide asynchronous flow control in a synchronous style… and that’s what co does.

For this example, imagine that all the asynchronous calls return promises:

function findPetsForHuman(id, cb) {
  co(function* () {
    var person = yield getPerson({query: id});
    var pets = yield findPets({
      species: person.preference.species,
      breeds: person.preference.breeds
    });
    var matches = yield filterPets({
      criteria: person.profile, 
      pets: pets
    });
    if (matches.length > 10) {
      matches = yield filterPets({
        criteria: person.preferences.niceToHave,
        pets: matches,
        max: 10
      });
    }

    return yield checkAvailability(matches);  

  })
  .then(function (matches) {
    cb(null, matches);
  })
  .catch(cb);
}

Generators work in Chrome and Firefox, can be enabled in Node using --harmony flag and are enabled by default in io.js.

Generators with co are a really nice way to organize asynchronous logic and control the flow, but there is overhead. Both promises and generators spend a comparatively large amount of time on CPU but this may not be a problem since the bottleneck will be the asynchronous operation but it will use more resources.

Combined approach

There may be a temptation to simultaneously return one value from a function and pass another through a callback. Whilst it’s a novel idea, this is worse than passing multiple value arguments to a callback because it requires developers to retrieve values from two sources.

Dual APIs

One exception to avoiding the combined approach is to support both callbacks and promises. The absence of a callback could be used to signal a promise request instead:

function doAsyncThing(withVal, cb) {
  if (cb instanceof Function) {
    return asyncOp(withVal, cb);
  }
  return new Promise(function (resolve, reject) {
    asyncOp(withVal, function (err, result) {
      if (err) { return reject(err); }
      resolve(result);
    });
  })
}

Conclusion

Ultimately, understanding the errback and using it as a simple unit of asynchrony is an effective way to write JavaScript.

It’s a core language construct, and the convention is well known. Using errbacks makes it easy for other developers to interact with your APIs.

Using well-known higher level abstractions is fine, but remember that there is a cost to doing so. There should be a strong reason in the larger context for using a control flow library, or generators, or event-emitters (and often there is).

That’s all for now. I look forward to seeing you again in part 5: use streams.

By: David Mark Clements

David Mark Clements is a JavaScript and Node.js specialist based in Northern Ireland. He is nearForm's lead trainer and training content curator. From a very early age he was fascinated with programming. He first learned BASIC on one of the many Ataris he had accumulated by the age of nine. David learned JavaScript at age twelve, moving into Linux administration and PHP as a teenager. Node has become a prominent member of his toolkit due to its versatility, vast ecosystem, and the cognitive ease that comes with full-stack JavaScript. David is author of Node Cookbook (Packt), now in its second edition.
  • http://jowanza.com josep2

    Really great post!