When transforming complex data containing nested object hierarchies in native JavaScript, I often have to write complex nested looping structures. These mentally demanding implementations soak up precious development resources that should be dedicated to implementing business logic. Business logic provides real value for customers and end-users, whereas implementing and maintaining complicated mapping logic simply slows development velocity.

JSONata is a query and transformation language for JSON data. Contrary to its name, the reference implementation works with plain JavaScript objects, rather than strictly JSON. I have started using JSONata in an increasing number of my projects, and my colleagues at NearForm agree that JSONata provides an elegant solution to the data mapping headache. It provides a concise and powerful syntax to both query values from data and transform data into other formats (with filtering, mapping, and aggregations). This concise syntax allows me to write and maintain fewer lines of code when transforming data. JSONata helps to cure this headache and frees me and my team to solve real business problems in code.

Of course, there is a price: That elegant syntax and power are expensive in terms of performance. As my NearForm colleagues started finding more use cases for JSONata, I decided to benchmark the performance of various transformations to compare JSONata with native JavaScript implementations.

Is the performance cost of using JSONata worth the added benefit in terms of implementation? I’ll try to answer that question in this article.

TL;DR: Typically yes, but it depends on whether you have specific performance constraints.

What is JSONata and why do I love it?

When mapping complex data hierarchies, I like to compare JSONata to JSONPath (similar to XPath). APIs rarely agree on a common format, so I find myself having to constantly parse and transform data received from an API into something useful. This may be an internal model that will be saved in the backend or served to the client. At NearForm, my colleagues have even seen backend implementations where developers were allowed to structure their API schemas in the way best suited for the consuming caller, with the constraint that they provide a JSONata expression to map the schema to the internal data model.

Another common use case for JSONata is backend integrations — getting data from the API of one system, transforming it, and writing the data to an API of an independent system. This is a use case where JSONata really shines. The concise syntax makes it a breeze to reformat the data.

Whether transforming data between models or aggregating statistics from a dataset, JSONata facilitates the task with an elegant, concise, and powerful syntax.

No data, no problem

Similar to JSONPath or the commonly used lodash get function, queries that work over undefined values simply return nothing. This is also true for nested fields. The risk associated with this convenience is that there are no errors. If you query into a deeply nested field, and any field along the path is undefined, there will be no errors. If you make an error in the object path and the application continues to run, there will be no data. This can make debugging complex queries slightly challenging, but makes it far simpler when you are working with optional fields.The following example relates to an application that needs to create a list of all unique cities associated with users’ home addresses and ignore users without a home address.

users.addresses[type='home'].city

Filter data with predicates

The above example also demonstrates JSONata’s predicate syntax, which allows you to filter arrays of objects in a fluent syntax. It is super powerful and makes narrowing the input data very simple. Sometimes you want to filter data by a deeply nested field, possibly traversing arrays, and possibly based on a related value. The predicate syntax in JSONata has you covered for all of these use cases.

Mapping along the way

Mapping is what a transformation language is all about. In JSONata, the dot operator is not just for referencing properties of an object. Instead, it implements a map operation. This lets you iterate through collections of objects and map them into new objects in a fluent syntax.
An example would be an ecommerce application that needs to calculate the subtotal of each item in the user’s shopping cart:

orders.item.(price * quantity)

Calculate dynamic values using aggregations

Aggregations are super powerful when making data transformations. JSONata includes a standard set of aggregation functions such as sum, min, max, and average.
With a banking app, for example, instead of having to map over all of the transactions and then reduce them to calculate a sum, you can just call the sum aggregation while you are transforming the data.

$sum(accounts.transactions.amount)

DRY variables

Software engineers are taught to embrace the DRY principle. When transforming data, variables are sometimes required to eliminate potentially long, repeated expressions. If you need to parse several fields from a deeply nested object, JSONata lets you store that object in a variable, so that you can parse the fields from the variable instead of repeating the path to the deeply nested object. However, because JSONata expressions with variables can be harder to read due to JSONata’s unique syntax, it is best to reserve variables in your toolkit for when you really need them.

Imagine when mapping over users, my application needs to access the location to set multiple fields:

users.(
  $location := addresses[type='home'].city;
  {
    "location": $location,
    "id": $location & '-' & lastName
  }
)

Context binding to JOIN related data

This is an advanced feature of JSONata that can really save time. You may be working on an integration project where you need to call multiple APIs and reconstruct related data in code. This is where the context binding operator in JSONata really shines. Functionally, it is similar to a SQL JOIN. I can JOIN related data into new objects. Doing this in pure JavaScript typically requires nested loops to iterate through an initial collection and then find the related data in a second collection.

In this example, our ecommerce backend has an API for orders that returns item ids and a separate API for item details. We can JOIN the two responses using the JSONata context binding operator.

[email protected]$i.(inventory.items.details)@$d[$i.id=$d.id].{
  "quantity": $i.quantity,
  "description": $d.description
}

The end of the JSONata honeymoon

I love JSONata. It cuts out huge amounts of work for my colleagues and me when we’re transforming data. With less code and less mental exertion required, it promotes a rapid development velocity and easier maintenance. However, do all of these benefits come at a cost? And, if yes, can we measure that cost?
The first big concern is performance. If the performance cost of JSONata compares unfavourably with naive JavaScript, it may not be suitable for large-scale and heavy-load node.js back-ends. When services need to serve millions of requests per hour, every CPU cycle becomes precious. It is important to note that this is not the case for every backend service.

The other challenge is that JSONata is a Domain Specific Language and requires some ramp-up to onboard developers. The syntax is very compact and presents a learning curve that must be recognized when onboarding new team members. Some may argue that the compact syntax is less readable, but its compactness facilitates writing concise implementations. This is particularly true when compared with long functions and files full of nested loops with map and reduce.

The final point regarding the syntax is that it can be a challenge to debug. The fact that there are no errors for null or undefined values means that evaluating an expression may return nothing — no errors, just nothing. Some specific skill sets are required to be effective when debugging JSONata expressions. I like to mitigate these “surprises” by always validating incoming data before it is transformed and always unit testing JSONata expressions.

A need for speed

I decided to test my hypothesis that JSONata just cannot be as performant as native JavaScript. I created some performance benchmarks using benchmark.js. All of the code used to run the benchmarks can be found on GitHub here.

First I needed some data. I headed over to the Nobel Prize Developer Zone and decided to use their version 2 laureates and nobelPrizes API endpoints.

The benchmarks called the APIs to get all of the Nobel laureates and Nobel prizes from the year 2000 up to the year 2005. Several transformations of varying complexity were then performed over the data. I have attempted to write equivalent transformations using plain JavaScript and JSONata expressions. The benchmarks measure how many operations per second can be performed for each transformation, comparing the JavaScript and JSONata implementations.

The tests were run on my System 76 laptop equipped with a core i5 processor with 16 GB of RAM running Pop!_OS 20.04. I ran the tests using node.js 14.14.0, allowing me to use the optional chaining operator in the JavaScript implementations of the transformations. This is an important point because if I wrote the JavaScript transformations using an older version of node.js, the equivalent implementations would have been much more verbose to ensure properties were defined.

The data

Below is a sample document for a Nobel laureate. I have removed some of the unused fields for readability. The full document can be fetched from the API here.

{
  id: '745',
  knownName: { en: 'A. Michael Spence', se: 'A. Michael Spence' },
  givenName: { en: 'A. Michael', se: 'A. Michael' },
  familyName: { en: 'Spence', se: 'Spence' },
  fullName: { en: 'A. Michael Spence', se: 'A. Michael Spence' },
  gender: 'male',
  birth: {
    date: '1943-00-00',
    place: {
      city: { en: 'Montclair, NJ', no: 'Montclair, NJ', se: 'Montclair, NJ' },
      country: { en: 'USA', no: 'USA', se: 'USA' },
      cityNow: { en: 'Montclair, NJ', no: 'Montclair, NJ', se: 'Montclair, NJ' },
      countryNow: { en: 'USA', no: 'USA', se: 'USA' },
      continent: { en: 'North America' },
      locationString: {
        en: 'Montclair, NJ, USA',
        no: 'Montclair, NJ, USA',
        se: 'Montclair, NJ, USA'
      }
    }
  },
  links: {
    // ...
  },
  nobelPrizes: [
  // ...
  ]
}

Below is a sample document of a Nobel prize document. As in the example above, I have removed some of the fields for readability. You can access the complete document from the API here.

{
  awardYear: '2000',
  category: { en: 'Chemistry', no: 'Kjemi', se: 'Kemi' },
  categoryFullName: {
    en: 'The Nobel Prize in Chemistry',
    no: 'Nobelprisen i kjemi',
    se: 'Nobelpriset i kemi'
  },
  dateAwarded: '2000-10-10',
  prizeAmount: 9000000,
  prizeAmountAdjusted: 11453996,
  links: {
    // ...
  },
  laureates: [
    {
      id: '729',
      knownName: { en: 'Alan Heeger' },
      portion: '1/3',
      sortOrder: '1',
      motivation: {
        en: 'for the discovery and development of conductive polymers',
        se: 'för upptäckten och utvecklandet av ledande polymerer'
      },
      links: {
        // ...
      }
    },
  // ...
  ]
}

An observant reader may notice that some of the transformations I have tested are already provided by the API. This is true, and these transformations are contrived. However, they use real data fetched from multiple API endpoints, making them relevant for a real-world web server.

The benchmarks

Transformation 1: Simple mapping

The first benchmark collects all of the English names of the laureate documents.

JavaScript

function jsSimple(input) {
  return input.laureates
  .map((laureate) => laureate?.knownName?.en)
  .filter((name) => name);
}

JSONata

const jsonataSimple = jsonata('laureates.knownName.en');

Transformation 2: Complex mapping

JavaScript

function jsComplex(input) {
  return input?.laureates.map((l) => ({
    name: l?.knownName?.en || l?.orgName.en,
    gender: l?.gender,
    prizes: l?.nobelPrizes.map((p) => p?.categoryFullName?.en)
  }));
}

JSONata

const jsonataComplex = jsonata(`
  laureates.{
    "name": knownName.en ? knownName.en : orgName.en,
    "gender": gender,
    "prizes": nobelPrizes.categoryFullName.en[]
  }
`);

Transformation 3: Joining related data

JavaScript

function jsJoin({laureates, prizes}) {
  // Map over each prize (flatMap automatically removes the resulting nesting)
  return prizes.nobelPrizes.flatMap((prize) =>
    // Filter all laureates who have an id associated with the prize.
    // This is complex because each prize can have multiple laureates.
    laureates.laureates
    .filter((laureate) =>
      prize.laureates
      .map((prizeLaureate) => prizeLaureate.id)
      .includes(laureate.id)
    )
    // Map each laureate and prize to the new data structure
    .map((laureate) => ({
      name: laureate?.knownName?.en,
      gender: laureate?.gender,
      prize: prize?.categoryFullName?.en
    }))
  );
}

JSONata

const jsonataJoin = jsonata(`
  (prizes.nobelPrizes)@$p.(laureates.laureates)@$l[$l.id in $p.laureates.id].{
    "name": $l.knownName.en,
    "gender": $l.gender,
    "prize": $p.categoryFullName.en
  }
`);

Benchmark results

Below are the results from the benchmarks:

JSONata

The results of the benchmarks speak for themselves. The native JavaScript implementation is always faster, sometimes by factors in the hundreds. The actual benchmark suite contains a few more benchmarks than I have shown here, but they reinforce the same conclusion.

An important disclaimer: My JavaScript implementations are written in pure JavaScript with little emphasis on code-reuse. This allowed for an implementation that can be heavily optimized by the JavaScript runtime. Most projects will look for a library or create their own generic solution to allow for code reuse between transformations. The result is that my benchmarks represent a worst-case scenario. In real-world systems, the differences in performance probably will be less dramatic. In these systems, the added complexity of these pure JavaScript implementations will have a bigger impact on the development and maintenance velocity.

Conclusion

As with many coding problems, the best solution cannot always be measured with a benchmark. Instead, the art of being a software engineer involves understanding the requirements and choosing the solution that best meets the business and technology constraints. Performance is just one of the pillars involved in constructing an application. Development velocity and maintenance are equally important (or even more so) when determining the best solution.
JSONata is still fast, especially for an application that makes many calls to APIs or works heavily with a database — operations that typically consume magnitudes more time than transformations. An important principle in fixing performance issues in applications is to avoid chasing micro-optimizations. Instead, performance enhancements should be chosen based on their impact and complexity to implement.

For the moment, I cannot recommend JSONata for high-load applications. In services that are required to serve millions of requests per hour, every CPU cycle becomes precious.

For projects that integrate systems, however, I cannot recommend JSONata enough. It has become a vital tool in my toolkit. Squeezing a few milliseconds of performance out of these services serves no purpose — especially when you consider the gains in development and maintenance velocity that are possible when working with JSONata.

I hope to see some future performance improvements in JSONata. There are some open issues related to the project performance. As the performance of JSONata increases, I am confident that the balance of performance, development velocity and maintenance will shift, and JSONata will soon become a valuable solution in many of my projects, including high-load applications.

Cody Zuschlag is a Senior Software Engineer at Nearform and a part-time instructor at the Université Savoie Mont Blanc in Annecy, France. He has extensive experience working with node.js, cloud-first applications, and managed databases. His passion is creating the best developer experience and sharing technical knowledge to enable developers to create the best solutions. You can connect with Cody on LinkedIn.

View all posts  |  Technology  |  Business  |  Culture  |  Opinion  |  Design
Follow us for more information on this and other topics.
Published by Cody Zuschlag
19th November 2020