Every day, individuals around the world use and send personal and sensitive information to an increasing number of remote services and every day, those services receive an increasing volume of traffic.
Each operation, even the smallest one, counts towards the performance and availability of a service. High performance and availability must be maintained without compromising the security of the system.
Keep reading to discover how we improved the performance of JSON Web Tokens (JWT), one of the most common authentication systems, in Node.js. We created a new plugin, fast-jwt, to demonstrate and measure the Node.js performance improvements. Using flamegraphs, we compared fast-jwt on a sample Fastify server with the existing jsonwebtoken implementation. This blog post also outlines the architecture of fast-jwt, which includes caching and asynchronous support.
What is a JWT?
A JWT is a compact and URL-safe token that contains a payload, consisting of one or more claims. Each JWT is cryptographically signed so that the receiving party can validate the integrity and validity of the claims.
Here’s an example of a JWT:
Each token consists of the following three dot-separated sections:
- Header: contains information about the token. For example, the algorithm used in the signature or the format of the payload.
- Payload: contains one or more claims, which store the information in the token. There is a reserved set of claims (for example, sub, aud and iss) but the standard allows user-defined claims.
- Signature: the cryptographic signature of the header and the payload, created using a well-known private key or secret and the algorithm defined in the header.
Internet technologies are increasing their use of JWT for the following reasons:
- It is an open standard, described in IETF RFC 7519.
- It’s easy to implement, with many existing libraries in multiple languages.
- As the only data transferred is a URL-safe string, it is compatible with most network protocols.
The most popular npm package for signing, decoding and verifying JWTs is jsonwebtoken. It is popular because it is very easy to use and is RFC compliant.
Here’s sample code that signs a payload, prints the token and then verifies the token.
Using JWT on the web is easy. The following code uses the fastify-jwt plugin (which uses jsonwebtoken under the hood) and the Fastify web server.If you execute the code above, you will see that the payload contains an additional claim, iat, that wasn’t part of the original payload. This iat claim, which stands for ‘issued at’, is the token creation date in Unix time (the number of seconds elapsed since midnight on Jan 1, 1970) and is one example of the standard claims defined in the RFC.
It registers two routes, one for signing and one for retrieving the authenticated user information.
Most calls received by a server include client authentication to ensure service security. This JWT verification is additional to each server action and JWT can impact performance if not properly implemented.
Performance of jsonwebtoken
The jsonwebtoken is the de-facto standard in the Node.js ecosystem, so here’s the golden question: how does it perform? To find out, we stress-tested the sample server above using Clinic and autocannon.
We used the following two commands to generate flamegraphs for both routes:
You can view the resulting flamegraphs by selecting the following links:
In both cases, the hottest frames are in jsonwebtoken package and its direct dependency, jwa (via jws). Let’s see how we can improve their performance.
Node’s crypto module performs the JWT signing and verifying operations. This is currently the fastest implementation available. So, we analysed the jsonwebtoken, jws and jwa source code to see what we could do to improve performance and how.
While the jsonwebtoken implementation is robust and effective, we found a fundamental problem; all operations are orchestrated by the jsonwebtoken package using the jws and jwa packages. This is not generally a problem, but as each of the three packages are developed as standalone packages, it is. This causes the repetition of many operations (such as input type and format validations) as input must be validated at each layer. In some cases, such as the stream interface of jws, which is not used by jsonwebtoken, there is also unwanted and unused overhead.
Finally, the jsonwebtoken resides in the public API it exposes. Each time a signing, decoding or verification is performed, the same set of options is provided and validated. Even though each operation has minimal impact on each request, they add up and result in slower operations (especially in a single-threaded environment like Node.js).
The split implementation also poses a problem when sending pull requests (PR) to change the code; PRs should be sent to each of the three packages, and applied and released together to ensure the changes work correctly. This is generally difficult but is even more difficult in this case as jsonwebtoken, jws and jwa do not have a common maintainer. Therefore, using a PR to improve the existing packages was not viable. Our solution was to write a new package, fast-jwt.
The purpose of fast-jwt is to improve jsonwebtoken performance while keeping the same features and a similar API. To do this, we established the following architecture principles:
- Minimise the number of external dependencies: except for the cache layer and a couple of small cryptographic utilities, fast-jwt has no external dependencies. This ensures the code is easily maintained and data flow can be followed.
- Use factory pattern and single ahead options verification: fast-jwt uses the factory pattern to create the signer, decoder and verifier functions. This ensures that all options (with the exception of the key, which might be fetched at execution time, depending on the options passed) are validated only once and only during the startup phase.
- Small public API: the public fast-jwt API consists of three factory functions (one for each operation) with a specific set of options.
With these principles, fast-jwt minimises the non-crypto overhead:
- Options are validated only once and during factory creation, removing unnecessary operations.
- As the data flow is easily followed, fast-jwt does not validate data twice.
Here’s the corresponding fast-jwt version of the sign-verify code:
And here’s the corresponding version of the Fastify-based server:
As the principle of fast-jwt is to provide and support the same features as jsonwebtoken, all operations needed to support callbacks. Also, we wanted to provide more, so we added support for promises (hence async functions too).
Originally, we chose to structure each factory function as shown in the following pseudocode:
The two inner functions share most of the code, except for one operation; resolving the key. In the async case, the key is an async function that must be called, rather than a string or a buffer. This resulted in code duplication and therefore was not optimal.
We, therefore, used a different approach, as follows:
And here’s the definition of
As you can see, these functions enable support for both callback and promises, both as input or output functions. And here’s the definition of
We introduced the use of caching in verify operations while developing fast-jwt to further improve performance further.
Most of the time, servers tend to process the same tokens. When verifying tokens, servers perform the same operations on the same data all the time (as typically the same user uses the same token in multiple time-close requests).
fast-jwt uses mnemonist to add a Least Recently Used (LRU) cache to all factories. Verified tokens are always cached. If verification fails, the error is also cached and the operation is not retried. Caching considers the time-sensitive claims of the token (iat, nbf and exp) and makes sure the verification is retried after a token becomes valid or after a token expires.
The idea of caching authorization information to improve performance is already supported by cloud services like AWS API Gateway, but it was not available directly in the application level yet.
To guarantee that the cache data is not compromised in case of unauthorized memory access, tokens are indexed in cache using a hashing algorithm which is chosen depending on the token encryption. For instance, HS384 encrypted token are hashed using the SHA384 algorithm, while EdDSA encrypted tokens with Ed448 curve are hashed using the SHAKE256 algorithm.
Performance improvements vary depending on the algorithm used. See the section Performance Comparison Between jsonwebtoken and fast-jwt for more details.
Worker threads evaluation
After our initial fast-jwt implementation, we reviewed the only piece of the entire flow that we couldn’t improve at all: cryptographic operations. As stated earlier, no matter which package you use, at some point you have to use Node’s crypto module to either create or verify the JWT signature. This is the bottleneck of any existing implementation and, unfortunately, there is currently no faster implementation.
Crypto operations are CPU intensive, which blocks Node’s event loop. As Node.js is single-threaded, this means the entire server is blocked.
When evaluating solutions to this problem, we tried using one of the latest additions to Node (since version 10.5.0): worker threads. This module enables parallel thread execution in Node that can share memory (via SharedArrayBuffer) and communicate via events.
Implementing worker threads in fast-jwt wasn’t difficult, but unexpectedly, performance reduced by 75% rather than improving. When we first used event-oriented communication, we thought that postMessage was responsible, as it must serialise and clone data by specification. We, therefore, tried to use postMessage for thread signalling and SharedArrayBuffers for transferring data. You can inspect this code here. This didn’t work out for the following reasons:
- Each system can only process a number of operations equal to the number of logical CPUs. When all logical CPUs are busy, putting jobs on a queue is counterproductive and therefore the best approach is to directly process jobs on the main thread.
- When the queues are not full, the interprocess communication (which in our case means postMessage and data copying via SharedArrayBuffer) is slower than just executing the job in the main thread.
Performance comparison between jsonwebtoken and fast-jwt
Here are the flame graphs for a sample Fastify server using both implementations:
The flame graphs show that fast-jwt spends most of its time in crypto operation. This is especially visible in signing when crypto operations are the active operations almost all the time.
Comparing fast-jwt and jsonwebtoken confirmed our initial hypotheses were correct – even though core crypto operations could not be improved, there was considerable room for performance improvement. And this was without considering caching, which is unavailable in jsonwebtoken.
Let’s start with the simplest operation, decoding. Unlike signing and verifying, the algorithm is not a factor as crypto operations are not performed.
Operations per Second
+ 50 %
The following results are for signing and verifying operations, having payloads with HS256 and RS256 algorithms, which are the most commonly used.
Operations per Second
+ 46 %
+ 31 %
+ 67 %
+ 88 %
As the results show, fast-jwt is faster than jsonwebtoken, especially when we used the HS256 algorithm, which is the most commonly used algorithm when using JWTs. We achieved the performance gain, as explained in the sections above, by removing redundant operations such as options validation and also adopting the factory pattern.
The algorithm RS256 is more CPU intensive and it occupies the majority of the verification time. In this case, fast-jwt improvements are limited.
Operations per Second
Verify with cache
+ 141 %
Verify with cache
+ 993 %
Caching dramatically improves performance. In particular, we found RS256 was one order of magnitude faster. The complexity of the cryptographic verification algorithm is replaced by a much faster token hashing algorithm and O(1) cache access.
This is the end of our little journey in the world of optimisation and open source.
First, we demonstrated how you can use flame graphs to troubleshoot performance issues and identify bottlenecks.
Then, we showed how you can improve software by removing all unnecessary complications and abstractions, even if you can’t improve its core operation. We achieved all this without compromising features.
We also introduced caching of verified tokens without sacrificing security. The performance improvements are astonishing and resulted in operations speeds from 3 to 10 times faster.
In our experience, if too many open source projects are used to perform a single task, it limits the code’s usability, maintainability and performance. If there is a performance issue, it might be spread between different projects, and therefore fixing via contributions might not be feasible. The only solution, unfortunately, is to write a different implementation from scratch to remove the issue.
Finally, we sincerely thank Filip Skokan from Auth0 for all his feedback on the original implementation. This helped us create a more efficient and secure module.
Fast-jwtis an experimental library to check if we could improve the performance in JWT verification. As for any
experimentalfeatures, we are eager to receive feedback. We do not plan to move
fastify-auth0-verifyto this module yet as
jsonwebtoken is more stable and secure.
If your current performance bottleneck is JWT verification, we’d love to hear from you on how we can work together to validate