Wormholes in JavaScript

Wormholes in JavaScript

Computers are interesting machines. From a theoretical point of view, we tend to think of them as automated mathematicians, or put another way, just really good at adding, multiplying, and working with numbers in general.

The automated mathematician, however, is a deceptive abstraction. It turns out the computer is much faster at certain kinds of math versus others. If you are writing JavaScript (or any other language) and care about performance for algorithms, it’s important to understand how computers work underneath.

If we know what the computer is really fast at, we can take shortcuts, or wormholes, to make our programs run much faster than we’d expect them to.

Modulus wormhole

What exactly do we mean by this? Let’s look at an example. Let’s say we want to implement a circular list. A circular list is a list that has a fixed size, where inserts larger than the list size loop around the list. Circular lists are useful for a bunch of things such as collecting stats at regular time intervals, buffering data and more, but let’s just look at an implementation of this:

const list = new Array(15000)
function set (i, item) {
  // The % operator is called modulo, and returns the remainder when dividing
  // the number on the left of it with the number on the right.
  // Using it here makes i loop around the list size.
  list[i % list.length] = item
}

How fast is this? Let’s run a simple benchmark

console.time()
for (var i = 0; i < 1e9; i++) {
  set(i, i)
}
console.timeEnd()

On my computer it takes ~4s to run that benchmark with 1 billion inserts. Not too bad.

However, let’s apply a computing wormhole and change the size of the array to a magic number:

// Change the size of the list from 15000 to 16384
const list = new Array(16384)

function set (i, item) {
  // The % operator is called modulo, and returns the remainder when dividing
  // the number on the left of it with the number on the right.
  // Using it here makes i loop around the list size.
  list[i % list.length] = item
}

Try running the benchmark again. On my computer it now only takes ~1.5s. A more than 2x improvement by simply tweaking the size. To understand why is that, we need to understand that, under the hood, the computer works with numbers in base 2 or binary. This important to know if we are doing a modulo operation (the % op). It’s much easier to do that against a number that fits 2 ^ n and 16384 is 2 ^ 14. In fact it is as easy as looking at a number in binary and simply taking the last n bits.

For example what is 353500 % 16384? 353500 in binary is 1010110010011011100. Since 16384 == 2 ^ 14 we just need to take the last 14 bits of that which is 10101(10010011011100) or 9436.

We can use this knowledge to apply another wormhole. On a computer it is easy and fast to take the last n bits. In fact, to do so, we just need to do a bitwise-and (the & op) with the number (2 ^ n) - 1.

const list = new Array(16384)

function set (i, item) {
  // Use the & op which is very fast instead of % since list.length is 2 ^ n
  list[i & (list.length - 1)] = i
}

Running the benchmark now takes it down to ~1s on on my machine or a 4x improvement since the first iteration. All by understanding how the computer works.

A smart compiler or VM will be able to optimise this by converting the modulo op to the bitwise-and behind the scenes. In fact the latest V8 JavaScript VM (not yet released in Node.js) does exactly that

Number wormholes

Another useful wormhole is understanding how reading and writing numbers work. Remember how we used to have 32 bit computers and now we have 64 bit? And before 32 bit we had 16 bit. What exactly does this mean? Normally we think of this in terms of how much RAM you can have in your computer. 2 ^ 32 == 4294967296 or 4GB, meaning we can only address 4GB of RAM on a 32 bit computer. When writing JavaScript program we probably don’t need to worry too much about this, as we seldom use that much memory.

However there is an important thing to understand about 32 vs 64 bit computers. Since the CPU has 64-bit registers on a 64 bit computer, doing operations with a 64 bit number is twice as fast compared to a 32 bit computer, where you only have 32-bit registers.

How can we use all this bit talk as a wormhole? Let’s say we have to write a simple program that copies one Uint8Array to another one. If you are not familiar with Uint8Arrays they are similar to Buffers in Node.js, or just “raw” memory.

function copy (input, output) {
  // for simplicity let’s assume input.length <= output.length
  for (var i = 0; i < input.length; i++) {
    // copy each 8 bit number (byte)
    output[i] = input[i]
  }
}

Again, let’s test how fast this is with a simple benchmark

// let’s allocate two 1MB Uint8Arrays for our benchmark
const input = new Uint8Array(1024 * 1024)
const output = new Uint8Array(1024 * 1024)

console.time()
for (var i = 0; i < 1e4; i++) {
  copy(input, output)
}
console.timeEnd()

On my computer it takes ~7.5s to finish. How can we use a wormhole to speed this up? Using the Uint8Array we are only copying 8 bits at the time, but since we have 64 bit computers we could be copying up to 64 bits at the time. We can do this in JavaScript by converting our Uint8Array to a Float64Array before copying, which costs us nothing to do.

function copy (input, output) {
  // for simplicity let’s assume input.length <= output.length

  // view the input and output as 64 bit numbers instead.
  // we are actually using floating point numbers here since each of those
  // take 64 bits to represent.
  // when BigInts are optimised in JavaScript, we could use a BigInt64Array.

  const input64 = new Float64Array(input.buffer, input.byteOffset, input.length / 8)
  const output64 = new Float64Array(output.buffer, output.byteOffset, output.length / 8)

  for (var i = 0; i < input64.length; i++) {
    // copy each 64 bit number
    output64[i] = input64[i]
  }
}

Running the benchmark again it now only takes ~1s to run or almost a 8x speedup.

For copying, the optimal solution is to use the array.set(otherArray)  method on the Uint8Array which does a copy for us in native code which is even faster because of some other wormholes. For reference that takes about ~0.2s with our benchmark on my computer or a 5x speedup from the solution above.

The JavaScript galaxy is full of wormholes

Using the wormholes above we can make a ton of real world algorithms a lot faster. In fact many more wormholes exist. Personally my latest favorite is the Math.clz32 method that returns the number of leading 0 bits in a 32 bit number. We can use this one for a lot of interesting algorithms. I recently used it to speed up a bitfield implementation by 4x while reducing it’s memory footprint by 4x as well, which allowed me to sort numbers much faster in some specific cases.

Understanding how the computer works at its core makes us able to write really fast programs when we need to. This matters, even when we write a high level language like JavaScript.

Image: unsplash-logoTony Webster

Top