Bangle.js: a smartwatch with JavaScript and TensorFlow

You probably heard! We hacked a smartwatch to run JavaScript and also put TensorFlow on it, allowing it to do hand-gesture recognition using AI.

My name is Andreas, I developed a model that does hand-gesture recognition and I would like to take you through my journey in this project from start to finish.

If you haven’t read about NodeWatch or Bangle.js yet, check out our previous posts:

Bangle.js smartwatch displaying powered by tensorflow
Working at NearForm is great, and sometimes I get an email like this that makes it awesome.

Hi Andreas,

We need your TensorFlow knowledge to guide Gordon Williams on this year’s “badge” for NodeConfEU. It’s top-secret but he has just got TensorFlow Lite running in Espruino on an NRF52832 so he has basically created TensorFlow Lite JS!

We want to do something with the badge sensor data, Web Bluetooth, and TensorFlow.

Thanks!
Conor

At the time I received this message, there were about 5 weeks to NodeConfEU 2019 where the watch was to be presented.

A model like this shouldn’t be too difficult to make. I estimated that it would take about one week. However, I also knew that there were tons of unknown problems waiting: the “TensorFlow Lite for Microprocessors” is very experimental, there was no data yet, the watch hardware would be changed halfway through, and oh, I didn’t even have the watch myself!

I had also never worked on embedded devices before, I usually work on $10M supercomputers.

Problem description: Use the accelerometer data, then process the data with TensorFlow to classify it as a specific hand-gesture.

ML modelling methodology

In my first machine learning course 6 years ago, I was taught there were always 3 steps in developing a model: Exploration & preparation, modelling, and then evaluation.

The process has changed a bit as machine learning has gone through a paradigm shift and has become much more advanced. In particular, the preparation part is now a much smaller part of the process and exploration is less relevant although often still worthwhile. However, I think the overall methodology still holds.

So let’s go through each of the steps.

Exploration

The acceleration data is measured in three directions (x,y,z). Acceleration itself can be hard to make sense of but can, in theory, and with some assumptions, be turned to either velocity or relative position though discrete (trapezoidal) integration.

Unfortunately, acceleration metrics are relative to the watch. Meaning if the watch is rotated but moved in the same direction, the acceleration is measured along a different axis. This means, that without knowing the orientation of the watch, it is not possible to know the position of the watch.

For this reason, most phones and other biometric devices have a gyroscope which provides the orientation. However, let’s be honest, this is an inexpensive watch that doesn’t include a gyroscope.

Even so, everything is not lost. It does mean, however, that we have to rely purely on machine learning modelling to do the heavy lifting.

Acceleration data: Shows the acceleration data in the x,y,z directions for each hand-gesture. The velocity is computed by integrating once, and the position is computed by integrating twice.

Modelling

While quite a few different types of models could likely solve this problem, the framework we have available is TensorFlow. As such, it makes sense to use a neural network to solve the task.

The first idea was to use a convolution layer followed by an LSTM layer. I won’t go into the mathematical details here, but the general idea is that the convolution layer can learn things such as double trapezoidal integration which provides the position. While the LSTM layer can perform signal processing and time-wise aggregation of the data.

However, it turns out that “TensorFlow Lite for Microprocessors” at this point only supports convolution-type layers, so the LSTM layer was replaced with another convolution layer (dilated convolution) to perform the signal processing as well as a global-max-pool layer for the time-wise aggregation.

from tensorflow.keras import keras
 
# defining the model
model = keras.Sequential()
model.add(keras.Input(shape=(50, 1, 3), name='acceleration'))
# First convolution layer
model.add(keras.layers.Conv2D(14, (5, 1),
                              padding='valid', activation='relu'))
model.add(keras.layers.Dropout(0.2))
# Second convolution layer (dilated) 
model.add(keras.layers.Conv2D(10, (5, 1),
                              padding='same', activation='relu',
                              dilation_rate=2))
model.add(keras.layers.Dropout(0.1))
# Global Max-Pool layer
model.add(keras.layers.MaxPool2D((46, 1)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(len(dataset.classnames), use_bias=False))
 
# fitting the model
model.compile(optimizer=keras.optimizers.Adam(),
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=[keras.metrics.SparseCategoricalAccuracy()])
 
model.fit(dataset.train.x, dataset.train.y,
          batch_size=64, epochs=500,
          validation_data=(dataset.validation.x, dataset.validation.y))

Evaluation

Correct evaluation is probably the most important part of machine learning. Done incorrectly, it will lead to false beliefs about the accuracy of the model. Unfortunately, it is also the most overlooked part.

The most important part is to separate out a piece of the data — that neither you nor the model has ever seen — and only use it once you have great confidence in the model. This is to simulate reality, where new, never-seen-before data is coming in all the time.

The moment you have used this separate data you are likely to make modelling choices based on that data, and then the data has implicitly been used to build the model, thereby no longer representing reality where never seen before data is always used.

              precision    recall  f1-score   support

  swiperight     1.0000    0.9574    0.9783        47
   swipeleft     1.0000    1.0000    1.0000        66
        upup     0.9545    1.0000    0.9767        42
      waggle     1.0000    1.0000    1.0000        39
       clap2     0.9444    0.9714    0.9577        35
      random     1.0000    0.9167    0.9565        12

    accuracy                         0.9834       241
   macro avg     0.9832    0.9743    0.9782       241
weighted avg     0.9840    0.9834    0.9834       241

Model accuracy: Measures the classification accuracy using precision, recall, and F1. Look at the F1-score for the most encompassing score.

Far more can be done regarding evaluation. The best scenario would be to evaluate the model on data from different subjects than those used to train the model. This is to ensure the model is sufficiently general.

Due to secrecy and time, we did not have the opportunity to do that and instead chose to individualize the model to each person. This also fit well with the workshop which focused on allowing our participants to train the model with their own gestures.

TensorFlow on embedded devices

The model has to run on a tiny System-on-Chip (SoC). With everything else also running on the hardware, there is about 10kb RAM left for the model. The hardware also only has one floating-point-unit (FPU), meaning that while SOC can compute using float32 numbers it won’t be fast.

Fortunately, TensorFlow Lite supports a lossy compression process called quantization. The quantization method used in the watch is an int8 quantization, meaning that all float32 weights and intermediate results are replaced by int8 numbers.

This conversion can be done with the following python code.

    converter = tf.lite.TFLiteConverter.from_keras_model(model)
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.representative_dataset = lambda: (
        [obs] for obs in dataset.validation.x[
            np.random.permutation(dataset.validation.x.shape[0]),
            np.newaxis,
            ...]
    )
    with open(‘fmodel.tflite’, "wb") as fp:
        fp.write(converter.convert())

I’m sparing you all the painful details here. The initial model only took a few days to make, however getting it to run on the watch was a huge technical challenge full of segfaults and meaningless errors.

“TensorFlow Lite for Microprocessors” is still extremely experimental and I ended up having to write several patches in C++ for it.

This lossy compression does come with some cost, however computing the model accuracy again on the quantized model, the accuracy cost turns out to be fairly small.

 
              precision    recall  f1-score   support

  swiperight     0.9020    0.9787    0.9388        47
   swipeleft     0.9844    0.9545    0.9692        66
        upup     0.9767    1.0000    0.9882        42
      waggle     0.9286    1.0000    0.9630        39
       clap2     1.0000    0.8857    0.9394        35
      random     1.0000    0.8333    0.9091        12

    accuracy                         0.9585       241
   macro avg     0.9653    0.9421    0.9513       241
weighted avg     0.9610    0.9585    0.9583       241

Quantized model accuracy: Measures the classification accuracy using precision, recall, and F1. Look at the F1-score for the most encompassing score. Results show that accuracy drops about 3% due to quantization.

Another key challenge we had involved the smartwatch’s limited 64k of available memory.

TensorFlow requires that the model be loaded in a single contiguous block of memory; the way Espruino works on the watch hardware, however, leads to frequent memory fragmentation that prevents a large enough block of contiguous memory to be dedicated to TensorFlow.

There is also not enough free memory to keep the model loaded all the time. These limitations limited the gesture recognition to only one or two recognitions before Espruino would begin reporting memory errors.

It was possible to work around the issue by ensuring that the model was loaded into memory immediately before each gesture recognition event, and ensuring that memory was defragmented immediately before loading the model.

The end result is that recognition takes slightly longer to complete but performs much more reliably.

Trying it yourself

If you’d like to see it all in action, head over to the GitHub repo and use the Jupyter notebook either locally or directly in Google Colab.

You can take the final output and save it directly on a Bangle.js using the Espruino Web IDE.

Adding your own gesture

To add your own gestures, the simplest thing to do is to fork the repo. Then run this code on the Bangle.js to record your gesture data. Take the console output from that and save it in the /data/extra-v2 directory of the repo. Edit the Jupyter Notebook to refer to the new gesture. Then run the Notebook to train the model and output the JS code with the model.

You can then create your own variation of the Gesture Sample App to use the updated model and new gesture name.

If you then want to use the new gesture to control a PC or phone, you can easily modify the HID Keyboard Control sample app or HID Music Control sample app. As you can see, it’s trivially easy to use the TF gestures in your JS code:

Bangle.on('aiGesture', (v) => {
      switch (v) {
        case 'swipeleft':
          E.showMessage('next');
          setTimeout(drawApp, 1000);
          next(() => {});
          break;
        case 'swiperight':
          E.showMessage('prev');
          setTimeout(drawApp, 1000);
          prev(() => {});
          break;
      }
    });

Conclusion

The hardest part was getting things to work on TensorFlow Lite for Microprocessors for sure. The modelling itself was actually easier than I expected.

The good news is that we are working closely together with the TensorFlow team on solving the issues we faced, such that deploying TensorFlow to IoT devices becomes easier.

Andreas is based in Denmark and specialises in Natural Language Processing and Deep Learning. He is one of the few outside of Google and OpenAI that have published in Distill Research Journal and is currently in peer-review for ICLR and NeurIPS, the two most important AI conferences. Besides academic achievements, he contributes regularly to Node.js, TensorFlow, and TensorFlow.js; has implemented several search engines for the industry, and is the mind behind the Machine Learning algorithms used in Clinic.js.

You can connect with Andreas on LinkedIn and on GitHub.

Don’t miss a beat

Get all the latest NearForm news, from technology to design.
Sign Up
View all posts  |  Technology  |  Business  |  Culture  |  Opinion  |  Design
Follow us for more information on this and other topics.
Published by Andreas Madsen