So I came up with the idea for this wearable
keyboard. I'll put on a data glove, draw some letters in midair, and it would type them
out over Bluetooth into my wearable computer. I built the hardware, I wrote a training app,
collected a whole bunch of training data, and I TensorFlowed the hell out of it.
54,000 weights, 600 neurons, 15,000 samples, and 500 epochs later, and it worked pretty well!
...on my 60-pound liquid-cooled gaming beast. Here's the problem: the model is 75 megabytes of
data, but the glove only has one megabyte of RAM. The TensorFlow library is 400 megabytes, but the
glove program memory is only 2 megabytes. I had to cram this big neural network into this little
wearable device, and I am going to show you how. Sometimes the cloud is too far away. The
coder is willing, but the processor is weak. It is times like this when you must
do big brain think on little brain. [Bonk!] We gotta bring our machine
learning... to the edge. [Ominous rumbling] I can feel the web developers'
veins bulge as they hammer away on their 60% ortholinear mechanical keyboards, keycaps clacking
against the sustainable bamboo support plate, as they froth at the mouth, spitting flecks of
cold brew against the glossy screens of their 2019 MacBook Pros, screaming, "Use the cloud! USE THE CLOUD!"
[Rumbling intensifies] But that zealotry conceals fear, a primal fear that pricks away at
your sanity in the dead of night... that the philosophy you built your whole life
around may not be as absolute as you think... that sometimes you can solve a problem...
without... using... the Internet. [KABLOOIE!] Californians may not know this, but there are
times when an Internet connection is unreliable or unavailable. The device might not have
the processing power or battery to handle wi-fi. A cell phone to connect with
through Bluetooth might not be available. I mean, even if a solid Internet connection is
there, that 50 milliseconds of round-trip latency could be a game breaker for some user
experiences. The fact of the matter is that some user experiences are
best done completely offline. [SLUUUUUURP!] 'Machine learning at the edge' is pretentious
startup-ese for doing AI inference on an embedded device. An embedded device in this context is
something that isn't a cell phone, a computer, or a server. These devices are often considered to
be too weak to run sophisticated machine-learning algorithms, but bridging that gap is a critical
part of advancing technology. Relying on managed cloud services for your toaster's business logic
is like, it's like carpet bombing a mosquito. It makes a lot more sense to process the model so
you can run it locally on the device using the hardware available. By taking that sophisticated
model and running it locally on the device, you get better scalability, increased efficiency,
and just a superior, more responsive product. Machine learning at the edge works because
training a neural network is a pain in the ass, but running a neural network is not. In order
to train this neural network I had to collect 15,000 pieces of data and then run them all
through the network 500 ee-pocks... eh-picks... 500 times in order to calculate 54,000
weights. But in order to RUN the network, all I need to do is collect one sample, run it
through one time, and then scoop up the output. In a way, machine learning at the
edge is actually a bit of a misnomer, because the learning takes place ahead of
time and all that's being done at the edge is running the model. Once my glorious handmade
60-pound pixel cruncher is finished chooching, what remains is the machine learning model -
the thing that actually makes the decisions. The model trained with 75 megabytes of data,
but the model itself is only like 600 kilobytes. So here's the plan we're gonna start
with a neural network that we create or, y'know, rip off someone's Jupyter notebook,
as well as a big ol' pile of training data. We then run that through TensorFlow Regular,
right on a biggest, most majestic computer we can find. We take that finished model and then
we run it through the TensorFlow Lite processor. This, like, sort of minifies and optimizes the
model, which reduces its size and also lets it run more efficiently, but here's the cool part:
we're not going to run it with TensorFlow Lite. Instead, we're going to take that model
and integrate it into an Arduino sketch and run it using TensorFlow Lite for
Microcontrollers. This is the smallest and most efficient TensorFlow available -
it's only like like 16 kilobytes and it lets us run the neural network on the very limited
processor and fit it into very limited memory. It introduces a lot of gotchas and restrictions,
but if we play by the rules, we'll be able to jam the network and its full functionality into the
glove and we'll have a neural network that we can wave around. As of August 2020, this is
bleeding-edge [REDACTED] and it's got serious limitations. Many activation functions
are broken or just straight missing; dense layers and convolutional layers work great,
but recurrent networks don't. Finally, in order to fit the model on the device and to process it in a
reasonable amount of time, it has to be quantized, which means converting those 32-bit floats that
you usually use as weights down to 8-bit bytes. Dropping everything by 24 orders of magnitude
seems like it'll take your precisely-trained neural network and kick it in the crotch, but
in reality, it's not that big of a deal. Your network should already ignore small variance in
input values, and even if it doesn't, you're not really getting 32 noise-free bits of every
sample. If you are, then you're probably working at a research lab or something, in which
case stop watching YouTube and get back to work, you slacker! You have a virus to sequence! NOPE! Otherwise, the sky's the limit. As
long as your microcontroller has enough memory to store the model and enough oomph
to do a few hundred thousand multiplications in a reasonable amount of time, the model will run
as well on small brain as it does on big brain. [Marching band music]
Let's choose our weapon. Any microcontroller with, like, a
megabyte each of flash memory and RAM should be enough to make this work. NRF52
boards, ESP8266 boards, Cortex processors, they all work great. Your regular-ass Arduinos
and PIC's are probably not powerful enough. [Smashing sounds]
Modern, faster boards like the Arduino Nano 33 are A-okay. Single-board computers like the Raspberry
Pi are powerful enough to use the full-strength TensorFlow Lite, and honestly, with something like
this, you could even just run full-on TensorFlow and make your life way easier. You will have
the overhead of an operating system, so your project might actually be better if you [shattering sounds]
switch to a faster and more real-time microcontroller. This Teensy 4.0 is perfect, and even this Teensy 3.6 would do great. I
used the Teensy 4.0 in the glove project. Now that you've selected your microcontroller,
it's time to stuff a neural network in it! First, we load our Keras model, then
we instance a TFLiteConverter, which will perform optimization such as making
it run faster, making the code smaller, etc. Or we can do neither, because frankly
[laughing] this doesn't really work very well! This is a neat part of the process that's pretty
easy to [REDACTED] up. We need to provide a representative data sheet that contains
at least one instance of every label... [British robot voice] You mean a
representative data SET, not a data SHEET, you dumb bastard. Disliked, deleted
my comment, and unsubscribed. It also needs a distribution of input
values that's similar to real life data. It's really easy to do using this library
SKLearn - we feed our model into it and we have it pull out a stratified sample, which
is a representative sample. The optimizer takes this and generates lookup tables and
scales input and output to make the most of those eight bits of resolution and to make
the most of our available memory. Finally, we run the optimizer, which does the thing
and gives us a TensorFlow Lite-ready model. WOO-HOO! Time to get it into the microcontroller. Setting
up a makefile to take this model and link it into our firmware is just a hell beyond hell,
so we're gonna hack it. We're gonna use the hexdump library, that dumps a binary to hex,
and then I'm just gonna format it into a C++ object declaration. We're gonna save this as a
header file and now we can just drag it and drop it into our firmware sketch.
What? It works! It's time to leave this big-boy IDE for grown-ups
and go into the IDE of chaos and deviltry... let's switch to Arduino. In Arduino, we want to install
the TensorFlow Lite for Microcontrollers for Arduino. We don't want the pre-compiled one; if
you're using anything other than an Arduino-brand Arduino, you're gonna have a bad time.
Let's check out the code. My code is just copied-and-pasted from Google's examples,
and those examples are written in ABSOLUTELY RIGOROUS compliance with Google's
C++ style guide. It's pretty neat! Anyways, let's dive in. First step is to import
the model from that header file we just generated, then instancing an OpcodeResolver. We can kinda
optimize RAM usage by only including functions we actually use, but I'm really lazy, so we're
doing all of 'em. Then, we need an interpreter, and we need some memory for it to work in. We need
to set up an arena, which is a pretty death-metal term for what's basically a pre-allocated
scratch file. There's no, like, hard-and-fast rule to how much memory to allocate, so I'm
gonna start with two kilobytes. We'll dial it up if we're getting buffer overflows, we'll dial
it down if we need more memory. We get a pointer to our input tensor and a pointer to our output
tensors, and we're all revved up and ready to go. This is where the rubber meets the road. Every
loop, the code checks if I've finished doing a gesture, and if I have, it processes it into
a standard format. Pre-processing is really important in TensorFlow Lite. Anything you can
do to reduce the necessary complexity of the model will let you make the most of the limited
resources; we are still constrained on how much model we can actually fit in this thing. All
input to TensorFlow Lite models is flattened; in other words, instead of putting in 50 XY
coordinates, we run them in X, Y, X, Y, X, Y. All that's left is to perform our inference and
let the eldritch gods of cyberspace figure out what I just wrote in midair. And that's it! We just
performed sophisticated gesture recognition on a device that can't even buffer a five-megapixel
image. Compiling this monstrosity takes a while, it takes up, like, 20% of program memory and,
like, 80% of the RAM, and it takes, like, a good five minutes to crunch the first time, but
it's worth it. We can run an inference on this 600-megahertz processor in like 10 milliseconds,
which is AWESOME. I mean, that gives us another 10 milliseconds to faff about and still keep the
thing crisp and responsive. This handwriting recognizing glove totally works, and if you're
interested in the hardware, or you just want to see it in action, I did a whole video on it and
you can check it out right here. Call to action. But what else can you do with this? Try using some
convolutional neural networks to analyze video in real time! Try some real-time image recognition
or audio recognition right there on the device itself! Capture false outcomes and add them to
your training set later! You can even add more memory, like physically add in more flash chips,
to store and work on bigger models. What's cool about TFLite for Microcontrollers is that it
runs the model from program memory... I think? It does.
Don't quote me on this. I will.
Be sure to check back on TensorFlow Lite and TensorFlow
Lite for Microcontrollers often, because they are under very frothy active development, and
new features could be added and new regressions could be introduced at any time. So the next
time you want to learn you some machines, do the big-brain play and put it in a small
package. Just don't cut yourself on that edge. Thanks so much for watching, and
double thanks for watching the whole thing! If you want to look
through my terrible source code, it's all on GitHub. Ravioli ravioli, links
in the descriptioli. I make videos about electronics and the crazy stuff you can do
with them, and if that revs your engine, feel free to give me a subscription and get
notified when the next video is up. Or you could roast my programmer's tan. I've been underground
in New York City lockdown for, like, five months. I know I'm translucent. Anyways thanks a lot
for watching, and I'll see you in the future. [Narrator] it wasn't so long ago that communication was
a simple act but the range of the human voice is limited. So, man's ingenuity found ways
to bridge distance. He invented writing... ...and typographical errors. a representative data sheet
that representative data sheet
representative datasheet You mother [REDACTED]er
Great work.