Small Brain, Big Think: AI on the Edge

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Great work.

👍︎︎ 1 👤︎︎ u/salien031995 📅︎︎ Dec 19 2020 🗫︎ replies
Captions
So I came up with the idea for this wearable  keyboard. I'll put on a data glove, draw some   letters in midair, and it would type them  out over Bluetooth into my wearable computer.   I built the hardware, I wrote a training app,  collected a whole bunch of training data,   and I TensorFlowed the hell out of it.  54,000 weights, 600 neurons, 15,000 samples,   and 500 epochs later, and it worked pretty well! ...on my 60-pound liquid-cooled gaming beast.   Here's the problem: the model is 75 megabytes of  data, but the glove only has one megabyte of RAM.   The TensorFlow library is 400 megabytes, but the  glove program memory is only 2 megabytes. I had   to cram this big neural network into this little  wearable device, and I am going to show you how. Sometimes the cloud is too far away. The  coder is willing, but the processor is weak.   It is times like this when you must  do big brain think on little brain.   [Bonk!] We gotta bring our machine  learning... to the edge. [Ominous rumbling] I can feel the web developers'  veins bulge as they hammer away on their 60%   ortholinear mechanical keyboards, keycaps clacking  against the sustainable bamboo support plate,   as they froth at the mouth, spitting flecks of  cold brew against the glossy screens of their 2019   MacBook Pros, screaming, "Use the cloud! USE THE CLOUD!" [Rumbling intensifies] But that zealotry conceals fear, a primal fear that pricks away at  your sanity in the dead of night...   that the philosophy you built your whole life  around may not be as absolute as you think...  that sometimes you can solve a problem... without...  using... the Internet. [KABLOOIE!] Californians may not know this, but there are  times when an Internet connection is unreliable   or unavailable. The device might not have  the processing power or battery to handle   wi-fi. A cell phone to connect with  through Bluetooth might not be available.   I mean, even if a solid Internet connection is  there, that 50 milliseconds of round-trip latency   could be a game breaker for some user  experiences. The fact of the matter is   that some user experiences are  best done completely offline. [SLUUUUUURP!] 'Machine learning at the edge' is pretentious  startup-ese for doing AI inference on an embedded   device. An embedded device in this context is  something that isn't a cell phone, a computer,   or a server. These devices are often considered to  be too weak to run sophisticated machine-learning   algorithms, but bridging that gap is a critical  part of advancing technology. Relying on managed   cloud services for your toaster's business logic  is like, it's like carpet bombing a mosquito. It   makes a lot more sense to process the model so  you can run it locally on the device using the   hardware available. By taking that sophisticated  model and running it locally on the device,   you get better scalability, increased efficiency,  and just a superior, more responsive product.   Machine learning at the edge works because  training a neural network is a pain in the ass,   but running a neural network is not. In order  to train this neural network I had to collect   15,000 pieces of data and then run them all  through the network 500 ee-pocks... eh-picks...   500 times in order to calculate 54,000  weights. But in order to RUN the network,   all I need to do is collect one sample, run it  through one time, and then scoop up the output. In a way, machine learning at the  edge is actually a bit of a misnomer,   because the learning takes place ahead of  time and all that's being done at the edge   is running the model. Once my glorious handmade  60-pound pixel cruncher is finished chooching,   what remains is the machine learning model -  the thing that actually makes the decisions.   The model trained with 75 megabytes of data,  but the model itself is only like 600 kilobytes. So here's the plan we're gonna start  with a neural network that we create or,   y'know, rip off someone's Jupyter notebook,  as well as a big ol' pile of training data.   We then run that through TensorFlow Regular,  right on a biggest, most majestic computer we   can find. We take that finished model and then  we run it through the TensorFlow Lite processor.   This, like, sort of minifies and optimizes the  model, which reduces its size and also lets it   run more efficiently, but here's the cool part:  we're not going to run it with TensorFlow Lite.   Instead, we're going to take that model  and integrate it into an Arduino sketch   and run it using TensorFlow Lite for  Microcontrollers. This is the smallest   and most efficient TensorFlow available -  it's only like like 16 kilobytes and it lets   us run the neural network on the very limited  processor and fit it into very limited memory. It introduces a lot of gotchas and restrictions,  but if we play by the rules, we'll be able to jam   the network and its full functionality into the  glove and we'll have a neural network that we   can wave around. As of August 2020, this is  bleeding-edge [REDACTED] and it's got serious   limitations. Many activation functions  are broken or just straight missing;   dense layers and convolutional layers work great,  but recurrent networks don't. Finally, in order to   fit the model on the device and to process it in a  reasonable amount of time, it has to be quantized,   which means converting those 32-bit floats that  you usually use as weights down to 8-bit bytes.  Dropping everything by 24 orders of magnitude  seems like it'll take your precisely-trained   neural network and kick it in the crotch, but  in reality, it's not that big of a deal. Your   network should already ignore small variance in  input values, and even if it doesn't, you're not   really getting 32 noise-free bits of every  sample. If you are, then you're probably   working at a research lab or something, in which  case stop watching YouTube and get back to work,   you slacker! You have a virus to sequence! NOPE! Otherwise, the sky's the limit. As  long as your microcontroller has   enough memory to store the model and enough oomph  to do a few hundred thousand multiplications in   a reasonable amount of time, the model will run  as well on small brain as it does on big brain. [Marching band music] Let's choose our weapon.   Any microcontroller with, like, a  megabyte each of flash memory and RAM   should be enough to make this work. NRF52  boards, ESP8266 boards, Cortex processors,   they all work great. Your regular-ass Arduinos  and PIC's are probably not powerful enough.  [Smashing sounds] Modern, faster boards like the Arduino Nano 33 are   A-okay. Single-board computers like the Raspberry  Pi are powerful enough to use the full-strength   TensorFlow Lite, and honestly, with something like  this, you could even just run full-on TensorFlow   and make your life way easier. You will have  the overhead of an operating system, so your   project might actually be better if you [shattering sounds] switch to a faster and more real-time microcontroller.  This Teensy 4.0 is perfect,   and even this Teensy 3.6 would do great. I  used the Teensy 4.0 in the glove project.  Now that you've selected your microcontroller,  it's time to stuff a neural network in it! First,   we load our Keras model, then  we instance a TFLiteConverter,   which will perform optimization such as making  it run faster, making the code smaller, etc.  Or we can do neither, because frankly  [laughing] this doesn't really work very well!  This is a neat part of the process that's pretty  easy to [REDACTED] up. We need to provide a   representative data sheet that contains  at least one instance of every label... [British robot voice] You mean a  representative data SET, not a data SHEET,   you dumb bastard. Disliked, deleted  my comment, and unsubscribed. It also needs a distribution of input  values that's similar to real life data.   It's really easy to do using this library  SKLearn - we feed our model into it and we   have it pull out a stratified sample, which  is a representative sample. The optimizer   takes this and generates lookup tables and  scales input and output to make the most of   those eight bits of resolution and to make  the most of our available memory. Finally,   we run the optimizer, which does the thing  and gives us a TensorFlow Lite-ready model.  WOO-HOO! Time to get it into the microcontroller. Setting  up a makefile to take this model and link it   into our firmware is just a hell beyond hell,  so we're gonna hack it. We're gonna use the   hexdump library, that dumps a binary to hex,  and then I'm just gonna format it into a C++   object declaration. We're gonna save this as a  header file and now we can just drag it and drop   it into our firmware sketch. What? It works! It's time to leave this big-boy IDE for grown-ups  and go into the IDE of chaos and deviltry... let's   switch to Arduino. In Arduino, we want to install  the TensorFlow Lite for Microcontrollers for   Arduino. We don't want the pre-compiled one; if  you're using anything other than an Arduino-brand   Arduino, you're gonna have a bad time. Let's check out the code. My code is   just copied-and-pasted from Google's examples,  and those examples are written in ABSOLUTELY   RIGOROUS compliance with Google's  C++ style guide. It's pretty neat!  Anyways, let's dive in. First step is to import  the model from that header file we just generated,   then instancing an OpcodeResolver. We can kinda  optimize RAM usage by only including functions   we actually use, but I'm really lazy, so we're  doing all of 'em. Then, we need an interpreter,   and we need some memory for it to work in. We need  to set up an arena, which is a pretty death-metal   term for what's basically a pre-allocated  scratch file. There's no, like, hard-and-fast   rule to how much memory to allocate, so I'm  gonna start with two kilobytes. We'll dial it   up if we're getting buffer overflows, we'll dial  it down if we need more memory. We get a pointer   to our input tensor and a pointer to our output  tensors, and we're all revved up and ready to go. This is where the rubber meets the road. Every  loop, the code checks if I've finished doing a   gesture, and if I have, it processes it into  a standard format. Pre-processing is really   important in TensorFlow Lite. Anything you can  do to reduce the necessary complexity of the   model will let you make the most of the limited  resources; we are still constrained on how much   model we can actually fit in this thing. All  input to TensorFlow Lite models is flattened;  in other words, instead of putting in 50 XY  coordinates, we run them in X, Y, X, Y, X, Y.  All that's left is to perform our inference and  let the eldritch gods of cyberspace figure out what   I just wrote in midair. And that's it! We just  performed sophisticated gesture recognition on   a device that can't even buffer a five-megapixel  image. Compiling this monstrosity takes a while,   it takes up, like, 20% of program memory and,  like, 80% of the RAM, and it takes, like,   a good five minutes to crunch the first time, but  it's worth it. We can run an inference on this   600-megahertz processor in like 10 milliseconds,  which is AWESOME. I mean, that gives us another 10   milliseconds to faff about and still keep the  thing crisp and responsive. This handwriting   recognizing glove totally works, and if you're  interested in the hardware, or you just want to   see it in action, I did a whole video on it and  you can check it out right here. Call to action. But what else can you do with this? Try using some  convolutional neural networks to analyze video in   real time! Try some real-time image recognition  or audio recognition right there on the device   itself! Capture false outcomes and add them to  your training set later! You can even add more   memory, like physically add in more flash chips,  to store and work on bigger models. What's cool   about TFLite for Microcontrollers is that it  runs the model from program memory... I think?  It does. Don't quote me on this.  I will. Be sure to   check back on TensorFlow Lite and TensorFlow  Lite for Microcontrollers often, because they   are under very frothy active development, and  new features could be added and new regressions   could be introduced at any time. So the next  time you want to learn you some machines,   do the big-brain play and put it in a small  package. Just don't cut yourself on that edge. Thanks so much for watching, and  double thanks for watching the   whole thing! If you want to look  through my terrible source code,   it's all on GitHub. Ravioli ravioli, links  in the descriptioli. I make videos about   electronics and the crazy stuff you can do  with them, and if that revs your engine,   feel free to give me a subscription and get  notified when the next video is up. Or you could   roast my programmer's tan. I've been underground  in New York City lockdown for, like, five months.   I know I'm translucent. Anyways thanks a lot  for watching, and I'll see you in the future. [Narrator] it wasn't so long ago that communication was  a simple act but the range of the human voice   is limited. So, man's ingenuity found ways  to bridge distance. He invented writing... ...and typographical errors. a representative data sheet that representative data sheet representative datasheet You mother [REDACTED]er
Info
Channel: Zack Freedman
Views: 402,650
Rating: undefined out of 5
Keywords: ai, machine learning, neural networks, deep learning, teensy, arduino
Id: iTj0lcVSIVU
Channel Id: undefined
Length: 12min 40sec (760 seconds)
Published: Mon Sep 07 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.