TinyML: Getting Started with STM32 X-CUBE-AI | Digi-Key Electronics

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Applause] [Music] [Applause] back in 2019 STMicroelectronics released their stm32 cube AI suite of tools which were designed to help people get started creating tiny machine learning applications on their microcontrollers specifically I want to show you how to get started using the x-cubed AI tool the X cube AI tool lets us take a neural network model that's been trained in something like Karros tensorflow or onyx and deploy it on an stm32 microcontroller since microcontrollers are usually quite limited in resources the X cube AI tool will walk us through the process of converting the model and ensuring that it can fit on our chosen system from there we can use functions in X cube AI to run inference using our model inference is the process of running new unseen data through our machine learning model for example if we had a model that was trained to recognize pictures of cats we might feed it a new photo and it would tell us whether or not it thinks there's a cat in that photo please note that the X cube AI tool and library are proprietary which from what I can tell is how st keeps it unique to just the stm32 line of microcontrollers for this video I'm gonna show you how to use the X cube ai tool using a pre trained neural network we will use the tensor flow light neural network we made back in the intro to tiny ml episode 1 please check that video out if you need a refresher in it I show you how to train a 3 layer neural network using Karros and tensor flow in Google collab the neural network just predicts the output of the sine function this is an awful way to create a sine wave in a microcontroller but it's a good demonstration for our purposes I have to give Pete worden and the tensor flow team all the credit for coming up with this example once we have the model trained we'll want to test inference by giving it some numbers I'll be looking at the input of 2 which gives us an output of 0.9 0 5 9 this should be close to sign of 2 which is about point nine zero nine three I'll save the Karros model as a dot H file and then use the TF light converter dot from Charis model function to convert it to a tensor flow light file I've had some issues loading chaos model files into X cube AI but tensorflow light files seem to work just fine note that we optimized the model for size but we're still doing everything with floating point values there is a way to quantize the models inputs/outputs weights and bias terms to 8 bits to save even more space but I'm not going to get into that here I wrote a quick function to convert the TF light file into a c byte array this is used for tensorflow light on microcontrollers but I'm showing it here so that we can do an apples-to-apples comparison between X cube ai and tensorflow light using the exact same model file we only need the dot TF light file but I'll download all three versions of the model anyway you can use net Tron to look at the chaos and tensorflow model files to help you verify the input and output formats along with how the layers are connected now that we have our model file we can import it into the X cube ai tool and use it to start running inference on our microcontroller I'll be using an stm32 l43 2kc nucleo board for this demo as it's small but packed with an arm cortex-m for processor I recommend checking out my other stm32 videos if you'd like to see how to get started with these micro controllers in stm32 cube ide go to help manage embedded software packages go to the stmicroelectronics tab and click the drop down arrow next to X cube a I check the newest artificial intelligence package which is 5.1 for me and click install now accept the license agreement and let the Installer download when it's done close out of that window click file new stm32 project if you just enabled the X cube ai package the IDE will download and install the rest of the add-on when that's done you should be presented with the target selection window you should see an artificial intelligence filter option added to the left pane if you enable it it should filter out all of the microcontrollers that don't support the x-cube AI library this can help if you are trying to figure out which processor to use because I'm using a new Clio dev board I'm going to navigate to the board selector tab and search for my l43 2kc select your board and give the project a name I like to put the name of the board or processor I'm using at the front of the project name since the ide creates projects unique to that processor click finish and the IDE should present you with the cube MX interface first I'm going to enable timer 16 and set the prescaler so that it ticks once every microsecond this will help us measure how long inference takes with X cube AI don't forget to set the counter period which I'll set to the maximum possible 16-bit value click on additional software and select the artificial intelligence package click on the drop down arrow and enable the core component for X cube AI note that you can also have cube MX generate a premade performance or validation program as well as an example application template I'm not a fan of the application template as it forces you to modify a file outside of main dot C making it more difficult to pass data around in this simple demo so I'll leave it as not selected click OK and you should see an additional software category appear in the left pane click on X cube AI under that category and you should see the configuration pane up here next to it click Add Network and select TF light from the network type click browse and open the dot TF light file you downloaded from Co lab I'll rename the model sign underscore model there's a few things you can do from here the first is analyze the model which will give you some information about its relative complexity how much RAM it needs and number of multiply and accumulate or MACC operations it requires if you click on the gear icon you can see that you have the option to use external flash and RAM chips this can be very helpful if you're running a large model but run out of space on your particular microcontroller you can also click on show graph to give you a visual representation of your model similar to what net ron provides validate on desktop runs a few tests with the model on your computer to make sure it works if you upload the pre-made validation program I showed you earlier you can run the validate on target test to make sure the model is running correctly on the microcontroller head to the main tab to get some information about your neural network note that it's possible to load more than one model at a time assuming you have enough flash and RAM for it you should also be aware that when you use the x-cube AI library it will automatically enable the cyclic redundancy check peripheral I'm going to head to the clock configuration tab I'll change the input clock source to the high-speed internal oscillator and set the main CPU clock to the maximum of 80 megahertz when I press the Enter key cube MX will perform some magic calculations to adjust the pre scalars so that I can get an 80 megahertz main clock click file save to generate code you should see that an X cube ai directory has been added to your project in app take a look at sign underscore model C this houses the functions that we need to use to initialize our model and run inference sign model data C is our neural network it's just a big raw byte array that our X cube AI library knows how to interpret let's open main dot C I'm not going to go into great detail with the code but I'll make sure there's a link to it in the description if you want to dig into it first let's include the C standard input/output library so we can use S print F next will include the AI defines and platform headers which contain some functions that we'll need then we include the model files that I showed you earlier note that the names of these files will change depending on how you named your model in main I'll declare some variables that will help us out such as a buffer for our serial output strings error codes and a timestamp we'll need to create a buffer in memory that stores intermediate calculations for the neural network notice that I'm using predefined constants here for the array sizes which were generated by the X cube AI tool next I'll create the input and output tensor arrays for this particular model we only need one element in each of these arrays but your model might need more elements so pay attention to that we'll also create a pointer for our model and a couple of structs that point to our input and output data along with some metadata then we'll create a parameter struct that points to our model and working memory we need to configure the wrapper strux here it seems that you can run inference multiple times with different data in one go but we just want to do it once per run so we'll leave batches at one we also want the data pointer to point to our input and output buffers after our auto-generated initialization code we'll start the timer and print out something over the serial port to make sure it's working then we'll create an instance of the neural network using the AI sign model create function note that once again this function name will change if you change the model label in the x-cube ai configuration phase if we get an error from that will simply print a message and stall the processor we then call the anit function which uses our models label again at this point we're ready to start making inferences in the main while loop fill the input buffer with our input data I'll put it inside a for loop so you can see how you might fill up a larger array but this neural network has only one number as an input so we just assign the floating point number 2.0 to our first array element for testing I'll get the current timestamp from the timer then we perform inference with our model using the AI sign model run function and give it the input and output buffers along with our model when it's done we convert the loan element output to a floating-point value and store it in a variable we print out that variable along with the time it took to perform inference to the serial terminal I'll add a 500 millisecond delay here so that we actually get a chance to read the serial output you might notice that S print F does not want to print floats so we need to go into project properties C C++ build settings tool settings MCU GCC linker miscellaneous in other flags add the dash u printf float flag repeat this for the release configuration I've had some issues with the cube MX tool getting caught in a loop during the build stage so I recommend closing the dot IOC file here Save Changes and generate code if asked so long as you didn't write any code outside of the user code comment blocks you should be okay build the project when it's done this is a good time to look at the resource requirements for this project take a look at the output of the ganoush size tool from what I've gathered you add the text and data fields together to get the flash usage which is about 33,000 bytes for this program then add data and VSS together to get the estimated RAM usage which is about 5,000 bytes let's run the program in debug mode and open our serial terminal we should see the output of the neural network which is about 0.9 zero five nine and it should match what we saw in Google collab at the beginning of this video also notice that it takes about 78 microseconds to run inference with this model just to check let's set our build to release and rebuild the project will create a new run configuration and assign it the release elf file we just built we'll set the build configuration to release and run it I sometimes run into issues switching from debug mode to run mode so you might need to click run run to try uploading again let's take a look at the resource requirements again as it's a little different for the release configuration while the RAM usage is about the same it only needs about 28,000 bytes of flash this time once the program has been uploaded open a serial terminal and you should see the same output from the neural network interestingly enough we seem to have shaved off about one microsecond by switching to a release build so how does this compare to tensorflow light for microcontrollers in the previous video I ran inference with the same neural network model on the same microcontroller but using tensorflow Lite with the release configuration I found that it took up about 50,000 bytes of flash and about 4700 bytes of RAM it also took about 104 microseconds to run inference in both cases we used the same machine learning model it's a fairly simple neural network with three dense layers it's got 321 parameters and we've done no quantization or compression to it tensorflow light needed about 50,000 bytes of flash and about 4700 bytes of RAM it took about 104 microseconds to run inference x-cube a I needed only about 28,000 bytes of flash but 4900 bytes of RAM it used the processor for about 77 microseconds during inference by using X cube AI we saved over 40 percent flash space and saw a tiny increase in RAM which could possibly be attributed to some of our serial debugging strings it appears that X cube a I also ran inference around 26% faster however the big roadblock for some folks is that while tensorflow light is open source X cube ai is a proprietary library that only runs on stm32 processors if you're developing a machine learning application for an stm32 you'll want to seriously consider using the X cube ai tool for the performance boost that you get but keep in mind that you run the risk of st potentially dropping support for it in the future this is always a possibility when dealing with closed source tools and libraries however I hope this has helped you get started creating your own machine learning projects for microcontrollers with that please subscribe if you'd like to see more videos like this and happy hacking [Music]

Info

Channel: Digi-Key

Views: 13,294

Rating: undefined out of 5

Keywords: DigiKey, machine learning, ai, edge ai, neural network, TinyML, IoT, STM32, STM32CubeIDE, X-CUBE-AI, TensorFlow

Id: crJcDqIUbP4

Channel Id: undefined

Length: 15min 19sec (919 seconds)

Published: Mon Jul 20 2020