Writing fast and efficient MicroPython

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello all and welcome back to the IOT many conf I'm sure our next presenter needs no introduction but I'll do one anyway this is Damian Georgia creator of micro Python here to talk about how to write fast and efficient micro Python thank very much Damian please welcome to the stage okay thank you thanks everyone for coming to my talk it's great to be this year so yeah I'm Damian George about five years ago I started writing micro Python I think it was the 29th of April 2013 I wrote the first line of code and I actually wrote the date down when I wrote it so I don't know so I would remember I like to date things but so more than five years ago I started writing it and it's come a long way and probably a lot of you have heard me talk before but if not I'll just give a brief brief overview of what micro Python is for those people who are new to it so it's a complete re implementation of Python so from the ground up written in C and the idea when I was writing it was to not look at C Python and how it was written but just look at the language specification itself and try and copy the language specification implement that and so that way I could generate new ideas about how to solve the same problem in a different way as in to implement Python and the way micro path is written is so that it doesn't use resources it doesn't use Ram and it doesn't it tries to be efficient with the way it does think so it can run in really really tiny systems like with 16 kilobytes of memory and so the key thing is that when you log in to your microcontroller you get a Python prompt and you can blink LEDs with one line of Python code in it's an immediate response there's no compilation and flashing of firmware like you're familiar with normal microcontroller programming so it's quite fun and immediate and easy to debug things but you can also write quite complicated app patience and there's lots of hardware that you could try it out on I mean you can also try it out online with the simulator and download the code and and have a fun with it so my talk today is going to be a little bit technical and a little bit fun and some few demos so I have a background in theoretical physics and in physics there are the concepts of time and space and energy and if you try and extend those to computing and stuff so time it's like you want to make things go fast space is like storage like hard drive storage and memory so you want to reduce that and the energy use where you all know about energy and how expensive it is and how you want to sort of reduce energy use because if we have a million or billion IOT devices and they use a nano amp that's really good but if they use an amp you know that's a lot of power so any optimization you can do is good for the environment that's my son would like to say so the aim of this talk is to just give a little bit of overview of some technical details of micro Python and how they relate to efficiency and give some taste on that that insight gives some tips and tricks on how to make things faster if you're writing code and some of these tricks will also be useful for normal Python and eight even programming in general and then there's a few examples just to sort of motivate this whole this whole thing and I'll start with those two examples so the first one the very simple thing the hello world of microcontrollers is to bring blinking LED on and off and in the top little black box there of demo code so this will actually work so it's creating an LED the blue LED and then it's doing a simple loop so end times so you have to feed in the parameter n and just turning the LED on and off so it's quite straightforward about what it's doing there and if you run this on a on a on a PI board you you can blink it about 50 kilohertz so that's 50,000 times per second this loop can run so that as in it will go on 50,000 times in or 50,000 times so the loop runs 50,000 times per second but we want to make that go faster and so by the end of this talk I'll show you how to make that go 500 times faster and the second example is to read a file so it's pretty silly it's just reading data from a file there so it's going through and reading a thousand bytes at a time n times and then sneaking back to the start of the file just a time itself and that currently just as it is runs at about 1 & a half megabytes per second and reading and we can get that up to about 20 megabytes per second with a very simple optimization so keep that in mind so the motivation this talk is to make these things run faster and I'll sort of go into some technical details and we'll come back to making those speed up at the end so this is a diagram of how micro Python works internally and don't try and absorb the whole thing just sort of understand that on the there's a compiler so on the left is sort of the compilation phase so it takes your script which can come from one of those red boxes you've either type it in it'll prompt or you have a script in a file and the lexer turns it into tokens which gets passed into a parse tree and then compiled and the compiler can can emit bytecode which is run on a virtual machine or it can even emit machine instructions so that's you know from JavaScript perhaps you've heard about just-in-time compilation and making JavaScript really fast they do that by making machine code so micro Python can do that as well although it's ahead of time compilation and I'll touch on that a little bit at the end and then the in the top right that orange box represents all the runtime support functionality like you know searching if a string ends with something or splitting a string up or look searching in a dictionary so all that sort of gory details it's all written in C it's pretty boring but it's the guts of everything so as I said we have a compiler and just reads account input string character by character so that means it doesn't actually load stuff into memory if there's a file on the disk it just reads it in one byte at a time so it doesn't waste memory loading at all at once but once you've run the compiler it can leave memory fragmented because there are some structures it has to memory so that's one thing to keep in mind and also any variable names that you use are left in RAM as well so the more variable names you have and the longer they are the more memory you'll be using at the end so one way to optimize that is to use shorter variable names and reuse them and when we generate bytecode we also use Ram obviously to generate that bytecode you can't really get around that and as I said we can generate machine code which will make things go faster and also I'll show you how to do that and also inline assembler is is something if you really want to go very fast and I'll show you how that will work with the LED example so these are all of the byte codes that micro pathid has when it compiles the script now you don't have to understand all these but this is a list of them all so you can see that you know there's a finite number and they're not that complicated a lot of them are to do with loading like objects false none true at the top left there so when you have false in your code that is corresponds to one byte code so one byte to load false and then do something with it in an expression down the bottom in the middle is like call function call method so there's a bytecode to actually do that and that's split up into one that calls simple functions and one that calls more complicated functions with key word variable keyword arguments like when you use star or star star operators and then you know there's a bunch of other things there but all these things here can implement all of Python and to show you a bit more of a concrete example so here's some script on the left and bytecode on the right and you can see this exactly what it's doing here so it's loading the print loading a string sleep now load fast zero means that it's loading the variable D because that's the zero local variable and then it calls the function so that calls print that first thing it loaded there and then the result doesn't care what the result is not using it so it pops out off the stack and then it does the next the next line which is gets time loads to sleep action loads d again calls the sleep function discards the value with the pop top and then the last two lines of returning none from this function so that you can see that takes two bytes to return none so this whole function is 22 bytes in memory and this kind of shows the sort of the differences in speed that you can get from certain things so loading a global like print is slow because it has to go into the global dictionary and search for print but if you look at the third line low fast zero this is just a one byte instruction and it knows that D is the first local variable on the stack so it can get it straight away there's no searching through a dictionary so loading local variables and storing two local variables is a fast thing to do but global variables are slow so that's probably the most important thing when you want to optimize your code is to use local variables and not global variables and the other thing is that it's slow to load methods and functions so down the middle there load method sleep so you've got the time module I'm looking for sleep so I had that also has to be a dictionary search so it's got to go through a dictionary of the time module and look for the sleep function and even though it's using hashing and it's sort of order one it's still slow relatively and so these things here can be optimized by sort of pre loading any methods that you want to use over and over again if you want to make it faster so yeah this is kind of a good example of some real-life things that can help you make code make make your code go faster so the other thing so this is this is a long slide but try don't get overwhelmed it's just about memory allocation which is really important in in microcontrollers because when you allocate memory it's slow because you've got to go find some memory and if you can't find it you go to do a garbage collection so that can take you know five milliseconds or something which is a long time when you want to flash an LED millions of times per second so the best thing to do is to try not to allocate memory when you can and to know how to not allocate memory you've got to know what thing don't allocate memory so a little expression like one plus two times X and so on expressions don't need to allocate memory because they use the stack and all of the basic comments in Python as in when in micro Python implements them like if and while and for statements and try accept finally they don't need to allocate any memory on the heap so they're all relatively fast so you can do a lot of stuff without allocating memory there so this is heap memory it still used the stack but the stack has been pre allocated for the function as I said before local variables because you know a local variable where it lives it's easy to store and load local variables small integer arithmetic if you're using less than 31 bits that doesn't need to allocate on the heap and then for things like less than Dix and byte arrays if you do in-place operations like your sub scripting a dictionary or sub scripting a list or even sorting a list in place those things don't need to allocate heap memory so that's also good to know so you can do quite a lot of things if you've pre created things like lists and dictionaries if you call functions and methods and you don't use the star or star star operators then that also doesn't allocate memory so as soon as you go into the land of variable keyword and variable positional arguments then you start to you'll see slower code execution and a lot of built-ins don't need to use the heap memory but things you can probably imagine that do use heap memory is when you import a module this is a whole lot of compilation and importing and loading of dictionaries that goes on there when you define a function in a class again you've got to create the actual function in class so that takes memory and one thing that trick that trips me up even though I know is that assigning a global variable for the first time will allocate memory because when you write to a global variable you've got a story into the global dictionary and if the global dictionaries too small has to be resized and you've got to find a slot for it and so on so that's something global variables are things you should try not to use if you're concerned about a fish see and creating data structures is also obviously allocating a list is going to take memory so the next few light slides give some tips on based on this knowledge that I've just gone through of how to make things run a little bit faster so as I said use functions don't use global scope so put stuff in a function if you can cuz that we use local variables so immediately taking something that's in the global scope like in the module level and putting into a function that will make it faster preload or cache methods and functions that you know you want to use a lot and sort of going against maybe some good programming principles prefer long expressions rather than breaking them up because a long expression doesn't have to load and store to some variables into intermediate it just creates this expression on the stack so you know if you can write one in what something in one line write it in one line because it will be faster and you know don't write your own you know starts with function like if you wanna search for the beginning of this if you want to notice a big string start or something new starts with because starts with is written in a C it's going to be faster than the way if you try and write that in Python and macro Bethenny includes this way to make constants so if you use this construct here from my code path n'importe constant then x equals Const 1 wherever the compiler sees X capital X again for this example it will replace it with 1 so it'll actually it's kind of like C hash defines or something so they actually do proper constants there and also it is okay to do things like 1 plus 2 in your code you don't have to optimize that to 3 the compiler what demise that for you so don't be afraid to use constants and in in ways like that so to reduce RAM usage just try and basically use the constructs that don't use heap memory use shorter variable names and reuse them ok this is again against sort of some programming principles but you know it's ok to use X Y and I you know if you just got coordinates or indexes and if you're reusing a that already exists somewhere else you don't have to allocate more memory to sort of store to store that variable name so use short names and and make them you know you program a microcontroller so don't be like you know this variable is the x-coordinate off my window so it's like just call it X and temporary buffers so sometimes it's good if if you really don't want to allocate any memory on the fly you pre allocate it like a bite or if size one even and then use that again and again so you want to construct a command that's one bit long and you can do that by reusing a buffer and the into methods so I'll go through those with the file example but there are some in-place reading methods like read into where you read into a pre allocated array and you don't have to create one on the fly don't use star star star tags if you're concerned about speed and memory usage and you can one way to improve compilation time so you can pre compile your scripts to pre-compile mpy files they're like pyc files but for micro python so it that contains precompiled bytecode so when you load that or you import such a script in you don't need to do any compilation so you save a lot of memory and a lot of time doing that it's doing is to load the code into memory but it doesn't have to go through the whole process of compiling so that saves you memory so that that is a big a big way to improve your speed as well as memory usage is to pre-compiled scripts and then out finally you can actually give in a Python script you can freeze it into the firmware so you've got to do this at compile time with this when you're compiling in C but this way it uses almost zero memory because the bytecode will be stored in flash storage so it doesn't use any RAM at all but that's that's quite an advanced topic but it's be very useful if I'm you've got big scripts ok so I'll try and show you how to make this script faster now using this knowledge so the first thing we want to do like I said is to put it into a function so I'm just going to switch here so led1 alright so this is the simple loop here so all the other code so down here at the bottom it's just doing timing and printing out the timing of the loop but the actual code here we've got is just this simple loop to turn the LED on and off so if I run this code on my PI board here that I have connected so LED one so it will tell me I'll scroll the screen up in so it runs it about okay 57 kilohertz now if I have a look at LED - so what I've done here is simply put this in a function so I've done the same code taken the same code and I've just put in a function so I changed nothing else and I've got some other there just timing related code down here so you know that but the main the main code here is just this blink simple function so if we run this one run number two I'll just sorry okay so 66 kilohertz so we've gained you know from 55 to 66 who gained a little bit but sometimes that's all you need because you just needed to run that a little bit faster you know gaining 10% in speed is actually quite quite big just by putting something into a function so that's because it didn't need to use global variables okay now in the third case so we're going to it's still in a function and we're going to preload the methods that we want to use so on is LED on so what it's doing here it's taking that LED object which was a global variable pre-loading the method on and storing that into a local variable so as I said local variables are really quick so now down here when we call on it's loading a local variable which is one byte and very quick and then calling it straight away with no arguments so it's super quick and there's no dictionary lookup so global x' no dictionary little cups for methods so you can expect that this might actually run quite fast Oh when I've also optimized the range yeah that's another little trick that you can yeah it's a bit faster if you do the range this way but that's a that's I want you can ask me about that later if you want so oops oops give away too much thick led3 alright so that went to 182 kilohertz so that's three times faster so you can see that there's a big difference there by pre-loading your methods and using local variables instead of Global's so the difference there between 60 and kilohertz and 180 is global very looking at things in dictionaries really okay so now we get so that's that's so I think LED 3 is probably like where you might stop in most cases you like you know it's still understandable and if you really want to optimize stuff this is how you go but if you want to go further so I mean you can use the standard technique of loop unrolling so we'll unroll the loop eight times but everything else is the same I've pre-loaded by methods in a function divided by eight here because I obviously I unrolled it eight times and yeah so I just unroll the date times so let's see how much faster that goes so what is that ten percent again yeah it's not too bad but what you can see here is you're really hitting the limit of the virtual machine interpreting byte codes so there we're sort of getting rid of the overhead of the loop so let's see how much further we can go led five sorry so okay so what we're using here is lewder - before we're using machine code and not byte code anymore so when this function executes sorkin first of all it's compiled it's compiled into machine code so instead of now running a virtual machine actual instructions on the hardware which is this is a little ARM chip are being executed each time so to do an on instead of executing the on bytecode X ray the instead of acting your feedback codes to load or the on method and call it it's executing actual machine instructions directly which makes things yeah little bit faster so we'll see how much faster le do five so this is the same as LED for exact same code but it's in machine code instead of using the virtual machine so that's about 220 percent faster so it's pretty easy to just put this little decorator there's a one-line change this decorator here at the top of your function to make it imaging machine code not all things are supported but a lot of things are and I'm improving that so I think there's two more to go hang on so LED six so this one here is this is really a bit crazy so this uses another mode called Viper mode which is a bit undocumented and and not finished did you know but it still does work it allows you to basically directly do things in machine code but rioted in Python it's really kind of a weird hybrid of Python and see it's like python syntax we see semantics so what I've got here I've created a pointer the points to the register of the peripheral so we're on GPIO B for this LED you could actually see LED flashing 50 times to 50,000 times a second if you wanted to down here but it's a bit boring so this this is actually each line here it's going on and off and by writing to a register so this is going to turn into basically one machine instruction so it's going to do one instruction here put on and off as opposed to loading a method before and executing it I don't want to go to the details of this you can you can ask me about it later I just wanted to show you what is possible so if we execute this one it's a yeah I can't even multiple do that division in my head it's quite a lot faster there so it's more than even 10 times faster it's 60 times faster or almost well 50 times faster so I'll just show you that code again it's you've got to know what you're doing and you've got to be able you know you can't read files this way you can only do things that require register access or really low-level stuff but yeah I'm just writing to the to the register here and that's it so then the final example led seven so so this is it written in inline assembler so so logical shift left so that's divided by three eight like before load the value of this register load the value the peripheral register into r1 once you've left for you saw that before that's the because we're GPIO be for loop is a label and then this store hi so that's actually turning it on and off on off on off so unroll the loop eight times here and at the bottom check subtract one from our counter and branch if it's not zero back to loop but if it is zero we've finished our loop and then that's the end of the function so here I've written inline assembler in Python syntax and let so this is sort of the punchline of the entire talk so 27 megahertz right so that's about 500 times faster than the initial code so and 27 megahertz blinking LED is almost a limit of the actual microcontroller itself because we're writing like it's just that's because it runs that fast so the microcontroller runs at 168 megahertz here so you know you're only doing you're doing less than 10 instructions per per blink so yeah that's that and then I've only got a minute left but doing this next example was pretty quick so this is just reading a thousand bytes over and over and over again and we're reading at one and a half megahertz or so a one and a half megabytes per second so file okay I've got everything here in one file so the naive the nave implementation is this file rate here which is just what you saw on the slide the optimized one I've unrolled the loop twice and I've pre-loaded the methods so they're just some techniques there and then the the final optimization the third one it's quite it's simpler but what I've used here is this read into method so basically this is not allocating memory each time so the other example was allocating a thousand bytes every time it read it but this example the one that's going to be the fastest here I've pre-allocated that array on that line and then in the actual loop which is just this bit I've used the same array and just read a thousand bytes into that same array so this thing doesn't use any memory when it's running it doesn't allocate any additional memory so if we run that you'll it'll run all three of them and show you the comparison between them so the simple example one and a half megabytes the slightly optimized one is almost not faster at all that's because you're really limited by the memory allocation here but then if we don't do memory allocation in the loop we get up to 20 megabytes a second which is about 1213 times faster and that's probably that's really as far as you can go that's almost the limit of actually reading the flash memory itself so but the key here is to is that showing you that Ram allocation is quite slow all right so I'm really at the end of my talk now and just to mention that the other optimizations so energy use if your code runs faster it can sleep for longer and sleeping is really low power consumption so therefore use less energy that's pretty simple rule of thumb but then you've got to think about other optimizations like how long did it take you to code it how long does it take you to debug it and how long does it take you to maintain this thing so you've got to make sure you're only optimize things that they really need to be optimized and I think that's sort of the key thing that you say that any programming language is that optimization should be at the end after you've worked out what the bottlenecks are the basic rule of thumb I use is that when you write something directly in Python it's about a hundred times slower than if writing the same thing in C but you can usually do a lot better than that because you know you don't read a file one byte at a time in Python you call file don't read and read a thousand at a time and that's all done in C under the hood and so you know reading a file is not a hundred times slower in Python than in C so try and use runtime methods like reading files or string those ends with or starts with on strings or regular expressions and in micro paithan the key thing is to use local variables and not global variables and that will get you a long way to making things faster so thanks for listening some links here and yeah I welcome any questions if I've got any time so I've got just about one minute left so you might have a chance for one or two quick questions anyone's got some just a question on when you pre-allocated the function into we pre allocated the function look like it was locally you mean like in this one or so it was where you had the function pre-allocated okay I see whoa so this one so pre I'm pre so LED itself is a global variable in this function because it's not you know it's it's defined in the global scope so when i when i just compile when this function is compiled it's it has to look up in the global dictionary of global variables to find led and it may not exist so it may have to raise an error and so on so it's got to go and search for a function by name but what I've done here is I've loaded that name once and then I've also loaded its method so I've done another dictionary lookup to load the method from within them object and that returns a bound method so you know in Python you have like you know if I have a string like and starts with so this is a that's a bound method of a instance of an object and then I can call that so yeah but I haven't done any arguments so this is actually low it's pre loading a method and storing it into the local variable and so on is a local variable so when I do this on here it's it's very quick and it's much quicker than doing led Don on even though it looks kind of very similar so yeah thank you so I think I'm fortunate all we've got time for for now but I'm sure Damon will answer questions in the in the short-term minute changeover break or you can find him out and around PyCon thank our much Damian okay [Applause]
Info
Channel: PyCon AU
Views: 43,726
Rating: undefined out of 5
Keywords: pyconau, pyconau_2018, Python, PyCon, PyConAU, DamienGeorge
Id: hHec4qL00x0
Channel Id: undefined
Length: 31min 42sec (1902 seconds)
Published: Fri Aug 24 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.