Multi Thread Coding on the Raspberry Pi Pico in MicroPython - Threads, Locks and problems!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
the raspberry pi pico is a powerful microcontroller but most of the time we only use half of its processing power let me show you how to use the second core [Music] [Music] hi and welcome to bytes and bits i've been using the raspberry pi pico in a few of my projects and it's very much taken over from the arduino for me now the pico is a powerful micro controller board so not only does it have 264 kilobytes of ram and 2 megabytes of flash rom but it also has two arm cortex m0 plus processor cores running at up to 133 megahertz and indeed you can actually overclock those up to over 200 megahertz and this actually makes it actually orders of magnitude more powerful than your standard arduino now add to that the ability to use python for development and the pico becomes a fantastic platform for learning to build your own devices now in this video it's the dual core processor that i want to focus on so with our normal projects we end up running everything on just one single core leaving half of the processing power of the pico unused now whilst not every bit of code can make use of two cores many can real benefits from parallel processing and with micro python running two threads is is actually very easy as long as you plan out what you're going to do so let me show you how it's done the basic idea of using two cores is actually quite simple so if we can divide our code into two separate parts we can then run each part on its own core that the two halves can communicate with each other and synchronize actions and micropython actually provides a special thread package to handle the division and running of our code on these separate cores so um what one thing to note though um the thread package so at the time of recording in april 22 um this package was still in its early development stages and it sort of marked as experimental um it does work but there are a few issues and we'll be hitting a few of those and have to work around them in this tutorial um so so just to keep you updated the big issue that i found was that processing data on the second core seemed to generate a lot of temporary information that got left in the heap ram space now the the garbage collection system wasn't able to clear this quickly enough and that led to system crashes so explicitly running the garbage collection process as part of the code loop was the only workaround i could find and please do make sure you go through to the spi lcd example later in this video to see exactly what happened and how i managed to get around that now if you go to the micro python documentation you'll find that it points you to the main c python version of the package now most of the features have been implemented and that will allow you low level control of the threads running on the two processor cores so to use the thread package we first need to import it into our python file now don't forget that all the code that i'm using in this tutorial will be available on a github repository that i'll upload for you um so do make sure you check out the links in the description below and also have a look at the main project page in my bytes and bit dot code uk website and again links for that in the description and while you're here don't forget please do subscribe to the channel to make sure that you get access to all of my upcoming videos on on programming making and retro gaming back in the code then we then need to create some sort of function which is actually going to be running the core one thread and then we need to use the start new thread method from the thread package to create our new thread which will run on the second core so core number one this method will return us back a reference to the new thread which we can use then to sort of control that and monitor what's going on and the parameters then for the start new thread function are first of all we need to send it a reference to the function and this needs to be a actual sort of normal python function or it could be a method in an object or something but that then is the reference to the function which contains the code which we want to run as our new thread on our second core we then are able to supply a few parameters to that so we have some posit the idea of passing some positional parameters and those need to be passed as a tuple so you can see here i'm passing in two parameters for the first two positional parameters in my method and then we have the ability then to pass in some keyword parameters as a dictionary as our third parameter in our method call so this will end up then providing us then with two threads one running on each of our cores so this function here will run on core number one and any code that we then have in the normal flow of our python file will continue running then on core zero now the easiest way to see this working is to create a very simple threaded example so in this example we're going to create two threads our first thread which we're going to be running on core number zero is going to print even numbers out to the repl console and it's going to do that every second so it's going to print out a number go to the next one and then wait for one second before continuing round in this infinite loop our second thread which will run on core one will be doing the same thing but with um odd numbers i'm printing an odd number out every two seconds so we have our our one function here which is going to be one of our threads and a section function here which is going to be our second thread so down here this is where we actually kick off our threads so we use our thread.new start thread method call to create this function running on our second core so core number one and again we're gonna get back a reference to that so we can then do some things later if we did need to do that now it's interesting note then in the parameters then of course the parameters to start new thread first of all we have to give it a reference to the function which is going to form our the the code for our thread we then need to provide our parameter information now this particular function doesn't require any parameters but we do still need to include the second parameter in our method call so we are using here an empty tuple and that is important you you do need to provide a tuple even if it's empty for this process to work so at this point we will kick off a new thread on core number one and then down here we we're still now this is in the main um thread of our of of our python python file so we will now start this function running on our default core which of course is core zero so at the point when we get down to the end here we will actually have two threads running um one on each of our cores so let's load that up into our pi pico and see what happens so if we start the code running in our rapl console we'll see that we get our even and odd numbers now coming out with our even numbers coming out every one second and then our odd numbers coming out every two seconds and you can see the two streams of numbers now intertwined in our console output so as you've seen getting two threads to work in parallel is is actually really easy where things start to get a bit more complicated though is when the two threads need to talk to each other and share resources now talking to each other is fairly straightforward both threads share the same global name space in your code so if you define a variable outside of any function or class it will then be available to both threads so in this example we're defining a runcore1 variable outside of any functions or classes so that will make it a global variable and we can then make that available to any part of our code so if we have a look at our two threads then so we're going to have our core zero thread and this is going to really control the whole system so you can see it it is importing our runcore1 global variable here so that's making a reference back out to the global version and this time again we have our infinite loop here and this is printing out five even numbers with one second sleep in between each what it's then going to do is then going to set this global variable to true and that's going to send a signal to core 1 to make core 1 start we're then going to sit here waiting for core 1 signal back that it has finished and it's going to do that by setting this runcore1 variable back to false and then of course we will go around again so we're basically going to print out five even numbers one second in between each we're then going to ask core one to do its work we're going to wait for core 1 to finish and then loop around again if we look at our core 1 thread then so again we're bringing in our global variable we've got our infinite loop so we start the loop by waiting for our true signal to come through from core 0. once that happens we're then going to print out three odd numbers with half a second in between each one and then we're going to send a signal back to core 0 that we have finished and then we're going to loop background again waiting for our next run signal so we can see here or both of our cores have got access to this global variable and each one then is sending a message to the other core just simply by setting that variable to either true or false down the bottom again we have just a normal idea here where we're starting off our second thread on our core one and then relying on our default core then to run that core zero thread function so let's upload this and see what happens so if we start that running we can see there we have our waiting signals and then our core zero doing its even numbers our core one then prints out its three odd numbers and we see then we have both of our cores both sending signals to each other then waiting for those signals coming back and then running their particular processes now we could of course wrap this flag up in a class so that we could easily share the data across code files as usual global variables are never a great idea especially if you're using multiple files as they tend to clutter up your code and can be very hard to maintain in this example we're creating a flag class which is going to replace our global variable inside that class we're then creating a class level attribute so defining a a variable outside of any methods in the class that makes it part of the class structure itself and again these then are shared across all instances of that class so in effect where we're creating a globally accessible variable so as soon as if if we ever access our flag class this common variable will be available to whatever part of code we're using we're then creating a number of class level methods so these methods then are accessing our class variable so again we don't need to ever actually instantiate a an instance of our flag class all we have to do is access the flag class definition itself and then the class variables and methods within that so we're setting one here for setting our run flag for clearing our run flag down to false and then also for reading the value of our our runcore1 class attribute so so that then again i say that that's using the class but at the class level so that we in effect are mimicking our global variable but now wrapping it up inside this nice package off of our flag class inside our cores then all we're doing is simply adjusting those so that they make use of this flag class instead of the global variable so in our core zero thread instead of setting the global variable we're now setting the run flag inside our flag class so again you can see here we're calling this as as a class level or if you're familiar with other languages this is a static method call and similarly down here then we are reading back that flag through this second class method um or the static function call and again core one doing exactly the same thing reading the flag class member value and then setting it or clearing it down here and again initially then we're setting the flag to zero and starting off our threads so again this this function or this code does exactly the same as the previous one but hopefully you can see here now that we have neatly packaged our inter process or inter thread communications inside a neat little class and that then of course we could extract this class out to a separate code file we could then of course extract each of our thread definitions out into separate code files and that would again make our our package the packaging of our code that much more um understandable especially once we we we've got very very simple code here at the moment again once our code starts to spread out and become much more complex and then each of our our thread code files may then of course be pulling in from different files this this means then that we have nicely encapsulated our inter-process communication inside this nice little class so again this again this code here does exactly the same before but it is now a a better formatted and better structured um piece of software now that that's a very simple sharing using a flag based systems but but sometimes we need to be very careful about who and when a thread can have access to some data or some resource uh for example um the spi interface which we're going to be using later now if if both threads try to use or update the same resource at the same time we will either get corrupted data or we could potentially crash part of our code now there are a number of ways to achieve this control and as we've seen one is to use the simple flag mechanism above and this will work for more basic well-defined situations but when you need to be a bit more flexible and have more control um we need to use something called a lock now a lock in our thread package which is sometimes called a semaphore if you're into concurrent programming this allows us to control access to really anything we create a lock object and only the owner of the lock can use the resource for example two threads need to use a single resource if they access it at the same time data will get corrupted we create a lock and each thread can only access the resource when it requires control of this lock so initially the lock is open so the core zero thread is then able to acquire the lock and start interacting with the resource the core 1 thread is now blocked from using the resource it can request access but will have to wait for the core 0 to release the lock so once core 0 finishes it releases the lock and puts it back into the unlock state now the core 1 thread is able to acquire the lock and start its processing so that's the principle of the lock let's see a coded example so here we have two threads which are continuously writing their messages out to the serial port and that will come up of course on our console so core zero is using uppercase and it's printing out a letter and then it's taking a little bit of delay and then printing out the next letter so it's using uppercase for core zero and core one is doing exactly the same with lowercase for core one now of course the console can only output one stream of characters so if we run these without any locking or any control of who's writing when we should get a mixed output so if we run that we'll see that we get our message coming out and of course we've got that complete mix of our capital our lowercase and so on so both both of the threads are simply outputting their um information to the console as soon as they want to if we now rewrite our code using a lock so down here so in the main part of our python file so we're creating a global variable here and that's just to make it a bit easier we can of course then use um objects and so on to to hide these away but i'm creating a global lock variable and that's going to be an instance of this thread dot allocate lock method call and of course that's going to return back one of these lock objects which of course will initially be unlocked now the important thing to remember about locks are they don't actually lock any particular resource they're simply a tool which you can use in your code to control the access to resources and you have to then as i say write your code so that obeys the lock principles and only accesses that resource when it has acquired control of the lock and again i say all that's down to you so if we have a look at our threads then we can see that our core zero thread is exactly the same before where we are simply writing out this string or other stream of characters but now we are not starting our our output until we have got hold off this lock so we try this lock.acquire and that method call then if we don't supply any parameters to that method call it does the standard wit for the acquisition so if the lock is unlocked then we will get hold of it and this will come back and where our code will continue if the lock is currently being owned by somebody else then this lock.acquirer will go into a wait state and our on our thread will actually pause at this line of code until it manages to get hold off the lock so in effect we will have a pause here and then so so this lock then this this thread waits to see if it can get hold of the lock once it gets hold of the lock it will then have sole use of our console and should be able to output his message all in one go the important thing then of course is once you have finished your task you need to release the lock and that lets other threads then get hold of it and continue in their process and of course our thread one is doing exactly the same thing so it's importing the lock the global log variable and again it starts its loop by trying to get hold of it so once our code first starts one of these threads will acquire the lock the other thread will try to acquire it but go into a weight state once the thread which got hold of the lock completes its process and releases it then immediately the thread that was waiting for the lock to become available will get hold of it and continue on its process and you can see there that each one will then get hold of the lock process its um message release the lock and let the other one in and in this way you can see that we are now getting full control of our stream output to our console and each thread should be able to complete its message in one go without any interruptions so let's try that then in our on our pico so starting our code we can see there that core 0 has got lock then core 1 then course 0 and so on but now we've got that nice um steady message completion per thread run and again they're taking it nicely in turns because as as one thread gets hold of the lock the other then is in the waiting state and the rule then is as soon as the lock is released whoever is currently waiting will then get hold off the lock and and so on with their passing control backwards and forwards so the basic lock acquire method and that we've just used means that one of your threads will simply go into a waiting state until it gets hold off the lock now this might not be what you want it to do for example you you might be collecting some data and want to use the serial channel to send it um to some data collection system now if you can't use the channel you still need to carry on gathering that data and you can simply try then to regain the lock at a later time so in this example we're gonna set the core zero thread to continuously pull the lock to see if it's available if it's not available instead of just waiting it will continue processing by counting how many times it has pulled the lock before it finally gets ownership so here we're using the acquire method and that actually takes two positional parameters so the first is a boolean weight flag and if you set that to zero that means then that the thread is not going to wait for the lock to be available it will check whether it is available or not but then just carry on i'm setting it to one of course then means that it will wait the second parameter then is a timeout value so if you have told the thread to wait for the lock to become available this sets the amount of time in seconds that it's going to wait before it gives up and just returns back with a false saying that i did not manage to get the lock so if we run this version um we'll see that previously and when we use the two cores both using the waiting method call they were taking it in nice turns so one would get a go and then the other would get a go when that lock has been released but but now when core zero tries to get control it does check to see if the lock is available but instead of waiting it now goes off and does some other processing before it comes back to pull the lock again at a later time on the other hand core one uses the wet method so as soon as it tries to acquire the lock it goes into this waiting state and then is primed to take control as soon as that lock is available so as soon as core one releases its lock at the end of its loop it will immediately go back and try to reacquire it again so so that you can hopefully send see that at some times core one is able to complete its loop release the lock and then get back to the reacquire the lock code while core zero is still off updating its poll counter so this means that core one will tend to get more than its fair share of the use of the of the console so it's important that you do bear this in mind when you're running multiple threads that that you need to make sure that one thread isn't going to be greedy when it comes to taking control of resources so that's the basics of concurrent coding using the thread package in micropython so best thing to do now is to actually see it in a real world example so this code is a development of my spi lcd panel display driver now i'll be covering the multi-core version in more detail in a separate tutorial but i want to use it as an example here to show how software can be broken up and the issues you might come across when using multiple cores so core zero which is the default core in our micro python code um it updates the models in our code and then starts the rendering process and this is going to draw objects onto the frame buffer memory space so once this process finishes we need the spi handling process to start drawing that frame out to the lcd panel now this frame buffer memory object it is the actual resource that we need to control access to only one process can use it at any one time otherwise we're going to end up having half drawn frames getting sent out to our lcd panel core 1 runs the spi handler code and that initially sits waiting for access to the frame buffer lock so once it has that access it assumes a frame has been prepared and the memory buffer needs to be sent to the lcd panel so once it acquires the lock controlling the buffer it sends the frame buffer to the lcd panel and then when it's finished it releases that lock to allow the render thread process access again in our core zero thread the update model code can actually be run in parallel with the spi handler so only the code where it renders objects onto the frame buffer memory needs to be controlled so that we don't have a clash with the other thread so this means that our code is able to use the slow spi transfer time to complete the majority of our model processing which of course then increases our frame rate keeping our spi channel working as hard as possible now in back in our core zero thread then our code has to pause at the start of the rendering process to wait for the frame lock to be reacquired and then we can draw the objects into the buffer when it finishes it releases the lock and that of course then lets the spi handler on core 1 have control of the memory map and we then start going around in our rendering loop if we run this code on the pi pico we do get an odd output uh the boxes seem to run but then they freeze now this did take me a while to work out what was wrong but it seems to be a sort of memory leak in the threading system as as the new thread is processing it must be writing temporary data to the system heap ram for local variables and etc and this doesn't seem to get cleaned fast enough by the garbage collection process so to get it to work i need to add an explicit garbage collection call at the end of the spi handler loop so we have to import the gargage the garbage collection library at the top of our code and then run a garbage collection um at the end of our spi loop so again the thread package is an experimental package as it does state in the documentation so i guess this is one of the areas where they're still ironing out a few bugs so with the garbage collection in place everything does seem to run fine but if we examine the code in a bit more detail we'll see that there are some situations where we could either miss or duplicate frames and this is because there are two code races in our code so when our render code finishes it releases the lock that we're using to control access to the buffer memory it then loops back into the update model code and once that's finished then it tries again to acquire the lock for the next frame render now this code assumes that the spi handler thread will take control of the lock before the main thread has finished the update process so if it doesn't the main thread will simply be able to reacquire the lock and process the next frame before one has been sent out to the spi handler and this of course will cause a frame to be skipped now this situation is actually overcome by the way in which our code is organized where the spi handler releases its lock on the frame lock immediately and then loops around and tries to reacquire the lock using the wetting technique so basically as soon as the render thread releases the lock the spi thread will then be sitting there waiting to take control of it and as we've seen before a waiting thread will get access immediately however it this does of course show that there is a second race condition in these loops so when the spi thread acquires the lock and start sending data to the lcd panel the main thread at that point starts to process the model updates now if the model update process takes a long time the spi thread will finish release its lock on the frame buffer and then go back and try to reacquire it now usually and what we're relying on here is that the render thread where we're actually drawing the objects onto our frame buffer should be sitting waiting ready to take control when the sbi thread finishes but if the model updates haven't finished then the lock will still be available for the spi thread the the render process will not have got round trying to reacquire it and therefore waiting for it so the spi thread now will be able to simply reacquire the lock and of course that will mean it will just resend the last frame out that uh that it sent to the lcd and that will give us a duplicated frame and that creates a delay um where we're not having to resend the same frame a second time and that has to complete before the next new frame can be sent out so here the issue here then of course is that neither thread knows what the other thread is doing so there are a number of ways of getting around this um and we could do that by using some flags to show the different thread states so that each can see what the other one is doing or another method which i'm going to implement now is that we can create more control in the way in which we use our threads and we can rethink how our parallel processing might also be reorganized to remove our need for these flags as we'll see it will also remove our need for this explicit garbage collection so so far all of our threads have run as infinite loops so so they're always active and they're always running in parallel but this doesn't have to be the case we can actually start stop and restart threads whenever we want so a thread will exit if you perform a return from the function that controls the thread code and you can then restart it using the normal start new thread method call so in in this version of our code we initially only have a single thread running on our main core so when our core zero thread finishes drawing a frame into the buffer memory it actually then starts the spi handler code running on the second core it also sets a global flag variable so that it knows that it has asked that spi frame to be rendered out to the lcd panel the spi render thread can now start sending the frame data to the lcd panel immediately as as it only ever starts once the frame buffer is free for use so we've also removed the need for this lock object now once the spi handler has finished um this this lcd transfer it then resets this rendering flag and that lets the main thread know that the lcd panel has been refreshed and then that our spi handler thread simply exits by by getting to the end of its code or we could issue a return value at that point so the thing here is that when our thread exits it does seem to automatically clear up any heap data is left behind and it effectively cleans itself out of the pico memory so that it's ready then for its next run request so this gives us our finished solution and as you can see we get a nice working display then on our lcd panel so that's multi-threading on the raspberry pi pico and hopefully i've given you enough information to use this very powerful technique in your own projects and as we've seen starting processes on the second core is not really that difficult in micro python and with a little bit of thought we can make sure that our processes play nicely together and can communicate and share data and resources so if you've enjoyed this tutorial and found it useful please do make sure that you like the video and subscribe to my channel for more making coding and gaming episodes have fun coding your raspberry pi pickle and i'll see you again soon so bye for now for more games programming electronics projects and retro gaming please make sure you like this video subscribe to my youtube channel and visit my website
Info
Channel: Bytes N Bits
Views: 36,987
Rating: undefined out of 5
Keywords: raspberry pi pico, micropython, threading
Id: 1q0EaTkztIs
Channel Id: undefined
Length: 37min 17sec (2237 seconds)
Published: Tue Apr 19 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.