Run BOTH Cores with Threading On The PICO

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good morning welcome to the lab today we're going to take a look at uh threading utilizing both cores on the Raspberry Pi Pico I'm going to premise this with this information is based on the best I can find right now threading on the Raspberry Pi Pico is in the experimental stage so you may get different results than I'm getting but frankly I've had really good luck with it thus far but I've been uh chatting with other people and other users and uh checking the forums and so forth and there are a number of people having difficulty with it uh so I'll try to add some pointers along the way to help keep things rained in and utilize this feature uh to our advantage in its current form now what we're going to take a look at uh in this video we're going to talk about the two cores on the RP 2040 the Raspberry Pi Pico uh for the most part all we ever work with is just one course so we're utilizing half of the capability of this device utilizing both cores has the potential for dramatic Improvement in processing speed and I'm talking on a magnitude of twice as much performance but it only goes that high in very rare cases so we'll cover which library to use and then we're going to be making a function run in a thread which is essentially forcing a section of code to run on the other core foreign but that creates a problem where uh the two cores the main one that we're always working in and then core number one as we'll just generically call it uh may try to access data at the same time that can create all kinds of problems uh so we're going to talk about how we protect the data and other resources as well from simultaneous access it's going to be very important to utilizing threading not only on the Pico but other microcontrollers and in other programming languages as well now that is done with a uh a SEMA form which is a fancy way of describing a locking mechanism where uh one area of code we can lock that data so nobody can access it until we're ready to and then we can release that lock and then somebody else can use the lock and lock up that data or resource so we'll be going through uh the sema4 and we'll show you how to lock data or a section of code so nobody can access it and then after we're done we must unlock it so other people can access the code now we'll take a quick look at the data sheet for the RP 2040 uh to kind of get an idea of the architecture of the device to understand a little more about the cores before we get into an example here we're looking at one of the block diagrams describing the RP 2040 I won't go into a lot of detail here but um I just want you to see two things right here we see proc process zero process one think of that as core zero and core one almost always we're running in core zero now this is the block diagram for we'll call it the brains of the rp2040 that this will handle has access to the interrupts and then all the gpio stuff it's connected uh through here directly into the bus fabric which then connects all the peripherals and most importantly the thing to notice is the ROM and the ram and that's where you have to understand the importance of locking and unlocking data core number zero can be accessing a memory address number 10 at the same time uh core number one could be accessing memory address number 10 and they would have a collision or a fight over who gets the data or where she at gets the data at the wrong time and therefore your other process is working on invalid data and that's where this gets really important and I've got an excellent example to demonstrate that but the important thing out of this block diagram this is showing us the two cores core zero core one and how it accesses all the resources of the entire Raspberry Pi Pico or the rp2040 and as a big warning here with micro python threading is um is experimental on the Raspberry Pi Pico as of today's date 12 26 2022 so use use with expectations of trouble so you'll have to test test test and test again to make sure your code is in fact running reliably so now we'll dive into the code for our first example or our first demonstration we're going to take a look at running uh two we'll call them mathematically intensive operations and we'll put them both in a function and these mathematically intensive functions will take about nine seconds to run so let's see how we would do that without having a second chord to help share the workload uh up above I'm explaining what this demonstration is going to do essentially running two identical functions and one is called successively or each one is called success successively and therefore they run sequentially one after the other and that'll take about a total of 19 seconds to go through and process all this data um I do have uh some stuff set up here already for what we're going to need to import you'll see I'm importing thread even though we're really not going to use it in this particular example we've got a couple of LEDs on the breadboard so that we can kind of monitor which core is running uh we're going to have some Global variables here that are serving no purpose other than to communicate runtime from each core back to the main Loop so that we can kind of demonstrate what's going on there now I've got two functions uh they're essentially identical one will be called core one and then the other is called core zero and this will of course be a modified program in our next example program where one of these runs in a thread but both of them don't do a whole lot other than some really intensive complex math um we do have access to our Global variable uh we're going to record the start time in microseconds and that's just for us to do some time keeping to see how how long things run it's it's so forth so we can do some comparisons we're going to set a variable X to Zero we're going to go into a while loop that'll count one million times or Loop through one million times so it'll increment the variable X by one counting up to a million we're going to toggle our red LED on and off so every time through the loop it'll turn it on next time through the loop it'll turn it off with the toggle command after does that a million times we're going to record uh the ending time of that whole looping process and then we will calculate the duration of that as the end time minus the start time in microseconds and store that in the global variable core underscore one duration dur core 0 does exactly the same thing but storing our runtime in a different variable name core zero instead of core one down here in the main code I'm also going to record the entire duration of uh not only this code running but each of the codes independently so here we're going to uh start record the start time in microseconds we will execute core number one the upper one then we'll execute core number zero the lower one here and we'll record the ending time that that whole sequence of events took and then I'm going to turn off the LEDs so I'm always starting the program with them off uh when we go to run it the next time I will calculate the duration of the main code in this Loop [Music] including our core operations and then we're going to uh print out all the different uh we'll call it uh specifications or details that I can gather regarding this runtime now I'm going to run it again we already see code data down here but I'm just going to run it again and then I'll fast forward through it you can see that the yellow or the green LED is lit now and after that one runs then the red one will light up okay I'll fast forward now down here is our our data that we've collected from the run and the total run time is 18.85 seconds which seems quite reasonable to count to a million and turn on and off uh an LED uh every time through a loop so a half a million times and then we do that twice uh then we recorded here the duration of time in microseconds millionths of a second for core zero core one that's these two lines and then the time difference between core zero and core one uh equals negative 180 microseconds and if we convert that into seconds that would be point zero zero zero one eight seconds now uh the the two cores here there are two core functions so to speak are running as fast as they can go but the calling operations uh can cause a little bit of time delay and that would give us these slight very small uh changes or differences between the run times but the important thing to notice here uh when we ran this demo LED green came on first then red meaning that these operations were performed sequentially just as we called them up here in the main loop we're up in the main code so everything ran essentially in core number zero so now let's take a look at taking and running one of these cores in or one of these functions in the second core core number one and let it run simultaneously while core 0 is running and there that in theory should about double the performance at the top up in our comments section I'm going to explain again what this function's doing uh a lot of the same uh description above here where uh most importantly we're going to run two identical function one is in a thread the other runs from the main program in core zero when run it will take approximately nine and a half seconds as to a a approximately 19 seconds in our previous example and that's to run everything uh including the the functions uh if we comment out the thread start line in the main code and run it the program will still take approximately nine and a half seconds to run meaning that whether I run the thread or I the threaded core or I don't the cycle time would be the same but I'm doing half the amount of work uh the thing that maybe help visualize all of this is uh think of all non-threaded programs anywhere we're not importing the underscore thread library and starting a thread we're always running in core zero and then anytime we start a thread it's going to run in core number one now the thing to understand here we could literally have multiple threads set up in our program but only we'll call it two threads can run our main program and core zero and then one of the other threads can run in core number one so keep that in mind you will probably try it as I did and find out wow it works well only one can run at a time in either of the cores so Mains always running in core zero and core number one gets uh whichever one uh we're calling into action at the time I would recommend sticking to just one thread uh because I did run into a few issues with that uh in my early experimentation with the Pico um but let's look at our import there's our underscore threat again this time we're actually going to use it um we've got our LEDs uh variable for holding time that's in a global variable yeah I know everybody hates Global variables but they're wonderful little devices in small programs such as we would find in in a typical micro Python program our two core functions core zero core one they're identical I haven't changed them yet and then right here's where the main change is in this program versus the previous one uh of course we're going to do our start time record that and then we've got this function underscore thread or method I should say dot start underscore new underscore thread core one so I'm saying start core number one up here and then I I'm allowed to pass in arguments but we'll get into that maybe in a more advanced video but I can at least start this function up and get it running uh that's literally what's all that's required to do that as a to run in core number one uh right after that line This starts working and then my main code continues uh so I'm going to call my function core zero this guy and then that will run uh then when we get done with all that we'll print out our data just as we did on the previous example so we'll go ahead and run it we'll see now that both the green LED and the red LED are running at exactly the same time so they're running simultaneously and you'll see that the total run time was 9.8 seconds substantially faster than the 18.8 seconds in our previous example um our two cores ran very close to the same speed or number of microseconds uh we were at uh about a point two seconds as opposed to a 0.18 second difference there uh but here you can see that's nearly a double in processing capability uh speed wise so we can do a lot of work with this other chord now as you may be guessing already I picked this type of uh uh uh we'll call it a computational function within these two functions um I'm doing very simple things I'm toggling LEDs on and off I could also be monitoring an input um and I'm also writing to a local variable here x is in communicated outside either of these functions in this particular case but I'm doing a basic mathematics in there I'm not doing anything with uh the peripherals which I have not tested yet um I I'm not doing anything with Wi-Fi on the Pico W with this others have reported uh odd problems with that uh with threading with those but what I am doing is leveraging this capability for raw processing just going through and Computing numbers now you could be processing data for any of the inputs that you've collected data on and now you've got to Crunch a bunch of numbers this is a way to do it where you could split up the load between process or quarter zero and core one do some of the computations and one some in the other and you've got a tremendous Improvement in efficiency so be careful what you try start out if you're experimenting with it do as I am doing and of course work from my example programs which by the way are on our companion website making stuff with chrisdayhut.com uh for you to download uh usually I post the the source code up the day after uh the video gets published or published onto uh YouTube when it's released uh but hopefully you get an idea of how simple it is to implement uh a to force a thread to run in core number one and take advantage of that processing capability literally we just changed uh one line of code from being a function call to a a thread start call and it makes that other function run in the other core truly quite simple which is why I think it takes so long for the people that micropython to get the full functionality on a new microprocessor like the Arduino and even the sp32 they've been playing with that for much longer and can tweak out these really difficult things to do where the Picos rather new so uh yeah it's frustrating for all of us but we got to cut them some slack now let's see what's going to happen in our next example about dealing with accessing uh data from two different cores and what what problems that brings up now let's take a look at how we deal with that uh locking mechanism and why it's important now in this case uh We've shortened up the example we've got a variable called shared variable shared VAR set to zero and that's going to be modified by core number one and that'll be the result of all this massive computations we're going to do in this function that we want to run offline so we can do other work while it's taking so long to perform these calculations so our function here is very similar to what we had before we've got a counter internal counter that's we're going to set to zero we will go into an endless loop that will run forever and ever uh we will have a counter this one that will increment by one every time through the loop oh we're going to take a a half a second break in between uh these massive computations imagine that sleep of five a half a second there is when it's Gathering data because we could do other things in this function as well Gathering and collecting information from inputs or other devices uh our shared variable we're going to set to zero at this point now a while shared variable is less than a half a million uh we're going to add 1 to it so by the time it gets done with this loop we're going to have a shared variable equal to 500 000. then when that's done I'm going to add this counter value which is incrementing once on the the primary Loop uh we'll add one to that so this will always be equal to 500 000 plus whatever this counter is incrementing up on for every uh time through the primary or main Loop in this area the code and that's the one that we'll think of as the result of all these massive computations down here in our maid code section uh here we are just starting that thread um right here four one start that thread of running and it just starts running and doing its job down here we've got our we'll call it our main program loop our infinite Loop where we're going to do a bunch of work that takes uh 0.25 seconds and then we will access the shared variable now ideally we would never want to access shared variable until all these computations are done uh so I've set this up such that um there's other ways to do what I'm doing but it demonstrates the concept of locking data so I'm going to run this and we will see that every quarter of a second I'm getting the current status of shared variable and that's constantly changing until it gets over five hundred thousand and then this will just keep changing and we keep getting data now if we're trying to get this data after the computations are done we're failing miserably with our our algorithm and our process of dealing with this so let's see how the Locking mechanisms the semaphore can help prevent us from accessing that data until our thread is done processing it and that'll be in our next example thread number four now we're going to get into a few more turns and I tried to use variable names to kind of help guide us along in this process and data key will be one of those words or variable names uh but imagine if you will uh while in our our thread where we're doing all that processing that we want to lock that area up so nothing can access either the process of the calculations or the actual calculation itself so we're going to go through calculate that shared variable uh numerous times a half a million times and then once we're done doing that we'll release it and then somebody else anywhere in this program could then put the lock and key on it and then they have possession of the data they can read it or add to it or manipulate it further and then they too could release it after that so whoever's working on what has to have the lock but the lock can only exist in one place at a time uh and that's pretty much what this is explaining here so now let's look at our our program this is again the same program as previous um but we've added in uh the SEMA for uh the the Locking functions now up here is the first thing uh we're going to uh create a SEMA for fancy name for a locking mechanism by allocating a lock and I'm going to call that data key I could call it whatever I wanted to but it's just a way for me to keep track and this is the key to lock data in this program now we come into our our function here our core one that's running uh this function is running in core one it will be running a thread uh all the stuff is pretty much the same but this one right here uh it says data key acquire now if nobody else has this function locked we've got it we lock it and we will get the the acquisition of this this area of code and the other device will be prevented from getting it now somebody has it ready we will wait here in code until they release it so it's you're going to kind of get stuck in in your code sometimes so you have to pay attention that you've when you do a lock you must release it otherwise nobody else can get the data so error trapping is even more important in a threaded program than it is in a non-threaded program but here we're going to acquire the lock uh and I'm going to add some print statements so that we can kind of follow along uh down in the terminal see what how the timing of all this uh then we're going to go through and do all of our calculations just as we did in the previous example once I'm all done with all that calculation then you'll notice I have data key release at that point I release control of that data and I can go about or this function can go about doing other work now down here uh in our main loop I added a few more variables just to limit its run time I've added a variable called f and we're going to increment that once every time the loop runs and then that'll we'll use a break statement down below to exit out of that loop after 20 reads that way I can get to talking and this thing don't keep running uh while we're in that loop we're sleeping for an eighth of a second and uh then I'm going to try to fetch the data now as soon as I want to try to fetch the data I am going to acquire the key I will sit there and wait until I can get the key if he has it so I'm going to acquire the key once I have it I'm going to get the data and then or further manipulate it do whatever I want with it here and then when I'm done playing with it in the main loop I will release it and that'll allow the thread in core number one this code to go through and manipulate the data again for the next iteration of what it's doing and that's kind of all there is to this it's just a function of locking data or locking code sections to prevent others from accessing it so we'll go ahead and we'll stop uh the thing that's running now and uh I am going to hit run now you'll see it it did a number it fetched got it fetched got data what it's doing is grabbing data while this thing is in the sleep mode that's why we see several got its and then finally it'll say uh data unlock or fetching data and then data unlocked when we release it up here so this is uh just a very simple mechanism from our perspective on how to protect data now let's look at the function up here I've got a fair amount of code locked up with our data key it's one two three four five lines of code now that could be a whole bunch more code whatever we want to protect in a block that we're actively acting on that could be inputs that could be outputs or data as in this case um or peripherals we could lock all that out keep everybody out of it until we're done playing release it and then down below or somewhere else somebody else can gain access to that data variables or input or output pin or peripheral or whatever else the device is and this should get you pretty far along with experimenting with threading on the Raspberry Pi Pico uh in this experimental phase of this feature and again it is experimental but in truth I've had pretty good luck with it keeping it simple and not going off on a lot of uh more adventurous uh tasks such as trying to communicate SPI or Iceberg c um or cereal or whatever the case may be I'm trying to avoid any of that obviously not Network calls but if you want to try it you're going to be in an experimentation mode I think there was something else I wanted to show you that I found rather fascinating in this and it just dawned on me now normally in a function within a program like this we don't have to import a library into the function if it's already imported into the main program but when I was doing this example I kept getting an error that uh it didn't know what sleep was undefined it just puzzled me and so after a while of trying to see if I misspelled the word sleep um which I could do uh it turned out I had to import it in this core which kind of I guess makes sense because we're in another processor per se another core running and it is probably not aware or made automatically aware that we needed to load that library in that core so that's another fascinatingly interesting aspect of how this threading is operating and running on the Raspberry Pi Pico my suggestion is this uh download these examples play around with them and try to use them maybe as a framework to do some crazy experimenting on your own uh over on the Raspberry Pi website in the forums in the Pico section there's been a good number of discussions about threading and if you're finding problems uh there where you think something should work share it in the Forum there and uh they would of course advise you to report that to the micro python forum but nonetheless just communicate the what what our findings are and that is in hopes that uh the good Folks at micro python can further refine this function and make our lives a little easier a little happier well I'm going to end this video now it's run very long um I my clocks got out of sync so I'm a little lost on time but uh this has been a great adventure uh spent a couple of days preparing all this code and getting it figured out how I could explain it I'm really hoping uh you got a clear picture of threading and working with multiple cores in a micro controller that's it for this video thanks for watching and I look forward to seeing you in the next video
Info
Channel: Making Stuff with Chris DeHut
Views: 1,165
Rating: undefined out of 5
Keywords: Thread, Threading, Mulitasking, multiple cores, core, core0, core1, Raspberry Pi Pico, RP Pico, Microcontroller, Electronics, Electronic components, Python, Micro Python, Interface, Real world computing, Maker, Making, Chris DeHut
Id: ZEgqrNXuBvk
Channel Id: undefined
Length: 33min 27sec (2007 seconds)
Published: Thu Apr 20 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.