In-depth: Raspberry Pi Pico's PIO - programmable I/O!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey in this video we are going to take a look at the pio peripheral of the new raspberry pi pico or more precisely of the new rp2040 microcontroller designed by the raspberry pi foundation if you've seen some of the impressive stuff that's been done with epico you might have noticed people hooking up dvi vga and other pretty high-speed buses directly to the pico even though it doesn't have native peripherals for a lot of these buses the way most of these things are achieved is by making use of pio the programmable i o peripheral the rp2040 contains two of these pio blocks and they are pretty amazing you can think of each of these pio blocks as containing four tiny specialized cores doing super fast and freely programmable i o operations and in this video we will take a deep dive and check out what pio is how it works and how it's programmed this will include a more or less complete description of all these supported instructions and i've created timestamps so you can directly jump to each of them so if you want to come back to this in the future you should be able to simply click the instruction or detail you want to learn about in the description and it should take you right to it i also want to thank luke wren from the raspberry pi foundation for helping me double check the correctness of this video luke is one of the main engineers behind pio and he posts about some pretty interesting stuff on twitter so you should definitely follow him so let's dive in each pio peripheral contains four state machines and all four state machines are connected to the same instruction memory which can hold up to 32 instructions now this might not sound like a lot but it's more than enough to do some really impressive things it's important to know that all state machines are using the same instruction memory and that the memory has 4 read ports which means that all state machines can access the memory at the same time without any delay or blocking also the instruction memory itself is filled by the rest of the system for example by code running on one of the arm cores each state machine also has two fifos each with a length of four times 32 bits one for sending data into the state machine and one for sending data from the state machine back to the system a nice feature is that if you don't need one of the fifos you can reconfigure them so that both point in the same direction effectively doubling either the transmit or receive fifo size the fifos are one of the ways with which these state machines communicate with the rest of the system and so we can directly push or pull data into respectively from them for example from code running on the cpu now as said the pio is for dealing with io and so we need a way to connect our state machines to the gpios available on the rp2040 this is done by the i o mapping which lets us per state machine map pins as either inputs or outputs or both into the state machine finally the pio also contains an interrupt system more on that later now let's take a look inside of the state machine first let's look at the registers each state machine has three registers the program counter which is a special register and points to the currently executing instruction in the instruction memory and two 32-bit wide scratch registers called x and y the state machine also has two shift registers the input shift register called isr and the output shift register called osr both of these are connected to the fifos then we also have a clock divider which adjusts how fast our state machine will execute the clock divider is directly connected to the system clock and so our state machine can run at the full 133 megahertz and drive ios at that speed the clock divider can divide the system clock by one to 65 65536 with a fractional precision of 1 divided by 256. the state machine is also connected to the ioq system you will see how we can use that for syncing state machines and alerting the cpu later on finally each state machine also has access to the gpios via the i o mapping we just saw now there are four different i o mappings for each state machine there are input pins output pins set pins and side set pins we will see how these are used when we come to the pio instruction set now let's take a quick look at the shift registers each of these shift registers is 32 bits wide and they can be configured to either shift left or right the output shift register can for example be used to shift data directly onto the output pins in this example we are shifting the two least significant bits from the shift register directly into i o zero and io1 on the left the register is then filled with a zero second the input shift register works basically the same way so for example if we want to read data from a pin and send it to the cpu we can simply shift the value from the pin into the shift register we can also shift in multiple ios at once super useful for example for parallel bus systems as mentioned earlier the shift registers are directly connected to the fifos this allows us to push the contents of the isr into the receive fifo and to pull data out of the tx fifo into the osr what's even better shift registers keep track on how many bits have been shifted into them and they can be configured to automatically pull or push data once they are fully respectively empty the threshold for what defines being full or empty is freely configurable by the programmer before we go on let's also quickly look at the fifos themselves each fifo has four stages with a size of 32 bits each and on one end we have our cpu or other system peripherals and on the other hand we have our state machine the fifo is aware how much data has been pushed or pulled from it and so whether the fifo is full or empty is detectable by our code and that's basically all there is to the state machine quite simple but also quite powerful as you will see let's also quickly talk about the interrupt system each pio has eight interrupt flags which can be used to synchronize these state machines or provide lock-like functionality the first four flags are also exposed to the cpu and can trigger interrupts there the last big thing we have to look at before heading to the instruction set is the io mapping the rp2040 has 30 ios and with the i o mapping we can connect them to our state machines in different ways one thing to notice is that the pio i o mapping internally works with 32 bit wide registers and so from the mapper perspective there are actually 32 pins even though only 30 physically really exist as mentioned before each state machine has four different sets of gpio mappings let's start with the input mapping for the input mapping we can define the input base pin for example gpio7 this pin will then from the perspective of the state machine be pin 0 and all the other ios are simply counting up and wrapping around after i o 31 the output mapping works basically the same way except that we also need to define how many pins we want to map as output for example 1 3 or 8 or even 32 again wrapping around after io 31 note that the state machine can also control the pin direction of all pins except for the input pins and that we can have overlapping mappings the set and side set mappings work identical to the output mapping with the exception that we can map a maximum of five ios and so for example we can map io 19 to 23 as set pins and ios 5 to 8 as sight set pins and as said all of these mappings can overlap without any issues now let's dive into the instruction set pio is programmed using a custom assembly language which has nine instructions and is assembled using the pio asm tool the first instruction we will look at is the set instruction the set instruction takes two arguments destination and data and the instruction will write whatever is in data into the destination data must be a value from 0 to 31 and for the destination we have a couple of options pins will allow us to write directly to i o pins and here's where the i o mappings come into play as explained we have four different mappings and as the name suggests set will write to whatever pins are mapped into the state machine as set pins we can also write to pindias which are the pin directions a zero for a pin sets it as an input and a one sets it to be an output finally we can also use set to write a value directly into one of the scratch registers x and y and so for example set pins 1 would turn on the first map io set pin ds1 would turn the first mapped io into an output and set paneer's 0 would turn it into an input next let's look at the jump instruction a jumper instruction allows us to jump to different parts of our program and so the jump target can be a value from 0 to 31. remember that we only have space for 32 instructions and so this is enough to perform an absolute jump to anywhere in our instruction memory in practice instead of using an actual value you will see that most code will use a label which by the assembler is then converted into an absolute address the jump instruction also supports a couple of conditions and so for example exclamation mark x or exclamation mark y will only perform the jump if x or y are zero x minus minus or y minus minus will check if x or y is not zero if that's the case x or y will be decreased by 1 and then the jump is performed and x unequal y will only perform the jump if x and y are different from each other we can also jump depending on the state of a single pin this pin ignores all the i o mappings and instead has to be set explicitly the sdk exposes this function with the smconfig setjump pin function last but not least we have the exclamation mark osre condition here a jump will only be performed if the output shift register is not empty now let's look at some examples here we have a label loop start two set instructions which will first turn an output on and then off and then a jump back to loop start so this is an endless loop that will turn on the output for one cycle and then leave it off for two cycles next let's look at the mov or move instruction this instruction will copy whatever is in source into the destination there are quite a lot of destinations available here so let's walk through them pins you guessed it will copy whatever is in source into pins now mov unlike the set instruction uses the output i o mapping next we can copy something into our scratch registers x and y and then we have something interesting exec if we copy something into exact it will be decoded as an instruction and executed in the next cycle so in theory we can execute pio code from external sources like our pins or whatever we can also copy something into the program counter which will set the address of the next instruction that will get executed and we can copy something into the input and output shift registers these sources are similar we can copy from pins which will use the input pin mapping in this case we can copy from our scratch registers we can copy null which will simply set destination to 0 and we can copy from status status is a special register that can be configured to mean different things such as fifo full or fifo empty and so on and finally we can also copy from our shift registers isr and osr a cool feature is that you can prepend flags to your sources for example an exclamation mark or a tilde will copy the inverted value and two columns will copy the source bit reversed next we have the in instruction the in instruction will shift bit count bits from source into the isr the shift register that goes towards the rest of the system bit count can be any value between 1 and 32 and these sources behave identical as before note that the pins will use the input gpio mapping the out instruction allows us to shift data from the osr into a destination again bitcount can be 1 to 32 and again the destinations are pretty self-explanatory after the previous instructions note though that out uses the output pin mapping next we have the push instruction which pushes whatever is in the isr into the received fifo and clears the isr push has a couple of optional arguments if full will only push if the isr is considered full the isr is considered full when a configurable threshold of bits has been shifted into it if block is specified the push will block in cases where the fifo is full and wait for the five vote to have space before continuing execution no block ensures that the push will never block however if the fifo is full it will remain unchanged but the isr will be cleared push without any arguments is identical to calling push block and so by default push will block pull is the opposite of push pull will read data from the transmit fiverr into the output shift register if empty will cause it to pull only if the output shift register is considered empty and block no block will block or not block if the transmit fifo is empty next we have the irq instruction this instruction will set or clear one of the eight interrupt flags of the pio iiq num contains the flag number and so it has to be between 0 and 7 and then we have a couple of options set and no weight are the default options and they basically mean set the flag and don't wait for it to be clear first wait will wait for the flag to be zero before setting it making it possible to use it as a locking mechanism and clear will clear the flag without waiting or anything finally we also have the optional rel argument which allows us to set interrupt flags depending on which state machine we are in if rel is specified the lower two bits of the iaq number are modified by taking the two bits adding the state machine number to them and then performing a modulo 4 operation this allows us to use different interrupt flags per state machine while running the same program on all of them let's look at some examples iq1 will simply set interrupt flag 1 without waiting for it to be cleared if it's already one it will simply stay at 1. irq weight 1 will wait for iq flag 1 to be 0 before setting it and irq clear 1 will simply clear the interrupt flag 1. now the final instruction wait there are different types of weight instructions let's start with the first one weight gpio this instruction waits until the gpio specified in num has the value specified in polarity note that weight gpio completely bypasses the pin mapping and instead uses absolute gpio numbers next we have the weight pin instruction which works similar to the gpio weight but uses the input bit mapping and finally we have weight irq this waits for an iq to be set polarity works a bit different here if polarity is 1 the weight instruction will clear the interrupt flag once it's set if polarity is 0 the interrupt flag will not be changed weight irq also supports the optional rel argument which again will modify the lower two bits in the interrupt flag number by modulo 4 adding the state machine number to it and that's all there is to it all 9 instructions now before we finish let's have a look at 3 more features the first one is the delay functionality if you read pio assembly you will often see numbers in square brackets at the end of the line these are used to increase the cycles and instruction takes normally any non-blocking instruction takes exactly one cycle now often we want to perform cycle precise io for timing reasons and let's say we want to create a square wave this code has three instructions the first one sets the i o the second one clears it and then we jump back to the beginning now in this case our square wave would look like this the one is half the length of the zero not great for a square wave now we could insert an additional no operation instruction that just wastes time for example move y to y which does literally nothing this gives us a perfect square wave but it also weighs one instruction slot luckily the pio language has the delay feature instead of doing this we can simply add a 1 in square brackets behind our first instruction and now the instruction will execute and then wait for one more cycle we still get the perfect square wave and we've also regained one instruction slot the delay can be between 0 and 31 cycles aka 5 bits however the amount of bits available as delay depends on another functionality called siteset siteset is the only gpio group we haven't looked at yet and it's used by a pretty cool feature with each instruction we can toggle the ios mapped as siteset for example by adding site 0 we turn the site set i o off and with site 1 we can turn it on we can set and clear 5 individual pins this way pretty sweet this is super useful when implementing fast clocked signals such as spi as we can toggle the clock at the same time as we are shifting data out to enable sightset in the assembler we need to specify the number of ios we want to use as sightset using the dot sightset directory note though that each sightset pin will reduce the bits available for delays by 1. now by default if we enable siteset we have to specify the value of these site set pins on every instruction however if we add opt to the site set setting this becomes optional and these site set pins will just retain the last set value this comes at a cost though as it will reduce the number of bits available for delays by one more site set can also be configured to set pin directions instead of setting i o values this is enabled by appending pin years to the site set directive finally let's talk about program wrapping a lot of pio programs will consist of what's basically an endless loop and normally when the last instruction of your pio program is executed the state machine will jump back to the first one automatically but we can change that behavior using program wrapping let's say we have a program that has some setup code at the start and then a bit of code that should be executed in endless loop and generates a square wave this square wave will execute in four cycles however the pio engineers figured that this is such a common use case that they introduced program wrapping with program wrapping we can tell the state machine at which point the program ends and even better from where it should start again we simply insert the wrap target directive at the point to which the state machine should jump and add dot wrap to the instruction after which the state machine should wrap around this does not only save an instruction but also because we don't have to delay because of the jump instruction our square wave can run twice as fast now that was quite the deep dive into the pio architecture and the programming instruction set in a future video we will take a look at some example pio programs and see how they work if you already want to have a look i've linked the pico examples repository in the description so that you can already read some pio code there's also the often overlooked chapter 3 of the pico c sdk book which shows a lot of examples on how to use pio in your c projects i hope you enjoyed this video and you'll see you on this channel again soon thank you
Info
Channel: stacksmashing
Views: 53,392
Rating: undefined out of 5
Keywords: ghidraninja, stacksmashing, hacking, hardware hacking, raspberry pi, pico, making, electronics, RP2040
Id: yYnQYF_Xa8g
Channel Id: undefined
Length: 17min 19sec (1039 seconds)
Published: Tue Mar 09 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.