[011] USB Debugging with sigrok

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hi and welcome back to the open tech lab so in today's video i'm going to be talking all about USB the humble communication protocol that we use to connect all of our devices to our laptops and it's quite interesting it's got a few different layers to it and I'll try and explain some things about it and specifically I'm going to be focusing on this little rig I have here some of you may have seen it before if you've seen my video about 3d printed mounts and the mount that I made for this little development rig and this is something I've been working on as part of my day job and what this consists of is a PC motherboards and Intel nut boards on the left here and a little USB peripheral that has an STM 32 and the two are connected together by this ribbon cable that links the two and they communicate with each other using USB full speeds now the microcontroller on this little board is an stm32f4 of developing the firmware for this thing and it turned out to have a few different pitfalls that I ran into but also there are a few ways in which I was able to learn quite a lot about what was going on between the device and the PC by using sig raagh now I've spoken quite a lot about sig Rogich analyzers and this is going to be another video where I focus on some of the amazing features that the cigarette has because using cigarette I was able to capture the packet flow that was going back and forth between the PC and the peripheral board here which really dug me out of a hole that I would have been scratching my head out for a very long time if I hadn't had that tool available to use so I'm really excited to go through all this I think it's going to be quite interesting and there's quite a lot to get through so let's get started so at the heart of this project is this little peripheral board and the heart of the peripheral board is this stm32 fo 14 microcontroller and this is a little ARM based microcontroller it has a cortex m0 core which is its processor built and it has a few different peripherals it has 32 kilobytes of flash and it has a USB slave controller which is how it connects up to the PC and the function that this board is quite simple it's got a couple of relays for controlling some outputs it's got a couple of opto Isolators for receiving some inputs and it can also communicate with a couple of channels of rs-485 signaling and that's about all there is to this thing so this is my development setup and you can see in the middle I've got the board and then it's plugged up to my development PC through this cable going off to the right and then at the back I've also got a USB hub and in - that is plugs the DS logic + logic analyzer and that's what I'm going to be using to snoop on the various signals coming out of this board and then I've also got this chinese clone st-link which is a USB debugger and though this can be used to load firmware into the microcontroller and it can also be used for single stepping through the code and doing things like that so if we take a closer look at this thing you can see that I've had to make a fair few modifications to attach the various things that I need to this board and in general if you're setting up these development boards it's well worth taking the time to make sure that what you've got has a reasonable degree of mechanical stability because otherwise things just get broken and drop off and then you can end up spending your whole time chasing ghosts which are actually caused by your wires dropping off and that's a lot as a big waste of time so what I've got here is a point one inch header on the side over here and I've got this super glued down to the board and then to the pins on it is attached off to the various debug pins on the microcontroller and I've also got a pin here which is a digital outputs which I can also use just for sending general signals off to the logic analyzer and then on the right here I've got a molex header that I'm using to plug in the K the USB cable and this has been mod wired up to the pins over here and I've got a little loop on the right to attach the ground plane a nice easy to attach an oscilloscope or logic analyzer ground connections to I've also got another little loop just here for the same purpose and then in addition to this there are a couple of other digital output pins that I've got in the middle here these are just a little bits of copper wire that I've attached to a couple of spare pins that are on this unpopulated footprint then to attach the logic analyzer to the usb I've attached a couple of pins of a pin header here and this is a nice stable thing that I can clip the hook probes onto so now with this whole setup in place we're in a position to start capturing some USB packets so here we are in sig ROK pulse view and as you can see we're connected up to the DES logic logic analyzer and I've got it configured to collect a half million samples at 50 mega samples per second which means that the capture length is 10 milliseconds now the reason for the 15 mega samples per second sample rate is that we're capturing USB signalling and it's communicating in USB full-speed which is sent at 10 megabits per second and because this is the bit rate so the Nyquist criterion requires us to sample at at least 20 mega samples per second and when you're capturing with logic analyzers it's typically advisable to over sample by 2 times or more which would take us up to 40 mega samples per second and so I've set it to 50 mega samples per second because that's the the nearest option in the list and it seems like a good choice and it works well so here in this capture you can see there is a series of packets and these are all being sent by the PC but no actual data is being transferred so these are the packets that the PC sends when the USB bus is in the idle State and if we zoom in on one of these packets you can see the individual bits that it's made out of and if we zoom in even closer you can see the individual samples and you can see that each of these bits has roughly 5 samples in it although it varies because of timing differences between the logic analyzer and the USB bus itself now in many ways USB resembles a differential signaling system similar to can bus and Ethernet and you can see here that we have two data lines that we're capturing from d- and d+ and you can see that the two signals being sent are mostly a mirror image of each other but what separates USB from a truly differential system is that sometimes in some states it will send a pulse on one of the lines but not on the other so you can see at the end of this packet that the pc is sending here it puts a negative pulse on the d + line right at the end of the packet and this pulse doesn't have a mirror image in the D - line and this is what sets USB apart from a truly differential system and in fact what we have here is really a two channel single ended system which is why we can capture these packets off the wire with a single edge ended logic analyzer like the DES logic and we don't have to have a setup involving any current transformers or differential probing or anything like that that will be necessary to read the bits off the wire in a current driven system such as Ethernet so now we've got packets of data captured we can use psychics decoding features to decode the meaning of the pulses so I'm going to go into the decoders menu and select USB signalling and you can see that since I added the decoder it's automatically associated the inputs to the decoder with the the two data lines because the names match the inputs which makes are very nicely and you can see what this decode is doing is its decoding the zeros and ones off the wire now and the USB uses a non-return zero inverting nrz I line coding and it also uses a kind of bit stuffing to make sure that there never is a long run of zeros or ones in a row and so the USB signalling decoder decodes these pulses into the actual symbols that's being tried submitted down the wire so the next level of decoding we can add is if we go into the menu here and select stack decoder we can add the USB packets decoder and this decodes the meanings of these symbols and decodes the actual fields within the packet and you can see the opening section of this packet is a sync word followed by an ID for the meaning of the packets and a frame number and a five bits CRC check sum at the end and there's just a little bit of a in the packet lead out here so what you can see overall here is that we have an sof packets which means start of frame and what we have in the overall capture is a series of these sof start of frame packets those are spaced out exactly one millisecond apart and what this is saying the pc is saying that it's got no message in each of these sof packets no message for the device and it's an opportunity for the device to once it's received this perhaps if it has something to say to the pc then it can then start up a message transfer if it has anything to say back to the pc which in this example it doesn't because of course the bus is idle so one of the things to take away from this is that USB is very very timing sensitive which is why a hardware controller is necessary in order to handle the real-time aspect of the low-level USB communication now the chip that we're using has a structure very similar to pretty much any microcontroller you're likely to encounter so at the heart of this chip we have a processor where the software runs and in this case it is an arm cortex-m 0 and then it controls various peripherals around the outside of this device through a series of internal buses and for the most part the peripherals are designed to implement various forms of communication so that the chip can tell other chips what to do and communicate with other aspects of the product so here i have the block diagram of the USB controller and the job of this block is to take care of every aspect the USB communication that can't feasibly be handled in software and so there's a few different things in this block and you can see at the top the two USB data lines coming in D plus and D minus and the first thing they're connected up to is this USB Phi and a Fire's job is to take care of receiving the bits in and reading them off the wire and it also contains the hardware necessary to drive signals onto the wire when transmitting and then in the middle here we have this block and this has a variety of functions mainly it's to do with the real-time aspects of the communication you can't just speak packets onto a USB bus willy-nilly it has to be initiated by the PC and the timing requirements are quite strict for that so this little block in olive color takes care of that and it also takes care of the flow control so if the PC wants to send a message but the chip isn't ready to receive it it can tell the PC automatically to go away for a while and equally the PC might want to ask if the chip has something to send and it can take care of telling the PC whether there's anything that the chip is ready to send or receive and if there are packets ready to send or receive they're transferred into the buffer memory and they can also be transferred back out of the buffer memory and then this packet buffer memory is accessible from the processor core over the APB bus which is just the internal bus that is used within the stm32 FC row to transfer diet data in and out of memory and then we just have a few registers over here and their function is just to configure the whole USB core to do what it's meant to do now the firmware is built with the C programming language and it uses Lib open c m3 and lip open C m3 is a really nice library of functions that make it easy to program a whole series of different microcontrollers by many different manufacturers and all of these different microcontrollers are focused around the ARM Cortex cores which is the case for the stm32 that we use and what this library basically offers to us is a couple of things firstly it makes it much much easier to program the various peripherals on the chip so normally you have to write certain values into various registers in order to make the peripheral behave as you want it to and this just wraps all that up into a nice API a nice interface of functions that you can call from your software so you don't have to focus so much on the low-level programming and it also offers a communication stack particularly for USB and there's quite a lot more to USB communication than the hardware itself can handle and you don't want to get bogged down in having to write your own USB stack so Lib open cm3 has built-in USB stack that you can use so your application just can just focus on what you need to solve rather than having to deal with all the complexities of conforming to the USB specification properly so now I want to move towards capturing a real conversation off the USB bus and to do that I'm going to need some test firmware and this is the program I have it's less than 300 lines of C code and most of its taken up with these static buffers that are here and they declare various in bits of information about the device that is sent up to the PC when it first connects and that includes its capabilities and so the buffers here are declaring that the device claims to support a virtual serial port and that it also has a built in DFU interface which is device firmware upgrade which means that the firmware can be upgraded over USB so these are just here as a sort of dummy set of interfaces to send to the PC and then if I scroll down to the bottom here we've got the main function and this gets invoked when the chip first powers on and runs for the whole lifetime while the microcontroller is running and the first thing we do is set up some clocks and then we initialize the USB stack here and then we set up some GPIO s and these are used for sending debugging signals to the logic analyzer and we'll be using those in a minute and in the middle here we have this Loup this is the main loop and at all that it's doing right now is just running the USB stack through this USB deep ol function and then I've got a couple of these GPIO s so that we get an output that set wild USB deep ol is running and then cleared so that we'll be able to see the timing of this loop so we'll be able to see how much bit time is being taken by this function as it runs as packets are sent and receives so it's pretty simple and then the only other thing that I've implemented here is this callback so I've registered a callback the vendor control callback in here and this basically says that this function will be invoked any time the PC sends a vendor message which is basically a custom product specific message and we're not going to do anything with the data of the message from the PC we're just going to toggle one of the other output lines so that we can see in the logic analyzer and when the PC message was received by the microcontroller now on the PC side we need some software that we can use to send the messages to the firmware and here I've got a really simple program written in Python and it uses the PI USB library and I really like this way of working it makes it really quick and easy when you're prototyping to send some custom messages to the device so what I have here is a couple of lines of code to locate the device by its vendor ID and Product ID and open it up so we can talk to it and then we have a line here that initiates a control transfer to the device it sets the BM request to zero X for T which means that it's an outgoing request that it's a vendor specific custom request and it applies to the whole device the request number is one so if you had multiple different types of function that you can invoke in the device this number would vary then we've got a couple of arguments that we can pass in that request and also a payload of data that goes along with the request as well so these numbers here I'm not going to be handled by the firmware at all it will just ignore them but having these here will be useful because we'll be able to see them when we read the communication off the wire so now all the software is taking care of we're in a position to start capturing some packets off the bus and so we're here once again inside pulse view and I've lined up a terminal window behind which I will use to invoke the Python script but before we do that I want to do one more capture with the bus in the idle State and you can see the SOF packets once again on the d- and d+ lines and if we zoom in on this you can also see that now we've got a pulsing on channel 5 and you might remember the reason for that is that I'm setting the output high during the period while the USB deep ol function is running and so the repeating waveform we're seeing here one cycle at this waveform indicates one cycle of the main loop and between the rising and falling edge of this waveform is the period of time while the while the live open cm3 USB stack is executing one iteration and in this case because all we're getting is start of frame packets these are being silently dismissed by the hardware and software never has to deal with these at all which is why the period of this waveform is completely regular because in every case the loop is doing exactly no work and returning quite quite quickly so now let's go ahead and capture some real packets and to do that I've set the logic analyzer to trigger off the rising edge of a pulse on logic analyzer channel number 3 which is wired up to the pin pb6 on our microcontroller and you might remember that this is set up within the firmware to send a pulse out when the vendor control message callback is invoked within the firmware so we're going to get a little notification when the firmware thinks it received a packet from the PC so I've set the logic analyzer to have a 50% pre trigger capture ratio so hopefully we're going to see our event right in the middle of the sweep so now if I go ahead click run and this will set the logic analyzer to wait for our trigger event so now I can run our scripts which is control test py if I go ahead and run that let's see if we capture something there we go we have a nice sweep there and you can see right in the middle of the sweep we have a little pulse on pb6 and that's the firmware showing that callback got invoked and you can see there's a few different stages to this this communication here and also you can see that while some messages were being received by the microcontroller the duty cycle of this waveform changes because now the USB D poll function has a little bit more to deal with now that some messages are being received so the whole transaction we have here is taking about 63 microseconds to complete so to interpret the meaning of the waveforms let's add some decoders so I've got the USB signaling decoder and I'm gonna stack on top of that the USB packet decoder and on top of that I'm going to stack the USB request decoder so 3 decoders stacked up on top of each other and you can see the USB request decoder has given us one big blob that indicates the whole structure of the transaction and it shows us the headers of the control message that we sent you can see the cafe dude from the W value and W index although the endianness has changed a bit and you can see the request number here and the 4-0 request type from the Python script and you can see the contents of the payload that we set up over here and then we can see the response from the microcontroller which is an ACK saying that it accepted this control request but to really understand what's going on here let's zoom in on the individual stages of the transfer and it's divided up into a few different stages so first of all what happens is that the PC says the microcontroller I want to set up a transfer of a control message to endpoint 0 and then it goes ahead the PC transfers to the mic control of the headers and you can see the various arguments the control message here to which the microcontroller says ACK which means it's acknowledged and accepted the header and then the PC follows up by saying that it's going it wants to set up the transfer of the payload and then it goes ahead and transfers the payload bytes here and after those have been transferred once more the microcontroller says ACK again and it's happy to really that it's happy to have received those bytes and then the PC goes straight ahead and starts trying to set up an inbound transaction and the reason it's doing this is it's trying to get the microcontroller to respond with a return value from the vendor control message whatever response the microcontroller wants to send back and the microcontroller isn't ready to send that back so it sends back neck not acknowledged and so the PC goes away for a little while and in the meantime we get our vendor control callback it gets in votes and it takes a little while because of course our microcontroller isn't ever so fast and the PC is a little bit quick so the microcontroller needs to deal with this rig response and here's our handler getting called within the firmware and just about at the same time the PC tries once more have you got a response for me and the microcontroller says neck and that's because with the timing of this even though our vendor control callback has got invoked here the response still isn't quite ready to send back to the PC and so here we get one more invocation of USB D poll and this time the PC asks for a third time have you got a response for me and the microcontroller says yes I have so it sends back an ACK rather than the neck and this time it's saying because there's this middle stage in the middle here the microcontroller is saying that it hasn't got anything to send back rather it's sending a packet that is an okay but it contains zero payload bytes and then that's the end of the transaction so there's certainly a little bit of a dance going on quite a lot of handshaking going on between the device and the PC berdal this happens quite quickly it's done within a few microseconds and so that is how control messages are sent into USB devices now for more information about the structure of various types of USB transfer there's a whole chapter about it in chapter 4 of the USB in a nutshell guide at beyond logic comm and you can see they've got a few nice diagrams showing essentially what we just saw read off the wire and then more information about the structure of various other types of transfer such as interrupt transfers and isochronous and bulk transfers and so on and there's a whole load more information that is really helpful for getting started developing USB peripherals now in one of my previous videos I did a segment about how to do packet capture with Linux's USB Mon feature which allows you to capture the packets that are being sent and received from the PC now it's worth asking the question what is the benefit of using a logic analyzer to snoop on the USB traffic versus just logging it in the PC and there's a few answers the first is that it allows you to see a lot more low level and real-time detail and we can see a lot of the invisible protocol that is silently handled by the hardware of the host controller and the peripherals internal USB controller for example those setups that we saw where the hardware responded with net packets that's not something we would ever see in USB Mon because the higher the hardware would handle that silently without the kernel ever having to get involved and without the kernel being able to make a log of it and also on top of that if there is any kind of protocol violation any kind of corrupt packets going along the wire that's not something we're gonna see in USB Mon either because those will be discarded by the low-level hardware also and another benefit of using a logic analyzer is that it allows us to do the side-by-side comparison with various real-time debug signals pulses that we set on those output lines as we had in that previous demo and that's a really really powerful thing and it's quite necessary to debug certain kinds of real-time issues so overall having a logic analyzer gives us a lot of more low-level information that we're not gonna see with PC based logging but on the other hand it's a lot harder to set up and USB Mon for the most part gives us everything we need so most of the time I find myself using USB Mon but the logic analyzer is a good tool to have ready if I'm doing something a little bit more tricky so now let's talk about the scenario that prompted all of this and the story behind this issue is that I was developing the firmware for the microcontroller and I've got pretty much towards the end of adding all the functionality into the software that needed to be there and I was testing the control messages that was being sent into the device and I found this very rare very sporadic issue that would sometimes occur and so to demonstrate the problem I've modified the control test script to not just send a single message to the device but instead to send a whole series of messages as fast as possible and to check that each one is processed correctly so let's go ahead and run the script and see what happens so there's a brief delay and messages are going through there we go and here we have an error and the error we're getting is a pipe error which is a generic catch-all error message from the kernel saying that there was some kind of communication issue over USB and thereafter the device just becomes completely inoperable and you can't send any messages to it so to help explain the scenario in a bit more detail I've drawn this diagram so of course we've got the microcontroller on the left and we've got the PC on the right and then of course the prime mover in all this is the USB host controller and that is directed what to do by the Linux kernel and then in userspace we have our Python scripts which uses the PI USB library which in turn uses Lib USB to tell the colonel to messages through to our device and then of course we have our firmware which is built on top of the lid open cm3 and that's used to send receive messages from slave controller now the problem we have here is that we have so little visibility on what is actually going wrong so we know for one thing that the PC is putting the bus in the stall State and it does that when there is any kind of miscommunication or error on the bus and at that point the as far as the PC is concerned the device is not talking sense anymore and so it just gives up trying to have a conversation with this it just says this device is done and it doesn't try and send through anymore messages until the user comes and unplugs it and plugs it back in again and so we don't know much about what went wrong because all the software in the PC knows is that there was some error and so with USB Mon if we try to log the packets all you see is an error in response to the last control message that we try and send and no detail about what the error was and perhaps the host controller knows but it's pretty much impossible to get many there any information out of this so we don't know much about what the miscommunication was on the wire and furthermore we don't know what state any of this got into the microcontroller got into that caused this so it could be a problem with our firmware it could be a problem with live open cm3 it could even be a problem with the silicon hardware of the slave controller although it's quite unlikely for this processor because it's been used quite extensively so it's unlikely that there are unknown aratus inside the behavior of the USB slave controller in the stm32 but it could happen you can never completely rule these things out and we just have no idea of what state this thing is in when the error occurs so we have a real just fog over everything we can't see what's going wrong and so we're in trouble because we can't really ask for help in any way because if we were to just start posting questions in forums it's not as if we have any information to help someone analyze what's going wrong in the setup so we're pretty much stuck so given that the firmware for the microcontroller was at quite a late stage of development there was certainly quite a bit of complexity in play which can be a real problem in bare-metal applications because there's little protection to prevent one part of your application from causing chaos in another part if it decides to malfunction so if one aspect of your firmware decides to start going bonkers and writing into random memory addresses you'll just see some really hard to understand corrupted memory or you could have some issues with interrupts or whatever it may be so it's very very hard to understand what's going wrong if you have no clue and you have a complex application and this pseudo code I have here kind of indicates how the overall thing looks at this point so as we saw in the demo we had some setup functions at the top and then we have a main loop which runs forever and inside that we have the USB deep ol function and then when USB is all set up we also have some additional additional functionality which runs taking care of all the rest of the functions of the software so the first thing I started to do was start to eliminate these functions and as I did that I found that little by little the problems seemed to become less and less likely to occur and at first I thought that it's specific to one of these functions causing the issue but in the end I found that it no matter what order I removed them from the firmware as I got down and removed all of them or almost all of them eventually the problem would slowly disappear and this gives us a little hint and it suggests to us that its timing related and to confirm that theory I just removed all of this code from the firmware altogether and replaced it with a little delay it should just waste a period of time and indeed I found that if the delay loop was set to a short period then there will be no error and if I had it set to a really long period then the error would happen instantly every time and if I had it set somewhere between the two then it would occur somewhat sporadically so this is really really weird because even though we've got none of our own code in here we do have us BD poll being called and of course us BD poll needs to be called somewhat regularly to make sure messages are sent and received from the USB slave controller but if there's some delay it really shouldn't matter at all because of USBs flow control so the slave controller takes care of telling the PC to wait if the buffers inside the slave controller are full so even if the firmware is super slow in getting back to calling this USB D poll function it really really shouldn't matter because the PC will just wait until the the the HOD of my control is ready to receive more messages but yet it does seem to matter if the delay is too long then the protocol breaks and so the question is why is that there must be something odd happening inside us BD pol because at this point I've lit eliminated all of my own code the only active element left is Lib open cm3 code the problem is that there's a lot of code inside this function and a lot of code that I personally am not familiar with so it's a real needle-in-a-haystack situation to figure out what state this USB D poll function is getting into now fortunately because Lib open C m3 is open source software and of course we love open source software on the open tech lab channel we get to have a look inside the code and so here I am in the st USB FS poll function which is the implementation of USB D poll on this particular microcontroller and there's a fair bit going on in here there's a little bit of code and this code then call through to a whole bunch of other function pointers and callbacks that get in votes in various ways and there's certainly quite a lot of functionality here and it's not at all obvious what's going wrong so it was at this point that I first decided to see if I could use cigarette to capture the signals off the bus and I figured that it might give me some kind of a clue but to make things more difficult the only fast logic analyzer that I had access to at the time was this open bench logic sniffer and as you can see the open bench logic sniffer is a bare board logic analyzer it doesn't come in and enclosure and it's certainly quite a bit less powerful than the DES logic that I'm using in today's demo and it's main limitation is that it doesn't have any streaming capabilities it only it can only capture samples into internal RAM and that Ram is quite limited in size it can only capture just over six thousand samples and therefore if we're capturing at high speeds the window that we're going to get visibility on the traffic going on the wire is a really really short period of time which means that we have to trigger the capture very very close to the moment of the failure now I had to scratch my head about this for a minute but I came up with a solution that turned out to work extremely nicely so it's easy enough to trigger a logic analyzer with our development board because of course we've got plenty of output lines that we can use as trigger signals to the logic analyzer but the problem is that we don't know what state the microcontroller is getting into when the error occurs and so we can't generate that signal and trigger it with this board we have to trigger it with the PC somehow because only the PC knows when the error occurs which is a little bit more difficult now I could have just relocated my setup to some kind of PC with easily accessible output lines to wiggle for example a Raspberry Pi would have served okay but I decided to do something quick and ugly with one of these USB serial modules and this is just like a normal serial port that you can plug into USB but the main difference is that the output voltage is at TTL levels so between 0 and 3.3 volts rather than the normal plus minus 12 volts swing that you get with rs-232 and these modules are really cheap you can pick them up for about a dollar 50 from banggood and places like that and if you flip it over you can see it's got the pin pin markings on the back and the key thing here is that we can use this module and when we want to trigger the logic analyzer all we have to do is just tell this little thing to emit a character and that character will involve a series of pulses sent down the txd line here and when we send that we we can set the logic analyzer to trigger off the edges that go into that now there are some risks with this because of course there are certain delays involving getting the PC to send a character through to this device so it might not have worked but I found that in practice it worked alright so now I've got the serial device plugged into the logic analyzer let's see if we can use it to trigger some packet capture ok so here I am sitting inside T MUX and you can see I've got some kind of epic split screen setup and then I've got pulse view overlaid in the corner so to begin with let's have a look at the source code of the firmware so I'm going to open up the main C here and if I scroll down to the bottom you can see the main function and we've got our set up stuff at the top and then we've got our main loop just as we have had in the pseudocode and you can see this time we've got this delay function here which is just counting up to a period defined by this constant and at the moment the delay period is set to be relatively short and this value is below the threshold where the error kicks in so now if I run make we can go ahead and build the firmware there we go and you can see if we look here it's spat out this di o dot FW firmware and this is the file we need to load into the device ok so now I'm want to load this firmware into the microcontroller but first I'm going to need to watch the kernel log because this is going to be helpful so that we can see when the kernel detects the appearance and disappearance and the demo board and so by running D message - W we can watch the messages that the kernel makes as they arrive so now I want to install the firmware and to do that I'm going to be using the st-link debugger and to take control of that I'm going to be using open OCD so if I paste this command in here you can see there are a couple of config files that get fed in on the command line and these just define the profile of the device and the st-link debugger and I don't want to go into too much into detail about open OCD right now but when I run this it's going to start up a couple of network servers connected to the local network interface and so we can control OCD by connecting to those through town net so let me run open OCD here and this is now sitting in the background waiting for commands and then I can connect to the the local server with this command on port 4 4 4 ok and here we've got a prompt so the first thing I want to do is reset the device the demo board and put it into Holt modes as resets it and cause it to freeze and you can see that the kernel is reporting that the device is disconnected so now I want to install the firmware and to do that I'm going to run this command and this just writes an image into the flash within the device after erasing it we put in the path to our firmware file and specify the start address where we want to write it to and this zero eight zero zero zero zero zero is the address that the flash starts in the memory structure of the stm32 so let's go ahead and do that and it takes a second or so and there we go the firmware is installed so now the device is still in the frozen state in the reset state so now if I run reset once more but without the halt argument this time we'll reset it and reset it into the running State there we go and you can see our device has connected and been registered by the kernel and you can see the kernel is saying there is a new a CM device which is basically the kernel reporting that it detected the virtual serial port capability that the firmware claims to have so now here I am sitting inside the control test Python script and again it's just set up to do a single trance there's no repetition setup or anything like that like we just had now and then if we have a look at pulse view you can see that we've got all the channel input set up we've got our trigger channel set here this is connected to that USB serial device and it's what we'll be using to trigger the capture and so it's set up to trigger on a falling edge and then I've set up the pre trigger capture ratio to 90% which means that 90% of our five million samples will be captured before the trigger points so now let's click run which will put a cigarette in the state where it's waiting for a trigger event and now we can go ahead and run our first test so to do that I'm going to run the control test script and after that runs straight after we're going to echo a single character in the TTY USB zero device so now let's go ahead and run that there we are and you can see we've got a catcher impulse for you and this returned with no error so this shows that with that short delay in the firmware everything's working just fine so let's zoom in on this and what you can see is a series of command packets and commands that are identical to the ones we saw earlier because of course all is going well at the moment okay so now I'm going to modify the firmware and set the delay to 128 loops of my little knot loop here and I found previously that with this value this is enough of a delay to trigger the error to kick in so I'm going to go ahead and quit build the firmware there we go and I will halt the device flash it and reset it and it's connected up to the kernel again okay let's trigger pulse view ready to capture the trigger and then let's run our control test once again and let's go there we go and we've got to capture and also you can see that we've got a pipe error this time so this is the error coming out of Python with our miscommunication so let's have a look at the USB packets that we've got and you can see at the start here the PC does a set up so it wants to set up an outgoing message going out to the device and it sends the arguments of the control message here and it doesn't get any response at all from the device so it tries again and sends the same set up and the same data and it gets no response once more and it tries one more time and then it just gives up on the control message and thereupon we get our pipe error on the PC and then a few milliseconds later just down here we get our trigger the character coming out of our USB serial device so now we know what's going wrong the microcontroller is giving no reply to the PC when it attempts to set up the control message transfer just silence and the PC tries three times and after that it just gives up and indeed our vendor control callback is never invoked and so now we know what's going wrong we need to try and figure out what state the firmware is getting into that would cause all this so to figure out what's going on we're going to use some of those trace output pulses to try and figure out where the flow of execution is actually going so here I am in the vendor control callback and of course this is never getting invoked because things are going so badly so there's no sense in taking up one of my debug pulses debug GPIO s in here because we know it's not gonna get in votes so we can take this one out and I'm going to move it somewhere else I'm gonna move it to this function here SD USB FS poll we saw it earlier it's the innards of USB D poll the USB stack and looking through the code this thing I'm not too suspicious of most of this apart from this block here which is the block that gets triggered when messages are received inside the USB slave block so I think it's worth putting our GPIO P pb6 wrapping that around this little if statement here be interesting to see and then of course we've got one more GPIO available to play with and I'm very interested in this little line here which is the only function invocation inside this block so I'm going to wrap it around here except the GPIO number is GPIO seven and we need to set and then clear and one more thing we need to add a missing header on the top because this code doesn't normally manipulate GPIO s' okay save and quits build there we go built correctly hold the device flash the firmware restart the device prepare polls for you waiting for the trigger and then run our test case bang there we go so what have we got in here so I've reordered the signals to make it a bit easier to see how this fits together and I've updated the labels to correspond to the code that these pulses are wrapped around and at the bottom we have USB deep ol just as we had at the start we've got the wrap around there if block and you can see there's a significant period of delay and that's because it's always entering into the body of that if block every single time because it's thinking there's some kind of received message every single time which is very very interesting and then within there every single time it's also going into that user callback function there so it's interesting that every single USB D poll is always taking the same flow even before the PC tries to set up start sending our message which suggests that way back before we even trigger the capture way back before the PC tries to send a message the device is already blocked up with some message that it's received and it's just every time thinking it's received something and for whatever reason there's some kind of message stuck inside the USB slave controller I would say now to try and figure out what's really going on here let's move our GPIO s a bit closer in and see where the code flows going now as I was searching this through I found it particularly helpful to have three debug signals orange yellow and green because that way on each test run I could change the code location of one of these signals while keeping the other two the same and I think with less than three it's a lot easier to get into a position where you think you see a pattern but actually you're chasing ghosts because the software is not actually running the same way from one execution to the next whereas with three signals because you are only changing one of them to remain the same it's a lot easier to just visually maintain confidence that something strange isn't happening that you didn't expect and if you can have more than three that's even better of course so to prevent this from getting too tedious I've already done the trial and error searching and I've been drilling down into the user callbacks CTR function call and it has resolved into this function the USB D control out function which basically is the state machine for handling the data transfer stage of these messages the outgoing data from the PC so I've wrapped the whole function with GPIO seven and then the state that is actually problematic is this error State the default case handler which causes a call to stall transaction so what's actually going on here is that we are typically going to be halfway through the transfer of a packet and the PC will either say here's some data or this is the last data you're going to get or this is the status of a transfer you sent up to the PC mister microcontroller and if it's not one of those things then the state machines now and really few state because it thinks it should be transferring some data when in fact that's not what's going on so something very weird is going on here and you can see impulse for you every single loop we're getting all the way through to that Stahl transaction thing so obviously the problem the thing that causes this it has happened a long time before we ever try and send a message to it with Python USB so the next thing to do is to try and figure out when this stall transaction actually begins and so usefully we can just set this up as a trigger instead of the USB serial device from the PC we can just trigger off the first time this fit this failure ever occurs after reset so to capture this event I've put the device into the halted state and I've got pulse view waiting for a trigger on the first time that still transaction ever happens so let's reset the device there we go now let's have a look at what we've got so looking at this trace you can see there's quite a flurry of USB activity and this is what happens when a USB device first comes out of reset and what's going on here is that when the device first connects to the USB bus the PC will send it a whole load of queries to retrieve all the metadata contained within the device and that contains the device description and all of its interface descriptors and so on everything the PC needs to know about the device itself and so our error seems to be kicking in some way through this process and I've done a few different runs to confirm this and it always seems to happen when we have a set up out transaction here and then the transaction is immediately followed by a set up in transaction and you might remember we were in the outgoing event handler so the candidate Michele with this little group of messages here but then it gets really confused thought because for whatever reason is it's not really expecting to get an incoming packet here although the PCs sending completely legitimate messages it's just that the state machine has got confused so having collected all these details I have plenty of information to be able to go to live open cm three people and have a productive conversation so I joined onto the IRC channel and started talking to them about my problems and several people were very very helpful indeed and I particularly want to call out Vegard Stoll Kyle Erikson he had been working on aspects of the USB stack so he was quite familiar with the nature of some of the things I was encountering and I was able to send through screenshots showing the flow of control the path with which the failure was occurring and he was able to help me figure out the exact cause of the error so to understand the cause of the error let's have another look at the trace we captured impulse views so on the left here we have the outbound transaction and then we have the fateful inbound transaction and thereafter the USB state machine of live open cm3 every single iteration it goes into the stall transaction function so what is the what exactly is going on here and if we zoom in on the outbound transaction we can see something that looks kind of reasonable so we have a set up packets and then we have the arguments of the control message being sent and we have the hardware responding with an ACK the microcontroller is happy to receive this this heading here and then the PC tries to follow up by sending the outbound transfers data payload and the microcontroller is not ready to receive this because USB pole has not been invoked in the interim and so the hardware automatically responds with a neck and this happens again and then eventually USB D pol gets invoked once more the packets the of the heading has been processed and so now the microcontroller is set up to be ready to receive the data payload and so on the fourth time the microcontroller responds with an acknowledgment so this transfer is actually accepted and so on for the whole outbound transaction now for whatever reason when the transaction finishes the microcontroller is still in the state where it thinks it's handling an outbound transaction and so along comes after a few microseconds this inbound setup transaction so we set it tries to setup the the arguments for the inbound transaction and the microcontroller responds with an ACK but of course the software for the lib open cm3 USB stack still thinks it's in the outgoing mode so actually it should have responded with a neck until it had actually finished processing the previous set of packets and thereafter it all gets very confused because yeah the lib open cm3 USB stack thinks it's in an outbound transaction and the PC is talking to it about an inbound transaction and so it goes into the error State thereafter because it's just thoroughly confused now the reason this rid this ugly it's ugly head is that when the delay is short enough USB pot BD pol is called often enough then due to the timing it's not possible for the state machine to get confused in this way so most software would never never fall into this trap but if there's enough processing being done between calls to USB deep hole then this error can occur so with this understanding of the issue zip from the open CN 3 project was able to produce some patches and these patches were included in the mainline and they completely resolved all my problems well that's just about it for this video if you're interested in finding out more check out the show notes and you'll find links to various things around the web and all the source code of everything I've covered here today and I want to give a big big shout-out to my 10 patreon subscribers and a couple of other individuals who've made major donations to the channel I really appreciate the support it makes a huge difference to what I'm doing here and one of the things you may have noticed is that I've been working on my studio set up and so hopefully things are looking a little bit better for you and I've got a brand new camera and so I'm going to be presenting everything in 60 frames a second for you to enjoy and if you're supporting the channel that's something you've helped enable and I really appreciate it and if you think if you're thinking of supporting the channel there's also Bitcoin if you'd like to donate via that method and for everyone if you liked the video give it a thumbs up and subscribe and leave your comments down below I appreciate every comment I receive good bad questions feedback whatever it may be I appreciate all of it and so hopefully I'll see you next time on the open tech lab you
Info
Channel: OpenTechLab
Views: 52,592
Rating: undefined out of 5
Keywords: sigrok, logic analyzer, usb, stm32, stm32f042, openocd, PulseView, DreamSourceLab, DSLogic, linux, open source, electronics, embedded
Id: 4FOkJLp_PUw
Channel Id: undefined
Length: 58min 37sec (3517 seconds)
Published: Mon Sep 25 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.