DEFCON 17: An Open JTAG Debugger

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so a couple of years ago I did run a Mac and I was working with the msp430 microcontroller this was before virtualization and gotten to the point that it's at today where you can actually share a USB device with your virtual operating system and it was just before boot camp came out so I had this back intosh it was running OS X it was Intel compatible and there was not a damn thing I could do to debug in msp430 on it because the msp430 requires a driver for its usb-to-serial chip sorry the msp430 JTAG programmer requires the driver for an USB to serial chip that's inside of it Linux had this driver OS X did not even if OS X did there's no Macintosh application for debugging the msp430 chips and the Linux one required a closed source library so even though you could debug a chip under GCC and Linux there was nothing that you could do to debug that same chip from a Macintosh workstation and then new chips were released which weren't supported on Linux they began to let the GCC so the gdb the canoe debugger library this closed source library that would allow for gdb to debug and msp430 they began to allow it to atrophy and they stopped patching it so today I have several msp430 devices that have been given to me by friends or that I've purchased which I cannot debug or develop for using the official tools because the official tools are out of date and I run Linux if I ran Mac or open BSD I would have further issues so today I suffer through x11 hell every time I get on stage just so that I can retain compatibility with that one piece of hardware without running Windows so to remedy this I've begun building my own JTAG adapter the open source and that supports as many chips as I can get my hands on in some cases this can be done with the official documentation the basics of how JTAG is supposed to work are very well documented it's very simple in the hardware sense so while it's a low-level protocol the method with which you send an instruction or send data is very standardized so these are the badges from neighbor con which some of you might have attended over the past few days it was a two-day conference that took place during blackhat but was a lot more neighborly and was free to attend and at neighbor con we gave away 180 of these just a quick dislike the quick disclaimer dealing with these things at such a low level will make you dyslexic I wasn't when I began working on this but then I had to deal with differences between big-endian little-endian for the ways the way in which a byte fit into a word and beyond that I had to deal with the the ordering of the bits within a byte because in some cases the most significant bit comes first and in some cases the less significant bit comes first even in J tagging a must before 30 data comes LSB first and instructions come MSB first or vice-versa I can't really remember right now oh and then some things occur on the rising edge of a clock some things happen in the falling edge of a clock in SPI you write your output on the rising edge new sample your input on the falling edge but then some SPI chips do it the other way around and is it's just it's just tell to deal with and then some things are inverted and some things are uninvited so for example the reset pin is actually reset with a bar over it or an exclamation point before it and that means that it's inverted so the chip is held in its reset state when the reset pin is low and it's in its normal non reset state when the reset pin is high this is done for reasons of physics and hardware convenience and so I'm going to screw some of this minutia up if you're trying to build JTAG based upon my spoken documentation at this conference you're going to get a couple of things backward so please double-check the minutiae in writing G tag itself is a testing standard the idea is that if you're manufacturing circuit boards or when this first came out computers you have many different chips on these boards and you have to know that the chips are wired to one another properly that you have good solder joints and all of that stuff so decades ago they began to use shift registers and the edges of the boards so that you could debug an individual board in isolation without removing it from the containing computer this was back when a single computer chip would be made of stacked boards rather than a single chip of custom silicon and when they when it began to build chips in silicon they needed this even more because if you manufacture a chip that's too small for a technician to stick needles into without extreme efforts and it's running too quickly for things to be caught in the scope but not all of these other issues what happens when you do a production run and it doesn't work you'll find yourself with 10,000 units that are utterly worthless no idea of how to repair them and rebrand them or or anything like that and more importantly you don't know where you screwed up so the next time you try it it's likely that you'll have the same problem so while in software recompiling takes a couple of minutes or a couple of hours for an extraordinarily large operating system in the case of hardware you're looking at a turnaround time of months between deciding that you made a mistake and actually getting to try the next iteration of your design so you need some sort of debugging both to verify that what you intended to put into the design was fabricated correctly in hardware and to determine that the software that you're running works properly because when computers began to come up in this form it was necessary to single-step and to debug the software that they were running even though the chips themselves had too little memory to store a debug monitor or an operating system like Linux so the JTAG machine exists within almost every chip for the sole purpose of debugging it and ensuring that it was manufactured correctly for programming it for all of these different things and because it has to exist in hardware for the hardware to be manufactured it's a very interesting thing to play with as a hacker because you can take the chip apart now you can write instructions on it you can debug software on it you can do all of these crazy things that are useful for development but also for breaking into things because you can say single-step a program on physical Hardware waiting to see what happens if you're trying to fuzz a wireless sensor you can run it at one one thousandth of real time and the device will still work the radio doesn't care how slowly the microcontroller speaks to it the application itself will chug along and so you can fuzz these devices and actually watch the stock grow just as you would in a simulator but on actual hardware without having to re-implement all of the hardware and simulation so you can use this to program advice to read the programming out of it if you if you purchase a device you can dump its firmware and then begin to analyze it and look for problems or you can clone it you can manufacture your own unit with similar hardware and then flash the original software into it and the access control mechanisms of this are only intended to prevent that they don't really care that you might be able to debug they don't really care that you might be able to view RAM all they worry about is that you might be able to copy the contents of flash memory into another chip and then execute that or sell it making a counterfeit product so this is my DEFCON badge and to hack this I decided I'd actually hack the Freescale JTAG standard because you can't program the freescale chips very easily from Linux it's the same situation that the msp430 had when i began to work on it so if you see these little testing pads on the back of the of your badge near the chip and you can look at your own badge for this you don't have to rely on the screen these are wired up this is a diagram that someone lent me in the hardware hacking village I'm not sure where he got it but probably from your CD and if you wire if you wire all of the pins up correctly you can then speak to your badge externally and unlike the bootloader which connects to the serial pins I think it's those two they're unlike the bootloader this never goes away so even though you might be able to damage the bootloader of your chip through software there's nothing that you can do to hose JTAG and if you put a fresh chip onto your board after the lecture will be blowing mine up in the name of science so I'll need to swap it out with another chip and when I put that fresh chip onto my board it won't have a bootloader and it won't have the software that it currently has it won't be blinking it won't be running anything so the only way to flash new software onto it is by JTAG and this being undocumented there's a little challenge in figuring out exactly how geotag works for it Freescale intends to publish it at some point but for now it's a black box that can only be had by an NDA that I can't sign because I intend to release my implementation as open-source so then once this connector is on you can run it to a JTAG debugger or an adapter this adapter will be connected to by your development environment whether it's a commercial ide or whether your use open-source tools the standard itself only tells you how to populate two registers and then send a trigger for execution so when I say that it's a standard I don't mean that chips can be read and written in a standard way so the the standard verbs only allow you to pass traffic from one chip to the next or to identify a given chip and there are a few others that are semi standard that might be supported in the case of the msp430 no other verbs are supported that are in the standard set so if you want to read or write memory if you want to blow the JTAG fuse especially if you want to write flash memory you have to understand how the chip it works how it expects to be told what programming method to use so to quickly review and I swear that this will be brief there's the serial peripheral interface the way that this works is that you you send one line to tell the chip that you're ready to speak to it and then it begins counting bits you send it eight bits and at the very end of it it might do something or it might just wait for more traffic and every time you send a bit for every bit that you Sun rather you receive a bit so it's always an exchange of information if you want to write one byte to an SD card for example you are also receiving a byte and it might be garbage but it is received nonetheless and that allows for a nice trick that I'll explain near the end JTAG works similarly except that you've got two registers instead of one and you actually mark when you're finished writing that register so you can send both an instruction and a data and the data value will be of a fixed width but the width is unique to each chip as is the instruction so while an instruction might be 8 bits wide of minimis before 30 on other chips it could be 16 bits and the data register varies more some FPGAs have odd widths 237 bits of width to the data register and you have to determine these with wits before you can begin to reverse-engineer the JTAG protocol of a given chip of a given family so the good fat is an open debugger supports the BSD license on both its hardware and its software if you need something more liberal just ask me for it and feel free if you want to make your own production run do it you have every right to I only ask that you send a couple of neighborly gift units to myself and I don't even legally require that as far as the architecture goes you've got C running inside of an msp430 this is compiled by GCC to compile the firmware link it install it and restart the machine to be executing it you just type make install it's quite nebula in that regard if you're used to the UNIX environment because you don't need to deal with the embedded development tools or IDs that are specific to each chip so you don't have to know anything about the msp430 which I use on this board in order to begin adding support for new protocols and then I use Python for the workstation code and the Python is written in a style that it's very easy to copy or port I don't use proper class inheritance or things like that because it it hides the code that you're looking for when you're trying to copy something and then you use it like this you say good Fett dot and then the name of the chip to choose your Python script followed by a verb followed by maybe a file name and a range so here I'm asking it to dump from data memory of a chip cotton radio to a file called foo from dead to beef or sorry from dead to death and having done that I wind up with this Intel hex file called foo and when I call the head function on it you see that it's all ones this is because I chose the region of R and that doesn't actually exist on the target chip but I could dump software in the same way and I can do it from multiple chips with the same adapter so well like well that was taken with the chip con radio that supports a non jtag debug in standard the same thing works with say the msp430 with a rather awkward JTAG standard or JTAG debugging implementation now the pins on SPI are rather simple you just have master in slave out which is how the chip receives input sorry how the the board receives input from the chip master out slave in which is how the chip receives input from the board a clock line and a slave select line you write data on the rising edge you read it on the falling edge and it's full duplex by necessity so every time you send something you have to receive something as well to send a single bit let's say we wanted to so if we want to send a bit we also receive a bit and we just want to send a single bit in a session let's say this SPI device was a single bit wide we would drop the clock then output our bit at this edge then raise the clock to sampling so both the master and the slave write some bit here and then they read it here and that's how they exchange their information so neither knows what the other is writing until the transaction is completed if you compare this to a regular serial line it's a lot simpler even though it requires more wires on a regular serial line you have timing so you have to know exactly how wide each bit will be you can't pause it in the middle you can't use non-standard timing you can't do any of the things that make SPI easier instead what you do is you keep the line high until you're ready to transmit then you send a 0 followed by your actual traffic so in this case we'd have the 0 here as the start bit and then just to its right this first upper bar that would be our one of the data a one bit of data followed by a stop bit which is always a 1 and then thing but you couldn't really tell where this ended on a scope visually I mean you don't know which of these would be ones this you don't know how wide this is this might be five bits Braille you know it might be eight and it requires each device to have a standard clock so if you remember dialing into bulletin board systems if you typed in 2400 a 10-1 that would be eight bits wide no parity bit one stop bit at 2400 baud and you had to know exactly how fast you are communicating JTAG doesn't care about speed you can run it faster or slower and the slave will automatically match to the master and you have two registers to populate you have an instruction register and a data register you don't know the instruction set and you don't know the register width until you actually look up the documentation for a chip which might be secret and it's implemented in hardware so there's no actual program receiving the JTAG sequences on the target device instead it is interpreted by a state machine and you push your way around the state machine by the TMS line they grab some water someone would grab me a beer that would be neighborly blue moon or Sierra Nevada so this differs from SBI in that you have to shift registers instead of one and that you actually say at some point I am done transmitting I'm done transmitting this register you never say that in SPI by the time you raise the slave select line you're saying this transaction is over I'm not speaking to anymore I don't care about any reply I'm never going to hear from you again so this allows JTAG to have more complicated commands and things like that the pins themselves are just output input there's a clock line and also a machine select line which is how you move around the thing and then there are two wire variants of this such as spy by wires by by wire works by having just a clock line and an everything line the everything line multiplex is between the four signal lines on the clock so you'll send one than the other than the other than the other the state machine itself is called a tap it has the two registers that we discussed there's also a run an idle state which is what happens when the chip is held a single stepping instructions if it's paused and in this state the chip isn't doing very much but it has time to work so sometimes when you're writing flash memory you might go through the run idle state at a particular frequency in a bit I'll get to how you can damage the chip by setting the wrong frequency and perhaps some of you can figure out how to work that into an exploit there's also a fuse which denies your access so if the fuse is blown in hardware this means that you're physically locked out of the device you are unable to read or write it you're not allowed to send any commands except for the one that forwards data on to the next device in the chain and if you can get past this fuse then you've broken the copy production of the chip you're able to access information that it expects to hold privately further if the fuses and software then instead of physically blowing something up instead you're waiting until boot time and then you're reading a value out of flash memory and that is setting a control register which denies your access when this happens you can sometimes break the chip by slowly ramping its voltage up or by violently jerking it when you started or by starting it without a clock dr.xu neighbor such that it never gets a chance to read the to read the few setting until after you've already done something malicious the state machine itself looks like this you've got your run check state which is a fuse check state where it actually looks to see whether or not you're allowed to enter this is followed by the run test idle State and you see ones and zeros in this diagram that's the value of TMS and you advance from one state to the next on any given TC k cycle so here where you've got a zero and then it's a bit obscured by the x11 garbage but if you see the zero with the arrows circling around it that's jumping out of run tests back into itself so if you want to stay in any given state you have to have you have to be in a state that allows you that luxury and then you have to know whether it's a 1 or 0 that loops back so from here if on the first edge TMS is high you'll move into the Select data register state then you can set it either higher to move into the Select instruction register state or low to fall down into the capture data register state and this is arranged so that you can actually navigate back to the run test idle state from anywhere not knowing where you are so if you were to fall into a random state you can move back to the beginning this isn't terribly relevant on a modern JTAG device because very few clock edges are missed but if you're trying to break into a device you can hit it with flash bulb and on those without protection sometimes you fall into a random state you might find yourself here in the exit data register state moving into update data register where it would maybe read a byte of memory by mistake or something but if you can move back to the beginning then you can start sending Klingon commands then you can start breaking into the chip the fuse check itself is performed in an interesting way you've got these lines here on TMI and you have to have two pulses and has to end high well why does it have to be two pulses and and we'll get back to that in a minute but what happens if you send zero and what happens if you send a million is it more likely to fail with one or with 100 and which likely which way is it likely to fail the instructions themselves are here you can read and write registers you can capture data values you can push things into different registers and everything else is done as a combination of these macros so in the documentation for the msp430 which is incomplete but still quite helpful you'll see that it sends the prepare blow fuse instruction into the instruction register then you apply the programming voltage which will be several volts more than the chip normally takes as is necessary to blow the fuse you follow that by the command that actually blows the fuse then you wait a millisecond you remove the programming voltage and the interesting thing is at the end you're told to reset because if you don't reset you don't actually leave the JTAG state and you won't even know that the fuse is blown because it's not checked anywhere except at the beginning so you can glitch it you can drop voltage very very quickly and then raise it very quickly so the reference voltages change and then whatever is being sampled is an input to a latching flip-flop is sometimes misread miss sampled and it will latch incorrectly so if you do this at powerup on a chip that reads the value out of a flash memory you might wind up with all ones they're all zeros held in that register instead I mean this is a probabilistic attack work very well on any given attempt but you can try it a hundred times a thousand times this is what it looks like on a scope so going back to the diagram and you can see in this line here the actual pulses that are required you are told to send to so there are two ways of implanting this you can either have it default to your failing the fuse check and then transition into passing once it succeeds such that if any of the pulses say that you're allowed through you become allowed through or you can begin in a pass state and then fall into a failure state such that if you try it zero times you automatically fall through so this in the first case you can just try million times and you keep on glitching them and if you ever succeed then in the end you you've broken your way into the chip in the second case you can just not try anything just never send edges and when implementing the JTAG adapter a second beer this is extra neighborly thank you kindly neighbor so because this is hardware and because it's implemented by a hardware engineer who's never had to worry about security in the way that that it's commonly dealt with an IT he's never had to deal with one of his chips being broken into and had to come and work come into work early on a Sunday morning because of it he really doesn't care as long as he gets paid at the end of the project and even if he does Kerry doesn't think that anyone's looking at how the JTAG implementation of his chip works the only exceptions being smart cards and even then they still make mistakes and they still design machines as if all of the rules of digital logic were followed when that's not true if the attacker can change the analog supply voltage and can change it quickly because these are analog transistors implementing a digital computer they still behave as analog transistors so back to the the state machine itself you can see that each of these two columns is used for populating a register and you can see that the two columns are pretty much just copies of each other but one is further out from the idle state if you wanted to add a third register you could just by putting it further out here and then there are primitive sequences that you use for these exchanges so inside of the firmware of the good FET there's a command JTAG trans 8 it takes in a byte and it returns a byte and all this is doing is swapping that by tau and this is called by two functions dr shift 16 and ir shift 8 which populate the 16-bit data register of the msp430 and the 8-bit instruction register of the msp430 the instruction register should actually swap the byte around because the the bytes are sent in opposite directions one of them comes most significant at first the other comes least significant bit first and from these you build higher level functions there's JTAG 430 set bc hulp cpu release cpu all of the different things that you would need in a debugger and these are documented but others are not I haven't yet figured out how to set a hardware breakpoint I haven't yet figured out how to read registers other than the program counter I'm working on it and I've got the recordings to do it but the point is that those things are not in the documentation and getting them can be a bit fun it's also worth noting that writing flash memory is very different from anything else that you might do over JTAG on most chips you have to send debugging waveform another debugging waveform a clocking waveform and it has to be of a specific frequency within a particular tolerance I'm sure you're aware that solid state disks were initially decried as unreliable because flash memory could only be written so many times and that this is managed by right controllers that actually map the physical blocks of the flash memory to virtual blocks that your PCCs to do we're leveling so that if you write the same file in the same location a million times it doesn't destroy the disk if your timing is wrong when writing flash you can actually destroy the disk early which gives you a chance to experiment with exactly how it's destroyed and I'll get to that in a bit this is how the program counter is set first you send the control register and you set that to 3401 this is just a magic constant but the individual bits are defined in the documentation following that you shift the instruction register and you you tell it that you want to execute 4030 to load the instruction register sorry to load the program counter from there you send these clock pulses and then you write in the value that you want in the program counter through dr shift 16 you've got to clear the line after that and then you have to shift in the instruction register command to capture the value that is held within the data register to the program counter so this is rather complicated and it's not how things would behave if you were receiving commands and software if you're receiving them in software you would just have say a single verb to say that you wanted to set the program counter then you'd have the value itself but there's considerable value in being able to automatically come up with this function for a given chip and it's not so complicated that it would be impossible to exhaust all possible ways of doing it now for writing the flash pulses on the msp430 you need three hundred and fifty kilohertz give or take 100 and writing this in PRC is quite difficult at eiu tries it in their implementation and between versions of their JTAG debugger the flash will actually be written and with different timing because the optimizations that the embedded compiler supports have changed this must be held for them to maintain a debug so in my implementation I went the other route and I did it in pure assembly and in the red which is quite difficult to see on the screen you can see that I actually counted the number of clock cycles these instructions would take to figure out how long each body of the loop took and I calculated that a three point six eight megahertz I had seven to 14 clock cycles allowed per iteration of the loop in order to get the right timing width my initial implementation was in C I thought it's a square wave what the hell does it matter what speed it is and in doing so I damage the flash and this this frequency can come internally or externally but in either case you have to set it somewhere you have to make the internal clock be divided by the proper value or you have to provide an external waveform of the proper value and this is a hell of a lot harder to figure out without having observed it because it might be too high or too low and I mean how many attempts do you get if you get this wrong seven times you've destroyed your chip you no longer have anything to play with so when they're damaged writing becomes sort of semi-permanent and it creates erratic writes on future attempts so I'm one of the chips that I damaged I wrote it's a de ad with slow timing my clock was 35 kilohertz instead of 350 and then I erased that entire segment including this word when you erase flash memory it goes up to ones and then when you write it you're actually clearing out individual bits to get zeros if I wrote anywhere else within the same segment suddenly the erasure for this word would fall apart and the next time I read it back I would see the original value that I'd written previously and then if I were to write anything else there the 2 words would be bitwise and it so if you write dead and beef into the same address you get 90 ID you wind up with a zero in the result if there's a zero in either of the initial values but this is after an erasure this is after the programmer assumes that the the region has been cleared and this would even have passed a verification for the erasure because just after the erasure it was once it would not pass a verification after the the second word had been written after beef was written um so it might be possible to use this into a sort of exploit on the msp430 they must be for 30 X 2 which is the very most recent chip from Texas Instruments you can actually kill the last word of flash memory not quite the last but near the last which controls JTAG access so I can take an msp430 and I can damage a word of its memory and then there's nothing that you can do to that same chip to lock me out of JTAG now working this into a meaningful exploit is rather difficult because I would have to do this to a chip sell the chip to your factory then grab the unit that included that chip out of the factory to steal your design or something like that and there are easier ways of extracting such things but it's still worth thinking about and then there are physical questions to answer about each chip for example how wide are the registers on the msp430 of the instruction register is 8 bits wide on the and the data register is 16 minutes wide on new or msp430 is it's actually 20 so it's handy to be able to figure out such things quickly and you can do this because the registers are cyclical it's shifting in on one end and out on the other you're writing the most significant bit as you're reading the least significant bit again with the dyslexia but it makes sense if you view it as a ring data comes in one side it shifts all the way around and then it comes out the end so if you shift a thousand zeros in that register and then you shift a single one end and you count the number of shifts it takes for that one to come back out you can determine the width of that register and it won't execute the instruction that you send it because unlike SPI you actually tell it when you're finished so you don't have to worry about accidentally shifting in a command that will destroy the chip because you're never telling it to execute the command so in this case you can just keep spinning around in this region of the data register and until you figured out how wide it is and you could automate figuring out how to read from memory by flashing a chip with a particular value at a particular address using the official tools then running back and trying every instruction until you can get it back out and you can automate this for reading you can't automate it for flashing because of how complicated it is and how many chips you would destroy in the process but if you're unable to do that you can still look up in the documentation of the chip how to write flash memory from inside of the chip how to write a program that'll write flash memory and on the chips that allow you to execute from RAM which are the minority but not a small minority you can just write a program to write to flash memory throw that program into RAM and execute it there on some chips officially you're even supposed to write flash memory by debugging in individual instructions it's telling the CPU run this command run this command run this command run this command while they're all CPU instructions not JTAG instructions so because it isn't very elegant and because software can't keep up with sniffing the traffic you can use either a logic analyzer or a tap emulator this is a logic analyzer recording of an msp430 being written by the spy by wire protocol every one of these clumps of clock edges is actually a sequence of four values being written because on each edge you choose one of the four lines and you write it out to the target so if you decode this by script now a friend of mine wrote scripts to handle it is quite the neighbor to do that for me you can actually dump it out and see what's being shifted in and where and it's also possible to build a tap emulator if you had say 10 ships or let's say 2 chips of the same model and you wanted to program them at the same time you could just connect them in parallel but leave one of the tdo pins disconnected then you could send whatever traffic you liked into the chip the commands to read and write flash memory and while you will only hear responses from one of those chips both chips will be written so the same way you could run it out to a state machine that does nothing but capture the transactions and hardware and then flush them out to SRAM and when you're done programming the chip or when you're done doing a particular small action on the chip you can read the contents of that SRAM to figure out exactly how the command that you asked the debugger to do was performed and in doing so you can reverse-engineer the JTAG protocol of a given chip so prototype boards are available for free they can presently program the msp430 the chip con 2430 and a few other chips as well as spi flash roms and ITC eproms the Eagle CAD schematics and layout and Gerber's and firmware and even the client are all available for free under the BSD license you can grab them from good fat sourceforge.net and if you'll do we have the question room yeah okay so if you'll follow me to room 106 we a neighbor and I on the hardware hacking village assembled a stun gun out of a camera so we're trying to see exactly how much voltage you can run through the JTAG pins without damaging the rest of the chip we we haven't quite figured out what this will do so it might blow up the chip entirely or it might just break the JTAG pins but it stands to reason that if you could right if you could burn out the input and output pins of the JTAG state machine then when an attacker tries to break into the chip through JTAG he'll have no way to communicate with it the Ember em 250 vulnerability that I presented at blackhat recently could be fixed by this because you can't connect to the chip externally once the pins are broken in any case sparks are going to fly and this thing shocks like living hell yeah so room 106 and we can dump that and then blow up a chip thank you for your time and be neighborly
Info
Channel: Christiaan008
Views: 19,320
Rating: undefined out of 5
Keywords: jtag, lecture, programming, usb, debugging
Id: k3ac5iBcfnQ
Channel Id: undefined
Length: 42min 4sec (2524 seconds)
Published: Sun Jan 16 2011
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.