FPGA simulated on a GPU - GPURTL Google CTF Finals 2019 (reversing)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this video is going to be a capture the flag video write up of a challenge from the google ctf finals 2019 so this was recorded quite a while ago the challenge is a hardware challenge it's ultimately a fpga bitstream reversing challenge but it also has a gpu component to it and a rust component to it and it's a very interesting type of software and i had a chance to sit down with the challenge author to talk about the software how it was created and like what went into it and i also had a chance to talk to goonvale who you hopefully know from his excellent youtube channel who also does crazy security uh streams and videos and ginwell was testing the challenge so it's very interesting to hear his approach when trying to solve it the interview is quite long as you can see in the minutes of the video but i think it's a very interesting conversation the ctf challenge is of course interesting in itself but i think there's just so much more around it that you can learn by just listening to the author or goonville describing certain problems and how they approach them i personally get really a lot out of just people talking about the stuff they are doing and it's not super condensed but i still think it's very very interesting and insightful so i hope you enjoy this style of video and let me know if i should do something like this in the future again so you are the author of the gpu gpu rtl challenge yes [Music] yes so the name includes gpu which i assume has something maybe to do with shaders or gpu code and then we have rtl which might be related to transistor laundry maybe yeah very good um rtl specifically stands for register transfer level in this particular context which is relevant usually when doing fpga design which was the inspiration for this challenge my general goal with the challenge was to try and develop something that a traditional disassembler just can't deal with okay and i've done a bit of hardware design on fpgas before and so i know how difficult it is to uh debug what fpgas are doing and there's whole suites of proprietary tools that that do this um and so i thought it'd be pretty cool to go down that route and make something that looks a little bit like an fpga in fact what i wanted the the goal was to have a reusable um a reusable thing that could simulate fpga designs any fpga design i wanted to actually make it a useful tool beyond just the ctf and then make a ctf challenge out of it as well all right and all of this running on a gpu because yeah it's more difficult also parallelism i mean fpgas are indeed parallel devices and that works out with the way the shaders are set up that it can in peril yes so it runs everything in parallel which didn't come without its fair share of uh issues which i can uh talk about later i can imagine yeah so uh goonwell i believe has done the test run yes and he showed me uh a part of like how he kind of approached this challenge and with like a weird setup so i'm very curious now to see kind of like the other side of this uh okay so he probably showed you the the big graphic which showed how various things were connected and you can see the various blocks in the program no he didn't so he he showed me kind of like uh it was very colorful and it was like and on when x was the time and uh each clock cycle was okay thing there and then he printed the register outputs of each uh lut and and color coded how far they are away from the output it looked crazy but there were some interesting insights from from that do you synchronize it with like a clap or something well now you have a cloud you can synchronize it perfect yeah so this is a picture from when i sold the challenge and you can tell me what the challenge is about by looking at this picture okay wait so the challenge that you download is that a regular binary to run okay that's a great question you get a binary you get a binary file which you have no idea what the format is and what is it about and you get a lua script okay and the valua script has skewers like inputs outputs and correct output and pins pin numbers ah and yeah that's basically what you get and this is a picture i got when it was i was close to solving the challenge okay wow okay okay a couple of guesses could be like a memory dump of some sort but they're also lines so it could be um a time uh diagram of like if you say like pins i know i'm trying to see like there are some variations in those so i don't know is this zoomable or is this like is this already a perfect this is uh yeah it's quite long actually okay but that there are different colors so that's why i thought it might be memory and the different colors just define like certain uh different values values of it cool i think you're super close actually so to keep you from grassing the challenge was basically an fpga simulator implemented with rust and gpu so in shaders basically and what you get the binary file is the bitstream for this fpga and the fpga has over 2000 lookup tables or like logic cells whatever you want to call them and each one has a basically four inputs then a lookup table two outputs actually one output which is a normal output then the other is a register like flip flop output and what you see here on this image is basically the register values from all the gates uh so there would be there isn't 2 000 over 2000 here because it's already only the part relevant to actually the solution diagram is like this because i have absolutely no idea how to solve this kind of charges and this is the second time i solved a challenge like this on the first time i used a similar method and yeah and this time i also ended up using this method and it's actually pretty good because you can on this picture if you know what to look for you can deduce the actual algorithm which is implemented inside the bitstream and from that it's pretty close to get the social network can you say again what what what is it what what are the axes again okay so this is time the y is time the vertical is time the horizontal large blocks are i think 32 pixels high are a cycle like a clock cycle in fpga and each line inside the cycle is a micro cycle which is basically propagation from one register to ever as in one gate calculated already the result and the result propagates to a navigator which is connected and another and another and another and at some point they stabilize for the cycle and that's what the last line basically image cycle tells you wait uh okay i i don't get like wait are these are these not different gates encoded in there or uh each each pixel each column each pixel is one gate or one register but it's actually a register which is at the end of the game yeah okay okay and so these changes here so when it's like on or off it's just like yeah lighter or longer it and this was time so it was like on and it turned off at some point and what what what does the color represent the color represents the distance from the output meaning the purple ones being the output and this is distance one from the output distance to distance free and swarm so this is basically all connected about good pins and these ones are next to it and so on and so forth so this one for example the orange ones are really deep inside the network looking from the output side is there a particular reason why you didn't sort it for exactly that uh so it's also sorted by weapon number and the reason i didn't sort it was because i wanted to have easily be able to figure out which pin is it ah okay and because you are searching for a particular pin yeah i'm basically the output pin or is that is there like a single output bit at the end actually as you can see you can see how many outputs there are by the uh size of this right so there are i think 256 or 128 outputs something like that okay so basically to restart the about the challenge right it's like this you get the fpga simulator right the first thing you have to do you have to figure out it's an actual fpga simulator then you have to figure out how does it simulate the fpga and what is the bit stream format in the file once you have that which is actually the one part is reversing rust binary the other part is reverse engineering compiled shaders and then you get the bit stream which after you can decode it and you know it's actually like the lookup tables and how they are connected you approach the real challenge which is figure out what does it do you know where is an input and you know the input is basically the flag split into fits right and you know the output is going to be some value which is 256 bits long or something like that right and you have the correct output at the end which you have to arrive at for the charge to tell you you succeeded and it's definitely not just a boolean logic because you were also interested in the cycles and that there's an algorithm implemented so it's not as easy as doing some correlation of like flipping input bits and seeing what happens exactly exactly and and honestly i have absolutely no idea how to reverse a bit stream so this was a learning experience for me yeah so to get the data that you used for this graph that you had to instrument the simulator to excellent question i actually had to implement it in python okay so yeah and i've written a lot of code in python to dump different things to get the graph of how the gates are connected to try to reason on that to get the lookup tables to try to convert the lookup table back into a boolean logic function uh or like an and like uh set of if else statements are tree of if else statements and based on that based on looking at this color for picture and another color picture i can show you it only shows the cycles there are no micro cycles here so you can see that in this output column right the color encoding is identical um you can see that first it does something weird with the first 64 bits one second 64 bits and so on so it tells you basically it was the how is the input processed and how is the output process then you know about this 64-bit blocks right you know there are four rounds and you can see how many cycles one round takes and you would find out that it's actually 32 cycles and based on that information if you actually put what i already told you in google like 64 bits 32 rounds 32-bit variables you get the name of the algorithm which is used here it's a cryptographic algorithm an encryption actually and based on that you can do a basically a known plain text attack to get the key and then be able to reverse it and get the flag 24 hours but how that's how long it takes to test it while learning i'm pretty sure people who actually know how what to do in this case would have it easier color-coded how far they are away from the output it looked crazy but there were some interesting insights from from that well that's that's such an awesome way so that uh answers my question i had to him about that because i didn't understand it particularly well either but uh yes that is actually one way if you're doing fpga design uh how you would approach sort of debugging an fpga is looking at how the values progress through a system like that and visualizing it as a colorful graph is a really good way of doing it but yeah let me show you the the other side of the coin yeah so more of the uh the source code level stuff yeah let's see so this is the actual source code for the for the project so what the challenges are actually given is a binary called gpu rtl and then two additional files ctf.bin and ctf.lua and what they look like and i'm gonna totally spoil it uh by with the file names here but uh yeah so t dot bin and t.lua are the uh original file names um so this is what the lua script looks like for starters there's a bunch of variables it's just addresses really and then the main logic is in this execute function which from what the challenger can see appears to write some input bytes to some bits which was which was defined further up uh along with the input oh and you can see the solution here uh it then runs a bunch of iterations until it is done and it measures doneness by reading an output bit there's also a check to make sure that we're not hitting a bug because that occasionally did happen uh and then we read the output and check it is it what we expect or not are these bugs related to are you simulating also the hardware behavior of like you know like rising edges slowly and that might glitch or so yes so the the basic idea was that in order to try and make the performance better i cut a lot of corners in terms of the correctness of the implementation which on the intel gpu of my laptop works fine on the nvidia gpu of my workstation did not work fine um and so hence why i now have that check for convergence failure but that uh the current version of the challenge and the one that the challengers are are solving does not hit that bug on either platform okay 100 of the time so it is a correct albeit slow implementation plenty of a chance for optimizations but yes this is generally what happens uh and as you can see that that is indeed the correct input to make it work and then um this is the bitstream yes the top of the file you have gpu rtl then pc which i can't quite remember what those those values mean and then you have what appears to be four bits and it's an eight and there's some other metadata and then you get this block which all looks very similar or the caca08 and it's fairly obvious hopefully that they're in blocks of uh four bytes so 32 bits and that becomes quite important because uh is it 32 bits or 64 bits i can't quite remember we'll look at the source code in a moment that's this is the programming for the lookup tables in that are being simulated on the on the gpu so so this is actually generated code from a hardware description does this look similar to other bit streams of actual like commercial fpgas like obviously you didn't implement everything but like is it similar like could what i'm asking is could somebody who is experienced with fpga reversing could maybe are kind of recognized a little bit already quite possibly uh i haven't really looked into detail of how commercial fpga bit streams look like but i was basing it on the model of you synthesize your design and then generate a bit stream this happens to be what the gpu rtl bit stream looks like but it should in theory be very similar to commercial fpgas as well in that we are programming a bunch of lookup tables which is encoded in this logic and then we are specifying so while a typical fpga specifies how wiring is how each each crossing of wires has a programming bit associated with it on an fpga that doesn't really make sense here so instead we have a dressing and i'll show you how that works when we have a look at the shader but that it's basically the same pattern all the way down uh until the very end where somewhere along here and i believe it's it sort of starts here we then have the jump table so this was to implement uh long jumps across the memory of the of the gpu uh which you'll see why that's that's relevant when we look at the shader i think the right thing to do now is actually have a look at what the shader looks like before it gets compiled into horrendous whatever it is so it's this so this is what the shader looks like we get some programming which is 64 bits so i'm slightly wrong there we've got a data buffer we have the programming buffer and we have a jumps buffer just as i explained comes directly from the uh the binary input we've then got some constants but they're not particularly relevant here and what we do in the main function is we unpack the programming at an index which is defined by the current iteration we then for a bunch of cycles uh do this lookup operation storing the result back into the data buffer every time and then we've got some memory barriers and an attempt to make it converge better but turns out they didn't really work so we had to do something else and then the last bit is a copy into the register when there is a rising clock thing mentioned okay um i've never written shade up i have written a little bit of shader in in computer graphics class and there wasn't i don't remember there was like a main function it was i feel like it would look differently so uh if you're writing graphics code on a gpu you're writing a vertex shader and a fragment shader okay typically and what those will do is they will take an a buffer of vertices and manipulate them in some way and the fragment shader will take a fragment and manipulate them in some way and manipulation is things like calculating the color of a fragment or moving a vertex or applying texture coordinates things like that this is a compute shader okay and compute shaders are the most generic form of shader uh they just have a main function and then they do stuff okay and so you specify work groups that's sort of what the the overall layout of the shader looks like is is a shadow program a compute shader like this uh executed on each uh like uh gpu core is this like do you have to think about this in massive parallelism yes exactly so this is a this shader or this block of code runs in parallel on every compute unit on the gpu and that's really why gpus are suited for this kind of task because each uh each element that needs to be calculated is independent from the others okay now my only confusion is left is with the orchestration like how do you tell which one should calculate which part of the right so that all comes through these these index um well i i've defined them as macros here because i was lazy but uh so if we have a look at config index which is defined up here it is the global invocation id dot y times the work group size in x plus the x uh global invocation id what this effectively means is you get a number from zero up to the size of the data buffer because that's the way that i've programmed it and so that is the index that this particular gpu is working on and it will the gpu will execute one of these kernels for every one of them the numbers every one of the the so-called work units defined by work groups uh which is how um compute shaders work all right so what i've actually defined at the top of the file is that you've got a local size of 256 which corresponds to the block size in the binary file and then we increment the global y for every block so that's why that particular expression makes sense in this case i could have also done it in x but i didn't so then the lut index there is an offset and then you have twice the config index why two times the config index because every cell is actually made up of two values a register and a lut and a lot one changes every every cycle it gets updated every single time the register one only gets updated when there is a rising clock in order to try and simulate how hardware would work obviously this is an oversimplification of hardware but it worked for the challenge yeah so then we look at the the lookup function which um so it looks at a bunch of places in the data in the data buffer pulls them out and we use the four bits that we have selected from places in memory to select one of those 16 bits and that then gets written into the as the data value which is very similar to how fpga lots work and then there is some special stuff in get bit address to deal with jumps so the basic idea was that the majority of lookups from a lut will be local they will be very close to the local the local cell and so we should optimize for that case and so if the value is less than uh 0x800 then we assume it it is defined as a relative lookup so it's actually relative to 400 is the current value anything higher than 400 is to the increasing memory locations anything less than 400 is lower and that that's where you see the lut index line so this is just uh code speed optimization uh code size optimization more than anything uh so unfortunately you don't have to use a full address yes so i don't need a jump uh a full 32-bit jump address for every single bit yeah of every single lut okay i only need them for the ones that are jumping across lots of memory yeah um i don't know how much of a difference this makes it probably actually slows down the program but it was something i decided to implement so that's how the shader works and this is what uh they don't get the source code but they have to compile trader and they need to reverse engineer yes so this shader is compiled into spear v which is the vulcan intermediate representation of shaders um i actually don't know what that looks like but it is reversible it's designed you know there's nothing special about it it's just it's a standard compilation step uh you do need to understand spear v to understand what this is doing there yeah okay and based on reversing this they will understand what the data the bit stream will yes or at least how it is set up yes so understanding how this shader works directly influences how how to understand that bit stream yeah now i think it's probably a good time to actually look at the gpu rtl binary itself the binary is written in rust and it was compiled with uh in in release mode but otherwise there was no special obfuscation going on uh the reversing of the rust binary was meant to be not easy but at least straightforward it's not meant to be the majority of the challenge turns out that reversing rust is quite difficult but it was it's still not the the exciting bit of the challenge the exciting bit is that the bit stream reversal um so i can probably show you some of the source code there's not much that's interesting in it is it just uh the apis for the graphics to set up the gpus basically and then just i don't know i said like shared memory to for the gpu to get the bit stream exactly it's it's just a lot of setup and glue code really so here you see that there's a this is sort of the high level implementation of the simulation so we set up a bunch of buffers we then set up this iteration which copies the input ports runs a bunch of cycles copies the output that's one iteration and then we pass all of this into lua and then let the lure script execute it so this is again just glue code to get the lure thing working and then these are the methods that are available from the lua code itself so we saw them before read output bytes read output bit write input bytes write input but the current iteration count and run an iteration very straightforward uh i could show you the uh gpu code as well uh this is the sort of low-level vulcan programming it's not particularly nice um it's mostly just i assume you could get this almost as a template yeah pretty much uh this is just a lot of glue and remember i was writing all of this to try and be as generic as possible i want this to be a useful thing in the future as well i don't just want it to be for the challenge and so at some point i hope to open source it should be nice well i mean i guess if this if this goes live on youtube then uh people can already take it um what's the license should i mention uh so there's nothing particularly exciting here it's just a lot of glue code really yeah so that's the binary and hopefully once teams have looked at the binary and worked out how it works and have seen that shader and reversed the shader they can start to understand the bit stream and then they will perhaps want to write some sort of decompiler of the bitstream uh to turn it into a sort of mem a model of a netlist of some kind at which point it becomes an fpga challenge instead of a gross gpu rtl challenge um [Music] at that point you just need to reverse the the hardware i guess yeah but one thing that's really cool and one thing that at least one thing i find to be really cool that uh challengers will never see is the code that took the verilog and turned it into the bitstream and i'm very proud of this code it's not complete yet uh but it is it's all of this stuff in the synth directory i thought you were about to say that you're you're glad that they don't see it because it's embarrassing and ugly but no you're proud of it no i'm very proud of it uh it it was fairly difficult code uh so let's have a look at t.v so this is the actual verilog that generates the bitstream for the challenge so you've got a top level module takes plain input plain text and a clock outputs the ciphertext and done and it's got some wires anybody who understands verilog will know what this does and then there's the module itself which is this and then there's some state tracking the rest is fairly generic code in particular you may notice this line here is that the key that is the key for the encryption algorithm which if a team is paying attention and is good at bit stream reversing they should be able to figure out that there is this ascii key being used um because this code isn't very well well optimized so gunman told me he tried to find the key somewhere but he couldn't find it at all do you believe it should be in the compiled stream recognized so it might not be very easily recognizable but it should there will be constants used that may be hidden as lut programming so it may not be immediately obvious um i don't know yeah it might be obvious it might not be but uh do wait you said you you direct your your code directly takes this verilog code and turns it into a bit stream yes do you apply any kind of optimizations there are some optimizations run by the the tooling that i use which is yoses okay which is a synthesis tool okay because i was uh with grenova earlier uh theorizing that maybe there's some compiler step that sees that those are constants and just like really ugly like yeah puts it somewhere it's quite possible that that happens which makes me a bit sad because it would be nice to to have the uh the string pop out at some point but it's also the kind of thing which you would only find at the end of the challenge yeah maybe this should be an extra challenge for the the challengers find out what the key to this algorithm was yeah um which is quite amusing yeah you could have turned that into a second reversing a second reverse yeah but that is indeed the key and it does sort of give you a hint as to which algorithm is in use it's the ta tiny encryption algorithm ah okay which i chose because this is the encryption algorithm it is those two lines which was very nice observing that this bitstream appears to encrypt things in 64-bit blocks and appears have a very simple small state they may very quickly understand that it's an encryption algorithm and a little bit of research will lead to the tiny encryption algorithm so that is a totally viable way of solving the challenge as just recognizing that it is an encryption algorithm with a very simple step um but you still need the key and that will be the difficult bit now uh that actually leads to a very good point as soon as the a challenger has realized the tiny encryption algorithm then they know somewhere within this encryption algorithm there is a key and they know what the key will look like so they can actually reverse the bitstream to figure out what the key was and as soon as they have the key they don't need to care about any of this code anymore they can just write use the c implementation which is available on wikipedia yeah and do everything on the cpu at that point yeah that's a totally viable way of breaking this challenge and one that i wanted to make sure was still available do you see the different kind of options for other kind of side channel ways of looking at this so goodwill was re-implementing the whole thing in python [Laughter] but i don't know i think um yeah so a re-implementation of the the shader component is a very good way of getting introspection into what the code is doing yeah because code running on the gpu is typically very difficult to access yeah um however i did provide the lua interface as a very easy way of um sort of manipulating what's going on so i expect some teams will be writing crazy things in lure that like dumps the entire state of the memory every cycle and works out what goes on in terms of side channels i'm not sure um [Music] i don't imagine there are any timing side channels i just imagine looking at the structure but the lua script is the trigger for each cycle that is executed yes okay and so that's what you're saying they can get the mem the gpu memory or what's going on yeah one could imagine that there might be some side channel attacks in in that way by i don't know yeah you could very quickly see the the the structure of what what the encryption algorithm is doing every clock cycle yeah and that isn't totally intended um you it should be possible to recognize the the encryption algorithm recognize the block length recognize the operations in use and from that work out the encryption algorithm so it's a well-known encryption algorithm it's it's not something that you need special knowledge to have to figure out okay so that's what the verilog looks like yeah and then the really cool bit is the synthesis tool the makefile does is it takes the verilog initially runs it through yosis which is a synthesis tool for fpgas yosis then uh optimizes the entire netlist turns it into uh so remember the the verilog code has things like multiplies xors uh additions all of this sort of stuff yeah yosis turns it into lookup tables and registers only which is great because that's what i implement at which point it writes out as a bilif file i then take that b lift file and generate the bit stream from that with custom code custom python code uh as you can see in this the second step then i can quickly show you what the the synth code actually looks like is place the interesting one no place is just the top level so what we what i did is uh wrote a bunch of code that takes the b lift file and turns it into a python representation of registers and lookup tables along with the where the variables came from and then can also spit that back out as a bitstream so that's how all this works and also i had to calculate the uh the jump addressing because that's that's the primary goal of this program is to calculate what the jump address actually has to be is it relative is it a long jump etc uh the main idea of this and this is one thing i didn't get to implement in the end is and you can see it here annealing i wanted to be able to take the b lift file and then optimize the layout of the lookup tables in order to reduce the number of jumps required because that shortens the length of the program yeah but i couldn't quite get it to work in time so you have the unoptimized version instead which is a little bit more homogeneous and easier to read but that's how that that bit of programming worked and it does indeed work well annealing doesn't but the rest does crazy do you is your plan to kind of get it to like a level where i mean it can already uh run pretty much any fpga very low file yep anywhere uh any standard verilog it should be able to yeah do you think with the especially running it on gpu is actually is it much it's probably much faster than simulating on a cpu um with with optimization yes definitely it should be uh the architecture is a lot more amicable to running this sort of computation yeah the the one thing actually this is probably a good time to talk about it the one thing that uh came up as being a problem time and time again was convergence okay because are on an fpga you have sort of the guarantee that when it when a clock cycle has rolled through the fpga every register has changed approximately at the same time what this means is that the updated values that the lookup tables on the fpga are calculating will all become available a fixed time after that clock cycle this is what timing analysis does uh with a gpu you have no guarantees about which work group runs when okay uh you saw a couple of memory barriers uh the thing you didn't see was the changes i actually had to make to the vulcan implementation uh i have raised bugs to put pipeline barriers in the right place what that means is that the gpu actually has to stall finish all of its work hand back to the cpu before it can start the next cycle and that is slow you don't want gpus doing that but there's no way of getting it to work correctly otherwise okay and it doesn't so do all the ins compute instances uh share also one is is that all the memory shared yes memories all shot and you can't like implement some kind of like new text through that where every like every computer instance has to raise a bit that they are ready or so nope can't do it um because on intel that would probably work just fine because of or at least the intel open source drivers on linux on this particular machine it would probably work fine but it seems that nvidia is a lot more happy about running one work group to completion so what it would do is it would run the first work group this is speculation by the way it would perhaps run the first work group and which would wait on this mutex and it would just sit there waiting forever because it would never schedule the others and that's a totally valid execution model yay concurrency um but yeah so there's plenty of optimization that can be done i'm sure yeah and i imagine that it would become quite a useful tool for simulating large hardware designs because that is a significant problem when simulating fpga designs is that the cpu it's a serial processor it doesn't have many cores it's difficult to simulate things like that um but you then have uh concurrency guarantees on a cpu that you don't get on a gpu yeah so there are trade-offs and i'd really like to see this project become a useful tool in the world of hardware design um but we'll see yeah i would love to be able to spend some time on it some some proper time yeah but as a sort of 20 project here at google yeah uh i think that there's a limits to how much time i can spend on this yeah but that's that's interesting that's an exciting project i really hope that teams will solve it at some point too yes i hope that and hopefully they can also talk about it and it wasn't a remote player or so yeah thank you so much anything else you need from me no that was great cool so the challenge was solved by one team and you had a chance to talk to them so how did it go for that team so uh the one team that managed to solve gpu rtl um proceeded just about how i would have expected them to at the beginning they disassembled the binary they pulled out the shader code and run it through a disassembler which actually funnily enough gave them a glsl code very similar to the source code which i'm impressed by the disassembler that they used and then they quite quickly worked out that it was running this shader on a sort of data buffer they actually decided to implement a simulator in python um so that they could introspect the program a little bit better it's interesting because it was also the solution that gunwale chose when yes exactly so it looks like people are a lot more comfortable with probing values on a cpu than they are on a gpu but uh what a surprise um but they did use the lower code they augmented the lure code in order to work out if the simulator was correct or not but they eventually managed to extract a working simulation of the entire program and they quite quickly worked out that the the hardware that was being described within the gpu rtl was effectively running one of two potential sort of functions in a feistel structure which if you're familiar with crypto cryptography you know fossil structures are a way of creating see if i can get this right creating a block cipher from a one-way function it involves exoring things in two blocks you accuse them and swap the values a bunch of times for n rounds and with some further analysis and by using their simulator they found out what the black box function for each round of the physical structure was they didn't analyze it any further all that they needed was that function that allows you to to implement the physical structure because with a bit of knowledge you can use the exact same function to decrypt instead of encrypt because of the way that physical structures work so once they had that black box function they just wrote a decrypter and decrypted the the expected output back into the input without knowing actually what that function did at all so when i went and chatted to them they didn't know what algorithm it was they knew it was an encryption algorithm but they assumed that it was some sort of custom encryption algorithm uh so i i gave them a little hint on the actual encryption algorithm and use the tiny encryption algorithm told them yeah there's a reference code on wikipedia if you want if you want to verify what what your thing was doing but it was really cool to see that they had enough experience with cryptography to recognize the faisal structure and then realize that you don't actually need to analyze it past that stage as long as you've got that the um sort of iteration function that black box function that they pulled out then you can implement a decryption algorithm very simply because the key was static by design so the function didn't change with the input that was to make it easier and also to allow this sort of attack so really cool how they they didn't actually need to reverse the entire algorithm they just worked out it was a physical structure here we go so in the tiny encryption algorithm the key feeds into how that that one-way function actually works so they didn't need to care about the key at all once they had the function they could reverse the the faisal structure and just decrypt so they built a decryptor without knowing the key which is quite cool okay awesome thank you [Music] you
Info
Channel: LiveOverflow
Views: 49,749
Rating: undefined out of 5
Keywords: Live Overflow, liveoverflow, hacking tutorial, how to hack, exploit tutorial, gpurtl, google ctf, capture the flag, gynvael, gpu rtl, shader, fpga on gpu, fpga reversing, bitstream, verilog, hardware reversing
Id: 3ac9HAsfV8c
Channel Id: undefined
Length: 43min 3sec (2583 seconds)
Published: Tue Aug 25 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.