Behind Enemy Lines - Reverse Engineering C++ in Modern Ages - Gal Zaban - CppCon 2019

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello every month you can hear me okay a my presentation name called behind enemy line repent reverse engineering C++ in modern ages my name is Cal I'm a security and vulnerability researcher I do a lot of reverse engineering and specifically reverse engineering for C++ I also like to play the clarinet and so close and this is my Twitter handle so we are gonna start with binary creation and whenever you have a source code you have their compiler the linker and then it creates the binary and but what if you would like to reverse the process if you would like to go form a binary to a code so the process is called the reverse engineering there is some difficulties that happens when you do reverse engineering for C++ code so the first is that local names and types don't appear in the binary since after compilation you cannot have both all of those things unless you compile in the back mode and then some of those things create symbols and then you can have partial very certain types one more thing is that optimization that the compiler does is kind of very complicated when you look at a binary afterwards because there is a lot of stuff that are very complicated to understand if it's like the user who wrote the code and this is the reason it looks like that or was it a compiler or responsible for this behavior and so stamp there is something that with time becomes more and more clear the other thing that I want to mention is static and dynamic analysis static analysis means that you have a binary and then you look at a code without executing it so you mean you look at the assembly and all we do is understanding the logic and without executing dynamic analysis is advocating the code and then look at assembly while being executed and under the logic from them one more thing I want to mention is that there is also hardware reverse engineering and this talk is going to focus about software builds engineering but also how where is something you can reverse engineer and you can they understand how I both look like and stuff like that but I will presentation will focus on software so what is it good for there is a many purposes for reverse engineering the first is vulnerabilities whenever you have a binary anyone understand and find vulnerabilities sometimes when you don't have the source you need to reverse engineer find bugs find funner abilities try to exploit them and this is one thing you can do with listening the second thing that you can do is to understand a logic of the code or an algorithm in order to understand the in solve complex problems like synchronization optimization and stuff that you probably know from your day to day job so the label process what happened is that you have a binary then a disassembler can create assembly code form the binary afterwards you have a decompiler and a human the d compiles that there are today are mostly compatible with C but we C++ there is still a lot of problems that happens with the compiling code specifically you have all the polymorphic classes and also you have the virtual calls it's very hard to create a correct and full the compiled code so you always have to to go over everything by human so the work is waste of the compiler basically the compiled code and assembly all together and and what is the most effective way to learn a birth engineering of C++ you would all answer of course playing chicken in Veda but the port means that I'm not really good at that game I can't win without cheating so we are going to use the knowledge as we have in reverse engineering or in order to win that chicken invader game so what do we have we have a binary of the game we have the tools for disassembling the we have the knowledge our knowledge in C++ and we have some knowledge in reverse engineering I will - soluble engineering kinda at a binary ninja and either specifically we are going to use either this is my favorite tool and also it is very comfortable to walk we sign up so assembly basics it's specifically for M SVC a compiler because this is what we are going to see during the talk circles we have the important registers and the important registers are Alex and it's that stores the return value of functions whenever you have a function call the value that returns from the function are going to be stored in arrays and I will mention also that the differences between Alex and IX is that Rx is the excess 64-bit register and e^x is their 32-bit register so it's also depending on the architecture that the code was compiled to and the second register that's important to know is our six it stores usually the pointer to the object and this specifically for MSB C but we are going to see that a few times during the talk and one more on this day is RSP RSP is the stack pointer it points to where we are on the stock EBP is also one of the resources it's important to know this register stalls the pointer to the base of the stack frame another last thing is that MCC first called calling convention and so it use a registers to pass the parameters to functions so when it is possible it's also important also there is some a types that cannot be passed for registers for another unique pointers and checkpoint share pointers will be patched pushed into the stack and so these are the latest the registers that are being used when a first call is being used and after all the other register is being used and other parameters will be pushed onto into the stock okay so now after we did a small introduction to reversing and what we need to know about MSV C compiler so now we are going to start and understand about reverse engineering of C++ so we're gonna start with dynamic key okay so first we are going to start with dynamic object creation this is a simple code it doesn't do much but it's a class and named person and at the end you have a name the most important part is this line the creation of the object the first part is the new that called calls operator new and the size is the parameter that is being given to the part to the to the function and I will also mention the with name that you can see there you can see like GZ underscore and operate on you the reason is the GZ these my initials and whenever I change the name of function I add my initials in order to under to see the differences between symbols in binaries and the functions that I changed and so this is when you see during my slides that is GZ in the beginning of function names this is because this is a name I changed the second part is person person is like the calling for the constructor also in here you can see my initials but a weird first letter J it means that the function that I changed that was a GC person constructal was wrapped by a function that just called it which is a jump function so in order to represent it either disassembler totally use ad J at the beginning of functions that just call another functions but what you can see here is that the person is there there is a function call to their constructor of person and the value that is told in our it's it's the return value of this constructor is being passed to a local variable so now we are going to focus more about the constructor itself so we have a few ways to recognize a constructor in assembly the first part you can see is the bitumen vtable assignment so in a lot of cases when you have a V table also this is not like if you don't have virtual functions in the function and the class is not don't have a to use the virtual function and so there will be no visible but in this case you saw that our three functions that were field trial functions in our class so in here we have the assignment of the table to the first eight bytes of the object the second part is the power metal you can see that in the eight the private parameter that the person class has its moved to the to the offset eight of from the beginning of the object which means that because the V table is in the first eight bytes so the members will come afterwards so in this case it's after it after of the date okay so virtual calls this is a main part the four wheelers engineers to understand built role calls because they are very complicated in assembly what happens is for example you have that code this is like a source code of a virtual call you have the virtual function and also you have the virtual call itself but in assembly what you have is that first you can see the move of the virtual table to array X second we can see the virtual call itself in this case you can see that the call what happens there is that a queue world PTR with the bracket means that you take the value of the register that's in the brackets so in this case you have our ax plus 8 which means you take the V table and you use an offset 8 from the V table which means the second function since we are in extend 64 beat architecture it was compacted 64 okay so we covered so far we know basic knowledge in assembly we have the dynamic object creation the basic constructors that we covered and also the virtual codes so after we know what all of that we can go into the reversing of the game so strings in the binary first thing that most of the reverse engineers do is that they look at things things are very simple they're human readable you don't need to go and you don't to go straight to assembly before you look what you have in these things so every time you have a thing you can just press X with I know our tool and see where the string was called from so in our case you can see you won the game this is kind of an important thing for us if you want to win the game and we would like to see where it was called from so you can see it was called form a function and named controller and listen so the second question you might ask yourself is like why do we have names before I said there is no names so this being a binary was compiled with the bugs with debug symbols and so in for this reason I can see the names and I can see pastel types so in doing the talk it will be easier to explain the concept but usually you don't have names you don't have all of the things you might think it's a bit weird I will mention each time you have those things that created by by the way there by by the parts in the code so a controller unloaded this is the function that the string was called form and we are going to focus first on the spaceship the in the meaning of the function I saw there is a shared PTR to a spaceship you can see again the reason that I have the name and I can see that it's a pointer is just because the code was compiled in debug so we are going to take a deeper look at this object first the constructor what are you talking about constructors before but now we're going to focus about this will behavior you can see there is a call for the game a game of the constructor and afterwards you have an assignment of vtable space repeatable into the first byte of the object you can feel it r6 as I said is usually the pointer to the object and this is what you can also see that so we would like to focus more and understand why there was a call to game object some of you might guess by now that but we all want to see that more indeed in for in in our in we can see it next in space if we table so this is the spaceship V table as it is represented it either I know that you might see the table a bit differently but in this case this is our Haider representing so the main thing is that you have many functions from game objects but only one from spaceship this name the name of the function is a move object and it's contact and as I said it contains both game object and Spacey the reason for that is that state ship actually inherits from game object this is why we saw a constructor call inside the constructor of space if there was a call for game object constructor and also do see why the V table combines both functions from game object and from spaceship so we won't like to understand game object a bit more before and we continue so the V table of Grima in the constructor we can see that there is an assignment of the V table to the third first byte 8 bytes of the consult of the object and afterwards we can see the initial initial sewing initializer you can see a call for get depth of mode the value of the return values from the function they are stored in X and then move through the relevant of the driven offset in the game object the reason that you can see that EA X is the register that we use is because the return values are integers which means that although we are using the excess 64-bit architecture so we still need to use the 32-bit registers in order to pass values that are smaller than that in this example integers which are 4 bytes okay so this is another representation of the V table of game object but this one is more similar to what we saw before so you can see that here you have a puker in the V table of game object again you can see the J in the beginning of the fuel car which means the puker was up to is a function just call the PIO code and so what does it mean is that if you want a recap what we understood so far because the reason we have a few code is exactly why we saw in spaceship that there is a move object function that was a unique to the spaceship so first if I understand from all the assembly that I saw it could write like some kind of a pseudo code so we fit the example of the pseudo code you can see the class spacing it inherits from game object it has a 1 virtual function that is move object that it implements it and the second thing is the game object I just copy the names that I saw in the assembly and also I added at the pure virtual call that we saw was actually a move object a few very - alcohol but now I have the name because I could see the name from there they derive the class space week ok code flow so what you're going to do now is look at the code of the come back to what we saw the controller 11 code and come back to the function and understand what happens there because this is the function we understood so far it is the important one so first we can see that is after the spaceship ensured what that I saw that there is a call to field weight vector so the name of the purpose are quite similar what it does it begin with creating a vector and then it fills it with a wave interface the derived classes and in this case the class is a wave - you can see that the wave interface was part of the vector in the first line and wave 2 is the class that is being used so we can assume that this is one of the derived class of a wave interface okay so what more did we have after looking more at this field wave vector function I could see that there is a wave boss color so it means that one of the function one of the classes that were used and one of the elements in the vector was a wave boss class so in this case and I will do a smoky I will come back a wave boss is quite important for us because what we wanted in the beginning is winning the game and wave boss is porbably I will assume is the big chicken boss that you need to fight and it's like impossible to pass and this is like my assumption but we need to check it and look further in order to give like a concrete answer so if I summarize it they feel web vector function from the assembly I could realize look something like that just in place back to the object and that in general how it looks like the now we can come back to one level this is the main function we looked at it was their function that's also called the string the one that's called the field vector and what I saw there is that there is a virtual call so there is many problems when reverse engineering C++ code but when you have a virtual college like a major one since we don't have we don't always have the ability to dynamically reverse code so you need to statically understand the flow in order to understand what was the virtual call and this is like a very hard task to do so in any case you can do dynamic analysis when you have a virtual call this is what you do so this is what you are we are going to do now and we're going to go to our and easiest solution which is dynamic analysis so and you have an ability to add debugger to either and debug the code assembly code and if you have so in our case I placed a breakpoint on the virtual call you can see it and in the you can see it in there and we can see in the right-sized side and you can see the Dilys the registers that I talked about and some Mostyn and Alex stores a value of anotherís so now we can go into the atlas and see what happens there so we can see that the virtual course is two wave three create ways and you can see the offset the address is 2v table and the offset is the create wave function so now we are going to look again at the vertical just to show you that it happens again again you have an address and in this case it's wave four so each time you have a virtual call what happens is that a create wave function is being called depending on the object and so we we know that this object was part of the vector so after looking a bit more we solder is it did iterate over the elements of the OP of the vector and then choose the relevant object and called the virtual use developer Colin called a create wave so after a few iterations I arrived to the stage when I suppose to deal with the boss and when I needed to deal the boss I looked what happened in assembly I again stopped in the breakpoint and so that the way the object is being called is wave boss so now we have like a concrete understanding that wave boss is the object is responsible for the level with the boss so examine wave force so you can see an example of the the boss and okay wave birth constructor so first we have what we learned so far we have the V table it's being stored in the first bytes then we have some member there is the number is 15 but we still don't know what is this member and we will need to have further checks before we decide what to do with it so from here we understand that wave boss I'm sewing and the V table if you can see if the B table of chicken a boss and not wave boss the reason is that because chicken boss inherits Voicebox inherit from chicken boss and the wave boss doesn't implement a new functions so it can use the same virtual table okay oops absorbs solving okay and so now we want to know what exactly do we want to change in order to win because this is our main goal after learning reverse engineering and so the process I'll summarize it because the process was quite similar to what we've done so far you know summarize what I've done and what I figured out so chicken boss is the base class for a wave bus but and there is like a full inheritance you can see a wave boss inherits form chicken boss that inherits from chicken base with no net from game object as we already saw before so game object has many children so it has like like raw chicken on spacing and and I could come back after understanding that this is like extremely important object I can come back and understand more stuff about the object after the bosses have done so I understood that one of the parameters in the game object was a number of life that an object has so this will number that we saw the 15 is actually the number of flies the boss here so now we want to change the amount of life because one is like enough for us we just need to choose it once and it will and we will win so this represent you can see a boast assembly and hex representation of the instruction so you have here their number 15 which is represented by 0 s 0 0 0 0 0 we will change it to 1 which means we patched the binary and now we have a new value and now we can okay and now we can adjust and again should the boss one anyway and just when I thought it is over I arrived to the scoreboard so what happens in the scoreboard is that I wouldn't be able to enter my name before because I wasn't good enough to so to succeed now I saw that I'm good enough I could enter the scoreboard I am able to enter my name and if you can see and there is like a small line at the top left and it says well done congratulations so every time you enter to the scoreboard and new message appears it can be a hooray first place well done correlations it can be like lots of things but it being randomized every time that you enter a new score what happens is one time I entered my score and this is what happened and this is not anything that I would assume supposed to be there like it's my name three dots and disco which is a bit weird you can see that there is the same score and name in the scoreboard so I was like ok this is interesting let's dig in and see what happens and why so first I saw I looked at the scoreboard function so this function and this function actually is responsible for choosing the strings for this sentence so it uses from two strings to global strings concatenate them and then a cold-air locators and for each one of them before we go into what happened I will explain a bit about small strings of optimizations and so in MSV c which is the our compiler what happens is that it checks if is smaller than 15 you can also see the assembly lines in the that shows what happened it has compared the value to 15 and if it is smaller than 15 it goes to a branch that locates it stores the string on the stub and if it's bigger than 15 it allocates the string on the hip it is also important to say that different compilers has different handles it differently and I'm not going to get into it now but I what happens that for example GCC also use a size of 15 but claim use 22 okay so cheers Cheers is the first thing that was used for our sentence or what what was supposed to be a real sentence you can see that in the assembly you have cheers this the string length is 8 we can see it being stored in LSI the source of this is a global string that is cheers with and explanation box and also you have the destination in another on the stack so this is the first thing we had we have the chills we saw that these being stored on the style the second thing that I saw was her a hearing is also being stored in the stack it's very similar the pauses to how it looks like in chains and so I'm just gonna move on and show you how the stack looks like so you have tears in array both on this time you can see in the right how the stack looks like approximately and also in the left like how it looks like and when I debug the code and look at the stack so allocate allocating a new a winner for the scoreboard so what happened is that after I won I could enter my name and the score and my name was appeared in the scoreboard so what happened in the part is responsible for that if you have a function call that's what it is like get the name that I entered and creating the string of my name the three dots and the score and then call the elevator for it and what happened is that the address for the allocator is the same others understand the chills an array had I don't know if you remember but it's like memory in this value but basically was the same address that was used for a look for using the chills an array a strangle the stock so if you wanna examine the stock before and after and the core so you have before you have we have the chills an array you can see again the start and afterwards you have the girl in my skull on the stack in the same places okay so this is weird but now we have more understanding of why and what happened so what we figured so so far that you have two Global's things that are been concatenated and can be stored on the stack or on the heap depending on their length and also you have this call and the name the fees also it can be stored both on the stack and the heap depending on the length again but a big question this both of all of you probably have is that why and like why the code what is the code behind it and why is this with weird behavior happens and the answer is string view c++ 817 and what i figure out i there is a previous talks that was explaining about this station you have a victor and an e coli talk so if you want to get deeper into understanding wine how it happens you can watch the talks also have reference to other blogs and a presentation you look at what and but in general a saint view is implemented like this you have a size and the pointer to the string of this tank view does the same view uses but three you doesn't own the memory which means that it has no control over the objects lifetime which means that in our case the return value from the function and created a dangling reference and we could actually write other stuff because the water tank reference from misusing a thing view so the to define no to make it clear I wrote what I thought is the pseudopod this is like what I saw in the assembly I could figure out approximately how the code looked like so first I evolved like some of the things I saw in the global things that work the sentence in the scoreboard well done first place all of those things a second you can see a function that concatenate two strings and store it in a string view and then return the value and afterwards you can see that there is the function call for main main and after this function call you see the function that's responsible for creating the gal three dots and the scope the string that actually overrides the string view a return value from the function so this is how an approximate it looks like it's all based on like how what I signed assembly and the understanding of the flow understanding of the objects that I saw there and this is how you can actually see assembly and create a code only by looking at what a binary can we feel for you so if I want to summarize everything I want to like give you like what do you need to to get from for my talk so first reverse engineering can explain a program to smallest to the smallest big details you can see everything that moves from one place to another you can see all the flow so actually representing is a good solution for understanding a specific stuff in programs also reverse engineering as you saw specifically for C++ is very complicated if you have a lot of difficulties yet you will see when you start reversing and this is one of the things I wanted to show in this talk another thing is that and you can understand a lot as you can saw we understood a bug and we could also overwrite a bug with our string only by looking at assembly and there was no source only assembly and we could still understand those small details and figure out how the program looks like and the last thing is that all we think that there will be a reverse engineer at the end as we look at the code try to find bugs and exploit them and if you don't know for 100% what will happen with your code or what will be the cause of and some action so it might have like severe consequences at the end so always think that there will be an evil person at the end that will try to do my try to do malicious stuff on your code so know what you're writing and try to make it as clear and asked and asked me as you can and so thank you for your time ok two questions I have noticed nope mnemonic down the comment what is the use of it if you can tell me no okay in a code in the one of the first slides when you show me no operation like the first slide one of the first even before before before before before before before here a second what was the reason that it's there what's common do you mean like the vtable oh no up no operation ah sometimes the compiler at the knots in the code it's that it does nothing like not less like but it's part of the compiler just like sometimes you can see that they're not being added and it's part of their compiler organization ok yeah ok it's not only data needs to be lined but functions is well right yes ok ok another thing I don't you only one but still it's not clear for me about this J traffics if it's only me then don't bother but so did i yes we'll go a lot back okay like this okay so what happened is sometimes the compiler treats the function like our constructor for example we have the person constructor and in order to and then the compiler create another function that all it does is jump to the function just and I don't I can't keep you like a specific answer for why but it happens like sometimes specifically when I code compile to debug in enemies it doesn't happen that much but in debug it has like one function that inside all occurs is calling a different function and either the disassemblers we use can recognize that this behavior is something is a pattern just add a J which is the jump and the name of the function so I change the function inside that's another function you can just call and only that's calling the the function and this is why I J being added okay is it clear now okay another okay thank you very much if you need any other questions feel free [Applause]
Info
Channel: CppCon
Views: 14,467
Rating: 4.8134112 out of 5
Keywords: Gal Zaban, CppCon 2019, Computer Science (Field), C++
Id: ZJpvdl_VpSM
Channel Id: undefined
Length: 36min 47sec (2207 seconds)
Published: Fri Oct 18 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.