How the C++ Compiler Works

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey little guys my name is eterno and welcome back to my sleepless blood series so today we're going to learn all about how the people applause compiler works so let's take a step back and think about this for a minute what is the big picture here what is a simple source compiler actually responsible for so we write our C++ code as text if that's all it is it's just a text file and then we need some way to transform that text into an actual application that our computer can run in going from that text form to an actual executable binary we basically have two main operations that need to happen one of them is called compiling and one of them is called linking in this video we're just going to talk about compiling I've actually made another video specifically covering linking so you might want to check that out as well the link will be in the description below so with that being said the only thing that the C++ compiler actually needs to do is to take our text files and convert them into an intermediate format called an object file those object files can then be passed onto the linker and the link could do all of its linking things but anyway we're talking about compiling here the compiler actually does several things when it produces these object files firstly it needs to pre-process our code which means that any preprocessor statements get evaluated then and there once I've heard is include processed we move on to more or less tokenizing and pausing and basically sorting out this english c++ language into a format that the compiler can actually understand and reason with this basically results in something called an abstract syntax tree being created which is basically a representation of our code but as an abstract syntax tree the compilers job at the end of the day is to convert all of our code into either constant data or instructions once the compiler has created this abstract syntax tree it can begin actually generating code now this code is going to be the actual machine code that our CPU will execute we also wind up with various other data such as a place to store all of our constant variables and that's essentially all the compiler does it's not crazy complicated it of course dogs get very complicated as your actual code complexity grows but that is the gist of it that's all that it does we're going to go ahead and jump into this and take a look at what each stage actually does so that you guys can see how it all works so let's go ok so here we've got a simple hello world application you might remember this from the house tablecloth Works video that I recently made we basically just got this main function which called vlog which is actually defined inside this log does typically file and it simply just prints our message to the screen and we wind up with a simple application which says hello world if we pop over into our output directory to debug here you can see that it's generated a hello world exe file and then back in the project directory in debug its generated a means or obj and load obj file so what the compiler has done is as generated object files for each of our C++ files for each of our translation units now every CPP file that our project contains that we actually tell the compiler hey compile this CPP file every single one of those files will result in an object file these CPP files are things called translation units essentially you have to realize that these both love doesn't care about files files are not something that exists in C++ for example in Java your class name has to be tied to your file name and your folder hierarchy has to be tied to your package and there's all this going on because Java expects certain files to exist in C++ that is not the case there is no such thing as a file a file is just a way to feed the compiler with source code you're responsible for telling the compiler what kind of file type this is and how the compiler should treat that now of course if you create a file with the extension CPP the compiler is going to treat that as a C++ file similarly if I make a file with the extension dot C or dot H the compiler is going to treat the dot C file like a C file and not a people spot file and it's going to treat the dot H file like a header file these are basically just default conventions that are in place you can override any of them and that's just how the compiler will deal with it if you don't tell it how to deal with it I could go around making sure no files and telling the compiler to compile that and that would be totally fine as long as I tell the compiler hey this file is a C++ file please compile it like a C++ file so just remember files have no meaning ok remember that important so that being said every C++ file that we feed into the compiler and we tell it this is a C++ file please compile it it will compile it as a translation unit and a translation unit will result in an object file it's actually quite common to sometimes include CPP files in other CPP files and create basically one big CPP file with a lot of Isles in it if you do something like that and then you only compile the one CPP file you're going to basically result in one translation unit and that's one object file so that's why there's that terminology split between water translation unit is and what does it BP file actually is because typically file doesn't necessarily have to equal a translation unit however if you just make a project with individual CPP files and you never include them in each other then yet every security file will be a translation unit and every CPP file will generate an object file now these are actually pretty big you can see this one's 30 kilobytes in this one 46 kilobytes the reason to that is because we're including iOS stream and that has a lot of self in it so that's why they're so big and because of that they're actually pretty complicated so before we dive in and take a look at what's actually in the file let's create something a little bit more simple I'm going to right click on full file hit add new item it is going to be a C++ file I'm going to call it math fill favorite eat and hit add over here I'm just going to write a very basic multiply function which multiplies two numbers together I'm not going to include any files in here or anything I'm just going to write a very simple function it's going to return an integer it's going to be called multiply it's going to take two parameters int a and int B it's then going to create a result variable which stores the result of a times B and then we're going to return that result variable nice and simple that's it let's hit control seven to build that file you can see over here that it's built it successfully I'm actually going to just resize littlez to do a little bit just to make it easier so now you can see the output window a bit better if we look back into our output directory you can see that we've got this mouse or obj file now and it's four kilobyte before we take a look at what exactly is in that object file let's talk about the first stage of compilation which I mentioned earlier pre-processing during the pre-processing stage the compiler will basically just go through all of our pre-processing statements and evaluate them the ones that we commonly use are include define F and F def there are also pragma statements which tell the compiler exactly what to do but we'll talk about them in other video so let's take a look at one of the most common preprocessor statements that we have hash include how does that work so hash include' is actually really simple you basically specify which file you want to include and then the preprocessor will open that file read all of its contents and just paste it into the file where you bro your statement and that's it it's really really simple and I'm about to prove that so that's over here I'm just going to make a header file I'm gonna right click on the header file to get add new item is going to be a header file and I'm going to call it and brace and then click Add ok we're going to wipe out whatever was in this file and I'm just going to type in a closing curly brace that is it that's our entire file so now back in master CPP you can see that we have reasonably written a closing curly bracket here for our multiply function let's go ahead and wipe that out if we compile our file now by hitting ctrl f7 you can see that the compiler complains about the less brave being unmatched at the end of the file so instead of fixing this like a normal person and just adding in our ending braze let's go ahead and include our end brace header file so I'll type in hash includes and brace and there we go let's hit ctrl f7 to compile that and look it compiles successfully of course it did because all the compiler did was open this and break file copy whatever was in here and then just paste it into here and that is it okay header files solved you should now know exactly how they work and how you can use them there's actually a way we can tell the compiler to output a file which contains the result of all of these preprocessor evaluations that have happened if we bring back our include end brace and then right click on our hello world project and hit properties under C C++ and then preprocessor I'm going to set the pre-processed to a file to yes make sure that you're editing your current configuration and platform so that these settings actually apply let's hit OK and then we're going to just hit control f7 to builders again if we bring up our output directory you'll see this new dot I file which is our pre-processed CC plus off code let's open this in a text editor so that we can look at it okay so here you can see what the preprocessor has actually generated you can see that our source code had this include end brace and yet the preprocessor code has just inserted our end brace that was in that dot H file that we've included pretty simple stuff let's add some more preprocessor statements and it does so back in our file I'm going to restore our end brace because I'm getting tired of looking at that include I'm then going to come up here and define something I'm going to define the word integer to be in now don't ask me why I would ever do this is just an example the design proposals a statement will basically just do a search for this word and replace it with whatever follows so let's replace our end here with the word integer so that we actually return the integer we can also do the same here let's head of ctrl f7 and if we look back at our file you can see what's happened it just looks normal into result if we were to do something stupid here like write the word Cherno and mention ctrl f7 if we go back to our file you can see it now there's channel multiply and channel result pretty cool stuff let's play around with this a little bit more let's bring back our int we'll get rid of this define and instead what I'm going to do is actually just use something called if the if preprocessor statement can let us include or exclude code based on a given condition so over here I'm just going to write f1 which in other words means true and then just write an end s at the end of this function I'll hit ctrl f7 we'll go back to our preparators file and you can see that it looks exactly like it does here without this thing if I go back here and I switch this off by writing in 0 visual studio will fade out our code to show that it's disabled by here ctrl f7 and take a look at its file we have no code so it's another great example of a preprocessor statement all right one more we'll look at include let's get rid of our s0 and then I'm going to write include iostream the massive massive iostream let's take control f7 let's look back a bit and well take a look at this we have in here 50,000 623 lines and there's our function at the very bottom and then look this is all that include iostream has done now of course iostream also includes other files so it's kind of like rolling a snowball down a hill you can now hopefully see why those objects class was so big because they included iostream and that is a lot of code alright great so that's the preprocessor once that stage has done we can move on to actually compiling our simple plus code into chain code if we go back to our project here I'm going to get rid of the include because we don't need it and I'm just going to hit ctrl f7 you should now see in our progressive file that we're back to normal and in fact I'm actually going to go into hi world hit properties and then disable that preprocessor to a file if you actually read what presses to a file does you'll see that it actually does not produce an obj file so we need to disable it so that we can actually build our project I've said okay and then we'll hit ctrl f7 to build our 50p file you'll see that we should now get a mastered obj file which is actually up to date so let's take a look at what actually inside our obj file if we open this file with the text editor you'll see that it's binary which doesn't really help up too much but part of what is actually inside here is the machine code that our CPU will run when we call this multiply function so because this is just binary and completely unreadable let's convert it into a form that might actually be more readable by our there are several ways we could do this but I'm just going to use visual studio I'll right click on hello world and hit properties under C C++ and then output file I'm going to set assembler output to be set to assembly only listing and then I'm going to hit OK and we're going to hit ctrl f7 inside our output directory you should see a math dot ASM file let's go ahead and open that with a text editor okay so this is basically a readable result of what that objects file actually contain if we go down over here you'll see that we actually have this function called multiply and then we have a bunch of assembly instructions these are the actual instructions that our CPU will execute when we were on the function I'm not going to go into huge detail about all the dissembler code now I might save that for another video but if we take a look over here you'll see that our multiplication operation actually happens here basically we load the a variable into our EAX register and then we perform an I mul instruction which is a multiplication instruction on the B variable and that a variable we're then storing the result of that in a variable called result and moving it back into EAX to return it the reason this kind of double move happens because I actually made a variable called result and then returned it instead of just returning a times B that's why we get this moving EAX into result and then moving result into EAX which is completely redundant this is another great example of why if you set your compiler not to optimize you're going to find out with slow code because it's doing extra stuff like this for no reason if I go back to my code and I actually get rid of that result variable by just returning a times B and then compile this you'll see the assembly looks slightly different because we're just doing a mole on B and E ax and then that's it the ax is actually going to contain our return value now all this may look like a lot of code and that's because we're actually compiling in debug which doesn't do any optimization that and does extra things to make sure that our code is as diverse as possible and as easy to debug as possible if we go back into our project and right click here hit properties I'm going to go over here into optimization under the debug configuration let's select maximize speed if you try and compile this now it'll actually give you an error because you'll see that OH - and RTC is actually incompatible so we'll have to go back over here into code generation and make sure that our basic runtime checks are set to default which basically won't perform runtime checks this is basically just code that the compiler will insert to help us with debugging let's take control of seven and look at that assembly file again wow that looks a lot smaller we've basically just got our variables being loaded into a register and then the multiplication and then that's it pretty simple stuff you should now have a basic idea of what the compiler actually does when you tell it to optimize it optimizes this is a pretty simple example so let's take a look at something a bit more advanced we'll take a look at a slightly different example in which case we don't actually take anything in here but I decide to do something like five times - we'll save that file I'll go into my properties and make sure that I disable optimization so let's hit control f7 and take a look at our file you can see that what it's done is actually really simple it's simply moved ten into our EAX register which is a register that will actually store our return value in so if we take a look at our code again it's basically just simplified our five times two to be ten because of course has no need to do something like five times to two constant values at runtime there's something called constant folding where anything that is constant that can be worked out at compile time is let's make things more interesting by involving another function so for example I'm actually going to write a log function which is going to log a certain message of course I don't actually want to make it log anything because that would mean I have to include iostream which will drastically complicate this so I'm is going to get it to return that message that it received over here and multiply I'm going to call log with the word multiplier I want to change this back to be a and B and watertown a times B let's take control of seven all right so let's take a look at what our compiler has generated if we scroll down a bit you'll see that we've got this log function which doesn't really do much but this actually will just return our message you can see that it's moving our message pointer into EAX which is our return register as we've established so this is the log function if we scroll up a little bit you'll see the multiply function and then all we have here is a call to log so right before we actually do our multiplication by using the I mul we actually call this log function now you might be wondering why this log function is decorated by what seems like random characters and at signs this is actually the function signature this needs to uniquely define your function we'll talk more about this in the linking video but essentially when we have multiple OBJ's and our functions are defined in multiple obj it's going to be linkers job to link all them together and the way that it's going to do that is going to look up this function signature so all you need to know here is that we're calling this log function that's what the compiler will actually do when you call a function it will generate a call instruction now in this case it might be a little bit stupid because you can see that we're simply calling log we're not even storing the return value basically this could be optimized quite a bit if we go back here and we turn on optimization to maximize speed and in control of 7 you'll actually see that that just disappears entirely yep the compiler just decided that does nothing I'm going to remove that code but you should basically now understand the gist of how the compiler work it will take our source files and output an object file which contains machine code and any other constant data that we've defined that's basically it and now that we've got these object files we can link them into one executable which contains all of the machine code that we actually need to run and that's how we make a program in C++ pretty simple make sure that you check out my video on how to link work so that you can see the next step but apart from that I hope you guys enjoyed this video hope you learned something new you should now have a basic understanding of what the simple of compiler actually and that's going to be really important when it comes to debugging and also when we get into more advanced topics in the future make sure you follow me on Twitter and Instagram and if you really like this you can support me on patreon I'll see you guys next time goodbye warm
Info
Channel: The Cherno
Views: 489,324
Rating: undefined out of 5
Keywords: thecherno, the, cherno, project, thechernoproject, c++, how c++ works, learn c++, c++ tutorial, game, programming, development, engine, game programming, game development, how to make a game, tutorial, source, code, complete, game engine, how to make a game engine, opengl, glfw, C (Programming Language)
Id: 3tIqpEmWMLI
Channel Id: undefined
Length: 17min 55sec (1075 seconds)
Published: Sun Apr 16 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.