A Brief Introduction to LLVM

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
clicker okay so I'll start off with the man behind the project Chris Latin er well let's give a face to Chris this is how Chris looks like he is currently 37 years old he graduated from University of Portland with a bachelor's in science then moved over to University of Illinois I graduated there with a master's in computer science as part of is his work at the University he had a research project which sort of later evolved into LLVM and he was one of the two main researchers assigned to that project and sort of this is a good example of your work in the university you're currently studying at whether it's a bachelors program or a master's can evolve into something really big and as you later know LLVM is a major major project that is usable throughout the software industry ok seiza currently um and as of 2005 compiler architect at Apple he he joined Apple on the sole basis of LLVM because Apple was looking for a waste to optimize their toolchain their compiler tool chain and LLVM was probably the most innovative and and and best project out there so again his research turning the commercial work for Apple so that's about Chris now LLVM is not an acronym it's an actual name although it used to be an acronym but now is it's the name for the project and the project itself is an umbrella project it envelops a lot of different a lot of different parts in the developer tool chain specifically compiler tool chain it has debuggers it has optimizers it has internal representations it has a lot of things and let me let me start off explaining those things with an analogy to the way things were before so what you see here is let's say the monolithic structure of compilers in the way they sort of currently are and definitely 100 percent before LLVM came came over so just imagine these circles represent different parts of the process so it goes from parsing to sort of lexical analysis and all the way through down to machine code and for any given language and any given target you had this one big structure if you wanted a new language you have to sort of throw everything away create a new structure and populate it with these circles so as you can probably imagine that is not efficient and if you had say M languages and and targets you would need m times and compilers so that is very inefficient in terms of efforts ok now if you get rid of that structure you can sort of transition over into this sort of scheme where you have a set of different components that you can rear engineer and sort of delegate to specific groups to work on so that this allows and this is what we've been taught in class you have the source the scanner parsing and this whole sequence is is necessary but if you break it up into parts you can you can have different parts developed by different people and if you're good at parsing doesn't mean you're good at optimization and so people can then allocate the reference to what they're best at and since this is a very big sequence I'll be focusing just on the two parts that LVM is famous for that is the IR or intermediate representation and the optimizer okay so we can get rid of all the other parts and start with the IR so there are a couple of ways of going about IR and the first one would be structured so you can either have a graph or a tree based system where you're taking your tokens and you're creating a graph from the other approach would be you can have a tuple based system which is flat then you have a sequence of two pools that sort of usually collect three values that is the opcode and the two operon right and then there's the stack based the structure which is kind of similar to the tuple based flat structure but you sort of instead of having a tuple of three values together you would pop say X you would start you would push X push Y and then call the add operation and it would pop these two together and it would add them all right so let's let's look at the LLVM ir so what it is is a little low level language that it's kind of similar to assembly and it is risk like being on what risk is in terms of CPU so it's reduced instruction set CPU okay it makes it more efficient you can sort of great difficult commands and large complex commands into tinier commands and then ideally they would execute faster okay and it is strongly typed which is unusual for an assembly language usually they're just untied okay this is a strong type language so you cannot do assignment from an Intel floating that sort of thing and like assembly where you work with registers and like assign values registers and pull values from registers this works with registers as well but it has an infinite set instead of ax BX whatever it has 0 through whatever whatever the number you want okay now let's go to the optimizer and obviously from the previous charts it has three parts it has the front end and the back end but in the middle sits the optimizer that's a very vague term I guess but hopefully by the end of this you'll get a good sense for what it means and what it is to optimize it but so what the optimizer allows of LLVM to do is if you take a whole bunch of input languages or source languages and you have a whole bunch of targets right you can with the ir LLVM provides you can sort of translate each one of these languages over to that ir and then translate that ir to any any target you want as long as there is a way to translate LLVM ir to a target okay now this allows for a lot of modularity as you might guess now with whatever the number of targets you have for x86 PowerPC arm whatever and we're going over the number of source languages that you have just need one one optimizer and you can link those two together so you're decoupling the structure and obviously that is a good thing for for development efforts okay so there are a couple of approaches that existed and currently exist take for example the Java and so this works in a way where you have your Java bytecode and it then runs on the JVM which is a which is the java virtual machine and what this allows you to do is write your code compile it into the bytecode the Java provides and then run it on at any JVM this is good this is part of what LLVM provides but it limits you to a specific kind of structure and that structure is the Java object model which is bloated shall we say it has a lot of parts and not every language is is a good candidate to translate over to that sort of structure you also have this this approach where since C is so ubiquitous and GCC is a very good compiler at what it needs to do which is compile C and C++ a lot of languages take the approach of first taking their language the source and then compiling it into C and then using that sort of intermediate representation and then using GCC to compile that further to machine code examples of that would be Python you can compile Python over to C and C over the machine code and obviously have GCC which is very popular it is probably one of the biggest if not the biggest open source compiler projects it is a very old project at that as well and a couple of things that come from that is you have a lot of structure that is mediocre it's not ideal so a lot of a lot of parts of GCC are outdated and bloated and that kind of structure and as well it's monolithic so you have everything coupled together and if you would like to write a front then the GCC which is some sort of custom language you'd have to rewrite the whole for I don't know parts of it but very very major parts so that's an efficient and LLVM ir takes care of all of that um okay so let's so that's I are taking your source language and parsing it over to some intermediate language okay but that is still a language now why do we do that would anyone care to guess why would you have source language parsed into some intermediate form I don't see any hands opening up so I could break break it down by intermediate code into machine code Betty yeah how better exactly yeah so with with a with a representation like that where it's it's assembly like modular strictly type that sort of thing you can then do analysis on that and get rid of redundancies and I'll show you how they do it so first of all you look for a specific pattern that you know he is redundant right you take your your intermediate representation and search for patterns now you you take those patterns and you pass them on to objects that transform the pattern into something more efficient and then you check whether it it is a safe transformation so by transforming whether it will break the rest of the code or not if you confirm that then you execute and you complete the transformation okay so I'll this is the this was an overview I'll get into specifics for IR and how it looks like with an example so let's say a typical c function add one takes two arguments a and b and returns the some bread pretty something now how this looks in LLVM ir right this is the assembly like language that I was talking about you have pretty much the same structure right you have your function you have your two arguments the types are against trippy type so this is an integer that's 32 bit wide you and again one one of the things that is specific to this language is that it's um the variables are not mutable so once you assign something to a variable you can't change that value you have to create a new variable and assign whatever operation you want to that variable so this is why you have this ten one okay you assign the sum and then you return the sum and the final bit about the LLVM IR is that it is a transformable to three different forms and at every point you can you can go back so you start off with your text the Assembly like language that I just showed you that gets transformed into memory and data structures in the memory the optimizer uses and once that is done you can save that into bit code and store that on the disk okay but if you want to you can get back from bit code to the same assembly language and go around the loop again okay so at every point these representations are are the same sort of just different forms of the same thing you can get you can get back sort of how Java bytecode if you compile it you can always look it up and see not the not the exact same names of the variables but you can still see the same structure that you had before so this thing applies to LLVM ir okay now i'll give you three examples of patterns that you can optimize so let's start with first one how would you optimize this exactly it's zero how would you optimize this okay and the last one boom so this is kind of simple but you get these types of things in your cupboard and these in order to optimize your program you need to get rid of these so if having this sense of patterns and ways to optimize them can move over the actual implementations so if if we find things in our code in our intermediate code like this you obviously get the sense that this is zero right you saw a from a you get you get nothing the exact same version of the second example and the third right so with these you then sort of go to the optimizer and then get to see the whole structure there so the structure is pretty simple you have your instruction which is an object that delegates the optimizations based on whatever operation you were conducting so that was subtraction so this goes to subtraction simplifier and then you get the value at the at the end okay let's begin with with the first one this is the for loop that iterates over all the blocks in your code so just imagine line by line okay and this is the object that delegates okay so the delegator then picks depending on the operation that you were conducting in this case subtraction over to another instance that takes care of common patterns and ways to optimize them so these are the three patterns that would be there for the three optimizations that we discussed you have your x minus 0 equals x and this is c++ by the way LLVM is written in c++ so this is just a a method within a class which is called simplify sub instance and depending on the pattern you return a specific value say in this case if you subtract from whatever 0 you get that same thing if you if you subtract a from a you get nothing and if you do this operation you get ax so these are sort of the first the first part the second part of your operation the two off trends and then you get the value which is the end result of your optimization and it is replaced throughout your source or the assembly turkey's assembling like intermediate representation with that optimization and again back to the same loop this looks kind of simple right this is not complex at all but this simple operation does lots of the efficiency of your cover all right in summary Chris flattener right the guy that invented this thing he's a compiler architect at Apple I'm working there since 2005 he invented LLVM as part of a research project at a university he designed Swift and that's why I picked them from the myriad of options to to give a presentation on because I love swift swift is my favorite language and he's an awesome dude done all those things LLVM so it's a umbrella project that has a lot of troubles in the compiler tool chain two main parts we've talked about the the LLVM IR and the optimizer so now as a modular design and that is the main thing that allows it to be sort of used throughout the industry to work on specific parts of the compiler and it has a unique intermediate representation which I showed you which is the most important part of the designer about it's the first class risk like language that's strongly-typed and as I showed you the three circle slide it is an isomorphic form that goes to memory to bit code and back without any any problem all right so that's part where you have any questions I can I can answer it them everyone's Ford okay I guess that's it thank you
Info
Channel: Morgan Wilde
Views: 52,716
Rating: 4.8463507 out of 5
Keywords:
Id: a5-WaD8VV38
Channel Id: undefined
Length: 20min 27sec (1227 seconds)
Published: Tue Jan 26 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.