2016 LLVM Developers’ Meeting: L. Hames “ORC -- LLVM's Next Generation of JIT API”

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right so the talk today is gonna be about LLVM JIT api's past present and future mostly focused on the future it's gonna be about 50 minutes depending on how fast I get through it at the end we'll have five or ten minutes of question time I'm audience members and make sure you were paying attention all right I'll be praising MC jet which is our current baseline API and telling you why the design is really excellent and then I'm gonna spend some time beating MC jet with a stick and telling you why the implementation is kind of broken and we need to move off at a none to something new and the bulk of the talk will be spent introducing the or KPIs that I believe can provide MC jets replacement now before I dive into the talk a word on the code examples there's gonna be a few code examples on the slides today everything you see today is available in the building a JIT tutorial on LOV org that's where you want to go for real code the code on the slides has been simplified in places just to keep the size down and make it more presentable but before we dive into the api's themselves we should stop and think about the use cases for LLVM GAAP eyes because that's going to drive the API design and there's a broad range of JPI use cases for LLVM api's and there's quite a diverse set of requirements for these api's so I've picked a few examples of use cases that reflect the diversity of these requirements these are the kaleidoscope tutorials ll DB expression evaluation high performance jets and interpreters and ripples and I'll go over each of these in more detail now the kaleidos kaleidoscope tutorials a lot of people would be familiar with them this is a tutorial series on LOV n org that teaches you how to build a simple language using LLVM and its purpose is to introduce you to LLVM ir and the LOV ma pis and the end goal of the tutorial series is to build a rebel for this simple functional language it's the great thing about the LLVM jet api is for this use case is that once you've constructed your ir you don't have to worry about platform specific things like system linkers and system libraries you can just add your ir to the jet api's and run it in place in memory and this is an important use case for us to support because it is an entry point for a lot of people in the LLVM community and the only real restriction of places on our jet api's is that they should be simple and safe for basic usage which is something we want from our API so anyway the next use case is ll DB expression evaluation which is another one that a lot of people would be familiar with when you're an ala DB debugging a program you break on a breakpoint or you pause the program you can evaluate an expression in the source language the your debugging and ll DB is going to use clang or whatever are the front end you're using to turn that source expression into LLVM ir and then it uses the LLVM GAAP is to execute that IR in the context of the process that you're debugging but the interesting thing here is ll DB can debug programs on the other end of a network connection and those programs don't have to be running on the same architecture as the program that you're used that you're running LD be on so LOV M's JIT api's need to support cross target compilation the same way the static compiler pipeline does now which is an interesting feature for JIT api's to have to support next up is high-performance jets and I'm thinking of things here like shader compilers or the top tier optimizers for tiered jets people who are using LOV M today P is for high performance cheating their aim is to get the best quality code out of their jet for whatever target they're running on so what they really want is the ability to configure the optimization and code gen pipelines to get the best quality code out of the GAAP is and then in a lot of cases apart from that high performance users just want the JIT api's to be as simple as possible in particular if you're building a TI jet you don't want LOV m to force JIT features onto you right you want to be able to slot LLVM stat api's into your existing jet infrastructure and finally we've got our interpreters and rebels what we need for these use this use case is lazy compilation if you you know the great thing about building a language on top of LLVM is that once your front end can produce IR you have a static compiler for a whole bunch of different targets it'll be great if the JIT api's gave us the same the ability to build a JIT compiling interpreter just as easily but to do that we don't want to have to compile all the AI are in our program up front we need to be able to compile each function the first time it's executed otherwise we're gonna have horrible lunchtime if you had assumed that this is what LOV MC api's were already doing this might be news to you but LOV MC api's have not always supported lazy compilations something some have some happen and then the other thing that we really need for interpreters and rebels is equivalents with the static compiler pipeline you don't want your front-end to have to know whether it's generating code for a program that's being compiled ahead of time or a program that you're going to run through the JIT interpreter you want to be able to say the same IR and have it run the same way whether you compile it ahead of time or run it through the jet but that means giving meaning to IR constructs like simple linkage and simple visibility that it traditionally meant for the static linker so that's something we want to do in our jet api's these use cases introduced some as I said some requirements that have some tension between them we want the api's to be simple for beginners but configurable for advanced users we want them to be cross target for ll DB but if you're using a low VM to add scripting support to your application you want to be able to run the jetty code in process so that it can interact with your application code we want things to be lazy for interpreters but we want it to be non lazy for high performance users and we can support all of these requirements with LLVM JIT api's but we can't do it behind a single interface which is what we've historically tried to do so that interface is called execution engine a core of this interface is actually really simple and really intuitive you have an add module method that allows you to add an IR module to the jet you have a get point at a function method that allows you to get the appointed to the compiled code for any given function in your IR and you have an add global mapping mapping method that allows you to create a mapping between a declaration in your program and a definition some whether you've already compiled in your application so this allows jittered code to access global variables and functions in your application if that's something that you want to support so this core is is actually pretty nice and pretty easy to understand although as it's written there are actually some problems with this as we'll see but over time as LLVM jet API is required different and those implementations have sprouted all sorts of features this API has grown all sorts of non orthogonal sort of dials and levers and options and callbacks and the end result is that it's no longer safe for easy to use at all most people get by by setting up an execution engine once and backing away slowly and never touching it again and hoping that it just keeps working so let's look at the implementations of this interface there have been three over all of Em's history the first one we now refer to as the legacy jet was introduced in LLVM 1.0 and it persisted through to LOV n 3.5 it was read in the lov m 3.6 time line and didn't make it to the LLVM 3.6 release this is the implementation that introduced the execution engine interface and it did support lazy compilation but it only supported it for in process compilation you couldn't do cross target compilation with this in alluvium 2.9 we acquired a new jet implementation called MC jet that was based on the then fairly new lov MMC layer this also implemented execution engine but it's and so I implemented the execution engine it also supported cross targeting so this was the first implementation to support this was the first implementation to support L LD B's use case for expression evaluation but it lost support for lazy compilation so that means when the legacy jet was deleted we lost lazy compilation support from LOV M and then in LVM 3.7 we acquired the orchid api's these are our forward-looking api's so I'll talk about their feature set later in this talk but importantly they don't implement execution engine they want to make a very deliberate break with this history to get away with from the the baggage that it carries but orcs design is going to be inspired by MC jets because MC g8 has a really interesting design MC jet is really just the static compiler in memory with a little JIT linking stage tagged on the end of it so when you add a module to em CGD using that add mod method in execution engines interface that module is going to get run through the LLVM code gen and MC layers to produce an object file in memory in a buffer if you were to dump that buffer out to disk you would have a valid object file for your target platform right MCG doesn't dump it out to disk it just keeps it in memory and it runs the runtime do-while D linker over it to turn that object file into R or executable bits for you a target platform and then you can use execution engines query interface to grab back addresses in those raw executable bits and run the functions this design represents really efficient code and tool reuse we're reusing the whole static compiler pipeline that means for instance when somebody adds support for a new instruction set extension right we don't have to go and do anything special in the JIT to get access to that we get it for free it also means if anything goes wrong in the JIT it either goes wrong in the runtime to yld component which is fairly isolated or it goes wrong in the static compiler as well and so we can debug the problem in the static compiled out rather than having to have a separate set of tools for debugging and fixing our jet and from the static compiler pipeline we also inherit support for cross targeting the static compiler has no problem targeting other architectures MCG inherits support for that now MCG won't help you a whole lot it won't allocate memory on your remote machine it won't help you ship the bytes out there but it will compile and Link those bytes for you for a remote target but we also inherited from the static compiler pipeline a lack of laziness the static can pile up compiles whole modules up front it doesn't defer any work so neither does MCG yet but overall this design is very good and it's served us very well the problem is mcg it's implementation is hidden behind the execution engine interface so we can't get access to it directly the mcg 8 class definition is buried in the library there's absolutely no way for you outside the library as a user of this to get an MC jet pointer that you could call operations on directly so that meant that any feature that mcg required had to be surfaced through the execution engine interface and this caused the interface to bloat out of time and at the same time mcg can't support all of the operations that are in the execution engine interface I mentioned you didn't really want to see how bad execution engine had gotten but I will show you this one bit this is our simple query function and it looks fine as as it's written but it was written for the legacy jet which was in process only that's why this returns a void pointer right because it was an address in your process originally mcg it does cross target jetting that address that you're getting back is not necessarily an address in your process you don't want it to reference it you don't want to cast it to a function pointer in your process and worse still because you can compile cross target you could be a 32-bit mcg instance compiling for 64-bit target that 32-bit void pointer can't even hold a target in the process that you're compiling so the result is likely to be wrong so we fixed this by just adding some new methods that returned you in 64 and some other variants and we ended up with this that's what the query interface I haven't shown the overloads for those functions by the way there are more and the thing that kills me about this interface is that underneath all of this there is one string symbol table with you in 64 values in it and all of them are just querying that when you use a function pointer is the key we're just calling get name on the function and then looking up that string symbol table it really doesn't need to be this way it's just what we've ended up with over history so just a couple more things before I finished beating on execution engine I have a lot of aggression to work out against this interface it's been a couple of years of working with this I appreciate your patience being stuck behind the the interface means you have limited visibility into internal actions in the JIT if we haven't surfaced a callback for some action that the JIT takes then you don't get to know about it if you want to know when the JIT starts compiling an object or start compiling an IR module there's no callback for that you can't find out about it and lost off MCG edge is pretty terrible about memory management because the simple query interface lets you query based on pointers into your ir MC chip when you add a module to it we'll take ownership of that ir but it'll never throw it away because it doesn't know when you're done making queries based on it so the upshot of this is that if you're using mcg and you're not being careful you're likely to end up with three copies of your jittered program in memory the IR that you added originally the relocatable object file that was produced out of the compiler and the final R or executable bits if you're jittered program is tiny that's probably not a big deal but as your did a program wrote goes grows so that becomes really awful so all these problems prompted me to think okay let's go back take mcg its design which we really like but rethink the implementation of it so that's what led to orc orc is short for on request compilation which is just another way of saying lazy compilation remember we lost that feature when the old JIT went away and went away so that's something we really want to get back that was the genesis of the name the name is also kind of an in-joke we already have the elf object file format we have the dwarf debugging format I felt the bad guys were underrepresented so I wanted to address that some people have expressed distaste for this process where you would come up with the acronym first and then work backwards to the expansion but as Tanya pointed out the history of the LLVM project tells us that this is the way you want to go the acronym is what you're gonna be stuck with that's what you want to nail down so we're gonna rebuild mcg it in a modular way let's take the design as it as it stands now now this is buried in the library you can't have access to this but this is what it looks like so the first thing we're gonna do is break this up just a little bit and move these definitions up into headers where everybody can get access to them so we split this up into a compile layer and a link layer the interface to these layers I'll show you in a moment but it looks like the core of execution engine just tidied up a little bit you can add a module to it and you can query for symbols if you add a module to the compile layer it'll go through the code gen and MC layers produce an object file that gets added to the layer below which is the link layer that links it and produces R or executable bits if you make a simple query on the compiled layer the compiled layer is just going to forward that down to the layer below now even with this really small bit of decomposition we get some benefits one of the things we can now do is test these layers in isolation for instance we can go in unit test the link layer which is something we couldn't do before the other thing we get is the ability to observe some events without needing callbacks anymore anyway you have a composition point you have some visibility into how the JIT works so if you want to see when an object file gets produced you don't need a callback anymore you can go and write your own class that conforms to the layer interface and you could just slip it in between the compiling link layers and now you have a notification layer it's be notified when an object gets produced in the jet so the interface that lets us do this is it's a concept there's no class that defines this intra this interface but any class that conforms to this interface counts as a layer you have an add module method that takes a module a memory manager and a symbol resolver the memory manager is going to own the executable bits that the digit digit produces and the symbol resolver is responsible for symbol resolution this is playing the same role that the global mapping did back in execution engine but it allows you to do it on a per module basis rather than having one mapping for the whole jet then we have our simple query interface which is now mercifully a single function we use a string ref like symbol name as the key for lookup this means that ork doesn't have the problem that MC JIT does where it has to keep the intermediate representation alive in case you're using it for queries and as soon as your object is done compiling your module it can throw away its reference to it if you haven't retained any references to it you can free the memory up front there's also a boolean argument to this that's meant to help with compatibility with the static linker this tells you whether you want to be able to see hidden symbols in the module that you've added to the JIT if you don't know what these are this flag probably doesn't matter to you and you don't need to use it but if you want to match static link to a behavior this lets you do it and finally you have a col that doesn't have a counterpart in execution engine this is removed module this allows you to remove a module for the from the JIT and free any resources ciated with it like the compiled code so let's take a quick look at what it looks like to put some of these things together these three declarations here basically get us back to MC jet we have an object linking layer a simple compiler which is a utility that knows how to turn IR into objects and a compile layer that takes a reference to the link layer so that it knows where to send its output and a reference to the compiler so that it knows how to compile modules now when you're composing things like this yourself I'm imagining that you would write a you write your own jet class that wraps this up and exposes whatever interface makes sense for your use case in these examples I'm just gonna write this code in straight line fashion once you've composed these three things together to actually use this to run code is reasonably simple you add a module to the jet along with the memory manager in a similar resolver you call fine symbol on it to get back a reference to the symbol you're looking for get the address of that thing cast it to a function pointer and you can call it and that's all it takes hopefully if you've never seen the jet api's before this is encouraging it really only takes a few lines of code to get to the point where you can execute LLVM ir in process yeah through the jet so this is hopefully good for beginning users I think that this interface is also good for advanced users as well remember one of the things they needed was a direct configuration of the optimization pipeline the code gen pipeline in MC jet you had to do that by going through execution engines interface and toggling various compiler flags here you supply the compiler so you can compile the IR however you want you have total control over that and the other thing that isn't shown on this slide but this is much better at memory management because we're using string keys for lookup as I mentioned or can throw away your LOV Mir as soon as you're finished with the add module call so you only have one representation of your program in memory at any one time and I glossed over how you would get the memory manager and the symbol resolvers a memory manager is as I said something that owns the acute Abul code produced by the JIT it's going to free that that memory on its when it's destructed and a memory manager is anything that inherits from the runtime to yld memory manager interface so you can go and implement custom memory managers for your use case if you want to optimize but there's one called section memory manager that provides a good default for most people so if you don't need to optimize you can just take this off the shelf and use that next up we have the symbol resolver and I found myself playing a lot of games with symbol resolution when I was building orc so I wrote a function to make it reasonably easy to construct one of these symbol resolvers it's called create lambda resolver and as the name suggests it takes a pair of lambdas and it gives you back something that conforms to the symbol resolve our interface the reason this is a pair of lambdas rather than just one is again about matching the behavior of the static linker in the static compiler pipeline there are two different reasons that the static linker would look up the definition of a symbol one is the obvious case that you need its address you're trying to call a symbol or somebody's taking a reference to the symbol so you need the address of it the other reason that a static linker looks up a symbol is because it has either a weak or common let a common linkage reference or sorry common linkage definition of that symbol so it has a weak definition and it needs to look elsewhere in the JIT and see if anybody else has a stronger definition that would override it so that's why there are two of these it's about those two different types of look up the first lambda these lookups have different scope if you don't know about static linkers when you're looking for overriding definitions you'll look around elsewhere in your program but you don't look into libraries that you're linking against for overwriting definitions so the first lambda implements this kind of an image lookup and I usually implement this by just pointing the lookup back into my JIT so that if I add multiple modules they can all see each other's symbols the second lambda implements external lookup and I usually implement this by habit having it fall back and look in my process for functions that are defined there that way my did't code can call into code in my process so story so far before we get more abstract we have layers that wrap up jet functionality and make it composable and the idea is that you should be able to build custom jets using this API by composing layers together memory managers are going to take care of the memory ownership and symbol resolvers will handle symbol resolution but everything we've seen so far is really just a refinement on MCG we've tidied up the API a little bit we've done better with the memory management we haven't added any new features yet and that was supposed to be part of the the orc project as allowing us to do that and the promise was that new layers would provide new features for us so the first layer that we added was called compiled on-demand and if you had been missing lazy compilation as a feature I have good news for you when you add a module to the compiled on-demand layer nothing gets compiled upfront instead the compiled on-demand layer will scan over your module that you've just added and for every function in the module it'll build a stub and it'll arrange that the first time you call that stub will jump back into the compiler extract that function from the module you added into its own module and then run that module down through the compiler to compile it when you query for symbol addresses in the compiled on-demand layer those queries are going to resolve to the stubs so everybody always agrees about the addresses of functions in this world that means you can JIT code for languages where function addresses matter like C you can take a function pointer to a function that hasn't been compiled yet and you still have a valid address for it so let's take a look at what this looks like to use this was our example before without laziness let's turn it into a lazy compiler for IR we can do that by just adding the compiler on demand layer on the end there so we have this one extra compile and Manta layer declaration we hook it up to the compile layer below and now we add our module to the compile on demand layer instead and look up the symbol there no work gets done upfront when you reach the last line here you're calling foo stub and nothing has been compiled yet when we make this call we will jump into the compiler compile the food function and then runner but to do this the compiled on demand layer needs help from a couple of components it needs an indirect subs manager which is just something that allows you to create named indirect stubs which are indirect jumps fire pointers and the indirect stubs point a manager also helps you update those pointers so you can change where the stub function for any given ir function is going to point and then we need somewhere to point these things initially that's where the second component comes in that's the compiled callback manager and this allows you to create compile callbacks these are re entry points that'll get you back into the compiler and allow you to compile the function so let's really dive into what this looks like let's say we had a function bar that's already been compiled and we have a function foo that hasn't been compiled yet we have a stub for this thing which is just an indirect jump via pointer and we have to have some initial address for that pointer that's going to be a compiled callback for the foo function when we call foo we jump to the stone the stub because there's no implementation yet is going to jump to the compiled callback and the compiled callback is going to jump through a piece of code called the resolver which gets us into orc and LLVM and allows you to compile this function now the resolve is also responsible for saving all the programs state as it was at the moment that you made the call we go through the compiler we produce an implementation for the foo function and then we exit the compiler back through the resolver we restore all the programs state to what the way it was the moment before the call and then we enter the implementation of the function and then we can return to our original caller and in the process of going back through the resolver we also update the implementation pointer for our stub to take the compiler out at the loop so that on all future calls the stub jumps directly to the implementation of the function so that's how this stuff is implemented under the hood but the more interesting question is what it looks like to use the API to control all this so let's say you had one of these stubs manages and callback manages how do you actually use this if you want a lazily compiler function you create a compiled callback for the function first and then you use the stubs manager to create a stub for it you give this stubborn name you initialize it to point at the address of the compiled callback and then because we're trying to match static link of behavior you give it a linkage time so we're gonna say this is an exported symbol now we need to tell the compiler how to compile foo if anybody ever calls this and you do that using the compiled callback set compile action function you give this this compile action some function to compile the code that you want to be able to run and that has to return a target address for the implementation of the function once it's compiled so if we have a jump to the compile the compiled callback here we're gonna run I've written this compile callback as lambda just to make it easy we're gonna run this lambda and then we're gonna jump to whatever target address this lambda returns I'm gonna fill that in with printf hello world to compile my function for now and I'm just gonna return zero so now we can go and look up our foo function in the stubs stubs manager get its address cost it to a point a function pointer and call it that's going to jump into the lambda that's gonna print hello world and then these codes gonna jump to zero and it's gonna crash but we know how to turn this into a real compiler we saw that in the very first code example if we have access to a compile layer and we have some ir for our foo function we can just add that ir to the compile layer and then look up foo and get its address so it's really not that much extra code to turn this into a working example that lazily compiles food from ir and to do it directly using the callback interface but the great thing about accessing this interface directly is now you can push the laziness further up your compiler pipeline LLVM only has access to the ir representation of your your program right we don't have your ast representation or anything like that in the LLVM libraries but you do and you have your ir gen method so you can put the ir gen method inside the lambda here so i've changed this so that we're now i are joining from AST inside the lambda now you're compiling lazily from a sts so having direct access to this api allows you to to be as lazy as you would like to be in compiling your program if you're using LLVM so quick recap on this lazy compilation feature callbacks and stubs give you direct access to lazy compilation and they allow you to push laziness earlier in your compiler pipeline and if you're happy to produce all your I are upfront the compiled on demand layer provides you off the shelf laziness for LOV Mir so org supports arbitrary laziness with a reasonably clean API that's laziness I want to introduce just one other layer that we've got entry at the moment which is very abstract which is the transform layer this is a layer that allows you to run an arbitrary transform over any module that you adds the jet this is useful if you want to say do some logging simple so we run arbitrary transform over anything that gets added simple queries get forwarded to the layer below you could add one of these above the compiled on-demand layer for instance to apply some lightweight optimizations up front maybe you inline all you'll get is and setters because you don't want to be going through stubs to go to every one of those but you don't want to apply any heavy weight optimizations up front because that defeats the purpose of being lazy the nice thing about having these layers compose is that we could also add an extra transform layer below the compiled on-demand layer this one will only have modules added to it lazily so any optimizations that we put in the transform layer beneath compile on-demand those will only get run on code that actually gets executed so having layers and and modularity in your a API allows you to pick lots of features off the shelf you can mix mix and match these components and experiment with new JDate designs very easily and you can create modify and share new features without breaking existing clients which was one of the other problems with mcg yet because everybody was sharing a single class definition if you wanted to change its features you had to make sure you weren't gonna break any of the other users if you have lots of small components you have a lot more scope to either change or replace those individually without affecting all the other users but the other big feature of MC jet that was really interesting was this remote jet support so just a reminder that's the ability to execute code on different processes on a different machine or even on a different architecture one of the really interesting things that this allows you to do is sandbox your edit code if you're getting something like C that's not particularly memory safe and the code that you're getting has a memory leak and you're doing that jetting in process with now your jet itself has inherited that memory leak if you can jet across processes now you can sandbox the code that you're jetting you can have tighter security restrictions on that jet in code now MCG 8 supported this but it required a lot of manual work from you to use it you were responsible for shipping the bits to the remote end laying out the memory there changing the memory permissions and making sure the code executed I wanted to make that a little bit easier so there's now an orc remote target client server pair that provides a high level API for doing remote this thing gives you remote mapped memory managers stub managers and callback managers it'll handle remote simple queries for you and it allow you to execute remote functions so this was our example from a moment ago where we lazily compiled from AST in our process let's turn that into a lazy remote compilation of AST to a different process we just add a remote target variable up the very top there and then we use that remote target to create our stubs manager and our callback manager when we add the module to the jet we create a memory manager using the remote target and then when we finally call it we're no longer casting to a function pointer and jumping to the stub because this isn't a stop in our process anymore instead you use the remote target to call the function in the remote process and the remote target API has prototypes for a few common function types so alright let's try this out there we go ok so on the left hand side of this screen we have a JIT client this is where the compiler lives and this is where our JIT source lives on the right hand side of the screen we have a terminal and in that terminal I'm going to suborn the jet server process conceptually the easiest way to think about this is it's a blank process it doesn't know how to do anything it just has this tiny JIT server protocol sitting in there the JIT server protocol knows how to allocate memory how to set page permissions and how to jump to code doesn't know anything else now I'm going to jump over to my client here like good computer compiler scientists I'm going to start by opening hello world oops and I'm gonna run hello world Oh first I have to actually connect to this thing there we go I've connected to my JIT server I don't know if you can see because the fonts rather small but it's done a handshake the JIT client has found out what the target triple is for the thing it's compiling for now I can hit run on my hello world example and you'll see a prints hello world on the JIT so the server-side and then prints jet program finished so on the client side we compile this function pushed it over these two it's connected by a TCP connection pushed it over the TCP connection and ran it on the JIT server side and each of these functions was compiled lazily as we hit it we've only got a main function in the source file but we're using the i/o streams header so if you're wondering where all the other functions came from that you can see little boxes down there they all came from the i/o streams header down the bottom you can see the boxes for the functions that were compiled functions that haven't been compiled the read functions turn green as they get compiled and here we've got straight line code so there's nothing very interesting going on I'm going to pull up something a little bit more interesting so this program has two functions in it one called foo the prints invoked foo on the screen one called BA the prints invoked ba and in the main function just sits in a loop reading input off the command line if I type foods going around the foo function if I type bar it's going to run the bar function if I type anything else it's gonna do nothing now I'm going to put mic down for this [Music] thank you very much Chris okay yeah so interesting thing about this demo we haven't entered any input yet so we haven't even compiled the functions in this program that deal with input on the command line the first time I type some nonsense you're gonna see a bunch of functions light up as we compile the code needed to deal with that input do the string comparisons and all the basic things that this program has to do I can type more nonsense now nothing you need to get compiled if I type foo for the first time we're gonna have to compile a foo function this is also so the first time the program has called anything understood output we're gonna have to compile the foo function and all the functions that deal with student the standard output from the library so I type foo we print and vocht foo and we compile all that code on the other side the bar function down the bottom still hasn't been compiled I think the bar function has unfortunately been chopped off the screen there but if I type bar bar also gets compiled and now we've compiled all of the program that we can so we can lazily compile over a TCP connection I've done this just to x86 64 but you can do this across architectures as well I'm gonna type done and our chip program is going to finish now that's all well and good for toy examples let's throw something a little bit bigger at it I'm gonna pull in the IR this time because I don't want to spend time compiling all the C++ code this is the IR for 403 GCC from the spec benchmarks we run this every night in the nightly test suite when we're testing LLVM so this time our blank jet process is going to have an entire compiler poked into it one function at a time over a wire and you're gonna see it compiling C pre-process C code and outputting assembly in the JIT server so we're going to turn our blank process into a compiler I'm gonna hit run this will pause for a moment while it's pausing all of the textual IR and then we'll see a much bigger grid of functions pop up here we've compiled enough to produce the data section in a moment we'll have enough code to produce the text section there we go and that's compiling six thousands lines of C code roughly with a compiler that we poked over the wire over a TCP connection so this is all done with entry stuff you can do this with what's in a entry in LLVM today and it's actually for something that hasn't been performance-tuned yet remarkably responsive sir remote jet support it's actually really easy to do remote reading with orc remoteness is orthogonal to laziness and all the other features you can be remote and lazy or remote and non lazy the before I dive into the great things that you could do with this do consider the security implications if you use this in your program you wouldn't be the first program to support remote execution of arbitrary code so please do sandbox the server and authenticate the client and take some action to secure the channel that you're talking office so that random people aren't poking code into your machine treat this like mains electricity it's very useful but safety first but this kind of API does give us some interesting opportunities you know we could potentially enable new development modes with this I would love to have just an edit test cycle rather than an edit compile test cycle if we could get the JIT in the build system to cooperate it would be great if I get to the point where when I change my code for one of my LLVM optimizations I can just rerun the application and have the JIT in the build system compile only the functions that arethe I needed for me to rerun that one test that'll speed things up a bit and this this kind of dynamic development cycle gets a lot more interesting when you start considering remote development which is something we might all be doing a lot more of in the future if I'm developing for my Raspberry Pi I don't want to develop on the PI because that's slow but if I develop on my desktop I have this heavyweight cycle where I have to edit compile test edit compile deploy test I would love to just be able to remote cost my codes of the composite to the PI and have the compiler figure out what needs to be done on the other end of the scale you can imagine this potentially being useful for distributing work to compute clusters so you could have very very simple compute nodes that just know how to run this JIT serve a protocol and you can poke code into them I know that LOV M's jet api's have actually been used before for doing distributed database queries for large data sets and this could also be make it easier to do that kind of work and quick word on orc versus mcg it's the same underlying architecture static compiler plus G at linker but or cough is a strict superset of the features a more flexible API it supports remoteness and laziness and it has better memory management so I think that we should consider deprecating mcg and moving on to on torque to ease that transition there is an orc mcg replacement class that is a bug for bug reproduce reproduction of MC JIT on top of orc if you're using mcg right now you can flick a switch on engine builder and be flipped over to this orc MCG replacement class and I would urge you to try that if everybody's happy doing that we can delete the MC jet class but you'll still have the same API and the same feature set of course the long term goal is to kill off execution engine entirely and so this is an interesting time start thinking about designing a new jet entry for L Li and the CB API it's also an interesting time to start thinking about contributing new layers and components I know a lot of people who have looked at this API in the dev lists have expressed an interest in for instance hot function recompilation support we don't have anything entry that that does that yet but it would be easy to build a layer that adds instrumentation to detect hot functions and rican parliament hot up higher optimization levels there's already sadly plenty of API cleanup to be done all the core abstractions are in place here for orc but there's a lot of room to put more polish on these things I could use a lot of help with the architectural support I've implemented support for laziness for x86 by 386 and armed 64 but that's still a lot of architectures that don't have support for lazy dinning we can also really use some work on runtime to yld elf to clean the codebase up we inherited that from MC JIT and it's quite old now and it shows so if you want to get involved I would encourage you to check out the building a JIT tutorial that runs you through all the things that you saw today on the slides and then get involved go to lov m dot org bugs there's an orc component that's where I'll be filing bugs for this component sir that's it yeah I'm turned your mic off I think you can either come to this mic here or I can run this out here I've got some questions over that way huh so what's the schedule for doing the deprecation let's talk that over the mcj clients but honestly on so actually I switched ll DB OVA which was using mcg to this old game sea jet replacement class about three months ago and nobody noticed so switch over honestly we could delete it by the end of the conference if everybody's happy how do you have a multi-process or multi-threaded Stubbs so if you if you're compiling and then another another threat calls the same stuff we don't yet but I would really liked it so there's yeah there are a couple of missing functions associated with threading we also don't have thread-local storage relocation support I would love to get that that is by the way a really interesting problem when you have multi-core machines you can be getting ahead of way you're going to execute with the OS spec horse I really like that idea is there any performance overhead comparing to mcg it no I don't think there is I think it depends a little bit on your usage case somebody pointed out that if your tons of small functions to this jet the symbol query interface for the moment doesn't scale well but there's nothing like that's an easy fix there's nothing obvious that is slower I think it ork is basically strictly better than MC jet so when you're doing cross target how do you find like the share libraries in the in the system if you have differences routes or anything how do you deal with that so this remote simple query when you build the server class you give it a lambda that describes how to do symbol queries I've just used the else in here so we're just looking for symbols in process but you could certainly add the RPC that the remote target server is based on is extensible you could add an extra layer to the protocol so that you could also tell the remote process to load up dynamic libraries for every target that you support in this there's no target specific code in this you can do it in C++ code this is a pretty simple utility dad okay yeah I've actually added something like this for demo once it's very easy okay so I really like the demo that you showed and so you sure that the functions are compiled on the on demand at once at a time so the input is a large module for the GCC case for instance does doesn't mean that you extract the function and create a new module to send to D yes that's the only way LLVM really knows how to compile a module anymore the old had weird crufty ability this weird cross the ability to compile one function in a module MCG it compiles whole modules so every time we hit a new function we yank that function out into its own module right yes right so it comes back to the previous question about performance overhead so you have some overhead because you need to clone the function in a new mojo yes and also now the optimizer doesn't see the full module so there is no thing there is no interpreter optimization the same way yes if you were process processing the full module right yes so this is an interesting distinction to make I interpreted the previous question as is there any overhead to the or KPIs there's no overheads of the or KPIs say versus mcg but to the extent that you use laziness yes you you are going to lose optimization if you're yanking functions out into their own modules so laziness has overhead yeah thank you so on your example about like a hook in the compiler and then updating the pointer to point to the compiled function yes you still have to jump twice every invocation yes but what why can't you do like I funk that you just update the original that's on the to-do list there's no reason we couldn't do that and I would love to see functionality for that as well but what one thing that you could it's like if in your stub you say you put some kind pgo stuff right and you say look this is taking this amount of time and then you put a trace that just calculates how long it's taking and then you can say you might want to optimize these functions rather than those ones because my stub is telling you that one is hot and it's taking too long or something yeah this is another bit of functionality though I would love to see entry and we just haven't gotten there yet okay okay we have time for maybe one more question if there is one it looks like so given that interface around laziness can the work handle the case where you would like to say lazily JIT a large body of functions at once so that you could potentially optimize them together yeah so through that API you have direct control over all of this if you wanted to compile multiple functions in that in that lambda and then go and update a whole bunch of pointers you could definitely do that and actually the compile on demand layer already has support for that by default you get a single function but you can actually give it a partitioning function and say oh when I come when I call foo ask me what other functions I want to compile along with it and you'll get a call back and you can say oh go compile foo plus all these are the ones together yeah that's supported okay great Thank You Lane and now we have a [Applause]
Info
Channel: LLVM
Views: 4,682
Rating: 4.9292035 out of 5
Keywords: 2016 LLVM Developers' Meeting, LLVM
Id: hILdR8XRvdQ
Channel Id: undefined
Length: 49min 21sec (2961 seconds)
Published: Tue Dec 06 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.