RubyConf 2015 - Inside Ruby's VM: The TMI Edition. by Aaron Patterson

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Now I get why the IBM J9 team was poking at ruby jumping to memory addresses instead of managing one unified heap.

👍︎︎ 1 👤︎︎ u/wbsgrepit 📅︎︎ Nov 22 2015 🗫︎ replies
Captions
(upbeat western music) - Hey, we should start now, because it is time to do that. One sec, let me take a photo. (laughter) I've gotta use this selfie stick. (laughter) (cheering) Alright, alright, alright. Thank you, thank you. So I just want to say thanks to Justin for giving that really incredible talk earlier. It was actually really good, so everybody give him a round of applause. (applause) It's definitely the talk that we all want our coworkers to watch. All the ones that are not here and not watching the live stream. (laughter) Anyway, so before I get started I want you to know that me, Justin, and Gary, we all met before RubyConf and we swapped all of our slides, so I'm going to be giving -- like, Justin did a really good job, but that was my presentation that he gave. I'm going to be giving either his talk or Gary's talk. I'm not sure which one yet. I haven't actually seen the slides, so please, please bear with me. Alright, so let's try to do this. Okay. So, JavaScript. (laughter) I'm not sure whose talk this is yet. Could be Gary's, could be Justin's. Okay, let's see. Let's go to the next slide and see what that is. (laughter) I can't tell, I really can't tell. Alright, alright, let's get to some serious business. This talk is called Ruby VM Internals The TMI Edition. My name is Aaron Patterson. You know me on the internet as tenderlove. If you don't recognize me, like I look different than I do online, this is what I look like online, in case you don't recognize me here. This is my cat, Gorbachev Puff-Puff Thunderhorse. I've got stickers of him, so if you'd like a sticker, you can come say hello to me and I will give you a sticker of him. This is my other cat, SEA-TAC. SEA-TAC, Facebook, YouTube, Instagram. That's her name. We call her choo choo. We also have a sticker of her now, finally, so you can get that one too. I'm on the Ruby Core Team and I'm on the Rails Core Team. This doesn't mean that I know what I'm talking about, it just means that I'm very bad at saying "no". (laughter) This is very true. So, I work at redhat. I'm an engineer at redhat. I'm on the ManageIQ team, and we build an open source application that manages clouds, so if you have a cloud at your work we can manage it with our software. We manage anything from regular clouds to rainy clouds to snowing clouds. All of these, we can manage those. And the application is open source so you can go check it out here on the GitHubs. I signed up for the RubyConf 5k. Anyone else do this? (applause) (cheering) So, I signed up by accident. (laughter) I thought it was a retirement plan. (laughter) It turns out it's just a bunch of people who are going to run. And there's literally no point, it's just a run. (laughter) Anyway, I'm really, really excited to be here in Texas. It's a really great place, I'm glad to be here in San Antonio. I'm excited to be here in San Antonio because I heard that San Antonio is really famous for ice cream. I don't know if you know this. They're famous for ice cream. So my wife and I arrived last night and we went to dinner and I decided to order dessert and I wanted to get pie. There was no pie emoji, so I just put a pizza pie in there. So I wanted to get a pie, but I was afraid that the waiter might forget the ice cream, so I said to him, "Remember the a la mode." (laughter) (applause) I've been laughing at that one all day. Like, "Hey Ebi, listen to this, listen to this. "I'm gonna say this in my talk. "No, really." Alright. Okay. So let's do this, let's do this for real. Alright. I was gonna name this talk Stupid Ruby VM Tricks, but then I realized that the tricks aren't very stupid, so that didn't make sense. And then they're also not tricks. So it was just Ruby VM. So I'm going to talk about Ruby's Virtual Machine and its internals. And in case you can't tell, I'm very, very nervous. I've never given this talk before, and so I'm scared. But we will get through this together, and the upshot is that I'm right before lunch, so if I end too quickly, everyone will be happy, we get to go to lunch anyway. Also, this is the very first day, so by the end of the conference, everyone will have forgotten about my talk, so it doesn't matter how poorly I did anyway. Plus I probably also won't die on stage. Hopefully. Though if I did die, I would be dead and it wouldn't matter how bad I did. (laughter) So, warning, warning, this is a tech talk. There is actually a lot of code in here. I apologize in advance. There will be code, and not all of it will be Ruby. Much of it will actually be C code, so I'm sorry. So we're gonna talk about Ruby's Virtual Machine. We're gonna talk about its internals. And this is true, except that this talk is actually a talk about failure and how I failed. I decided that I would try to write an ahead of time compiler. I thought that would be really really fun, I'm gonna do that, and I failed at doing that. So this talk is going to be about that, the things that I learned on my way, and I don't think it's actually a failure, I'm just not finished yet, so I want to rebrand failure as "potential for success". So, it's there, it's going, I know how to do it, it's just not done yet. But I also thought, I really think data analysis is kind of a cool thing, although after this morning's keynote I'm worried about the data analysis that I did. I decided to do a time break down of my feeling of being successful. Like, when do I feel successful? I decided to log all that and figure it out. So I've actually broken that down into a pie chart, and this is what it looks like. And you can tell that this pie chart is legit because one of the pie pieces is actually extracted, and moved outside so you know that this is for real here. Anyway, so it says I failed, but at least I learned something, right? Right? I learned something. Does that matter? The answer: no. (laughter) Alright, so let's dive in, let's dive into this. As I was thinking about this, I was thinking,alright. We're going to do some ahead of time compiling. What does it mean when we do ruby and myprogram, we said Ruby, run myprogram. What does that mean? So I was sitting there thinking about it. I'm like well, okay. So when we think about running a Ruby program, we can actually break that down into two distinct steps. There's two distinct things that happen when you run a Ruby program, and these are the two things. The stuff that happens before the program runs, and then running the actual program. Those are our two distinct things there. And you'll notice that one thing naturally goes to the other. We say okay, well the stuff that happens before the program runs, that happens before running the program. (laughter) Then we run the actual program, and then the program goes. But this is actually a loop, too. I'm using some funny terminology here but there is a grain of truth to this. So we say like, okay, well there's some stuff that happens before the program runs, like parsing, compiling, etc. and then we run the actual program. And this actually happens in a loop. If you think about it, anytime you do eval, which you're not doing in your code, please don't write eval in your code. But if you do do eval, you're essentially going back to the beginning of this thing here, as well, right? You're doing that, you may be doing this in a loop over and over, whatever it is, depending on your program. Now, if we think about the actual details in what these two steps are we have, on the left here we have the Lexer, Compiler, Parser, and on the right side we have the Virtual Machine. And I don't really want to talk about the lexer and parser. I'm gonna dedicate about two slides each to those things. Most of the time I wanna spend on the stuff that happens between the parser and the compiler, right in there in that white space, then the compiler itself and also the virtual machine. So I'm gonna talk about the lexer now. This is the lexer, it exists. All it does is it takes your program there on the left, which is a string, and it turns it into some tokens for you. That is it. Those come out on the right side. Alright. So that's all it does, turns those into tokens. There you go. Done. Now the parser, it also exists, and what it does is it takes those tokens and tries to make sense of them and actually converts them into an AST on the right here. Okay, there we go. Great. It exists as well. And if you want to know more about this stuff, I'm not going to talk about this stuff, but if you wanna know more about it you should go to SoManyHs' talk at 3:05 today. She is going to talk about this very detailed. I've seen her talk and it's very good. I recommend it. So at this point, this is where Ruby 1.8 and Ruby 1.9 diverge. So Ruby 1.8 interprets the AST, so what happens in Ruby 1.8 is we actually have this tree representation of the code and the interpreter just walks along and says okay, well, we have made a Method Call and we're doing array.each as in the previous example, so we have to figure out what is array, then we walk back up to the Method Call, then we go down and call .each on that, and then we call into the block, and then we go back up to the top. That is the way the interpreter works in Ruby 1.8. With Ruby 1.9, we switch to using a virtual machine, YARV, which stands for "Yet Another Ruby Virtual Machine", and then my (VM) there is kind of redundant, so it's "Yet Another Virtual Machine (Virtual Machine)". But anyway, Ruby 1.9 switched to a virtual machine, and what happens here is we actually have a compile phase where we take that AST and we turn it into some codes, right there. So instructions, and we actually evaluate those instructions rather than walking an AST. So, what is a virtual machine? Well, I'm glad you asked that question, because a virtual machine is like a real machine, (laughter) but it's virtual. I've gotta kill some time here, I'm so nervous. (laughter) Alright, alright. So we have a computer. Let's say we have a real computer, this is a real machine with real code. It's got some assembly language on the right there. These instructions are actually defined by the chip that's inside the machine itself. It's a real machine, we have real instructions, and this chip defines those instructions. The capabilities of this assembly language are defined by the capabilities of the processor. Now what's interesting about virtual machines is that virtual machines give us freedom. The reason they give us freedom is because they're essentially imaginary. We're making these up, they're virtual, they don't necessarily exist. We implement these machines in software. But these machines can have any instructions that we want them to have. They give us the freedom to innovate on the machines themselves. So we can sit around, and this is actually a picture of my cat imagining her virtual machine, just thinking, "What instructions should I imagine up today?" So, there are two types of virtual machines. There are stack based and register based virtual machines. You should go look up register based virtual machines on Wikipedia later. We're only gonna talk about stack based VMs because that's what Ruby's virtual machine is. Ruby's virtual machine is a stack based VM. So you think to yourself, what is a stack based VM? What is that? A stack based VM is very much like a calculator, okay? It's very much like a calculator that I use, which is an HP, and I used an HP all throughout high school and then into college, before I dropped out of college. And the reason that I used an HP calculator is because it says it's Rad, right there. You can see this calculator is rad. Anyway, for those of you that have not used an HP calculator this is what it's like to use one. Let's say we want to perform this calculation, 9 times 18. The way that you actually do this is you say okay, on the HP you go 9, you hit 9 then you hit enter, then you hit 18, then you hit enter. And what happens is you end up with two numbers that are on the stack there in the lower right, you'll see 9 and 18. Then you hit the star key, and then that multiplies them. It pops those two numbers off the stack and it pushes the multiplied number back onto the stack. Now you may be thinking to yourself, "That's so much work, why bother doing all of that?" And my answer to you is, go back to your TI-83 Plus, we don't want you here. (applause) (laughter) Ah, yes. Actually, one of the other reasons that I liked having an HP calculator in high school is that no one would borrow it from me. (laughter) They'd say, "Oh, can I borrow your calculator?" "Yes, do you know how it works?" "Yes, of course it's a calculator." "Are you sure you know how it works? "Let me show you how it works." "Why would you do that? "I don't want to borrow your calculator." "Okay, cool." Anyway, so if we think about this calculator, the calculator has instructions. These are instructions, we're doing 9, enter, 18, enter, and times, and when we run that, we see that it's working with a stack. We say 9, enter, and it pushes that 9 onto a stack. So we actually have a stack here on the right. We push that 18 on, and we see that pushes those numbers up the stack. And then we hit times, it pops those numbers off the stack and pushes the resulting value back onto the stack. If you look at the instructions on the left hand side, we can actually directly translate those into YARV instructions, into VM instructions. So the instructions on the left are equivalent to the instructions on the right, except on the left hand side, that's our calculator, on the right hand side, that's our virtual machine. So when you're thinking about virtual machines, virtual machines should not intimidate you. It's just like using your HP calculator. The thing that should intimidate you is C code. (laughter) C code should intimidate you. It sucks, it's not very fun. Anyway, so we have our YARV code here, and when we look at this YARV code, these three instructions, this is our program, I like to think about this as our program. And our program is just an array of instructions, that we're just iterating through that array. And that number 9 and 18, those are instruction parameters, and then on the left side those are the instruction names, so putobject is a name, opt_mult is a name. Alright, so how do we get this "machine code"? We get the machine code through the compiler, Ruby's compiler, and Ruby's compiler is actually a multi-pass compiler. We're gonna look at the different steps that the compiler makes. We go through these particular steps, we have an AST which goes to a linked list, we perform optimizations on the linked list and we end up with byte code coming out at the end and we execute that byte code in the virtual machine. So we can actually access that byte code through this class called RubyVM::InstructionSequence, it's available on your Rubys today. And we're gonna be following that code through and seeing what it actually does when we compile code. It's essentially the same thing, evaluating code with this is essentially the same thing that the virtual machine does when you do ruby myprogram.rb. So the way you use it is just like this. You can say okay, pass it a block of code, or pass it a string of code, and then disassemble it. So if I run this code and print it out the results will look like this. You don't need to read it. That's just all of the instructions for that particular chunk of code. So if you're curious about what instructions that code is equivalent to, you can just run this and find out. Alright, so the first step is to get an AST. Now what is an AST? AST stands for "Abstract Syntax Tree". And the way that we can get that is via this function called rb_compile_string. Now, the return value there, this is C code by the way, that return value NODE, NODE is actually an AST node, it's just one of them, it's actually the top of the AST. So if you look at the AST, if we have that particular AST, that NODE* is actually the top there. It's pointing at that top one, alright? So the next thing we do is we translate that AST into a linked list. Okay, so what is a linked list? For those of you that don't know what a linked list is, linked lists were invented in 1836. They were actually invented at the battle of à la Mode (laughter) by Alexander Graham Link, and we actually have, it's really cool, I Googled and I was actually able to find some historical photos of Alexander Graham Link. He looked like that. (laughter) So actually, really, really, a linked list, all it is is just a list of items that are connected by pointers. In this case, this is a singly linked list, and actually in our case we're going to be working with doubly linked lists, so they actually link back up. The place where that linked list is generated is inside this function, rb_iseq_compile_node, and we pass it our node. So that node that we got previously, the AST node? We pass that in right there. Now, like any great program, or any great C program, the return value is actually right there, one of the parameters. Not only that, but that's actually our final product. That's actually the compiled down byte code itself, alright? We're gonna actually have to dive into this. This function is actually responsible for generating the linked list, doing any optimizations on the code, and then actually creating the byte code itself. So one function with one responsibility, right? (laughter) Love that. Anyway, so, if we dive into that function we'll see that we end up with this. There's a bunch of branches inside, I'm just showing an excerpt here. And all those branches are based on the nd_type of that node. Now what is nd_type? Okay, I don't know what this is, so... And by the way, this is my thought process as I'm going through reading this code, and unfortunately I was watching Justin's graph of all the different time things, and he's like, "Oh, this part gets bigger, "and this part gets bigger." and I'm like, "Oh man, I wish mine was that small." Anyway, so what is nd_type? nd_type is very simple defined as this. (laughter) Basically what we really care about is there is a flags member of the struct that gets passed in. Now, we're looking at the C code and we don't know -- what is that flags value? We can't really tell, so I decided to do something dangerous. We're gonna do something extremely dangerous and we're actually going to reach inside Ruby itself, get a handle to the function that compiles all of our stuff down to an AST, and actually call that from Ruby. So if we do that, here is an FFI program that actually gets a handle to rb_compile_string, calls it with some stuff, and then prints out our node. So we'll say right there is rb_compile_string, we're getting a handle to that C function itself, which we're not supposed to do. And this code will probably not work on your installations. I compiled my Ruby with 0.1 and -g so all of the symbols are available. This might actually be gone on yours. Anyway, we're going to call it with an empty file name with some code, and we're going to print out what the pointer is that it returns. And unfortunately we get an error, but that's fine, because it dumped out the flag value which is one. (laughter) Yes. Literally sitting at the computer doing that kid, that meme. Yes! Alright, now, which branch do we follow? We know what nd_type is going to return, it's going to return one. So if I look up NODE_SCOPE, that value == NODE_SCOPE, I go find that and I see that it's inside of an enum. That's actually the very first one that's an enum so it's equal to one. Yeah, we found it. Great. So this is the branch that it actually falls, it falls through a switch statement down to this default switch statement that calls COMPILE. Now, COMPILE is a macro and I didn't put the macro into these slides because I really don't want to hurt you that much before lunch, but basically what it does is this kicks off the recursive process of the AST, so we're actually gonna walk that AST recursively and produce a linked list. The main function that it calls is iseq_compile_each. This is our recursive function. Now the very top of the function looks like this. It returns an instruction sequence, takes compiled node, and it walks all the way through it. Now, I didn't put this in the slide because this one function is over 2,300 lines long. So, this one function is over 2,300 lines long and it handles every single type of AST node that you get in your AST. So let's just look at one. Let's look at "true". How does it handle "true"? Well, this is the case statement. It's just a very giant switch statement. This is the case for true, so if the AST is true, or for the true value in your Ruby program, we actually call ADD_INSN1, and what this does is it adds a new linked list item, so we get a new item added to our linked list. This is the object name, it's just put -- or that's the instruction name, the instruction is putobject, and then this is the actual value. That's the parameter that we're giving to the putobject instruction. So if we print out the code, so let's just say we print out "p true" and look at the instructions that come out, we can see right here we actually got putobject, that was our name, and you can see the parameter to that instruction is true, just like we saw in the code. So, I want to look at a little bit more complicated example, which is If Statements, and you'll notice, we're not going to walk through every one of these things, but you'll notice that it actually handles each branch of the if, so it looks at your conditionals, the body, and then the else clause. The important thing to notice here is if you see the macros called COMPILE, those are the things that recurse and any macro that starts with ADD_, that actually adds a new link to your linked list. So if we look at the linked list, we'll see something that ends up looking like this, where we have an instruction sequence putself, we're putting self up there, we're calling putobject. So putself just puts self onto the stack, putobject puts true onto the stack. And then we say call p and that pops off the stack and actually executes the function. So, why use a doubly linked list? Why do we have this doubly linked list phase? The reason we have this is because we're actually gonna be doing some manipulations on this list, and mutating a doubly linked list is much easier than, say, mutating a C array, right? We can actually link and unlink very easily with a linked list, and that's where we get into the optimization step. And the place we find that, again, is in the rb_iseq_compile_node, and that calls this function called iseq_setup. Now iseq_setup is responsible for doing the optimizations as well as doing the byte code. Again, two, one responsibility, right? That's great, I love it. If we look at that function it calls iseq_optimize, and this is where our compile time optimizations happen. And these, I'm calling these out as different than our virtual machine optimizations, and we can see what those optimizations are if you run this code, InstructionSequence compile_option, you can actually see all of the optimizations that are available in Ruby's VM. We're only gonna focus on two of these. We're gonna look at peephole optimizations and we're gonna look at specialized instructions. So these are two that are turned on. Peephole optimizations are essentially eliminating dead code, and I don't mean dead code like your code that doesn't execute, I mean useless instructions. So let's say we generate a bunch of instructions from this, we walk the AST, we generate a bunch of instructions, but some of them happen to be useless. We can eliminate those. So that's what these peephole optimizations do. So, for example, let's say we generate a jump, it jumps to a label, and immediately after that label we do another jump. Well, what's the point of jumping to LABEL 1 when we can just jump directly to LABEL 2, right? So this is the type of eliminations that peephole optimizations do, so they pull out those instructions that are useless. We also have specialized instructions, and these specialized instructions are for all of your special occasions, including weddings... (laughter) These random thoughts also pop into my head while I'm doing this stuff. All of your special occasions. Alright, anyway, so let's look at foo.bar versus foo + bar, okay? If we look at regular method dispatch foo.bar versus saying foo + bar, what is the difference? Well, in the case of foo.bar, we actually have to look up that method and call it, and in foo + bar we know that there is this method called + and we know the location of that and maybe that's the one that we can call. So, with our regular method dispatch, we'll say okay, we're just gonna send that method. We call send. Go ahead and do it. We call send for foo to figure out what foo is, 'cause it could be a method, in this case it thinks it's a method, and then we call send for bar on the return value of foo, right? Now, again, if we look at regular method dispatch for foo + bar, it looks exactly the same. We're doing send for foo and then we do send for bar and then we add the two together. Okay, now what specialized instructions do is they say, "Well, you know, "yes, it's true that calling + is a method send, "but on the other hand, maybe, "unless you're using rails, "hopefully + is not monkey patched "so you're probably gonna be calling "the real + method itself, "and we can optimize for that." So that's what specialized instructions do. So in this particular case, those sends turned into opt_send_without_block. That used to be a send right there, but now it's opt_send_without_block, so it's a specialized instruction in case you're calling the method without a block, and in this case we are not, so we use that one. Again, here we have that other send was translated, and you'll see down here we're doing opt_plus instead of calling a send. Before, we had three sends, now we have these three specialized instructions. So these specialized instructions do less work. Okay, so as a reminder, we haven't actually run any Ruby code yet. This is all in the stuff that happens before your code runs phase. So we're gonna continue on from linked lists. (laughter) We have a bit more work to go. Alright, byte code. So inside, we're gonna pop back up the stack a second, we're inside iseq_setup. And then the next thing that we need to do, the thing that actually converts our byte code or our linked list into actual byte code is this function called iseq_set_sequence. Very descriptive, right? We all clearly know from this function name that we're gonna make byte code here. So what is byte code? Our byte code in the Ruby VM is just a list of the integers, okay? That's all it is. It's literally just an array of integers. Those integers are actually addresses but we'll get to that a bit later. It's just a list of numbers, and in fact, this fact will come back to "byte" us. (laughter) And random photo, I don't know. Alright, so, we have raw instructions and our raw instructions actually end up in this VALUE*, this is a list of our instructions, okay? That is just an array of integers, it's actually an array of addresses, as I said, pointers there. Anyway, we add instructions to the list, this is actually how it gets stuck into the list, so you'll see generated isequence, we're actually inserting that byte code right into the list there. And any of those parameters, for example, we saw putobject 9 when we were doing the calculator example, those are gonna go into the byte code down here. I have omitted a lot of code in here because this function handles every single type of linked list node, right? Alright, so for example, we have this thing like putstring, for example, this is one of our examples here, we say putstring, putstring is our instruction, foo is the instruction value. Now we've got a list of integers. Yes, we finally have our byte code. We are ready to run this stuff. We're ready to do it. This is where the virtual machine comes in. Now, we say to ourselves, "Okay, "how does a virtual machine work?" We need to know what these byte codes actually do, and to find that out go look inside of insns.def, plus or minus some n's and s's, that is the file, it has every single instruction in it. This is the instruction layout. This is the format, it has the byte code name, that'll be the name. This is the operands to the byte code. So these two things, the source of data is from our list of integers. The instruction name and the instruction operands come from that list. Now, these other values, the pop_values and the return values, those are stack manipulating values. So we're actually gonna pop things off the stack and then we're gonna push other things onto the stack. So here's an example of an actual one, this is an actual instruction. This is for the putstring instruction, and you can see the name of the instruction is putstring. We have one parameter, which is a string. We don't pop anything off of the stack and we push a value onto the stack. So next we'll look at VM optimizations, and these are different from our compile time optimizations. And we can see the optimizations that our VM has by running this code. There's a constant called OPTS on there. And you'll see we have direct threaded code, operand unification, and inline method cache, and I'm only gonna talk about direct threaded code today. You can look up the other ones online. If you want to manipulate any of these, there's actually more optimizations you can do. Check vm_core.h. You can tweak stuff in there if you want to. So let's talk about different types of virtual machines. For example, we have native execution machines, which is what Ruby 1.8 was like. We just have a chunk of code and we actually just walk through and execute all of that. Okay? That's our baseline thing. The next thing is we have virtual machines that are called decode and dispatch virtual machines, and the reason they're called decode and dispatch is because we have our chunk of code here that gets translated to a bunch of instructions and we have a central loop. That loop loops through the instructions, looks up the instructions from a hash table, and then goes and executes the particular function that it needs to call for that instruction, right? That one returns, and then the thing loops back up on itself and goes and executes the next one. But we can actually do better than this. We can do better than this, we can do threaded interpretation. So we can say okay, well, we don't wanna do this, this loop is costly, we don't want to be executing this loop over and over again. So what we can do is when we generate the virtual machine, we can take that central part of the loop, the part that dispatches, and we can insert that at the bottom of every one of those instructions. So we can say alright, we're actually gonna take that, stick it at the bottom of the instruction, and as soon as we get to the bottom of that we're gonna go execute the next one, have that lookup code right there so we can actually hop directly to the next one and continue on, so we don't have that loop. Alright, so the important thing to take away from this is that our VM is actually generated and that eliminates the loops. Now, we can even do better than this. We can say well, remember we're doing that lookup from a hash table. We have to say okay, putstring, where does that go? We'll look that up in the hash table and then we go call that function. We can do better than that. What if the instruction was actually the address of the function that we wanted to call? So now instead of going and doing that hash lookup, we can say alright, we have the pointer, we know where that function is, let's directly jump there instead. This is called direct threaded interpretation, and this is what Ruby's VM does. So we eliminate that dispatch code and we can jump directly from the end of one instruction to the next instruction. So no more lookup, we just go all the way through. So, alright. Cool, we've gone on a tour of the VM internals. Let's talk about failing at AOT compiling. As I was going through this trying to put together an ahead of time compiler, I thought to myself, you know what? Our byte code is just a list of integers. That's all it is. No problem. The integers are addresses, we'll just take that list, write it out to a file, later on we can read that file in. We can say this is me, you know, "Aha, if it's a list, we can write it to a file, we can load the file back in and execute that list." Seems like a good plan, right? Right? Well, unfortunately past Aaron was not very bright. Past Aaron, he was an idiot. But he's smarter now, though still not too smart. Alright, let's take a look at this step in the code where we say alright, we're gonna stick our instruction into the generated list, and down here we're gonna stick our value in. Now that parameter, you may have noticed there that it's actually a VALUE which is a heap allocated object. It's actually a Ruby object. It is a Ruby object, it is a pointer to a Ruby object, okay? It is a heap allocated Ruby object. That means that the next time we actually run the program, that location is bad. It's gone away, it's different, because the next time we allocate that string it's gonna be in a different location. We can't look it up again. The pointer always changes. So we actually have to write that object to the disk and then load it up later. So, it's not an impossible task, it's just something that we have to overcome. So let's do end stuff here. I've been talking about VM internals. I hate giving a talk without giving some practical applications to this, so let's talk about a few practical applications and then wrap this thing up. So, we can tweak optimizations. If we know that our VM is gonna be faster with some particular optimizations, we can go test those out, recompile our Ruby, and get our programs running a bit faster depending on those optimizations. We can understand what our code is really really doing, so we can take that Ruby VM instruction sequence, compile it down, look at the assembly for that, and understand what it's actually doing. So, I encourage you to go take a look at these two examples. I'll post these slides online so you can try it out. But look at the difference between the byte code generated between those two. I think you'll be surprised at the difference when you look at them. Another thing I'd encourage you to do is browse iseq_compile_each. I know it's 2,000 lines long, or over 2,300 lines long, but it's chunked up into switch statements so you can find various constructs in there. So if you wanna understand how Ruby handles if statements or case statements or begin, any of those things, you can look it up in this particular block. For example, we can look at this code and say, okay, here's a quiz for all of you. Which one of these is faster? I actually tweeted about this a little while ago. If you go through my tweets you'll find it. Which one of those is faster? I'm not gonna tell you. If we have time for Q and A maybe I'll tell you. Now that I've changed this, notice I have changed only one line. Now which one is faster? And why? Why is it faster? So if you go look through that code you can find it. So remember these things, if you're gonna remember anything from this talk, go look at this file, insns.def, iseq_compile_each, iseq_compile_node, and if you wanna know more, there's actually a YARV architecture document checked in to Ruby itself. It's called yarvarch.ja, you can find it there. Unfortunately, it's written in Japanese. (laughter) The good news is that I can read Japanese, so it's fine for me, but then I found that there is actually an English version, so I opened that up, yarvarch.en. This is the entire contents of the file. (laughter) So if there's anything else you want to know about it, I recommend reading Ruby Under a Microscope. Also there is a book called Virtual Machines that's mostly about actually virtual machines themselves, not programming languages, but containerization, however that has a lot of stuff in common, so you can learn from that as well. Thank you very much. I have stickers, come say hi to me. (applause) (upbeat western music)
Info
Channel: Confreaks
Views: 7,291
Rating: 4.7666669 out of 5
Keywords:
Id: CT8JSJkymZM
Channel Id: undefined
Length: 39min 10sec (2350 seconds)
Published: Fri Nov 20 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.