GopherCon 2015: Rick Hudson - Go GC: Solving the Latency Problem

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

okay so today we're gonna talk about the go GC this is gonna come out in a month in 1.5 in August and we think we've got this latency problem solved for you and that's what we're gonna talk about so before we go on I want to talk about my co-defendants and my friends these are this is the Cambridge run time team Austin this is a new hire he's a very big brained type of guy sitting also next to him is Russ who we talked to what we heard about this morning I'm Rick Hudson and David chase just joined us a few months ago so that's the team and sometimes we have discussions about the other go they go you play with a board in little pebbles and sometimes those discussions in this way the discussions about the language are much more removable so okay so let's talking about making gogo okay so we have this concept of a virtuous cycle okay it's an economic term and basically it says that if we write software and the software will be written in go then our dream is that when the next piece of hardware comes out from one of the hardware companies that software unchanged will run better meanwhile since we have this new great hardware we're able to write better software and we're going to create this virtuous cycle where every generation of hardware enables a new better generation of software which just gets better with the next generation of hardware and that ran this industry for 30 years until about 2004 and but as we all know Hardware people they're still doing their they're putting 2x2 transistors in the chip but it's not giving us 2x the frequency anymore for good solid physical reasons but what we're noticing is if we've got more transistors it's not translated into more cores and go is making a bet more cores better software but the hardware folks will not click more cores in their hardware if the software isn't going to use them so it's this balancing act of eachother staring at each other and we're hoping to go it's going to breakthrough in the software side so long term go wants to reestablish or establish this new virtuous cycle okay short term we need to increase go adoption a lot of things that Russ said this morning on doing this the number one barrier at a low technical level has been GC latency mostly it's described by our users of the GC is too slow but it's really the GC takes too much of a pause okay about a year ago I was hired and we said well how are we going to solve this problem what are we going to do and more engineers the first thing we did was we looked for a workaround okay so okay maybe we can't solve this problem we need a workaround so let's go back to the basics when's the best time to do a GC well it's when nobody is looking okay so we're gonna put a camera on the PC and we're gonna look at your eyes okay now when you look away we're gonna do a GC and we think that there's enough time when you're looking away to slip that GC in well I'm not sure that's what Google hired me for they had another project that was about to fail you know so anyway so okay well we can't follow their eyes so let's distract them okay so the idea was to just will pop up a network wait icon we won't tell them that what it is will make them think it's the Wi-Fi will stick waiting under it they'll be happy as hell you know you know G C's not the problem okay that's fine but so those were the ideas I'm not sure that Russ was approving of them but we thought they were good but the fact of the matter is what we really wanted to do was to be honest we wanted to trade throughput for reduced GC latency and I've had this ongoing discussion about throughput and I'll just be real honest with you we're gonna make your program slower not by much okay that's just a fancy way of saying that but what we're going to give you is the number one thing you guys have been requesting which seems to be reduced GC latency and you'll see numbers further on showing how much we lost throughput not actually very little and you'll see great numbers for latency so latency let's talk a little bit about the other half of this stuff and that's latency what is latency and what is if you don't know what time is it doesn't make sense so what's a nanosecond grace hopper the the great computer scientist when I was an undergraduate came and gave a talk at my college and she gave me an 11 point 8 inch piece of wire and she said that's a nanosecond electricity that's the maximum speed that electricity can travel that is the same speed as light in a vacuum in space 11.8 inches and then here's a picture of her holding up a microsecond 984 feet that's how far a light or electricity will travel one mile no for one mile a microsecond they will travel for 984 feet 5.4 microseconds will travel for a mile so you started talking about microseconds you're talking about very very short amount of time okay get through the onion talk we saw this morning to get to the outer onion that's hundreds of microseconds okay so let's talk about milliseconds one one thousandth of a second okay if you want to read one megabyte only one megabyte that's gonna take a whole millisecond if you've got an SSD if you've got the old mechanical stuff where heads have to move for Christ's sake you know oh that's going to take 20 milliseconds to read a megabyte from that disk now let's get into the human stuff there's this thing that psychologists call perceptual or causality you know that basically says if you do something with this hand and you see something with your eye if you see it within 50 milliseconds you perceive it as being caused by whatever you did so if you move your mouse and the cursor moves in 50 milliseconds you're happy as a clam you figured that the curse of moving because you're moving your mouse takes more than 50 milliseconds you do what I do you shake your mouth thinking for some reason that's going to help it you know but don't know why we do that but everybody does okay 50 milliseconds up you ping something across the country you pink something across Denver yeah that's gonna take about 50 milliseconds how long does it take you to blink your eyes 300 milliseconds okay so now that we have a concept of what a millisecond is let's see how much GC we can do in a millisecond so the first thing we did we said well let's look at Java these guys have 20 years on us okay but job is different Java's different than go in a lot of ways but I'm going to talk about how it's different to the memory system first of all we have thousands of go routines Java was invented in a time where you have tens of Java threads running on a single core we have thousands of go routines running on lots of course we do synchronization by a channels they use objects and locks one of the big advantages and I sort of fought to keep this on the slide was that our runtime is written in go and that means that if the GC is causing the compiler problems the guy in the next office comes over to me and says the GC is causing the compiler problems let's figure out how to fix it okay if I was writing in C that wouldn't have been a problem and that's what they're doing over in Java Lent but the most singular most important thing is that go gives you control over spatial locality so if you have some information and a bunch of backing store like an array of floats and already of char in the case of a string you can actually put those in the same object and allocate the memory forum at the same time that sticks these things together in memory so no matter what happens to one part it happens to the other part and that's very valuable because now the programmer has control over that we're in the Java world in the list world and swallows are world these would have been connected by a pointer the garbage collector would be moving them around independent of each other and that is you can imagine who's going to cause all kinds of cash problems and greatly slow down your program so the decision was let's not go to Java let's build a GC for go for the language go and let's get rid of this old stuff and that's what we did so all GCS have certain things in common so for people not familiar with GC to make the rest of the talk work well I'm gonna go through a couple slides explaining the phases of the GC so we have our heap over here which is a bunch of objects connected with pointers but during the scan phase we have to figure out what pointers allow the application to get into the heap and these pointers are located in your registers you direct access that way or in the back up of the registers which is the stack you get access that way or they're located in Global's and that's it so if we know all the pointers in those locations we know all the ways that the application program or the application go routine can get into the heap we gather these things during something called a scan phase so that's one phase the next phase is the mark phase straightforward we take the pointers we gather we go to the heap and we sort of do a transitive walk and every time we run into an object we mark it as being reachable from the program and that's great not a GC if it had controlled the world it would just stop all the application threads tell them not to bother it it was busy doing important work but we want these applications ready to continue to run and they're going to run around and they're going to start changing those pointers and we're gonna have to ask the application program to tell the GC I'm changing a pointer in this case it allocates a new object then it creates a pointer to it the GC it already marked that object is reachable it will never know about the green object so in the mark phase the program has to tell the GC it uses something called a right bear okay to inform the GC of what it's doing and the GC has to take advantage of that and know about it so and and that is more expensive inherently than simply stopping it and not letting it make those things so these right barriers and this communication is what's going to slowed things down a little bit okay so the next phase and that's for this one it's pretty simple we've got all the objects marked we'll just run through memory and if something's not marked we know it's free and we'll let the application program reuse that space and on and on so one slide this is what we do okay we start off with this right barrier turned off we are letting the application run as fast as possible it has to check to see if the GC has been disabled if it has it just goes ahead and slams a pointer into the slot that it wants to the check is in an out of order machine there's no dependencies it'll just fly you won't even notice it it'll actually be very fast on modern hardware at some point we decide it's time to do a garbage collection so at this point we say okay before we scan these stacks we're gonna have to stop the world stop all of these threads or go routines excuse me you know and get ready to do the garbage collection so there's a brief stop the world there and we're talking sub-millisecond stop the world there that's the little one okay so you're gonna lose a millisecond okay then we scan the stack just like I said and that we can do while the go routines are running okay during that stop the world we turned the write barrier on so from now on the application program is going to tell the garbage collector what it's doing we didn't go through the mark phase we mark everything concurrently again the right barriers telling them do you see what's going on when that's done we want to finish the mark is done as much work as it can it's profitable it's marked almost everything it wants to coordinate one last time with the application programmers so to do that we actually do another stop the world okay we finish things up we do some bookkeeping we take care of finalize errs which is a really a pain but we take care of that and some other things and when that's done we turn the write barrier off we don't need it anymore we go into the sweep phase and we restart all of the application threads and when that sweeps done you know rinse and repeat you know we're good to go okay so that basically is the algorithm we have correctness proofs in the literature you can come get them for me if you need them and that's cool so let's look a little bit deeper into what's going on I think you all understand this applications running along it stops and then all of the CPUs are used during the GC application gets no time GC finishes a little later this is what one-point-four did one three one two okay on back but in the New World in one five it's going to be different okay we still have these big gaps where the application program runs no write barriers turned on okay but then when we do a ripe area we have that brief stop the world that's your left goalpost there okay typically a millisecond or less and then we go into the actual garbage collect so we give the GC 25% of the CPUs okay and that's the lower blue bars here and that's the GC working as hard as it can now the evil mute evil application program can allocate a lot and to keep that in control we actually ask it upon allocation to assist the GC so if you're not allocating you're not going to be asked to assist the GC and this is how we make sure that we finish the GC without having to increase the size of the heap we just simply slow down the mutaters that are allocating are you using the resource we're trying to take care of so that's the algorithm this is how it works let's get on to the slide of the show this is the punchline so what we have in the blue and the red is one four and one five Layton sees these are seconds on the right hand side those are gigabytes on the bottom and as you can see that one five which are the yellow or orange I guess brown dots there right down there on the axis okay so this is the slide that nobody thought that we could put up there let's show how close that is to the bottom okay so this is the garbage benchmark okay now I just dropped that scale by 1000 X okay so this is 1000 better on the scale here okay so and here we go there's a couple thousand points done here as you can see they're all pretty much below two mil two to two milliseconds a couple stragglers the OS people say that could well be my UNIX box causing some of those we don't know what they are but we're not worried about it you also might notice that there's a slope here okay this is the value of having visualization okay now we know what's causing that slope we know how to fix it it'll be fixed for one six but as Russ said this morning everybody has to play by the rules we didn't figure it out until the one five closed date and Austin I weren't even gonna bother to ask for us because we knew what the answer would be okay so we're that'll be fixed so what's the cost and what can we do about it okay so here's the cost these are various numbers play as a benchmark we implemented from the JavaScript octane suite because we didn't want to cheat and if there's a lot on the far left we have like a 50% increase in the heap size over live heap which is well below about half what the default is the second set of pair of dots up and down are what the default 100% is but if you increase it to 200 a little bit beyond you really start to get the actual cost the throughput cost down low so this is these are throughput numbers sometimes you do even better for example in JSON it's even more dramatic the drop off but then you start seeing we're starting to get faster if you give us more heat for single-threaded programs it's great because we don't stop a single-threaded program we just let it keep on running we want to keep the compiler guys happy tell them their compiler doesn't have to stop for a GC it works wonders in the office okay so that's the story okay and this is what we're gonna have for 1/5 so this is a victory lap for us you know it's a big deal and we have to get this out to people and we have to start telling them that GC is not a barrier anymore they can use go and we need to attract C programmers because we've got a better place for him now and the reason they've always been using the GC was in their way is no longer true okay in 1/6 we're gonna be tuning for a sweet spot but we don't know where that sweet spot is we want it to be more predictable we want things to be higher throughput we want even lower latency we know how to get the lower latency now but that 1/6 work is going to be use case driven and it's going to be the use cases of people in this room so you need to talk to me okay and I'm talking to some of your already it's pretty exciting but that will decide what we do in 1/6 and once we've done that and get this word out we're going to increase go adoption and once go demonstrates to the hardware people that it's a viable way forward ok and they're all all on board with this virtuous cycle because they win just as much as we do we're going to establish a virtuous cycle and we're going to use go to do it and that's it for the talk and I'll take your questions

Info

Channel: Gopher Academy

Views: 18,862

Rating: undefined out of 5

Keywords: programming, golang, gophercon, software development

Id: aiv1JOfMjm0

Channel Id: undefined

Length: 22min 51sec (1371 seconds)

Published: Tue Jul 28 2015