Instrumenting the real-time web: Node.js, DTrace and the Robinson Projection

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I am Brian Cantrell VP of engineering at Giant and I'm gonna be presenting today on inch cementing the real-time web and in particular focusing on two technologies and a projection nodejs and DTrace so first and you know I debated whether I should even have a slide on node at all or whether node should simply be assumed at this point how many of you have heard of nodejs okay that's good you're all the right conference those of you put your hands up your hands on up you may be at the conference next door and Zoe something else going on around here no js' is something that a lot of us have heard about how many of you have actually implemented in nodejs okay actually a decent number of you and how many deployed into production on nodejs smaller number of hands but okay but what you would expect given where the technology is so a lot of you have heard about it a bunch of you have implemented it and some of you have have deployed on it and so this is no js' is a little snippet of nodejs the beautiful thing about node is that you can write a complete somewhat useful program the hello world program is a web server six lines of code really beautiful node is as virtually all of you know but just to reiterate it it's a JavaScript based framework for building event oriented applications and we're seeing a lot of energy behind node and that's because node is really a confluence of three historical ideas so three ideas that have been brewing for a long time first of course is JavaScript support for asynchrony javascript has got terrific support for asynchrony and for those of you who've been doing this for a long time and if you've met sovereigns here for a long time remember when you first picked up JavaScript you know maybe maybe it was as early as 2000 probably wasn't probably was like 2003 2004 okay you know javascript was kind of a for dummies language and you begin to play around with it and you have that moment of strange attraction remember that that moment of shame guilt I think I think I'm attracted to JavaScript and that's not right I have a computer science degree I'm not supposed to be attracted to JavaScript but as you get into JavaScript eros glad damn it there's a powerful language sitting over here in the browser this is a really interesting language in fact I view it I am a kernel engineer to see bigoted I view it as dynamic see it is the it is the the syntax of see but it is a fully dynamic language was great rich support for asynchrony including closures which are really critical to javascript i it's always frustrating to me that javascript books don't begin with chapter 1 closures i know this is going to be hard but it's important um closures are really critical for javascript and they allow a synchrony in the browser which of course we can leverage on the server side so that that is kind of one trend that's been brewing for decades really 15 years um the the other major trend or a second major trend is the fact that Java via JavaScript VMs have become very high-performing thank you v8 and v8 really kicking off an arms race now where the battlefield is performance and we now see depending on how you count somewhere between three and five professionally written VMs a for JavaScript on which the battlefield is performance that is terrific and is terrific for all of us terrific for Humanity and really great for JavaScript and opens up some new opportunities and including node and then of course the third thing that need to add to these two other ideas is if you take JavaScript rich support for asynchrony you put that on a rip and fast vm namely v8 and then you add to it the system API is that God intended namely the UNIX api's yes God did intend that don't sorry exactly I'm apologies but I'm having to break the news to you but that actually is God's intent I'm here to convey that to you he wanted me to make that really clear today in the talk I didn't want to go on about it but he really wanted me to emphasize it so there it is the unit's API is all the api's that God intended and that is why when you if you were coding in node you're kind of coding along a node why it feels right you go to reach for an interface you go to reach for an API and it's right there and it's right there because it's just the UNIX baby but it's asynchronous and so you take these three things together and all of a sudden you can deliver new kinds of applications and that's what we've seen and we've seen this explosion and interest and to me the explosion of interest was never queerer than with node knockout this is a programming competition that that was held in August of 2010 and was fun by a bunch of folks including giant enjoyin got the opportunity to host the programming competition now when I heard that they were hosting a programming competition my first thought was on like a friend that's telling you that they are contemplating a career in stand-up comedy and you think yourself I'm embarrassed for you you were going to embarrass yourself and you're going to embarrass me and that's where I thought when we're hosting a programming competition but this is just gonna embarrass the environment like four people are gonna show up for the program how many people go to a programming competition this is not people don't engage in these kinds of things in a regular basis I was very surprised to learn that the programming competition which only had 224 seats had sold out and in fact it sold out way before weeks before the programming contest to the point where people were looking to pick up with other teams and so on amazing amazing amount of interest in what was a very nascent programming environment so there's this tremendous environment in tremendous interest in this no knockout competition the competition is a weekend long competition in which in one of teams of one to four endeavor to build something neat and something usable with node and so these guys all gonna show up on Friday night and code all weekend and then be judged on Sunday so we view this is a great opportunity and we join you this is a great opportunity because here we had a bunch of folks that were going to organically write these new applications what could we learn about these applications by providing an environment by hosting them so that's really what we wanted to understand and one of the things we considered could we build a leaderboard for this thing I mean it's a programming contest there's no kind of notion of who's ahead I mean it's judged at the end but maybe we could do like a real-time leaderboard of some some play for another and Ryan doll the inventor of node who works at giant he had a really great idea and his idea was why don't we instrument incoming connections people are going to begin to deploy these things onto the Internet the programs are gonna be deployed on the Internet and what's instrument incoming connections let's watch where they're coming from and let's geo locate those IPs and throw them up in real time on a world map that sounds like a great idea that would be really pretty neat and it would allow us to see where everyone coming from now as I have done very often in my life I let my intuition get ahead of the data which is almost always wrong and I did think myself well it's gonna be kind of neat but let's face it everything is gonna be coming from the Bay Area I'm sorry to be so Bay Area centric but it's like the contest is in the Bay Area everyone's in the Bay Area it's gonna be a bunch of Bay Area um so what let's do it alright whatever it's think there could be someone somewhere else in the world I guess um so let's go ahead I thought that sounds exciting let's go do that now in order to be able to do that with there's some technical problems that we need to go solve or we needed to go solve um and we didn't have much time to do it you know you're gonna see a parallel between no knockout and what we did in preparation for no knockout um we needed the instrumentation to be entirely transparent so we want to go instrument these incoming connections but we don't have to do log analysis we want people to be able write their own application that we don't want them to be able to see the fact that they are being instrumented and we want to be able to instrument pretty high up in the stack we want to be able to see node connections we don't want to see SSH connections and other connections that people might be might be making so we really wanted to be transparent and non invasive and these actually these constraints are of course a natural fit for dtrace so um I don't want to bore you with things who are you know how many of you have heard of dtrace or know about DTrace some of you okay great um how many of you have used dtrace or use dtrace how many for how many of you has dtrace saved your ass let me ask that okay you all are awesome so yeah dtrace is wendy try saves your ass it's just it's the difference in opinion about dtrace yeah I remember the first moment II try saved my ass personally it was obviously very early on in the traces development and you realized that this is actually going to be something that's important for a saving so a dtrace was developed by by sun microsystems shed a single tear circa 2003 by me and Mike Sparrow in a dilemma Thal we developed it for Solaris 10 but it was open source along with rust the operating system in 2005 and has subsequently been ported to many systems including the Mac here um do I showed you of a two second demo DTrace is that okay I'm not the head to nodding that let's go look at that I mean this is I just so beautiful that I can actually go on to my macbook here although this is going to be okay I'm going to see that yeah this is gonna be alright so let's just run dtrace without any options and here so I'm on the Mac now I'm not this is just running on my own my own Mac uh in fact you see words like this that we're not in the original so if I run DTrace not on the options I just get kind of a very unix-style help message here what I'm actually gonna do is let's do let's instrument all IO in the system so what we've done is we have instrumented the code paths that initiate IO and I assume that if I come back over here and let's start up some IO here or do it okay so we can see that we're seeing a a probe now whenever we do any IO which is in this is telling us this is the the the CPU and the probe ID and this is telling us where we are in the kernel when we actually hit that probe that's not very interesting that we just did I oh but with D and D trace I can attach arbitrary actions to it so for example I can say printf % s did some IO sorry this is my I'm like doubly blind over here where are you there you are I'm missing a quote demon this is don't do this to me yeah yeah yeah what's that reduce the font size said right okay now if I come back over here and go there do some can we get someone to do some I oh yes thank you google Thank You chrome Chrome good to my out okay Google Chrome did some I out well okay that's that's more interesting but indeed rice we don't have to actually just get add atom every time we do something we can actually aggregate aggregation is a first-class citizen in Detroit so I'm gonna use that at sign notation and I'm going to gate on exact name and I'm gonna take the aggregating action to count and now what we're gonna see is when I go do some IO and chrome and I'm gonna go go what's the latest photos from representative Weiner of course see what's going on with him it seems to get looter and looter by the day so let's go back over here and if I control see now I just see a table google chrome now let's make that a little bit more interesting by not doing I oh let's do will do sis call entry and then let's just add take one sec so what I'm gonna do is I'm gonna if I can do so what I built is a little stat tool that is showing me the system calls per second by application so what I did is aggregated Cisco entry instruments every system call entry point aggregated by the app name the exact name and then added this take one sec probe that fires once a second that prints that aggregation out and then clears it so you can see you can actually pretty quickly get to something that's actually useful now if we wanted to go explore what was going on on my laptop and one thing is about D prices because you haven't used it in D trace as in a good cocktail conversation the answer to one question typically provokes the next question like what the hell is mdns responder doing on a reasonably regular basis we're getting five hundred some-odd system calls per second out of em dns responder which seems a bit rich if I wanted to go explore that I could actually add a predicate here so this is called entry and now I want to only focus at all that doesn't have insane settings this call entry with a predicate that the exact name equals M I think it was mdns was a capital R probably not oh lucky yes um so now I can actually aggregate by let's do bipro punk that will tell me the system calls that NBS responder is making and I can see that we're getting a lot of K events there I may want to understand where are we in the application when we do that so let's actually go aggregate by user stack and probe func that's yet the type of correct way of course I think I'm being I think the trommel's going me again so you can very quickly hone in on you know what I'm big the terminals murdering me little bit here on there we go let's do bipro funk and you stack okay now in my control see that I'm gonna see where I am in the application when we actually got these messages so I can very quickly then go in some of the application and begin to dive around and so on so each race is very quick for these these sort of ad hoc queries and has been used a lot to debug lots lots of performance problems most recently by me and we notice at midnight your Mac starts going nuts on Io very quick don't get frustrated about that just get a shell run some dtrace and you're very quickly discover that spotlight is doing L of its re-indexing at that particular time how do you discover that by the way let's go back to that i/o probe and instead of aggregating on exact name let's aggregate on the actual file name that we're doing i/o to let's do args at two points to install documented fi fi path name now if I go over here and click reload on my Weiner scandal which I think there's been a new photo now posted since the last time I clicked reload it's the beauty of that scandal keeps on givin um and now we can see what exactly we were doing IO to now the the Apple guys have a harder time resolving paths and then we did in Solaris so that you don't get as much path information here but you can clearly see where it is in its cache and so on doing I out um you there's lots more you can do and we'll see some examples of using dtrace on specifically on node so you get a flavor for the other kinds of things that we can do so that is that's a DTrace a whirlwind overview of eat rice with a lot of mistyping and some some terminal conspiracies so we wanted to use d trace to be able to go instrument the these node contestants but how can we actually do how can we actually do that to go instrument these processes what do we want to go how do i could do that and there are lots of ways to do that because dtrace you just saw me instrumenting the kernel there when I was instrumenting system calls when I was instrumenting I oh I was inserting the operating system kernel oh we could do it with the kernel but we actually want to get higher up in the stack and getting higher up in the stack proves problematic it's problematic because DTrace instruments the system holistically which is to say from the operating system kernel dtrace always executes in the kernel and it's instrumenting the entire system which is great for the the ability to instrument anything everything but it's really complicated difficult when you want to instrument high-level interpreted environments there you can imagine it in the kernel looking way up at this interpreted environment and it's very hard to make sense of what's going on up there especially when it's being recompiled and so on we get it and so on um so we actually have a mechanism in dtrace to allow user lands to help dtrace out and help it understand what it what instrument which we call user level statically defined tracing USDA and some interpreted environments like Ruby Python PHP have added us DT providers that allow them to instrument the interpreter themselves so if you google dtrace plus Ruby or D transpose Python or D first place plus PHP you'll see examples of people watching toad flow through PHP or watching code flow through Ruby or through Python which is incredibly valuable of course as a developer you can see inside your code in a way you couldn't do previously but that's a little too far ingrained your instrumenting at too low level it's a little too painful to be instrumenting interpreter in that way and it doesn't really work in jittered environments especially if the JIT is highly dynamic as it is with the JVM and certainly with v8 things are moving around too much so you can't actually rely on that on that kind of methodology so for node we really had to take a very different tack and now given the nature of what we wanted to do we warned in remember one two connections when connections are established that's not that hot a path you know it's not this is not something and my standards for a hot path may be different than yours a hot path for me is once you start once you crack like the five hundred thousand line per second five hundred thousand of something per second that's very hot then bumming cycles is actually going to be a win if you're if you're called less frequently than that and certainly this path is the cycles are probably not going to be observable so what we are going to do is create a JavaScript function that maps to a c+ plot to a c++ module effectively part of node that has USCT probes in it so from node code and this has been pushed to the node repo so you can see this for yourself from the node code in these paths it makes a call in JavaScript that lands in C++ that then has the USD T probe and we use a little trick we developed called is enabled probes to minimize probe effect what is enabled probes do and wait the way DTrace works by the way if you're wondering how dtrace operates on D trace has zero probe effect when not enabled because it's you're running optimal program text and what dtrace does is it instruments that actual program text it changes what your program is doing and the technique that it uses depends on the system that is operating on spark or x86 or spark Ross and pace but x86 or even muesli and of the kernel for for us DT we added an additional is enabled probe that allows code to conditionally say if this probe is enabled I want to do this extra work and then that's what that actually turns into is we we Det turns into a call that will always return false and then we actually hot patched the text to return true in the case of the probe is enabled what's what in his naval probe is so we add some his naval probes so really minimize the the enable probe effect or the or the rather the disabled probe effect so we added a bunch of probes we added things that were or interest to know programmers notably HTTP server requests HP server response and so on I've got some kind of one-liners here on we'd infer mission I think it would be a lot more exciting if we actually went to run this on system in production this exactly this is where I begin elective surgery I may look back on this moment as the moment where the stupidity began but um okay so what's actually here we are I am okay in production I'm sure the terminal is gonna give me a hard time I'm sure I will manage to while attempting to write a d-- script I will write RM minus RF slash because the terminal will conspired against me that is actually the failure mode it's gonna so stick around this could be very very exciting um okay so if I um there we go this is in the East Coast so we're getting a little bit of latency there um so let's do HCB server request and let's aggregate on a sexual skit account here so we're so what we have done now okay so max 313 probes what happened there so there are and I've actually just I've just revealed the tendency on our noge service there are 313 running node processes on this box each of which is in its own virtual OS container and we'll get to that a little in a little bit so I have in the the time Delta between me hitting carriage return and seeing the maps 313 probes we went off to each of those processes and changed their program text - now vector safely in 2d trace and we're just gonna get a count of the number of times we're hitting that and we hit that over the course of that whatever it was 30 seconds 10,000 times now it would be actually interesting to not just um what's aggregate on let's do this a grade on zone name there's nothing incriminating there so now what I'm gonna do is determine who of the tenants is doing this if I can spell zone incorrectly great if the Mar group ah okay this is perhaps not surprising perhaps is always the case we have one tenant doing 436 ops during that period of time and the other tenants are not doing quite as much that's interesting so let's actually figure out that's the zone name what's actually aggregate on the let's do the URL how about that this could get actually a king cake depending on what the stuff is so now yeah you never actually know you're gonna see when you do this this is actually a bit gutsy someone's running a porn server on an OD we're gonna find out right now um close okay wow that's a lot of different URLs um huh ah this is legitimate activity huh oh well I note to myself anyway we'll go look into that later um uh uh yeah this is you know you never know what happens and you know in production there's a crazy thing to happen in production and we've definitely mentioned things going on here on and we can do all sorts of things here and indeed probably after this presentation I think I will to understand who is doing it and what in understanding latency and so on I'm not going to type it in here just be too excruciating to watch but if we show you the example we have here we are so we've got some one-liners up there the first guy is just doing something similar to what we saw and then the second guy is aggregating based on remote address that's what we want right that's we're going to get for the we want to get to the leaderboard um the the third guy there is I'm looking at garbage collections start we'll talk about this in our experiences in a little bit but one of the things we definitely learned about note is that garbage collection really matters it doesn't I mean it's not counterintuitive that garbage collection would be important but it's very important to know when you're being gcd I would say and you know Tim and other other node folks that was supposed to be note folks and that like that if you take note folks and you transpose that you get excuse me excuse me valued node community others in the valued node community would feel free to disagree with us but I would say in my experience nine times out of ten we're heading GC you've got a memory leak in your application that you actually you are holding on a memory inadvertently and GC is running hard but it's fixing GC is not the problem and getting you more memories not the problem you are a drug addict and you need to hit rock bottom and just be more memory it's gonna go straight up your arm um anyway so the GC GC start um is it and and GC done are a good way to actually understand how much this is actually affecting the actual performance of your application and then the the more intersting one down here is we're actually taking the latency from server request to server response now this is complicated because node is event oriented one of the beautiful things we had with threat oriented structure and federal rain and programs and servers is that you could always associate work with the threat but I was the whole point you bound those two together so you could watch the thread progress and you always knew the body of work that it was doing know done does that beautifully so and and and righteously so but it means it's much more difficult to correlate the requests to the response one thing that we do is we basically rely on the fact that HTTP pipelining is kind of dead no one really does it so we can actually use the file descriptor 2 BHK right pipelining on the same connection um so we can actually use the file descriptor as a key so we watch the final script that it came in on and the file descriptor that it went out on and nevermind what it did in the middle then we actually have the wall the wall time for the for the HTTP response there um so we got a bunch of self that's interesting there we put that in place that's been pushed by the way that wasn't pushed before knockout that's been pushed now um so the USCG providers in place for node how could we go meet meaningfully instrument these things how do you instrument the contestants and for that we have to talk a little bit just briefly about the virtualization model that we have for our platform as a service offerings at joint we use OS based virtualization so we've got virtual operating systems that are running here and just as you saw me do on the live action system from the global zone you can see across all those virtual OS instances which is actually incredibly powerful but if we were using hardware based virtualization you don't have a way to do that with hardware based virtualization which are virtualizing is the x86 architecture the instructions you don't have that level of visibility you can't see across machines without SSH into them effectively without having a connection into them talking to the operating system so this is a major win for OS level virtualization hardware virtualization has its place too but this is an area where oh s level virtualization is a major major win and we're able to so we're able to use DTrace to cut across all these guys and we can actually see what folks are doing so um given kind of all that background we had an architecture for leaderboard we're gonna define a connection establishment and teardown to be a tech and then we're gonna have a demon that is going to instrument all of the virtual s instances by running in the global zones of all the compute nodes that we had running for the contest and we'll pull that data periodically and we'll pull it from the actual center of the diagram we will pull it periodically from the demons running on the compute nodes which I call tick or D we're gonna pull that into leader the leader D is gonna merge all this stuff together on and then it's sitting behind a load balancer - these two are equivalent so these guys are sitting behind a load balancer why would we put something behind a load balancer when it's just it's just a programming contest so something you should know about this if this feels over-engineered it's because I was scared shitless that what was gonna happen is that we were going to launch this thing and that the leaderboard would effectively be the first node application that someone has walked launched in highly-visible production and it would simply melt and the conclusion would be like a node sucks that's great an awesome programming contest so this is like and I I mean I was having nightmares about this so we we went way way out of our way and we were way conservative to assure success here so we in any case we wanted to build this architecture how should we build it well I mean come on node of course so the leader D and antiquity were both done in in node we have the we're doing HP gets every 500 milliseconds those are the leader D is then doing HTTP gets although you're getting we satisfy by leader D the leader D is asynchronously doing gets of the ticker D every 100 milliseconds integrities is pulling data out of the kernel from D trace every hundred milliseconds so the the latency here from your you're a contestant no knockout from someone establishing a connection with your nodejs server that you just pushed the latency from establishing that connection to seeing that pop up in a world map is 700 milliseconds but obviously you could tune that down but we're trying to pick a number that was basically human excitable and that was still not going to melt this infrastructure which is again my my concern my peer so that was the architecture in the deployment in terms of building it we needed a live dtrace add-on for node so I did that I was that was pretty straightforward um we needed a there's a node Gio Gio IP add on for 4 node I'm kind of embarrassed to say this because I think this is kind of a knock on node is that everyone writes their own module for everything I really really tried to use the node geoip module but it and I have some disagreements of principle so I broke my own sorry um they know the geoip but that was very straightforward um that's obviously a very well understood problem on the HTTP POST keep alive for for leader D integrity and one of things about note is it is so easy to communicate in HTP that you just kind of do it until you prove yourself htp won't work it's it's it's such so easy it's kind of the default interconnect even when it's got nothing to do with what you're doing so it's it was very quick to build on it would about 400 lines of code for leader T 500 lines of code for taker T far and away the most most was a challenging but time-consuming and brittle part of this was getting get statistics and trying to get the right goddamn options to get log to give me the the statistics that we wanted because we also wanted to have statistics of pushes going up and how frequently people were pushing to their repos so that was far and away the most complicated part of this whole thing which I think speaks both highly of node and I think that is going to be a criticism of git and it's commandlineoptions so I think that's I think we can infer it that way so there's important challenges of course um one of the front end challenges or the first front end challenge is how do you present on the this geo-information visually and now we get into it we've been talking about very modern problems now we talk about a really age-old problem as old as old as as our knowledge that the earth is round because when a sphere is projected onto a flat surface you're going to lose something a distance shape size bearing some of those have to give this is like tap for cartographers they just have to give up something sorry and it must be very refreshing that you have to give up something but it means they're always going to be dissatisfied and there are many different map projections because that dissatisfaction it is unresolvable that's kind of a beautiful thing I guess but the two most common projections are both problematic um so this is the equal rectangles I don't know how well you can see that hopefully you can I mean you can definitely see it you can see bunch of ice up there there's Greenland um so this is what I would call the lazy computer scientists projection this simply takes longitude and latitude and let's just treat them as coordinates I don't know you're taking polar coordinates and just and just treating them as coordinates on a grid effectively and okay great like I appreciate it that's only one line of code for you congratulations it's very it's very terse it also has distorted the world and in particular you I don't know if you can see it there Siberia is psyched Siberia has just been like promoted to world power up there that's that dark mass across the top Greenland is definitely psyched Greenland's like yeah I am I am bigger than Australia I like this projection everyone else has been kind of squished looks like it's been hit with a hammer it's just not very appealing this is what Yahoo Maps uses MapQuest uses this this is commonly used because it's so easy to code - oh this projection now it's very hard for me to talk about this projection without flying into a rage but I'm really gonna do my best on this is a so this is the Mercator projection if you can't see anything but Greenland and Antarctica that's the point I I mean what is this a conspiracy between Greenland Ellesmere Island and Arctic to crush the world it's like and this is the reason I'm so mad about this one is this is the projection we grew up on right this is like when your school teacher went and pulled down the map assuming you're as old as I am hope you are go pull down the map then that is the projection that was on there and I remember at a very young age the young Brian Cantrell feeling some incipient rage that it's like wait a minute Greenland is not bigger than Africa come on come on so the and the the only thing this is useful for is used for one thing and one thing only and that is getting on a boat and pointing it to the new world and going and that is why I think we've romanticized the Mercator projection great for navigators for everyone else it's a disaster in Greenland Greenland and navigate hey wait a minute nor did the Norse word navigators all right this is a Norse conspiracy too but we're gonna just gonna dismiss all right so the Mercator sucks the the equal or tango kind of sucks then we have ah the Robinson projection beautiful projection you guys like yeah that's about it I understand we got to give something up but it looks so nice it's like Greenland whack get back and you don't even lives there man get back in your place Africa good yeah you big prominent you know get drafters there Australia's here just play Antarctica is kind of nicely there it's like the Robinson projection is a really beautiful projection so when we talked about something like it looks to a Robinson projection and I really should have checked the math behind the Robinson projection before I suggested this on because you'd be forgiven as I was for assuming that the Robinson projection is you know like a projection there's actually like a mathematical transformation that takes you from a coordinate in a coordinate on earth to a core to a and X and y offset ah no I love this it's like so this is what this is the quote from the inventor of the Robinson projection Arthur Robinson I started with a kind of artistic approach I visualized the best looking shapes and sizes I worked with the variables until it got to the point where I can't redo that laughing where if I change one of them anything get any better then I figured out the mathematical formula to produce that effect most math magnet most map makers started with the mathematics yeah well most map makers start with the earth I mean it's like I mean at what point at what wait Arthur H Robinson are you no longer a map you're a drawing so I was dismayed horrified horrified because I'm actually the way I got it but that quote by the way was totally backwards I mean it is it's it's off it's in the Wikipedia page but he's like I started with the math the math you're like what the it looks like this guy took different parts the earth and just they took it I hammer it's like only so it's a mess it is there it is math in there but as Arthur H Robinson tells us the math is at the very very end on so I was like alright forget it let's give up on that but the thing is that the Mercator is so offensive the equal what angular is so offensive and you know I work with terrific folks who get as angry about things as I get sometimes angrier which is really scary and Rob else on our team stepped up and pulled it off at the last possible moment essentially got that working just before the competition started that's up on github if you want to check it out and I know I actually put it up there I know a couple folks asked about it so we've been trying to get it out there he actually I believe ported that from Fortran you've got some Fortran code yeah that's that's where we are that's the world we're in so on all right so what did this look like once we actually got up and going and that is if there's any way to bring down the lights or to bring up the saturation a little bit on the I mean that looks like the earth at night which I guess is fine okay so a couple of things about this um so this is running this is during the competition with leaderboard looks like during the competition I'll post my slides can get a better view of this on so first of all you can take my bay area hypothesis and just throw that out this is not the bay area this is a lot more than the bay area on the you'll notice the the this other a lot of random colors on here each color corresponds to a team what would happen is you would go to the leader board and as folks came in the the signs the the lights would start to twinkle as people came to these various sites I don't know did anyone play the game swarm ation did you see that so swarm a yes formation was funny so slow motion was one of the first ones that kind of went viral over the weekend these guys developed it early in the weekend this formation was no knockout project it it started to go viral and you could watch it go viral around the globe and um I felt anyone watching the leaderboard had do something interesting about swarm ation and I don't know what to conclude on it summation is a game where's a little box and your moving around and all the other boxes are all the other boxes are players and you need to make shapes that have been indicated by the game basically it is everyone is panicked and no one is in charge which is kind of an interesting metaphor for life in a lot of ways one thing goes brings too much formation is it was very popular in Eastern Europe and white likes it's it in a statistically relevant fashion and you could watch this like these guys are playing like 3:00 in the morning they're kind of connect can do it from middle of Siberia um but that's not what's on here what's on here is this thing an on see these guys were this is a MapReduce project what they did is they put their their JavaScript code in to browser code that's being downloaded by major news sites and this code would contact their server and ask for work to do the idea is that you could run MapReduce jobs across the planet by hijacking and I mean that in kind of a literal sense hijacking people's computers you go to a website and all of some like oh well my lap is hot what's going on oh it's the Map Reduce project doing something that they want to do on something that I own that's it was a little sketchy but um these guys claim they had six and a half million monthly uniques signed up for this before the contest started and this this was shortly after this happened that all of a sudden like Sunday at 11 a.m. or whatever and the world starts going nuclear literally like see see Japan over there Japan they knew a guy in Japan I guess yeah the eastern comes the US and then Uruguay of all places who the hell knows in any case we saw that really melt that is a lot of fun to actually watch that I mean you could see this go on for the contest which is really fun um so in terms of of our experience with this the the leaderboard very quickly got about a thousand active users um it could have got could have gone a lot higher I'm Luke said we were ready for the Internet to show up basically I know I was ready but he's the wrong word scared shitless at the end that was gonna show up um so we over engineered it CPU utilization was that you know the big question my note is okay but it's low in one thread and what if it goes compute bound yeah well CP utilization was totally uninteresting what was interesting with network utilization node can drive bandwidth with surprisingly little CPU and I think that as these applications begin to proliferate this is going to change our mental model for these applications because node can can work the network hard with surprisingly little CPU um the in terms of the reliability it was a total champ the ticker DS stayed up for the entire contest the whole week stayed up until we ultimately took the service offline on to Reaper repurpose the hardware uh the leader D did die twice over the course of the the weekend and in subsequent week both of those rooted memory leaks note both those since been fixed I as a as a programmer a software engineer I'm used to being enthusiastic about something waiting in and then having that enthusiasm enthusiasm naturally attenuated that didn't really happen with node actually got more enthusiastic the more lines of code I wrote which is very strange for me um it was it actually worked like a champ it was great I was it was very fun to code in it is one of the very few experiences I've had of like a my gun I think I'm done I think it works oh my god it works what am I gonna do no I don't know break it I guess so it doesn't break yeah so that that was great we did have a significant issue we had a of course you know it's always the things that you the things that we thought very very carefully about and we were paranoid about we're fine the things that we ended his accra thoughts like throwing in this little canvas real-time updating map to the leaderboard was actually causing like laps to be on fire like that could have actually burned down a house on and in particular the browser would crash oh snap yeah you then so I'm sorry I'm sorry that's you know it that you know my doctor told me don't bring up Chrome don't bring up the browser crashes we've been over this I'm sorry I'm sorry I just you know I'm not even I'm not even gonna go that let's just say that if I said aw snap when your kernel panic you wouldn't be so psyched okay I I it's not Austin apps like let's go pick some goddamn problem there I'm sorry um and it was it was very interesting to watch these things as they as they went viral so I think that the question for us and they're kind of it you know this is like one of those you know play within the play things are very kind of meta thing where you realize that what it was you know oh my god we were in node knockout all along like we were we were a no knockout contestant with the leaderboard we were doing something was very like what the other contestants were doing namely we were all building these data intensive I'm systems and it was kind of the light went on you like oh wow of course that's what this thing is really really good at we're really good at building real time at these data intensive real time systems these web facing real-time systems and that's why you're seen you're seeing a lot of games of course in node but anything that wants that kind of real-time interactivity and if you look at the early adopters of node they were all folks that were you know you've you know matt raney what he was trying to do with inbox her with those guys are trying to do trying to do it with python trying to do with other environments couldn't do it came to node out of frustration and has had they found they've been able to do data intensive real time with node so it this was clearly we felt a kind of a mega trend that we were seeing and it's hand in glove with what we're seeing of course in the devices the proliferation of devices and that what we think it's gonna be much more real time interactivity we don't have real time interactivity on our devices today we think that's coming on we think the node is gonna node it's gonna help Excel write that on and so of course this needs an acronym so unfortunately data intensive real time Hey dirt all right I dig dirt so we've got crud acid-base cap now we have dirt data intensive real-time I and I we I do think we're seeing more and more things like this I think that the important thing here is that what notice highlights in the success of nodejs in this particular domain highlights is that when you're building a real-time system that's facing the web the problem that the challenge with those systems is not CPU by a long shot the challenge in those systems is long latency events that's the challenge the FIO the remote systems whatever it is those events those are the challenge and it may seem it may seem obvious but I think we have fixated on the CPU element of that when we talk about real-time systems and for for data intensive real-time we we do have a different animal on so I think that you know that's kind of the lesson the big lesson to take away from this is you know as a reminder a real-time system is one in which we measure the correctness of the system based on its timeliness if it's late it's wrong so that's what a real-time system is and what wrong means does wrong mean that someone sad or someone died that's hard real-time versus soft real-time right so the a real-time system needs to be on time to be correct and in those systems it does not make sense to look at offer per second we've got to lose our collective religion run operations per second it's important you know I wouldn't want to not have the data at all it's important but we really need to collectively focus on latency latency is the only metric that actually matters and and by the way when you take latency don't distill that into a single number don't average that out but if you take your eye out latency average it out and get ten milliseconds what does that mean well it could mean a lot of things that could mean that your i/o is essentially like a rocket and you have these 5,000 millisecond outliers which can definitely happen or it could mean that your average everything is at ten milliseconds that the difference there is important and this poses challenges to both instrumentation and the way we actually visualize the system so you know we're thinking about this problem obviously we think other people are gonna be thinking about this as well we think that a lot of people need to be thinking about this problem in terms of how we instrument visualize here is kind of one attempt that we a that we took at it for our energy service on so this is a screenshot of our real-time analytics or cloud analytics in noge what this is showing is a is a heat map you've got time on the x-axis latency on the y-axis and we can see dynamically where our latency is and you can see banding and so on and we've already used this to do to find some really interesting things in the cloud so actually where are we on on time we're not good on time or bad on time but actually if folks want to just quickly go to if you go to RM noge and go to port 8000 one if you want mind let's see that's it should be still up yes okay so what I'm gonna do okay here is our world map all right thank you all right now now stop I mean no no you can keep doing it all right so it's very hard to see here but what we've got is a world map up there that you really can't see it all unfortunately but this is you guys coming in here you're actually being geo located on a world map that is like at night all right so that's hard to say you have to come up and you'll have to go see it for yourself later and what we can actually go do here ships what did I just closed that chrome is chrome it's like oh oh are you making fun of my house snap I will I will all snap you my friend so let's go over to the Analects real quick and what we're gonna do is look at and I will get off the stage so what happens you know sorry oh I did so without you I know it's trip I start on time know what the record reflect on so let's do server operations and let's break that down by remote IP address and latency and what we're gonna see here if you guys are still heading in there you go oh it's hard to see because they're all down there okay so this is the actual latency in real time of you guys going to that silly little node app that just shows the actual the mix the user-agent mix and actually I should go over and see what our user-agent mix is um but so and we got to go D to drill down on this dynamically and so on um this is just a first approach um I think there's going to be there's a lot more to do here collectively I think that the the important bit is that that the the real-time web is absolutely coming to the degree it's not already here and that we in in both development software engineering and operations need to plan for it um so and we need to plan for it by thinking about the metrics that we're going to consider and those metrics really need to be latency so thank you very much
Info
Channel: Bryan Cantrill
Views: 5,915
Rating: 4.8400002 out of 5
Keywords:
Id: _jS_XkCkpVI
Channel Id: undefined
Length: 47min 33sec (2853 seconds)
Published: Sat Feb 03 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.