A Crash Course On Worker Threads - Rich Trott, University of California, San Francisco

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
it's three o'clock and I believe in starting promptly so let's do this so if you come here to see worker threads crash course you're in the right place and thank you but I do question your judgment because Joey Cheung is giving a talk in the other room about how no bootstraps itself and she's awesome and it's gonna be amazing talk and it's totally where I want to be right now but I'm here just I mean I'm happy to be here too I should also but it but if but if you want to leave now you do because I've insulted you you totally can this is also probably gonna be less of a crash course and more of a more of a more of a introduction but you know not so gentle instruction I don't know we'll see how this goes anyway you've made the wrong choice to be here so hello friends thank you and I talk fast I'll try not to but actually I won't try not to because I have a half hour and I ran through this talk about an hour ago and it was way over so I cut a bunch of stuff out mostly jokes though so don't worry about it it just won't be as funny as I wanted it to be but hopefully I'll have as much material my name is rich I work for the University of California at their San Francisco campus at the library right about now is when I imagine people going he works in the library what is he doing here in serve a Jim Gaffigan voice that just makes me relax and feel good about things but anyway as it happens I do a lot of work on node core and I'm also the author of a rock opera about a steak house I don't know if the sound on this is plugged in but let's see if this works here I don't know okay so it's coming out of the projector I think okay cool so anyway I didn't come here to play examples of the rock opera but if you you know but I did want to point it out because all of the slot although like links to blog posts and documentation and videos and stuff are at pals family say guys calm I don't think I have a link to the slide deck which I guess I'll add after I'm done here okay anyway but I'm not here to talk about that stuff I'm here to talk about node but first some disclaimers the views expressed on my own and not necessarily those of my employer that's a pretty standard disclaimer these expressed in my own necessarily those of node there are a lot of other people involved in node a few of them are here right now naturally we all don't see everything the same way hence this disclaimer with that out of our way node hey have you heard about worker threads they were introduced in node 10 5 0 but required a command line flag to use in that version so use no twelve if you don't want to use a command line flag which you don't specifically here's no 12 11 zero or newer although I don't think there is a newer version in the 12 release line there's 13 dot as well out but anyway that's the first version 12 11 0 is the first LTS version where support is officially stable rather than experimental so yeah oh no it's whoa 13 1 because the time of this writing it's the most recent yes not 12 3 12 13 one is the most recent great ok super so anyway worker threads hey what are they so they're kind of like web workers but different for example um if you're used to web workers there's shared workers no such thing here but they're also kind of like threads and other programming languages but not if you use threads and other programming languages cool if you haven't use threads and other program names just cool it's gonna be fine don't worry so JavaScript right we all love JavaScript and we all know JavaScript is single threaded even if you have no idea what that means before you saw this slide there's a 100% chance that you've heard the phrase javascript is single threaded because javascript is single threaded arguably I actually don't want to have that argument though so the point is that your code is runs you know it can do one thing at a time so it's why this program never exits there's only one execution thread handling this code so the code in the set timeout which would break us out of the while loop never gets to run because the code instead timeout won't run until the while loop relinquishes control of the event loop so this is gonna run forever or until you hit control-c or until you turn off your computer or whatever but it won't exit cleanly and whatever causes is to exit it's not going to be that set timeout this is called blocking the event loop if you already understand what that means great if you don't trust me and look it up later there are some good YouTube videos of talks about it my favorite is what the heck what the heck is the event loop anyway by Phillip Roberts I think that was a gist confit you I don't know anyway there's a link at pal's family steakhouse calm now you may be thinking hey this is cool everything but notice asynchronous I don't really have to worry about this blocking the event loop stuff you're probably actually not thinking that or you wouldn't be here but anyway i know'd can do many things at once like handle multiple simultaneous HTTP requests or read multiple files and that's true but it's mostly true for input/output for i/o if you're doing say data sciency stuff or graphics processing or something that's CPU intensive then let's just say the default state of things is not as asynchronous prior to worker threads the usual way people would offload CPU and a non-blocking way in node was the cluster module and if that's working for you great but here's the thing cluster spreads your workload out across multiple node processes with independent memory and so on so sharing large amounts of data is often problematic and each process consumes the full amount of RAM required by node this can be inefficient although again if it's working for you then great but it doesn't work great for a lot of things and even for things that it does work for worker threads will often be better because worker threads are more lightweight and they are better at sharing data so let's dive in here's a hello world example and we're gonna go through it step by step the first line pulls in three things from the worker threads module it pulls in the worker class the is main thread boolean and the parent port object the worker class will be familiar to you if you've used web workers if you haven't used web workers there's there's a great blogger about the worker class and that's Karl Marx but um we'll explain the worker class along with is main thread and param port laid at parent port later just know that they all came from the worker thread module and we'll start with this main thread we use it to make sure we're not inside a worker thread we're checking that we're in the main threats that we know it's okay to launch a worker thread if we didn't do this check then then this code might launch a worker thread that launches a worker thread that launches a worker thread ad infinitum or until you run out of you know stack space or something anyway I don't know what you have actually I'd be a good experiment anyway this kind of check is usually only necessary when there's when the code for the worker thread is and the code for the main thread are in one file which is probably not what you want to do but for production code but was you know works great for a hello world example so there you go so we we know we're in the main thread so what we're gonna do is create a worker thread so we use the constructor for the worker class that's what new worker is right and we pass it underbar underbar file name which is you probably know is the special node variable that contains the name of the file currently being executed and if you didn't know it now you do you can create a worker thread to run any JavaScript file you specify so here we're specifying the file that's currently executed but you can also pass you know a string path to any file or you can also pass in a string containing code to be executed although to do that you need to pass a second argument to the constructor that tells it that you're doing that I tend to avoid that because string as blob of executable code raises the same kind of hives that it might raise for eval because that's basically what it is but it's an option if it makes sense for your use case so we're creating a worker now let's listen for messages from the worker this is the usual event listener syntax in node remember we're in the main thread still not the worker thread we're listening for messages events on the worker we've created and when we get one we're going to use console dot log to print the message and that's it for the main thread remember we were in an if block that checked if we were in the main thread so now let's use else and do the right stuff for when we're in the worker thread and the only thing we're gonna do is use parent port to send the message using its post message method in the main thread parent port will be null but so if we want to send a message to the worker from the main thread we use the post message method that's on the worker instance but in a worker thread parent port dot post message can be used to send messages to the main thread so let's use it to send the message that says hello world and that's the end of the file you'll remember that in the main thread we had a listener for messages from the worker and that listener prints the message so if you run this file it will print hello world not terribly useful there are much easier ways to do that but it does introduce the very basic concepts of worker threads so now let's do something just as contrived as this but more interesting perhaps you remember the game six degrees of Kevin Bacon if not it's simple given the name of any actor in a film your job is to connect them to Kevin Bacon in six or fewer steps in the following manner let's say you're challenged to connect Katie Boeck Katy Perry to Kevin Bacon in six or fewer steps Katy Perry was in Zoolander 2 with John Malkovich and John Malkovich was in Queens logic with Kate with Kevin Bacon boom Katy Perry to Kevin Bacon in two steps I've seen neither of those movies there are already websites that solve six degrees of Kevin Bacon by using IMDB data several years ago I wanted to do this for musicians playing on recordings of individual songs so I made a site called music routes but it's been broken for a long long time so let's fix it first surprised surprising there's no usable database available for what musicians playing what tracks and love people think Allmusic has it all music has album data but not track data a lot of people think musicbrainz a lot of information but it has artist data and not individual data and a lot of people think discogs has that information discogs just copies what's ever on the album sleeve which means that for example it will tell you that that Rudy Sarzo played bass on Ozzy Osbourne Skyrim madman as everybody knows he did and he was just on the in the credits leaked Hurst Lake played drums and Bob Daisley played bass that's the end of Ozzy Osbourne information for this talk but you know y'all know just a little bit more now but I was born so anyway that brings us to wiki data which has some data along these lines but less than you think and you know that's cool its wiki data everybody can add information to it but it's also very very very unusually slow for the many many queries we will need to make in our searches so I built my own database and published it it's very incomplete but it'll do for here and I also built a rumor rudimentary a little visualizer for which we might get to later I don't know so in order to solve these things we could use breadth-first search I am now about to give the world's worst overview of breadth-first search here it goes let's go back to connecting Katy Perry to Kevin Bacon step one is Katy Perry Kevin Bacon that's a JavaScript triple equal there in the middle the answer obviously is no don't be ridiculous step two find everyone that was in a movie with Katy Perry do any of those people happen to be Kevin Bacon the answer is no includes John Malkovich and other people step three find everyone that was in a movie with any of those people that were in a movie with Katy Perry so any of those people happen to be Kevin Bacon the answer is yes so we're done congratulations you've just witnessed the worst explanation of breadth-first search ever now let's do a slightly better explanation this will be the second worst explanation of breadth-first search ever we're going to connect Katy Perry and Kevin Bacon but this time not through movies this time let's do it through music Kevin Bacon has a band with his brother Michael Bacon the band is called the bacon brothers and I'm not making that up fun fact is you can see this photo from Wikipedia Michael Bacon's nose has never been successfully photographed so let's see if we can connect Katy Perry to Kevin Bacon via music so step one is Katy Perry Kevin Bacon no get out of here with that nonsense so here's a visualization of Katy Perry in the middle and everyone she recorded with on our album one of the boys which I'm sorry to say is the only Katy Perry album that I have in the data set you can open up or requests to fix that if you want to correct this horrible injustice anyway Katy Perry's that circle in the middle like I said and the circle each circle in the surrounding ring is someone who is one step away from Katy Perry because they played with her on the album so somewhere in there is legendary session horn Minh Jerry hey there he is there's also your rhythmic guitar Dave Stewart at the bottom who goes by David a Stewart because literally and I'm not making this up either there are too many Dave Stewart's out there there's an David and Stewart and in there so notably absent from evering though is Kevin Bacon so now imagine we take each of those circles in the ring around Katy Perry and we find out everyone who is recorded with each of these people we would take all those people and make it outer ring with circles of each of them I didn't do that but I did mostly because well for a lot of reasons one I'm lazy but also because there would be like way too many circles to fit on the slide we're gonna get to that in a minute but yeah the number of circles is gonna grow exponentially or exponentially ish with each ring right so you won't want to see all the circles but I did scroll this ugly blue line sort of represent that outer ring kind of like Saturn you know a little ring around the planet of Katy Perry anyway I'm here to tell you something exciting about that outer ring though it totally contains Kevin Bacon it's basic so that's basically breadth-first search here are the results if you don't believe me there you go yay Jon Bon Jovi who would have thought okay so that is breadth-first search let's implement it no just kidding for purposes of this presentation it's an implementation detail the there are trade-offs various ways of implementing it and implementing it I don't really want to get into it and I don't I'm I'm talking way too fast as to this but you can check out this repository for how I implemented it as well as the other algorithms we're going to talk about in a little bit the important thing is that our approach will keep the CPU busy rather than do a bunch of work upfront this is so that we can see how cool worker threads are but it's also a legitimate trade-off one might make in the real world it's it's not always worth it to spend time upfront pre-processing your data if it's time consuming takes up too much storage etc etc so here's what breadth here's what it looks like we're let's step through this the first line gets all the tracks of the start person sorry for the long long variable names but you know this it made sense when I wrote in the repository anyway um let's say it's Aretha Franklin we're gonna put all the tracks in index zero of an array of tracks for the start person the index indicates how many steps we've gone from the start individual and this line populates the corresponding array of individuals that are in the source of those tracks above so in this case it's an array of just one individual ID it's just a wreath of Franklin for the two lines starting here we're gonna do the exact same thing for the target person let's say it's Carrie Brownstein this line checks to see if we have a match by seeing if there are one or more tracks in both lists lastly this while loop runs until the match is found mmm this line adds the individuals and the tracks that result from going out one more step from the start individual than we've gone this far so all the people on tracks with Aretha Franklin then the next time it runs it's gonna be all the people on tracks with those people who are on tracks with the reason Franklin and so on and so on getting exponentially slower as more and more data is involved in deduplication and queries so this line updates the matches list so the while loop will stop if we found a match so each time that last wild loop runs we you know remember each of these rings exponentially ish more work longer paths will take longer and so you know there's a bit of a solution hinted at at a use of an otherwise unnecessary array in that previous code but first of all let's check how breadth-first search to performs on my laptop here's a run with the results it took more than 14 seconds just to do the breadth-first search that's a lot we can do we can do better even without worker threads by doing bi-directional search really quickly here's how bi-directional search works first Katy Perry is not Kevin Bacon despite the striking resemblance evident in that photo again we look for every one that is connected to Katy Perry and check to see if Kevin Bacon's in there he's not now we bounce back to Kevin Bacon and we get every one connected to him we check to see if there is an overlap in the two sets of musicians if not then we do another query we do one for Katy Perry's cluster of people still no match to one for Kevin Bacon's cluster people still know Matt you know until there's a match now so let's let's visualize it like this excuse me um that oval on the left is Katy Perry and that oval on the right is Kevin Bacon Katy Perry Kevin Bacon different people there's space between them so there's a bunch of little dots but those are all people just like Katie Kevin anyway those are all the people have played with Katy Perry still no Kevin Bacon this is this is breadth-first search we're gonna do another queen us-korea get starts to get expensive I didn't include all the dots that should be in there but imagine like 800 times as many dots and still no Kevin Bacon we're new another query and this one's even more expensive but finally there was Kevin there's Kevin Bacon alright so now bi-directional search would go this way hmm let's check ok all the people play with Katy Perry Kevin Megan's not in there all people play with Kevin Bacon Katy Perry's not in there oh look at this we do one more query Kevin Bacon and Katy Perry have have some overlap John Bongiovi or someone is is in the middle there Dave Stewart somebody is anyway um so yeah so you know we're we're we're doing fewer expensive queries we're doing the same number of queries because we have to do as many queries as it takes to can to get the number of steps to connect the two people but we're doing less expensive queries so let's see you know if you don't believe by the way if you need Big O notation to show the how that works or if my explanation sucks so bad that you've no idea what I'm talking about there's a Wikipedia article um I mean so alright how am i doing on time I got 12 minutes okay bi-directional search is just like breadth-first search right down to the comments except for the contents of the while loop mmm so let's go ahead let's zoom in on that while loop you can see that we do a breadth-first search starting from the start individual then we check if there's a match and if not we do a second breadth-first search starting from the target individual and we repeat until we find some overlap holy-moly we went from 14 seconds to less than 3 it's awesome but wait this talk is about worker threads why be bound to a single thread rather than doing one breadth-first search over here and checking and then doing another one over here and checking and then doing another one over here and show you why not just run two threads doing simultaneous breadth-first searches racing to see which one you know can return and overlapping individual first so to create our worker threads this time we are calling new worker again and this time we're putting the worker code in a file called worker jeaious there's also a new thing over there in the second argument which is a worker data property this allows us to provide the idea of the individual to start with so worker data gets serialized and sent over to the worker which then uncie realises it into its own copy of the data and hmm which is basically what happens when you post MIT when you post message data as well now worker threats can do this awesome magical thing where if you do things just right you can share memory and also transfer memory buffers between the main thread and the worker thread sharing memory doesn't actually resemble sharing nachos like this but I needed an image so we're not doing this in this app worker data just sends a copy but if your data is of a predictable size and format and if there's a lot of it look at the docs for information on sharing memory or transferring it it will be useful in addition to the shared memory stuff there's pooling for this application we always need two workers you know one for Katy Perry and one for Kevin Bacon and we don't and I don't care about the cost of starting one up you know just waiting until I need it and starting one up but in an application where your needs are more dynamic and you're trying to get the absolute best performance you can you want to investigate having a pool of workers that you use as needed there are NPM modules that can help you if you want to if you don't want to implement pooling yourself so over in worker j/s reading the worker data value is done like this you import the worker data property from the built in worker threads module then read the data and then read the value of the ID key we're going to context switch back to the main thread we have an error listener that simply throws any unexpected errors from the worker and we have a callback that we use when we receive a message from the worker the index here is used to distinguish the results from Katy Perry's worker thread and Kevin Bacon's worker thread so we're use it so we might use 0 for Katy Perry's thread and 1 for Kevin Bacon's thread let's head over to the worker code again and see how the worker invokes the callback so each worker is created it grabs all the tracks the individuals on and sends them along the individual back as an object to the main thread that post message will cause the callback in the message thread to execute so here's the callback and again the index is a value that lets us use the same callback for Katy Perry's tracks as we used for Kevin Bacon's tracks we also get all the individuals from whom the listed tracks as dry and we check to see if there are any tracks that are on both lists thus indicating they're expanding circles are overlapping and we can stop if we have a match we call a function called done we'll check that out in a second if we're not done we send a message to the worker to go get us another expanding ring of tracks and individuals I'm not going to show the worker code that listens for the message as it's pretty similar to what we've already seen plus I feel like I'm rocketing through this fast enough but if it gets the value next it gets the next set of research results and sends them back to the main thread just know that to receive the message the workers listen for the message event on the Parent Portal checked but I do want to talk about that done function it removes the listeners we have for both workers and then it calls this method that's on all workers called terminate and what terminate does is it ends the worker thread and returns a promise that resolves to the returned could be exit code of the work of the of the worker thread if we if we have cleanup code or whatever and we want to make sure the worker threads I could cleanly we can put this in an async function and wait the value but in this case we don't I'm going to exit with an error code you know it's gonna exit with Medicare code because we're terminating it while it's running we could also send it a message and have it end gracefully but that's that's extra code and overhead we don't need in this case this just says an execution immediately please but we could do something more elegant if we wanted to and lastly we print our results so let's see how this performs remember single threaded breadth-first search took over 14 seconds single threaded bi-directional search under three seconds oh my gosh it's less than seven hundred milliseconds now I can't believe it unbelievable this should be illegal now I have to admit that the main motivation here was is you might have picked up by now wasn't really to talk about worker threads as awesome as they are and as exciting as they are it was to write a program to efficiently find out how far pals family Steakhouse is from little nests X the answer by the way is six degrees it starts with a little Ness X of course and the first degrees Billy Ray Cyrus who you performed on an old town road I was as surprised as anybody to find out that as recently as 2009 that's what Billy Ray Cyrus looked like the second degree is country star Mary Chapin carpenter who along with Billy Ray Cyrus was on Dolly Parton song Romeo mad Dolly Parton gets her own slide because you no need to stop and just pay if he respects the Dolly Parton she the affirmation Romeo was on her 30 second studio album she wrote it she produced it people who aren't country music fans and I'm I'm not really one myself but don't realize the extent to which he was in control of her sound in her career she's a legend and a force to be reckoned with so don't mess with dolly also this you know since we're starting from Old Town Road and everything like that but the analogous legend and force to be reckoned with and note is Anna Henningsen she's the one most responsible for implementing worker threats basically did it single-handedly as far as all things node go it's extremely difficult to give on too much credit so you know you should totally just like start applauding right now yeah that's right show me she'll be one of the people on the node technical steering committee panel at 9:00 a.m. tomorrow morning so check that out anyway back to this nonsense after Dolly Parton's track Mary Chapin carpenter goes through Saturday Night Live band leader G Smith and Tom Waits and trumpet blare named Chris Grady Flynn on track that I was on anyway why should I restrict the fun of vanity exercises like this to me you can head over to this glitch URL and try some stuff out and since um you know I do want to just take a okay let's see here this is not working the way I expected it okay see here yeah so we can yeah so let's see here you know no let's try a did come at the Frog because my daughter it doesn't matter why I did cover the Frog earlier I did cover for the Bob Dylan but that that that didn't seem as much fun as some other things what the Pope the Pope is yeah musician I don't think he's in the discography but I don't we could try like Miles Davis or something I'll try that here you go four steps from Kermit the Frog to Miles Davis so anyway um you know you search for people and they're not and oh and this is where the visualizer comes in so you know here's here's Herbie Hancock and everybody he's played with and then you know if you want to know what Randy Kerberos um well he was on this Herbie Hancock thing but it's you know but if you want to everything Herbie Hancock is there it is you know you get the idea okay so anyway we're now so have fun with that that glitch thing yay okay uh present please this is gonna throw me back to the beginning don't throw me back to the beginning yeah okay so there you go um there's also a link at past mistakes calm Thanks I think oh my gosh I came in like four minutes under that's awesome okay um yeah Wow everybody stayed it seemed like and that's just fun I hope I hope I don't didn't completely waste your time thank you very much [Applause]
Info
Channel: node.js
Views: 17,741
Rating: undefined out of 5
Keywords:
Id: wT4lg9oiMvI
Channel Id: undefined
Length: 26min 36sec (1596 seconds)
Published: Thu Dec 19 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.