Node.js Background Jobs: Async Processing for Async Language

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so we're going to talk about background jobs a node and how basically everything you do in node is better than any other language except for the terrible ideas which will be far worse so all the ideas there's a repo that goes along with this talk feel free to try it out explore at home just like don't blame me when your production servers catch on fire so yeah everything is better a node including the bad ideas so for the point of this talk we're going to talk about a hypothetical situation which maybe you've come up with someone comes to your website they signup you write a record to the database and then you want to send them some sort of email like a welcome email or something and then kind of respond to the client with a 200 ok and life goes on and everybody's happy so there's a couple of strategies for how to do this we're to go through some of them today so you do it foreground in line you can do it parallel you could like break out to another thread or process and handle this you could fork you could have like a whole other running note instance to handle emails you can have like a cue system like you can write it to a database that somebody gets to later and there's other stuff we're just not going talk about for time reasons but there's lots of strategies that you can do here to make this email get to the client and again they're all like way better note in other languages so I'll start at the foreground this is still in the PHP home page as an example maybe a week ago so I started programming with PHP and of course like everything's in line everything's a script and it's really easy to do stuff but in this classic example what happens here that male line blocks the request thread so like you hit the web page it's going along going along like you wait SMTP stuff happens like socket things happen it comes back and it moves on see the client is actually waiting for like a while for this to happened so how do we do this in node so we're going to use the node mailer NPM module which is awesome it's really easy to send email really quick they have all these different transports Gmail being one of the easiest ones to use but there's also SMTP and some other layers you can use so for all other examples we're going to use node mailer which provides kind of an asynchronous send a message command which we can use so stealing code almost directly from the node website we can build a web server and this is the most dangerous web server you'll ever see by the way every request will send an email so like again don't do this at home don't do this in production right so you see the bottom there so like the path is going to be our mail information like to subject message and we're going to take the body and we're going to parse it and we're just going to split on slashes and kind of make our email this way this is really terrible really bad idea but it kind of gets the point across and then this transporter send is from know mailer and this is how we have the send the email and it's asynchronous so if you run this which I'll run over here over here so in this tab I'll start up the node example so we're going to let's start one our foreground and it says here we go here's how we use this port 8080 so if I do like this so I'll send myself an email subject ibly SF node lecture body stuff goes here so we'll send that and like we'll count like one second so I was like one and a half seconds ACTA send the email so that took how much long it took to do the node actual web server part and SMTP the email out and like in theory like there we go I got an email awesome right done talk out okay cool so like that worked everything's great so it's better in node than other languages at least you're not pissing anybody else off when you do that the PHP example an equivalent ruby example you would be blocking a web worker whatever that means in your language so like the time it took to send that email like nobody else could use that request the lisa node because it's always synchronous other people can still be making requests and other people can still be getting stuff done even if the client still had to wait that second and a half which isn't that terrible but when i did this on my couch it was at least two and I could imagine it being worse so again it's better than most places but still bad like you're still spending server resources like maybe I don't want to race the RAM to load a node rail or in my process like and eventually there's a lot of socket stuff going under the background and bad things could and will happen if a lot of clients are sending these emails especially in my example again that's terribly and safe don't ever do that in production so let's go with the next one so what's what's next we can do this I have a note to myself we're not going to talk about like actual threads here it's a dangerous topic especially in node but what can you do to kind of like pass on the work of sending the email to another background thing can't use the word thread so what that means a node is you can just ignore the callback like that is a thing you can do it is not recommended again don't ever do this but it's allowed like you can do this so in our previous example the only change is to wrap the asynchronous callback in something that just doesn't care like if you don't give me a call back just don't do it so the web server block is now just send an email just do it don't care so if I run the second server here so let's kill first one startup example two so it's again the same API and I can do the exact same thing and I'm gonna get my a response my body payload like stupidly fast I have no data to send back so I'm not sure if the email like worked or anything happened but the clients done and like maybe I got a new email I did awesome and so as far as the clients concerned like that kind of worked but again like things can go wrong like maybe justin.tv server isn't there like it's a terrible idea don't do it but it's a way to speed up the client response and so it's fascinating a node in JavaScript and a couple of asynchronous frameworks it's very rare you can actually do this like you can kind of split the thread off and not worry about ever collecting it again it's just a unique property of node again don't ever do this we're moving on never talk about it again but it's a neat property of JavaScript that you could handle tasks this way maybe if you're logging like maybe it's okay like maybe if the callback logged if it didn't work like maybe that's a little better but if you're going to login anyway look maybe there's a more elaborate system you could do to handle handle these jobs so let's go to the next one so this is the most over-engineered example possible here node has the notion of cluster so no cluster API is experimental but it has been experimental for some time it's fairly stable people use it and it makes it really easy to kind of take one process in fork out other this is and leave a communication panel up between them so a little wireframe for example here's we have one master process we start and it has two children it has a webserver child an email sending child and they can all communicate through the master process and kind of pass information around so this is our main logic block there's an it really easy way to tell if you are the master and you do some master stuff and then you can check environment variables so when you fork one of the few things you can pass to your children and node is you can pass environment variables so we'll pass an environment variable corporal and it will tell a child to be a server or a worker this is how I'll know what to do so we'll change our web server block here so instead of saying node mailer directly we're going to say process dot send so this is in the cluster world process and means if I'm a child tell my master something you can tell the master JSON blocks so what we're going to say here is going to send that email block to the pet to the parent and we can still respond to the client really quickly and just pass that information along so it's like ignoring the callback at least the minutes the data went somewhere and we can log in we can handle it and error catch it and do other things with it so let's move on to the master loop so we're going to have a master process which is just going to boot up in every second every 1000 milliseconds it'll just check on its children and what that means is that we can just do the fork thing so if you look here this is the the magic right here cluster fork so we say hey take myself take the script I'm running make a new version of myself and then pass these environment variables down to what's role server and everything else here is really just checking on the state of the child so you have it's an event matter so when the child's online log it if the child dies like delete it from myself and if I get a message from the child so here's where the web server would pass on that email information what did I get well I got this JSON blob that message then I can take it because I'm the master then pass it to the other child it can pass it to the worker to then send the email and the same thing is very similar for the email worker we're just going to watch it we're going to give it a new role and move on so I'll show you what this looks like when it's running so let's start example three here so here we're gonna start logging the pits so we have one master worker here he started up and he tried to start his children and he has two children the webserver guy and the worker guy they both came back with messages and this is how you can see that one's here so we can will log the pit log the state things like that and again there's a github repo with all this code we can look at later this whole example I think is only like 130 lines so in 130 lines which is really cool you get all this stuff so here's the email worker in its entirety you can get a message queue throttling retries all this kind of fancier logic with this with this type of build process so what the child the email worker child does is every time he gets a message so process on message just adds a to an array store the messages do something with him later and he has kind of a main loop and he'll run it every every so often he'll pop one email out of his array so this is throttling right so like every second pop every second pop if I have one so there's a lot of emails coming to him like he'll just store him up for later assuming the process doesn't crash or anything terrible and then send them all one by one later so it's a little bit better than being able to hammer that Web API like I had earlier I could send a million emails at least I'm only going to send one every second slightly better now we can do some retry logic like if there's an error when I try to send email like I can rien que now I have retry logic I can send messages back up to the parent like hey it worked hey it didn't work so let's give it a whirl because we're using airplay I was actually to turn off Wi-Fi and show you guys like how retrying worked but that's not going to work but again it's the same API so I'll just do this again so now again I still get my response really really quickly and then like eventually sometime later I'll get the email so this is the response in the web server then the worker eventually goes like oh I try to send this email then eventually you sent the email probably should up with timestamps on that and it was about a second later so that's really cool so I can node this forking thing this cluster thing even though it's an experimental API is really powerful this message processing passage thing is really not available in many of the languages at least with this easy of an API to use it's an awesome way to kind of do stuff like this and get kind of this functionality out of it so like it's awesome so now we have queueing we every time we have IPC I can kill any of those worker processes and they'll be restarted if I kill the web server eventually come back with email worker will still be running and vice versa because they are truly different pits and different processes they won't share RAM they won't share resources if you had like two CPUs maybe on a machine you can they can each be using a CPU you get a lot better so what's bad here now is it's still only on one host like the cluster module only works on one physical server it has to be able to talk you know to whatever else is on the server but we're getting better right so now we're getting retry logic we're getting logging or getting lots of other stuff that wouldn't have before so we're building them up we're getting a little further into the realm of possibility to use on production so let's talk about remote queues so the key piece of the preview was having this email right having a queue of things to do that could be worked at a later time that was really a synchronous into like a whole different thread than the guy hitting the wipers fast hitting the web servers so the important things about a good queue let's take a little sidebar here so like things that are good about queues they're observable you can measure them you can see like what's in there you can retry them even in our previous like terrible example we put emails back in the queue if they failed which is an awesome like feature that we didn't have before so like a quick note for this example we're news Redis how many people have heard of or have used Redis is anyone not heard of Redis okay cool so Redis is a very tiny C++ app I think that is basically a in-memory data store it's like a somewhere between memcache in like a relational database so it has much fancier data structures including an array which is really all you need to make a queue this is the MVP for a queue I can put things into the array I can like wait a while and I can pop things out of the array and I'm guaranteed that when I pop I only really get it once and I guess that's it that's all you really need for this queue Redis has some other stuff like hashes and sorted sets and fancier data structures but to do this kind of asynchronous array thing like you really just need array so we're going to talk about a module called node rescue which we use a TaskRabbit rescue if you don't know is kind of a semantic way to describe things that live in Redis so they're jobs to be processed later so there's data structures around like here's a job here are the arguments here's the names of the files they live in so I can pull them out later and so node rescue is a module that just talks to Redis and lets you do things like connect to a queue and then I can in queue things for later like you know put the job ad in the math queue and add the numbers one and two and those will be processed later by a worker and rescue has a really nice UI so here's kind of example of the data structure so it has like a rescue namespace than a queue namespace and then the email queue would be the name of the queue that it goes under this is called Redis commander by the way it's another node app it's a viewer for Redis it's pretty great and then rescue has a UI so you can this happens to be the Ruby UI it's a Ruby and Python and Java and node ecosystem Ruby happens to have the coolest UI so I just picked that one but it's really really introspective also you can see the name of the queues how many workers are running and things like that so back to our web server so we have this method send email and this does exactly what we did before it builds up this email payload it tries to send the email but now it's in this thing called a job - so if you notice back up here when we're talking to rescue one of the things you pass into it is jobs so it's a way to kind of tail like once I get a job of a certain name what method do I call what function do I call to handle it so our job hash can just be this you know take your data build this email payload and then send it off so if we're in the web worker instead of doing the process message send we'll do the Q and Q send it's a very similar line of code but the only difference now is before we boot our server we're going to connect to Redis first so that's the Q setup line at the bottom down there and then we just boot it and then every time we get a message into the web server will in queue it in Redis and on so reticent really fast it doesn't actually do that have to do the SMTP parts sending the email so it's a little slower than we had before but at least a guarantees us you know persistence now and it guarantees us this is a database so other servers can read it and things like that which is a lot better so in node rescue it's really simple to start a worker the two lines at the top it's very similar to starting up the queue you just pass it the connection details the name of the queues it's to work and again that jobs hash and then it boots and you say start and it just goes on its merry way forever it's an event emitter so you have tons of callbacks again so we can just log stuff if we want it'll say like the state of what it's doing like I'm working I'm paused I'm doing some other stuff I failed I crashed you had all these events out of it you can act on but really worker work will just keep calling those jobs hash and doing whatever it says do one try again do you want try again it also sleep if it has nothing to do and then like wake up in five seconds and try again to kind of save some RAM and CPU as it goes but it's really simple that's all it does but that's fine like lots of other languages can do this rescues implement in other languages but likewise a better anode that's the whole point of this thing like what's better here so notice a synchronous right no it has this event loop thing it's only one thread but it's really awesome at doing things that are you know wait and come back like go talk to a database wait for the answer and then process again so everybody thinks of this as kind of handling web requests or handling at least socket requests but the many of these background jobs are going to be the same kind of thing like talk to the database wait for the answer talk to an SMTP server wait for an answer talk to somebody's API wait for an answer and it's kind of the exact same thing as what requests so as long as your jobs are non-blocking or at least partly non-blocking you probably can be doing more than one at the same time and that's kind of the trick here to this rescue stuff so rather than just having one rescue worker for process like Ruby and Python do in the implementations in node we can do this multi worker thing so we can say like have at least one processor and up to a hundred processors and be smart somehow about knowing whether I'm allowed to do more or less so if my CPU is bound like don't make any more workers but if my CPU is freed I'm just like waiting on i/o you probably can make some more workers and working so like how do you do that so it's really simple actually in theory to ask if note is blocked you have a timer and you say hey set timeout 10 seconds if he takes longer than 10 seconds to get your timeout back you're blocked so this I stole from module TJ has called node blocked which is he has it exactly that he has a timer and he uses the process time and he waits so many seconds and if he doesn't get it back he knows what his delay is because and there's a kind of a buffer like you have to allow for maybe 10 to 20 milliseconds to let your CPU kind of catch up but as long as you're within that buffer you're probably on time you're probably processing okay and otherwise you're blocked but only this I don't know 15 lines of code here is really all you need to know if your CPU is pegged the CPU of the node process and as long as whatever you're calling is wrapped in either a set immediate or a process time neck stick you'll be able to like make sure that it is called to work when the CPU is free and you can see how long that takes with this right here so what you can start doing is you can have a couple examples here so we're gonna have a blocking slave remember terrible ideas this is how you block the entire node CPU while true just like wait for time to pass so this will take the CPU 100% it'll just sit there and do nothing until time passes and the equivalent is the kind of the async sleep with just you know set timeout that does nothing and that's the freest kind of time where you can wake so we have these two jobs here to kind of demonstrate this there's like the slow job which uses our blocking asleep and then there's the the set time out job which is free yeah pegged pegged is a an older term that means like so you have it's the CPU as used as it can be right so you're always doing arithmetic on it you're always working it it's when your computer fan starts making weird noises at you it's like doing complex math as opposed to just waiting so the event loop is really smart and I'm not smart enough to understand exactly how it works but it knows like if I'm not doing any i/o right now like I can chill I can sleep a little bit and come back and check again if there's something for me to do but if you're saying like while true like add one or something the old Fibonacci exam like it's always doing math that will just use your CPU up as much as possible and no other IO will get through until that's done so like if you have one web request that says like do this blocking asleep for 10 seconds no other web requests will be able to be processed because the CPU is fully used like fully whatever like claimed and then once that job is done then node will be smart and start processing the next jobs as it goes that's the term pegged basically claimed used fan spinning so the whole point of these background jobs is they probably use more CPU than just the web server and the whole point is splitting these up is that you want to make the web server really responsive and fast a client requests and you're kind of offloading any job that's tricky to these workers on the side so it's really possible these workers might be doing CPU intensive operations and that's fine because it's not going to affect the end user as long as you have enough workers and everything and the whole goal here is kind of to save money right so you want to work as many of these jobs as possible if you can and if you can it has to do with whether your CPU is free I think I have only a couple more minutes left so I won't run the example unless anybody wants to see it run it all right all right I'm digging around the example let's do it all right so node 5 so this example will do we're going to incubation we have a couple queues here and zero workers running so I'm queue like ten of those high CPU jobs and like a hundred just sleep jobs if I did this right so here's our queue we have a thousand workers and we're only going to work one at a time and it's going to work one by one every second get down get down and then once it realize that the CPU is free it'll start spitting up more and more of these workers just kind of sleep which is the next job and it'll be smart about going up to the hundred limit that we've said and then once the job they're done it'll span back down and doing this a note again is this is as simple as this bit right whoops this bit right here just checking if my CPU is pegged and either and then doing more asynchronous work so we can like stare at this if you want but eventually it'll work all the way through and and clean itself up so what we got a node it's it's way better than two languages because you have again in only like 15 lines of code this amazing way to tell like how much work you're doing in your thread in your process which is really new and unique and something to note so that's in the node rescue package now it's called the multi worker because I'm great at naming things and that's a great feature of node so I think that's it so these slides are up the github projects up so you can play with the stuff and there's some Doc's to the rescue project is to kind of see haven't implemented and now we finally have a good way to send email we have a background job we have a database to store it and and we're not like wasting resources in our web server and hopefully no one will get spammed
Info
Channel: PubNub
Views: 35,758
Rating: 4.9049034 out of 5
Keywords: node background jobs, node background tasks, node.js background jobs, node.js background tasks, node.js, node
Id: XL-nCvj2DO0
Channel Id: undefined
Length: 23min 6sec (1386 seconds)
Published: Wed Mar 11 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.