Ryan Dahl: Original Node.js presentation

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

I have seen it few times already, but never realised that he had promises there from the beginning. That's a surprise. I thought node was always callback based since the beginning.

👍︎︎ 5 👤︎︎ u/Capaj 📅︎︎ Feb 20 2015 🗫︎ replies

dude is nervous as hell, that's how you know it's genuine

👍︎︎ 5 👤︎︎ u/DJDarkViper 📅︎︎ Feb 19 2015 🗫︎ replies

He left the node project in 2011. I've wondered what he's been up to since then.

👍︎︎ 4 👤︎︎ u/Joghobs 📅︎︎ Feb 20 2015 🗫︎ replies
Captions
I'm going to talk about node so briefly so note is is a server-side j/s platform it's built on Google's v8 it does it IO in a very special way which I will describe in great detail it uses the common J's module system and it's written in C which has confused a lot of people so it's a rather large C project actually so the thesis is that IO has to be done differently we're doing it wrong everything the way that we're thinking about doing IO really makes things difficult so writing servers and writing any sort of application is difficult because of how we're doing io so a lot of web applications have such line of code like you query a database and then you return a result and then you use the result and so the question is what is your web framework doing well that while this line of code is running so in many cases you're not doing anything at all you're just sitting there while the database is waiting to respond and that might be a that the database is in San Francisco or it might be thank you there's a lot of reasons okay well the point is is that you can't just wait for it to respond there's a big difference between what happens inside your CPU inside your memory and what happens when you go to something outside of that if you go to a disk or if you go to a network if you have to do a TCP connection to a different server even if it's in your same hosting center then you're talking about millions of clock cycles instead of hundreds of clock cycles or tens of clock cycles so you can't do nothing so obviously better software can do better than just wait for the database to respond it can multitask so you can have different threads of execution running so the question is is that the best that we can do and I think you can look at two popular web servers and see how they're doing i/o and decide if what what they're doing right and what they're doing wrong so here's a benchmark which is possibly not visible to but it shows a concurrency on like the number of concurrent clients on the web server on the horizontal axis and on the vertical axis it shows requests per second and so this is engine X in Apache and what you see is that the the engine X is is responding faster not so much faster twice as fast three times as fast especially when you get to higher concurrency who cares but the big difference is when you look at versus memory so again on the horizontal axis is is a the number of clients on your server but now on the vertical axis you have memory and it looks great online okay so the point is is that Apache uses tons of memory when you start getting a lot of clients if you have 3,000 people like connecting to your Apache server you're using many megabytes of memory where as engine X stays very stable it stays with a small footprint and the question really is what's the difference between these two and the big difference is that Apache uses threads for each connection and engine X uses an event loop okay so a to be slightly technical the first benchmark kind of tells you that context switching between the different threads which is what Apache does is is not free it costs some CPU time and the second one is that each thread takes memory and that ends up being a lot of memory and kind of in the building tight little server thing community if you want to make a little engine X server or you want to make a little IRC server everybody knows that you can't use threads for connections this is not the right way to do concurrency the right way to do concurrency is to have a single thread and have an event loop so you do something and then you're done with it and then you do something else and you're done with it but that requires though is that what you do never takes very long you have to have non-blocking i/o so apache uses OS threads there's other threading systems you could have green threads or you could have co-routines these can help the situation very much but it's still machinery it's still something that needs to be done you have to you have to do something to create the illusion that when you connect to the database your program halts you don't move on to the next line until it comes back it's an illusion your program is not halting it's doing all these other things but it looks like it's halted so I say that threaded concurrency is a leaky abstraction it's it creates pains you have locking problems you can you have these memory problems it's hard to think about it's not a very good abstraction for what's really happening on your computer so code like this where you call a function and it connects to some server and returns something from that server as if no time has passed and then you're going to use the result beyond that this somehow either requires blocking the entire process or you're going to have to have some sort of threading system maybe it's co-routines but it's probably going to require multiple execution stacks but you could have the code like this where you make the query to the database and instead of waiting for the response inside that function you give it a callback in one format or another when this happens you can your execution can run right through that statement make that request and continue doing other things when the request comes back millions and millions of clock cycles later you can execute the callback there's no machinery involved in this all you need is a pointer to that callback so this is how we need to do i oh if you want very fast high concurrency servers you have to design them like this so okay you say yeah but everybody's talking about threads you know my boss says that Java threads are super cool blah blah blah why isn't everybody doing this why should you believe me um well there's there's two reasons uh I think they can be summed up in two reasons anyway and it would be cultural infrastructural so I think that there's a cultural bias against doing non-blocking i/o you know your first program that you ever do IO with is something where you enter your name and then you get the results somebody types that in but you block on that function you don't do anything else and then you print out the name so we're taught to like demand input and we kind of treat it the same way on on sockets so we connect to the database and all right give me the give me the give me the response so people see code like this where instead of waiting you just give a call back and they say I can't do that that's that spaghetti code that's too complicated I think we should reconsider this I don't think it is necessarily more complicated maybe in the simplest cases like this but when you start writing an IRC server it becomes very natural to write code like this and the other reason would be the missing infrastructure so remember if you're on an event loop you can never block on i/o you can never wait for the database to respond because you're in a single threat if you ever wait everything else shuts down we're in a threaded environment you can wait occasionally or for very long periods of time so the problem is is that we just don't have like libraries available to us to do this sort of non blocking i/o so just a couple of examples like POSIX has a asynchronous file i/o specification but it's really hard to find such libraries man pages often don't state if a functions going to access the disk or not you just don't know which is important closures and anonymous functions you don't have in C so that makes writing such evented code difficult and things like live MySQL client don't support asynchronous queries I think they actually might do asynchronous queries but not asynchronous connections and Aengus asynchronous DNS resolution is is really difficult to find so there are certain solutions you might have heard of event machine or pythons twisted or if you come from Perl any event these are libraries that provide such an event loop with non-blocking sockets and it is fairly easy to use them to create efficient servers but I think that users are kind of confused as how to use this if you're using Ruby there's all of these libraries available you have a vent machine in here and you want to know how do I use these other ones and usually the answer to that is you can't because the MySQL library for Ruby blocks on everything so you can't just throw that into an event loop but people don't know that and so users really still require some knowledge of event loops or non-blocking i/o which almost nobody has and so it's doesn't abstract the problem very well but yeah luckily javascript was was designed in such a way that that it was built for an event loop the browser side javascript that you have is an event loop you create a button somebody clicks it you get a unclick callback this is exactly what one needs to do event loop stuff so yeah it has anonymous functions and closures when you're on the browser you only get one call back at a time you're not getting multiple callbacks and you don't need to lock variables and the i/o is just done through those callbacks so I think everybody in this room who is familiar with Java JavaScript is already kind of prepared for writing evented servers you don't you you don't really need to know that much more so right so that was a long motivation so what I want to make is a a non-blocking infrastructure so that you can make very highly concurrent servers and you don't need to know about it we're going to abstract all of the difficult non-blocking event loop you don't need to know about that it's just going to be callbacks so right so so let me just give some design goals and then I'll give some actual practical examples of how to build a server so first of all no function should perform IO to receive disk or network information from or from another process you have to have some sort of callbacks so you can never have that sort of function that makes some query that returns something that's not allowed it should be low level so I want everything to be able to stream in and out you never I should never force the user to buffer data and if you're familiar with like Ruby on Rails or something a lot of place force you to buffer data in one way or another I don't want to make those choices for you it's low level people can build on top of that so if they want to buffer their data then they can do that but at my level at the node level it should not make any sort of decisions like that and similarly I shouldn't remove any functionality at the POSIX layer so for example have closed TCP connections everybody's shrugging but there's good stuff to be had that everybody ignores um and so I should also have like some built-in support you don't want to write everything I want you I want it to be low level but I think TCP and DNS and HTTP are infrastructural protocols they're very important and so there should be very good support for them in such a system and in particular with HTTP it should it's going to have many features so we're going to have chunked request chunked responses I'm going to have keepalive and importantly it should be you should be able to get a request and respond to it at will so you should be able to hang requests and this is what you need for comet style applications if you want to do a long pole then you have to hang that request and wait till you have something to tell the user and finally the the API needs to be familiar so if I'm going to have a timer I'm going to call it set timeout it's going to it should look like browser JavaScript and where it's not browser JavaScript where I'm talking about POSIX stuff then it should use the POSIX names that is I don't want to reinvent what people are doing I just kind of want to present an idealized version of that and then finally it should be platform independent at the moment I don't compile on Windows but there's no reason I can't and I hope to okay so now now some actual examples so note is you have to compile it so you have to download it so there's no binaries is what I want to say and you there's no real dependencies other than Python so it should be fairly easy to to build it's not it's not very difficult or at least it should not be okay so here's here's your first example this program is going to output hello after having waited two seconds so first first we require the SIS model which we need to output some some data this is the common j/s require uses the semantics defined by commonjs then we set a timeout for 2000 milliseconds importantly the callback inside is not done right now first we print hello which comes after that two seconds later world is printed out so node after the world is printed out node exits so that's kind of important when there's nothing else to do on the event loop if there's no more timers than it exits it's the end of the process okay so so here's what you would do you would put it into a file called hello world GS and then run it with the node program you get hello to send kids later world in the process exits okay mmm-hmm so now we're going to change the hello world program and what we'll do this time is is we're going to go in a loop so we're going to use the the set interval function and we're going to loop forever and print out a message and then what we want when the user kills it when they hit control-c that it prints out a message and then exits and this is to demonstrate the special process object in node and the how you set a signal handler the for the interrupt signal the SIGINT okay so so again we do the require we just need one function so let's just pull it out into a variable in the next three lines we set up the interval which calls that callback every 500 milliseconds right and then the last couple lines we're setting up a signal handler so you do this by calling this add listener function which is a should be fairly familiar if you're used to the dome then you put you you print out goodbye and then you you exit the process so this this process object emits events when it receives a signal so it emits the cig event there it would be the same for any other signal that the process might get so so this is like the dome you just add a listener to catch what the process is doing there's a couple of other things on the process like the PID the program arguments the environment current working directory memory usage useful things so the the way that process emits these events is is pretty typical for node so many objects emit events it's kind of the fundamental paradigm in in node so like for example a TCP server would admit a connection event every time somebody connects and like if you have somebody's doing in HTTP upload it would emit the request object to admit a body event each time you get a packet of that upload so somebody's uploading a stream a movie or so to your server and you get body body body so yeah all objects that emit events are instances of the event in meter class okay so so here's here's the first TCP server example so we're going make a TCP server it's going to listen on port 8000 and then when somebody connects we're going to send the the peer a message we're going to say hello then we're going to close the connection so a very simple TCP server so we have to require the TCP module that's what we do in the first line and then we create a server object s and we that's a TCP server and then we add a listener for the connection event and we get this object C from from the connection event that's that's our connection we send it hello we close and then finally we have to start it listening on port 8000 so if we tried that then we would start the we would put that code in this server yes and you can tell that to local host at a port 8000 and you get hello and then the the server closes the connection okay so we can simplify this a little bit so the the connection listener does it you don't need to call this add listener you can just pass it to the constructor and so so we can simplify this program until just a couple lines by by doing this so file i/o and note is non-blocking this is something that's usually very hard to do and note it's quite hard not to do it's a bit the opposite which is good I mean the the way that you should be doing things should be easy in the way you shouldn't be doing things should be difficult okay so so what we're going to do is we're going to a look at the last time somebody modified EDC password so first we require the post six module which has all of our file i/o operations and we pull out this stat function we require our puts function get that from the sis module and then we call stat on an EDC password and it returns a promise which I'll talk more about and then we add a callback to the promise which gets called when when the stat operation was complete and then finally we print out the the modified time so the these these promise objects are pretty common all the file operations return a promise and so a promise is is an event emitter which which emits a success or an error event so if you do some file operation you don't want to block because that's going to be a long time you can't just shut down your server while you spin the disk so you send off this request to the to the disk okay tell me what time that that file was modified I'm going to go do other things and then eventually it comes back and it says success I got the answer for you here it is or error so the the promise dot add callback was was just API sugar for for promise dot add listeners for success okay so we're getting progressively more complicated now we're going to do an HTTP server we have to require the the HTTP module and we create a HTTP server object and the callback for this which is called on each request gives you a request object in a response object so then we send that the header the the 200 success code and content type text plane and we send body hello send body world and then we finished the response this is slightly more complicated than if you were at the narwhal talk before Tom promotes the the J SGI specification this is more complicated than that but it's it's for a good reason J SGI requires you to Iraq to rap it's it's the same thing you have a function and then you return the result so all the processing happens in a single function which as I said at the beginning is something we want to avoid if you have to connect to the database then you don't want to you don't want to have to respond in one function now I think that there's some ways to do that with with J SGI but i like this i think it's it's simple enough so let's let's try this out so we put the the code into HTTP server j s we run curl on it and we get hello world ok now slightly more complicated we're going to do the same thing we're going to have an HTTP server which outputs hello world but instead of outputting at all at once it's going to output hello and then it's going to output world two seconds later so what we do is we put a set timeout in there so we wait for 2000 milliseconds in the the request callback so you might wonder like okay who cares why would anybody want to do this but this is important because while the server doesn't shut down when when the set timeout is called it doesn't say set timeout it's not asleep you're not sleeping for two seconds and not doing anything else you're serving requests so here's a request here's a request here's a request and then suddenly okay that one's done that one's done that one's done this is the exact behavior that you need for commet style application if you need to do a long pull you have to be able to hang requests in a ficient way and this demonstrates how you can do that this isn't the right way to do a long pull thing but it demonstrates that you can hang requests so let's try it out we put into a file HTTP server to call it with node run curl we get hello two seconds later we get world a streaming web server okay we can also use we can also call programs with node one would hope so we can do that with the XE command of course this also returns a promise because it's something that doesn't happen immediately it doesn't happen in memory but you know if you do LS slash that might have to spin the disk and so there's some time that happens between the time that that starts in the time that that finishes so we have to have a callback so we we add a callback to the promise object which is returned from and then we output the output okay so but I told you before that I never would force people to buffer data and that just buffered the data so there's also a lower-level way to call sub process to call it to make a child process where you can stream the data through through the standard i/o so you can stream that if you're you LS on a huge directory you don't want to buffer all of that data in memory you want to stream it to the to the to the parent process and pars it and handle it do whatever with it but hopefully not buffer it so this is a simple form of inter process communication so this is an example of that where we're going to launch the cat program like the UNIX cat program which whatever you you send to it it just sends the same thing back and so we create the child process in line three we add a listener for output so every time that there's some output it calls that that callback and then in the last couple lines we write data to the cat process and so that's sending that hello world and line eleven and twelve to the standard in of the cat process and then finally we call closed which closes the standard end of the cat process and a cat terminates when it's standard in is is complete okay so we can strip we can create sub processes we can create child processes and stream data in and out of out of them okay so so now I have a demo for you I wrote an IRC server in node just a hack really but just to demonstrate what you can do so maybe this will work maybe this won't work but let's let's try to go to I or see no DJ s dot org and go on to the node.js channel okay so so I have my my terminal here I'm actually logged into the server so well first of all I'm going to connect to the server hopefully it's still running and it hasn't crashed okay so so now I'm connected and I I will join the the node jf Channel okay so so if if people did that you can see people talking so so this this IRC server is is running on node this is an IRC client of course what I want to do is actually show yes I want to I also have a repo library a read eval print loop and so since we have an event loop we can add all sorts of io to this this is one process but we can add a repo to it without thinking about it because what we do is we do all the on our event loop we we do all the connections we send all the messages we come back and then we can do the repo things if we wanted to add an HTTP server to that that would also be possible all in the same process just by going around the event loop and since nothing blocks we can just add as much IO as we want to so I show you the repo now kind of okay so it's running on screen so I'm just going to open the screen there it is okay so this is the ircd repo thing and so what I can do is is I have total control over the IRC server so for example I can kill some of these users off yeah okay I have full control I can also make people like say things okay anyway the point is we have a we have a repo to to the IRC server and of course if I just control see it then everybody will go away and they're offline now because the IRC server is gone and if I turn it back on then it's just there okay so so that's my my demo I'm not a very good presenter okay yeah so so check out the the code if you want it's it's just 400 lines or so I think this demonstrates that node abstracts the real problem of writing an IRC server away because if you look at the code of that it's just okay somebody connects okay I have a list of users okay I send this user that message it's it's really quite easy to follow in my humble opinion whereas if you try to sit down and write an IRC server say in Ruby I think you'll find it very difficult okay I guess you can use event machine and you might find it okay but I I think that this really abstracts the the the problem of writing a concurrent server so if you want to throw together say a message queue demon start up node type out a hundred lines of JavaScript and there's your message queue demon that that does whatever specific thing that you need it to do okay so so just briefly uh I talked about the internal design of node so note is not some big monolithic app it's it's a bunch of C libraries that are kind of hacked together in various ways so the v8 is not a little seed library it's a huge C library that's by Google of course the Lib Evie event loop and Libby IO thread pool are a really nice little libraries and they're both written by Mark layman there's a HTTP parser which is a quite advanced it could do all sorts of streaming sort of things that's by me and a socket library by me too and then you you DNS is is a DNS resolver which is important so the way that I do this is a there's a lot of system calls that you might do if you have to access the filesystem those system calls those POSIX system calls can block and so what I do is I have a thread pool underneath everything and I say okay I want to do this system call I want to read a directory I pack it up and then I send it off to the thread pool it does stuff and then it comes back signal handlers are usually asynchronous from the rest of your execution stack and this this thread pool thing these things are kind of asynchronous from the main node j/s event loop and so they have to notify they have to be marshaled back into the main event loop you have to say okay wait your turn alright now we can process the signal handler you can't just do it immediately when the result comes from the event loop you can't just say stop everything we're going to do this you have to say okay all right now we can process you and we I do this by by using a pipe from the thread pool in and from signal handlers in and you can select on that pipe another thing that I did is I'm worried about what happens if you pipe a huge file into node so say this this file is 200 megabytes and has a list of domain names and you want to look them all up so each line is a domain name and you want to do dns resolution on them but you can't block the dns resolution else it's going to be really slow what you want to do is read it and stream it into your node process read the lines do the DNS lookups and this should all be on the same event loop so the problem is is that in UNIX or in any POSIX system the standard EDD file descriptor is going to refer to a file and you can't select on files you can't add them to your event loop you can't just read from them because that's going to block so what you do is you create a pumping thread and you have a pipe and so you do these blocking reads from the file pump them into the pipe a which goes into the main application and in this way you can stream data into the the server in a non-blocking way with just one extra thread these are the sort of things that you might have to do know about if you were going to write this yourself you don't have to do that it's under the it's down below the the API that the users deal with if you interested look in the source directory at the coupling okay so so what's next I have to fix some some API issues there's various things that are very kind of ugly I want to have more modularity I want to have a break up the libraries in the DLL so that the the coronoid process is very small and that if you need the HTTP server you would load a DLL to some shared object to to get the HTTP parser for example I'm going to include some libraries for like MySQL and Postgres into the core distribution I need to improve performance that's always an issue there's there's a some low-hanging fruit that I know about that we can that we can pick and make it faster TLS support is on its way and then finally I would like to do some sort of web worker like thing which would probably just extend the the child process object so that you can create processes and kind of do IPC between them in a in a nice way so right now the version is 0.1 point 17 I will release 0.2 which will be kind of the first version that I would hope that other people would use at the moment I think it's a bit hacky and if your experimental then please use it but if not then wait for 0.2 and I think that will be good because I will freeze the API or at least some of it and so you can kind of build on it with some confidence that I won't change it out from underneath mm-hmm okay so yes any questions so the the question is if like if you make changes to the source file will it will it load in those changes no but Felix and I have been hacking on this a bit this weekend it's something I'd like to add yes so the question is what do I do with blocking rights when you have a socket there's a right buffer in the kernel and you can only write so much data to the right buffer before it gets filled up this kernel can't push out all the data to the network as fast as it wants to there's some buffer that that can possibly fill up so if you're streaming a file out of the socket to somewhere else then that can block and note it doesn't block it buffers the data it will allocate data internally if you do that and what you could do is get a callback win that buffer when the kernel buffer drains so what you can do if you want to stream a file to somebody you start sending it and then you send a say a megabyte which may fill up the buffer may not fill up the buffer and then you wait till it drains and then you send another megabyte and so in this way you can stream data the the right buffer will never fill up what will happen is all allocate memory so in in user space so so rights don't block um yes so the question is what is my stance on common Jay s so common Jay s I think has a lot of good proposals so the module system I'm using there's a binary proposal and there's a package proposal which which look very good at the moment common Jas is only ratified I think the the module proposal and the assert this testing library proposal so it doesn't define things like IO so I think those discussions will be ongoing I think there's some people who want it more in a blocking state I want it more in a vented state and so we'll fight it out over the next couple months or so yes a little bit I would like to have more money for it writing your own open-source project requires a lot of effort and yeah it needs funding yes
Info
Channel: stri8ted
Views: 200,521
Rating: 4.9613924 out of 5
Keywords: ryan, dahl, javascript, node.js
Id: ztspvPYybIY
Channel Id: undefined
Length: 48min 32sec (2912 seconds)
Published: Fri Jun 08 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.