Stream Into the Future (NodeJS Streams)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I am attack Elena at Mitaka Leena on Twitter so please follow me just so that you know and I'm trying to eat 10,000 followers by the end of the year so please help me into that that release that my son so I'm here today to talk about nodejs streams and the future of Nod streams and you know how you can help or maybe what you can provide feedback on so a couple of things first I work for a company called near form we have a nice booth in the booth CRO we are also doing some raffle thing with some smartwatches and air foot so come by oh they you know we donate some money out of the Ottawa charity out after the booth after after that so please come by swing by would be interesting so we are not yes we are a professional services company that do all things javascript so you know if you're if you have some problems with your javascript and you need some help and your team need some out likes please pass by we have a lot of things to talk about so conversation so let's go into into streams okay how many of you view snowed streams I mean if you have like node streams okay a few maybe I mean if you understand those streams okay I'm not sure I trust you [Laughter] so a stream is like an array but over time essentially instead of having a big chunk of data enough of memory which iterating upon your receiving data along the way and your processing them over time this is fantastic because it enabled us to crunch an insanely high amount of data with a limited amount of memory which is fantastic right that's that's what why there we are using them we are using them for file processing and a lot of other things so there are key part the problem is that node streams are really complicated and most people don't really understand all dreams at all by even some cork not core contributor struggles a little bit into when it comes to those two reviewing those crispy PRS it's really you know it's a really complicated code base that evolved a lot over time and you know in fact it emits about number of events so not streams are based on having to meet parameter and they emit a lot of events you know you can have the data event you probably have used you have you that is the readable event which probably you haven't used much because that's not very popular that is close this is a meaning this end that there's some other meaning different from close by the way you know there is finish which you know close at the finish you know they kind of mean the same thing right no they're truly completely different tree completely different meaning and then that is destroyed to tell the full thing the full thing down and I have not mentioned error because you probably have known that emitting every node is a really big thing so there is also that to take into consideration just to recap data and readable are for reading data and that close and then finish are emitted when when streams are ending to some extent the meaning of end it's on the readable side the meaning on finish on the writable side so when a readable stream is done or the writable stream is done successfully successfully only so if there's an error or an abrupt operations are not emitted close is emitted when the underlining resource is brought down and it will be emitted also in the case of fibers so so once upon a time the only way to interact with a stream was by using the data on data event you will find a lot of these examples everywhere okay this I call this the wild horse example because essentially you are receiving as much data as fast as the source can provide you which turns out to be a very bad idea because you cannot really you know it's really hard to pause the source to some extent enters a please slow down because this is too much data I don't know what to do it with that here also take into account that Jesus posted the source of one of the wars back ever so if you put with will is it okay if you put a sink here you are creating making a lot of pop card so don't put a sink there James has a talk tomorrow about broken promises go to his broken promises talk because it's a really good talk to be to go and dig but pressure with that on data model is really hard because I'll do you stop there's a pause method on strings but pause is only advisory so so at some point it was introduced a new in another API called the readable event to actually you know provide a pool based model so instead of having these fire rows of data okay let me just call you and get me notify me when there is data available and then let me read that data from you so instead of you know a fire hose of information is more a pool model okay so streams can have a push model and a pull model I'm implementing in the same codebase how does that sound that's it's are able to maintain okay but these is actually the probably the best way to interact with the string because you actually can you pull data from it so you use the internal buffer in the most optimal way still again not really this is not really readable to be honest it's also more performant because in this way you only read the data that you need so it's a little bit more it's a much more gentle in the buffering off of the streams error handling is still super complex though streams and this probably the one of the source number one of memory leaks and file descriptor leaks file descriptor lists are really fund to to diagnose so this is now this is how to not stream a fight over HTTP don't do this okay how many of you have done this few people don't do this at all this is community this will create a memory leak immediately y-you can you see who I can this create a memory you know can't you see the super is it's obvious right it's a fast that's a simple API out what can I write bad you anyway so if you if you look at this you see these PI press so if the response errors or quit or the and the other side close the connection the read stream is not closed or till now or destroyed so it stays alive and stays a light forever because then there is nobody to consume that string and that's that's a big problem so essentially if in your code are using pipe without having some special error handling magic you're probably of a memory leaks and where your application so don't use dot pipe what you need to use is pipeline which is a new operator that we have added in north core I think ten or something or eight maybe even eight I don't ten ten anyway it's available on as an open source model under the name pump and these Souls released inside the readable stream model module anyway there is pipeline and pipeline is actually really cool because it actually feared player down the streams so if one of those errors it close all the others so basically it makes sure that there's no memory leaks and no leaks of any sort so please use pipeline okay don't use pipe you might want to ask why can't you change the behavior of pipe to actually do the right thing well backward compatibility we cannot break everybody okay there's a lot of code that the light of that behavior out there so you can we can't really change that that's bad you can also use a framework by the way to serve files pretty probably the best option is actually to use a framework because there are a bunch of things to be done so don't survive manually you just use a framework you will do it for you it will do the right thing actually so use a thing note that all the best features nodejs are based on streams so as good as best as it sounds it's an aging codebase that is a lot of history and although jes repents of this a fast depends on this HTTP depends of this HT picchu of course and a bunch of other stuff so standard input a standard output dependencies things so it's really important to understand and evolve this code base in them in the best way without breaking everybody because the biggest problem with developing note stream in node core and somebody field an ogre collaborator will tell you that if you touch those you can really break everybody so don't break everybody there is a one bit big problem though you know because the JavaScript community use a sink await everywhere now how many of you use async await good everybody is there somebody that do not use any async await at all we can probably have a beer afterwards right anyway so everybody has moved to a sink await the biggest problem with streams and the sink away it is that the two things do not really mix well together and if you try to do that we are probably have seen some new promise and then doing some shiny guns to get actually the thing to do what it what you want to do I don't know you probably somebody as soon is nothing I hope I hope it I'm meeting some nerves here but the outer do it is not clear is not easy and it's not straightforward so essentially you have we're not doing a good way of serving you folks that's not core contributors into this by the way I'm part of the team that maintains the node streams so member anyway so we cannot really change the things that are already there without breaking everybody because everybody depends on these things so for example we cannot really change all the streams work because then it will break whatever web framework that you are using those used streams or you can change them if we break some of the file processing that you do so we can do some surgical changes but we can't really do much so so then the question comes how can you improve the usability of this usability of streams considering the fact that you know people love us in Kuwait so the first one was the number one mistake that people do with streams is on data sync don't do that okay right now I'm working on a fig go to this totally but I'm going working on a fix I did a PR to help with that but to actually provide some safety net but the net result would be you will have a memory leak if you try to do that and so the thing that I've been working on for the last few years he has been to provide a sync iterator support on on streams how many of you knows about a sync iterators hey fantastic guys the best feature that has been put putting inside one of the recent we should have note they came into node it said note 10 so that basically after December by the way no date goes out of maintenance in the 31st of December this year so stop using no date so since January 1st all supported version of nodes yes we'll have a sync iterators in which is fantastic so we can use them everywhere depending on whichever node version we are using how does this look like well you have a bunch of things to say so first one up is you can have an a sync generator of amazing generator this is actually a very specific way to write a function that you know can spit out things you know that you can iterate on like this with for await and you pass an iterator now what you probably don't know is that there is a full protocol under that thing you know to power that so you can write your own a sync iterators without using the the function syntax for example more or less like you can write your own iterator normally the resource for other objects so you can if you want to use for off into on an object on any object you can actually provide that type of capability because there's a protocol so there is your your object need to implement certain methods to do things so this is pretty interesting can we use this with streams well maybe we when this was being standardized we were working with tc39 to actually validate that we could actually use them with streams and we could so so things are compatible and compatible right now so you can use this code right now so you don't need to write any turn on any flag and so on this code is so you can create your stream and then you can faraway it and iterate over that stream pretty cool right and you can just process those chunks and that's it and when the for loop ends it ends hey I don't need to use any crazy events I don't need to do any shine against I don't need to do anything crazy I just can't just do my thing and don't be done with it and note that if I break I use break from my for loop it just tear down the streams automatically again I don't have to do anything about it so it just dude they just end up the very 80% basic use cases that you need and use the down the other API is as low-level components also in node 12 but also not 10 now these are shipped on node 10 as well so I need to update these lights in node 10 in all version of node but it was just recent recently ported to not 12 so like the last release of no time we have added readable dot from which essentially you can pass you can pass a an iterator or a nursing calculator to readable dot from and it will automatically convert you in to a stream which is pretty fantastic so you don't have to do anything now okay I can just use I can just use these to convert to convert us to a function or an object or an array to a stream so it simplifies a lot of testing for example because you need to test your streams right I hope all of you write unit tests if you don't write unit tests go start writing any tests so you can and then use pipeline always use pipeline so with now you can also go and write some crazy you know concept of Inception so you can start with an a sink sink generate on function convert into a stream and then iterate over the stream now these are some little side effects okay stream one of the key concept of streams is data buffering in order to a stream in order to be performin is to will buffer data and so you're a sync generator will get cold a little bit more up to fill the streams buffer so it has a slightly different behavior than these would be have a slightly different behavior that are normal if you just iterate over the scene generator still is doable and you know you can use these things to actually level the field up it's pretty fitting nice because then you can just accept a parameter if it's a sink I terrible and just a sink iterate over that and you can use an assassin assassin generator or any other object that implement that protocol and your code will actually be the same so it's pretty powerful it's pretty powerful okay so it's demo time so hopefully the demo gods will be with me you know because that's that's an important piece so I have a seven examples I'm not sure if I go through all of them we'll see so this is the first example so this is old-school streams so you implement the red with a the youth pass you implement the read method and with read you just you know iterate over a big array and just push all those values in and then push now this is how you would do implement these things a long time ago so you run this and you just spit out of the code right now this was how you would do this thing note that you knowed we change I changes this and not these will not run on note six it not six this won't work by the way okay we have fixed some stuff I do not think this won't work whatever don't try that or not six of course that is long gone so we don't have to worry about it anymore so now let's go and look at another version of the same code so this is just using readable from know that we see use on data to put to process that to process the data however in this case we just pass the array through and we just get a stream back this is really powerful for for testing for example because in this way we can be compatible and so on without you know breaking a sweat so we can run this code and you will still see you can still work the exact same now what you can do is you can also go into the full inception mode to some extent and you can pass in a generator generator function know that this is all synchronous okay a generator is synchronous a sync it a sync generator it's a synchronous okay anyway you can do this and you can still run it and you still get this amount with it before but now you're generating it from a function instead this is pretty useful again we can of course use those with we can use we can go from an iterator to Anna sync iterator using this API so we can pass we have we have at the our generator here we pass it to readable from and then we a sync iterate over it why because it's all pretty much generic so we can still run the code in still spitting out 1024 numbers a lot of fantasy so as I said we can also go full inception mode and do the sync iterator to a sink iterator thing okay note that if I do like let's let's let's show this so this still runs not that if I do instead it will still do exact same thing okay we still work the exact same so there is as I said there is a little bit of difference related to how much data is fetched from the sync iterator function so there is that thing to take into consideration but that's that's it so now you can also given that so those are streams though so you can also call all the full api's so for example if you have a if you have a five you want to read from a file and we want to read as it as as text as shrinks you need to set the encoding and you set the according to a utf-8 and it will still work now let's run six it spits out the same file yeah a lot of hard magic here and note that when you do this then you can do whatever processing you want in here so for example if you do by the way always use promise if I if you want to get promised version of anything don't do it manually the chances if you did make a mistakes doing manually or some of your colleague will make a mistake after you have done it correctly six months later it's very high so don't use promise advice that is there for a reason and it's also really fast so we go with generators Lib function with set timeout there are a pretty a lot of nasty things that you can just for if you just forget a return in your new promise thing you will just create a lot of poppers and memory leaks and very bad things so avoid again broken promises and you know we can do an a quick await here for of this you see now is taking some time and note that this is interesting okay do you know why is the single chunk yes because it's read as a full Sun from the disk interesting right so if we do the same thing in in here okay if you do this the same thing in here yeah you will see that now it does what God says on the tin okay note that if you do it in in that's - okay I want to correct one note that yes I'm going forward so anyway last thing is that they want to show you it's an example on the server okay note that when you're using a singer wait on the server you also always need to put a catch handler in there make sure make sure that if you don't put a catch and ER you are going to create a memory leak everything about you called make promises safe the name implies the name is exactly what it implies or you crash on and handle rejection this is a big conversation and it's not the focus of this talk but if you run your code with no an Android section enter and you don't crash on and handle rejection you probably have some memory leaks in your code very likely anyway just always put a catch under when you are moving from a known a sync function to a promise based function or a sin function so what we do here we just count the chunks and count the length and and then we just here we just use the request module - to do two pipe things so I'm just running this up and you see this is all working fine and it's actually really fast note that [Music] using a sync iterators is actually really fast as prolly faster than using other things you know using gun is as fast as any other way of processing streams in not so use that that is doesn't have a big performance overhead I we have optimized the sync iterators like to the bone problem so it's the vehicle the code is really nasty from in there just I brought the things so let's go talk a little bit on what is missing so the first one that is missing is how to write a transferable stream using a sinc iterators so there is for example something that you could use to actually process the data along the way this is actually a very nice idea because we can have transfer by and you just pass in an a sinc iterator a sync generator function that except on a sinc iterator as an input so it has these all its own morphic input and output and then you can just for await the thing and just yield the results so it's a pretty nice pattern to write a transform a way to transform data pretty cool right I love this so and then if you convert this each to a transfer stream then you can use it with pipelining interact of the stream secure system modules note that this is currently work done so hopefully this PR will land again it means it got stuck at the meaning of tober so I'm hoping to unstuck it this weekend we have the collaborator Summit written in two days so hopefully I can get that unstuck and you can get that landed eventually I would love to be able to create a to avoid creating a transform completely so you can just pass in a single a sink a sync function I think I think I think generator function that essentially implement that pattern for you so in this way you know you can do avoid creating another stream you can just compose function as much as you like it's pretty it's pretty interesting this is not implemented yet this is fantastic code I'm living in wonderland now hopefully this will eat your screen maybe on node 14 hopefully we'll back ported them now I've talked about readable I've talked about transform now I'm talking about writable you know I haven't talked about writable at all at this point writable well I don't know what to do with writable that's the end story and you know if you want to write something to write something you need to consume asana sync iterator right however you want to have an operation that actually you know returns are freaking promised the problem the fact that you know our current right thing does not return the promise and it's very hard to create the behavior to return a promise why well in this code I would like to be able to write this code okay look at this this is we have write a bill right and then we do a way to write chunk now this code is correct however all the magic is in bill right the problem is in the error ending model of promises virtual streams streams emit an error event as soon as it happens promise is like quantum physics so they will error only when you look at it that's a net that call parts of the problem so managing and this is why using promises with streams is really hard because one thing in errors straight away the other thing around only when it when you look at it so those two things are really really hard to mix and match and you can see this this code that does this kind of shanigan to make this happen so essentially we are we are registering an error event and cashing the error you know I'm cashing the error so that whenever you and we return with this right function will return a promise that you know when you look at it will error okay it's really bad it's really bad code so I don't have anything better than this so if you have any ideas any promise expert that wants to give it a shot I am really welcome to to to brainstorm things this is still open so how does it all work well almost is implemented in JavaScript so you can use all of this stuff in your code learned the pattern and use it for other things so I think iterator as I said I'm running a little bit out of time so whatever hope you don't kick me out so or not this is an extract from North core so essentially we are implemented the symbol ADATA sync iterator primitive that returns a function and essentially we just create call that call our generator for that our builder for that and we get a turn on us in Kotori so essentially it's in order to prove to build something you just need to provide that note that if you want to build and I think iterate or you can you should use this type of pattern if you don't want to use on a sync generator function that is nice and really cool to use but if you want to do some of the hard stuff that we're doing or providing the mapping that we do in note 4 what you need to do is that you need to create an object which as the prototype of the sync a traitor and and implements a method like next and return next when it's called one night is called to get the next element when the return is called when you exit that's it there is some other few but they are less important these are the key to once so internally not core we wrap the readable event these are some consequences and one of those thing is that is there it's really performant so that's the good side also it handles it windows while buffering and and all the things this is a long link sorry when small is the source of the implementation now I'm going to open it up for you ok and I'm going to open it up and just do a quick scroll he choose 200 lines of code ok not that bad not that much but really lot of very ugly stuff in here sorry lots of edge cases and ok so there is no performance penalty in using a single raters over the other method so that's the key part which if you talk about promise based API saying there is no performance penalty this is pretty big and I'm not find and underselling Elizabeth typically promised best a far lower than code back best ah this thing is on par just so that you know one last thing before I finish there is what woody streams now what would you stream says what you're using in node fetch okay in a certain uh keynote file using in fetch in the browser not in not fetch not not does not have what would you choose now this creates a little bit of abyss compatibility between the two things this is what we do streams were which is this pipe to method three complex again okay all the key part there is streaming is real streams based API are really complex you know and you know if you want to implement those you need to implement this thing which has a number controller and a lot of other stuff which is really complicated again so not really easy and I just want to skip through this but we would like to have those in North core and to make it more consistent you probably seen my stock so this makes us some sense to have this in North core however we want to be compatible with the node ecosystem we saw we use node streams at what time is here the next one okay a tree okay so I think sorry however what with these streams are going to be a sync hi interval so that's cool they implement the same pattern so I hope one day to be able to do this type of thing which may we start from a water G readable and then consume is just using an a sync generator function pass to a not writable and it will or or fine so we're all the introduction is being managed by via their sync I traitor protocol which would be pretty fantastic do you want to get involved please reach out I will be at the booth so reach out to me the next few days if you're interested in you know stuff you know those questions again we are near forum again this is a raffle the blah blah blah okay and thank you [Applause]
Info
Channel: Coding Tech
Views: 33,225
Rating: undefined out of 5
Keywords: node.js, nodejs, streams, javascript
Id: aTEDCotcn20
Channel Id: undefined
Length: 33min 32sec (2012 seconds)
Published: Sat Jan 04 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.