The Big Elixir 2018 - Eli Kim - Leveraging GenStage to Implement Your Own Event Bus

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

welcome Eli Kim hello just working oh my god I don't know why guys are listening to me all right I really wanted to play the whole clip but I was told that a minute was too long so I had to clip it to that don't have to deal with it that's me I'm Eli I'm known as monkey patcher on Twitter and Instagram you're into that any logic came on github this is the obligatory ever get frame free mio is a video review and collaboration platform that helps video teams of all sizes produce better video together excuse me we processed over a hundred thousand video uploads per day for my five hundred thousand users and are used by teams such as vice BuzzFeed Turner and NASA a lot of our core platform is powered by a look sir having run it in production for nearly a year anyway back to me I've been writing it looks here for about three years I have terrible stage fright you can't notice already I've never spoken at a conference before first time and I have crazy amounts of imposter syndrome thank you I was about to do some live coding but I decided against that but anyway it's gone today we'll cover what an event bus is why you should use one how we use our event bus at frame we'll go over building an event bus some lessons learned after um production and tons of lists along the way so what's an event bus this is the definition I found at ribbit org which i think is like a java event bus but if you think of it as a giant megaphone it receives messages and passes it along to all of its consumers and with all tech things there's a ton of jargon so we'll get we'll get that out of the way first message your event is any object that can be posted to the bus a consumer a listener or subscriber a component or pretty much all things that want to receive events and hopefully do something with them and you'll hear two posts in this talk which is to pass or something event and this is all pretty abstract hopefully this example brings a little makes it a little more concrete there's no crazy animations here so you just have to close your eyes so when you think of a four-way intersection the car's never really talked to each other you never like roll down the windows scream it's time for you to go but instead there's a traffic light that kind of acts as an event bus it sends three different types of messages red yellow green to communicate to the cars at whatever Lane that they're on that it's their time to go so in this case the cars act as consumers and the traffic light acts as an event bus you could even consider the sensors as some something that communicates to the event bus which it then translates it into the lights how am i doing go alright so there are ten of pros to use an event bus the mean is that the logic the main reason you should use one is that the logic for a particular side effect is isolated what this means is you never have a mix of concerns so if you have like an email service and a push notification service those two pieces of code will never be in the same file better yet as you'll see down the line the code for like creating user will have nothing to do with sending an email it also isolates errors so if you have like an email service and for some reason you push up bad code it won't affect your whole system hopefully and lastly bringing up a new consumer is really easy if you have the infrastructure or set in place all you have to do is maybe like spin up another gen server to consume some events and with pros there always comes cons our job is just picking trade offs integration testing is really difficult if you want to do a test where if you create a user and an email is sent through this event bus you do have to ensure that it's a deployment order you might have to manage multiple VMs you have to also worry about drop messages Paul's talk at the end of the day yesterday shows that like these messages are really hard to make sure that they're all in order so if you look at this code snippet it looks fine it's not that bad right it takes some parameters into database and if it works we send an email and we ship it it's fine three months later we have a PM that comes back and says we have to implement push notifications so you add another function at the end of the pipeline send push notification you have like two more function heads where you you know just handle those cases and you ship it it's fine but eventually it turns into something like this or you have to push maybe you send some push notifications you implement webhooks you want audit logs you want to send an SMS etc and it looks like this where you know you just get deeper and then suddenly you're just taking over by the wave and like it really doesn't scale right if you duplicate this code or like all those side effects are you still watching this all right we're all those side effects and you have to implement those for like every service function for like every crud operations for every resource like every time someone signs in every time so only it's a comment like it's just gonna get out of hand and there's a better way and I hope this is you or this and there is I promise so this is what our code on a high level looks like there's a context function which is responsible for like the core business logic it's responsible for like authorizing users see if they're allowed to do it or not anything that has to happen on the spot like maybe taking money you probably don't want to make that async and the last thing it's responsible for is sending the event to the event bus the event bus then like shots to all the consumers hey this thing happened and then each consumer is able to handle them handle that message as it wants to so instead of having that huge pipeline we change your code to something like this the one like black box thing here is notify and if things go well we send out we call this handle notify function and what that does is it standardizes the the message that we get into a struct that we use and then post it to the event bus and their events look like this before we pass the user created and we have this Mac using macro called event and all it does is it creates a struct with some like standardized keys so that our consumers know exactly what they expect gadi all like talk for 30 minutes mmm right so with these messages we recreate this protocol for like whatever things we have to handle and in this case we have a trackable protocol and what it does is it receives any sort of event and then turns it into a track we also have this fallback to any which means if we don't want to handle the message we just pass an OK tuple or sorry in okay Adam we do want to we define this implementation for trackable for that event and do whatever we need to there so this makes our consumers very light we have this base consumer which I could talk about a little bit later and all it has to do is call track on that event and in queue it which if it receives an okay then like pretty much no ops otherwise we send it to our batch our batch processor that's pretty cool and I promise we make an event bus so here we go the requirements are that we subscribe to the event bus we post an event to the event bus we make sure we broadcast the events and our final criteria is that we don't overload our consumers this is the API that will slowly fill in and this is the first example it just uses a gen server underneath Oh so start linking it like these are all functions that you know that you have to implement when you're using a gen server in this case we give it a name over there just so it's easy to call later and we initialize it with an empty list of subscribers get subscribers takes two subscribers passes it back as a response subscribe disap and the subscribers sorry the paid to the subscribers and this is where like the broadcasting bit happens right all it does is it takes for every subscriber we send the message to that subscriber and that's it Phil then it looks like this and I ran a benchmark I had to keep the numbers low because eventually it gets really slow but it's really fast to be sent like a thousand messages in a few microseconds which is crazy but we don't meet the criteria of not overloading our consumers in this case we just fire-and-forget and hopefully the consumer can handle it so here's a slightly better example we pretty much just have an act here for each subscriber we we pass the message as a call and wait for the okay I've seen their and this is an example of a mock consumer we just initialize it and sleep for random random time in this case anywhere between 100 to 500 milliseconds and then we just send the response back our benchmark changes a little bit where I start mock consumers instead of passing itself as a subscriber and I try running the benchmark and it's just not going to happen it's way too slow and the reason why is because here we have to wait for the act we go through subscriber per subscriber and making sure we make sure we wait till we receive a response which means that if you have like one subscriber in here that's really slow it it's just it just takes way too long and we could do this even slightly better and we do that by using task async stream what that does is it takes the subscribers and passes each one to a function and start to task for it and then waits for everything to come back the problem here is that let's say the call doesn't return it okay you'll probably crash the event bus which kind of sucks but hey we got to do it anyway and it finished the benchmark in around 55 seconds it went from 300 microseconds 255 seconds that's I think a couple orders of magnitude but that's because we have to make sure we don't over a lot of consumers there's a better error way and you do this by going to this blog post announcing gen stage which leads you here if you don't know what gen stage is it's a behavior I'm gonna like just read it off for exchanging events with back pressure between the elixir processes this is like exactly what we want in the case of not overloading consumers you scroll down anyway lets you scroll down you copy all the code and you paste it in so this is this is our event bus alright this stuff stays stays the same you do have to in the init you have to say that it's producer we initialize it with an empty queue and we have to say that it's a broadcast dispatcher because you want to broadcast all the events to all the consumers notify just takes whatever you want to whatever event it gets and puts it into the queue and handle demand is what we have to implement in this case we just we pop whatever demand we get from the queue and pass it along our mark consumer changes a little bit as well we have to implement this handle events I think I have a box for it yeah and we sleep for each event you know just to make it fair and it actually turns out it's a little bit slower than past casing stream it finished at 150 seconds as opposed to TAS casing stream which finishes in 55 you have to keep in mind that gen stage in my opinion really isn't a performance mechanism although it's really fast what jet stage gives you is a is pool base rather than push based events but because of that we're able to get rid of our act completely but wait there's more there's better Asst I hope this works oh this sounds gone there's a beep beep at the end but you didn't get to hear it so the better ist's implementation uses consumer supervisor and what that does is it spawns a task kind of like tasks acing stream or sorry it's funds a child process for each event that comes in so the we use consumer supervisor RN it changes a little bit and we have this worker that spawns that spawns sweeps for a certain time and dies and it's really fast it finished in like around three hundred milliseconds which i think is as fast as you can get because we have asleep in there from 100 to 500 milliseconds and to show how consumer supervisor works I have a small demo let's see see if I could do this right okay see this so our application looks like this we just spawned the event bus and the mock consumer and law consumer looks like this nothing really changed line 11 we have a maxim and the five just so we could show it going in and out and the sleeps got a little longer so you could see the pigs changing at least I'm not live coding so we start mix you could start the observer and then you could see the eventbus running and our consumer supervisor and weekends and then you been if hello you could see the pit pop-up and after maybe five seconds is she'll disappear but the cool thing is if you do something like this you can hello you'll see the test pop up and then you'll see the the pit can you read that you could you could read the pits and they'll cycle through as the as a test died off so when one dies off supervisor spawns another one you know slowly go through all of these events all 10,000 of them which will probably take a read a long time that's the demo I breezed through this damn all right so our event bus processes around one and a half million events per day at a peak at around two point two and it takes about 0.01 nanoseconds for us to consume each event which means like all of our background jobs like emails push notifications all that happens in this short span of time and our p99 for our API for the last three months has been 101 milliseconds which i think is pretty awesome and a year later like in retrospect this pattern actually worked really great recently we implemented web hooks well I didn't do it Steve did it and he did it in like two weeks because we had all of these events already passing through so he just spawned a consumer for the web hook and was able to just listen in on the events that came in yeah this is still a thing and I think we're okay with it we unit test our functions really well all of our service functions are tested in a way that we can guarantee that the event is sent off and all of our consumers of test to make sure we can consume every single message that's passed in the cool thing is our event bus freely isn't a bottleneck and if it does we can easily replace it a few slides back we have this handle demand function like if you want to rip it out and maybe put in Kafka instead we could just put it in there and then have handle demand pull in the messages from Kafka maybe in theory that was my talk and I have to thank I lost my notes thanks frame for sending me down here Chris I have to plug his podcast because Desmond isn't here to do it for him if you like listening to a British guy telling him he's wrong Sheila's to elixir talk Michael Zack and Steve for enduring all my practices and also to Brian Joseph and Nicky for having me down here thank you [Applause] Thanks okay questions there's a good talk thanks I really appreciate it oh so you're definitely not the first person that I've heard kind of shout this strategy from the rooftops of like a lot of the just the benefits of kind of far outweighed the cons for a lot of people but one of the things I've noticed has been troubled a trouble for me when I've tried to use it is the indirection between you know you know that this thing starts an event but you're looking at it and you're thinking okay what's gonna happen right like when I do this you know 30 things might happen how do I find all of those things and how do I know what they are and you know I've been told to trade off as worth the trade off is worth it but I'm curious if you if you've developed strategies now doing this for a year to make that easier to be able to draw those lines a little bit easier and sort of predict how your systems gonna behave I mean luckily we don't have like 30 consumers maybe there's like ten and the best thing I have is like I just grabbed for the event we have like a list of events that we like that all use that event macro and they're all in one file so we could at least see what events are fired and we could grab for that yeah that makes sense any other questions great talk couldn't tell that that that was your first time unless you told us what you did how do you guys handle like disaster recovery like I I'm still pretty new to lick sir but it seems like there's since this is all handled with processes you know if you lose the nodes like you might lose any kind of events that haven't been processed yeah so disasters aren't really something we handle really well in the past year we did implement like retries and failure logs but as I mentioned like one of our cons is that like you do have to worry about drop messages we don't really have a persistence layer for it yet but it's something that's in the works I was also gonna ask a question about persistence but since so that was just asked the I guess what I was gonna mention I mean that I guess one of the nice things about doing the consumers as processes is if you want to figure out like what's gonna happen you just look at what processes are running that are like under that consumer list so that's gonna need I do happen to know that josée who wrote gen stage is kind of working on a version of that that'll have a mechanism for hoping in persistence so hopefully they should get easy a lot easier soon is Paul here all right great you mentioned Kafka in passing count towards the end yeah it's a technology we're sort of maybe looking at where I work as well we're for now using certificate gen stage thing and in place of that can you walk me through sort of like what you guys have been thinking why you chose to not go with it initially while you might in the future that kind of thing so I have a secret and that it's I joined after the gen stage thing was made so I can't really tell you why and what decisions what thought processes went through but I can tell you that like it's worked fine and we don't have to worry about deploying Kafka maybe that's something cool any other questions look at sorry I think you may have said but how many consumers are you typically spinning up versus the broadcasters or publishers um I think anywhere between 10 and 12 I haven't checked recently I mean we just edit the web book 1 and we also don't have to work don't worry about it going once going twice ok Thank You Eli [Applause]

Info

Channel: The Big Elixir

Views: 929

Rating: 5 out of 5

Keywords: Elixir, Erlang, Functional Programming, GenStage, OTP

Id: ffhCUKI2_ho

Channel Id: undefined

Length: 24min 47sec (1487 seconds)

Published: Sat Dec 08 2018