Keynote: Rethinking Serverless with FLAME - Chris McCord | ElixirConf EU 2024

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] [Applause] hyped up my I don't feel like my title gives gives as much Justice to that intro but yeah the title is rethinking cous with flame uh something more inflammatory is like burning cous to the ground uh which is also my goal but let's talk about what that means um before I talk about that I got to plug where our work uh I work at flyo they support my Phoenix development I get to spend vast majority of my time just working directly on Phoenix building things like flame uh anyone in a fly IO user here raise your hands okay awesome a lot of people cool keeping me employed um yeah so if you're not familiar with fly it's a place to deploy basically anything that runs in a Docker container which is anything and it's by far the best place topl your Elixir application you can run a single command have your app running a Elixir cluster across the planet I'm currently running like a 33 node cluster on like multiple continents and it just works we'll see an example example of that it's not a 33 node cluster cuz the UI would be too messy it's like a nine nodes but it is still pretty cool so definitely check this out but uh we're here to talk about Auto scaling and how I consider it harmful or at least the way that developers think what what autos scale means um because the the issue of I've had with autosale is like going back to for me uh the Heroku days so if you've been around programming long enough Heroku was like this one of the first like platforms as a service where they kind of abstracted your web server so if you've done things long enough like uh I deployed my first uh PHP app in production in 2003 I had to call Rackspace up and like go on a sales call to like rack a server somewhere so like we we had the Rackspace days of like you call somebody and you speak to a salesperson and they charge you like an obscene amount for bandwidth and and that's how you deployed a server and then you started having companies come up and they made it really easy to deploy a server and they just gave you an instance somewhere and then you had companies pop up like Heroku that said you know what we're going to abstract that we're going to put a layer in between where you just give us your code and we'll run your code and you don't have to worry about the servers so Heroku came on the scene and the moment companies said like well we can take your code and we'll manage those pesky servers then this idea of Auto scaling came up because if the companies can take your code and put it on a server they can say well we'll just take your code and put it on four servers so Heroku let you have this like slider in the UI and you're like I want to scale up and now you're just you're just running more web servers like awesome autoscale and you can have like um you can pay for private spaces and they'll even do this for you instead of you having to slide things up and down so I think autoscaling for me is like kind of the word goes back to these days of like we can Auto scar our app because we'll just take your code and we we'll run a bunch of web servers or Heroku dinos as they were called or Heroku workers but like this has a serious problem so like we're we have our web server here we're running Phoenix if we're doing background jobs we're going to be using obon and let's say anything else our service is doing like FFM Peg work if we're doing video transcriptions we run that together so like the Heroku model of like okay these these video transcriptions are super expensive what do we do it's like oh I know what to do auto scale like awesome more web servers more web servers we just horizontally scale and this is like the this is what's beat been beaten into our head of like this is how we scale our applications but like this has obvious problems like the Heroku model of like just run more web servers just relies on the load balancer to like split requests and yes this will give us more capacity because those ffmpeg jobs are really expensive so we're using the load balancer to distribute these HTP requests but the problem with this is we're we're still putting our web requests our apis our live views our HTML Pages through like the critical path of like these fmeg jobs so I think a lot of us at least sense that this is this is not ideal especially coming into Elixir so we're like okay so where do we go from here right this Auto scale at the web server doesn't make sense so all a lot of us will reach for like well elixir is super great at vertical scale right we can scale horizontally when we needed but we can scale vertically much better than than most folks so I think if we get past this step a lot of us will end up with like okay I have my web server which I can scale horizontally maybe it's doing some standard openen uh background job work and then I can set up like a big beefy ffmpeg server that has a dedicated OB q and that can turn through my CPU intensive work and at least at this point I'm splitting those CPU intensive intensive jobs out of the critical path of my UI or my API so I think in an Ideal World we get here architecturally and this is a good place to be and I'm a happy openen user at fly we're actually open Pro users so definitely check open out and buy the pro license um but I think this has an issue in that when we want to autoscale horizontally those fmeg servers those big BP servers are expensive and we have a few options like either we can lean on something like kubernetes and have that spin up like a bunch of micros services for us or we can uh use a feature open has a dynamic scalers feature where it can like monitor a job queue and try to provision a server on different backends or we start just like splitting our app into microservices or ads Lambda so we haven't had a good answer in elixir on like how do we go from here which is at least more ideal than just like frantically scaling web servers to something that can Auto scale instead stay within the Elixir ecosystem and I think like what we really want is not Auto scale and what I like to think about in my head is something like a cinnamon it means the same thing but elastic scale right so I think we need to get away from this term of Auto scale because it's it's overloaded and think more about elastic scale and that like something that can grow and shink shrink back down and not only elastic scale but granular elastic scale I think that's what's missing from this ideal architecture is like how do we go from this ideal architecture of like just running our alixir code and how do we granular scale these operations on our app because starting more web servers is just like the top level of the app like we're not actually scaling the things that we need to scale we have all these operations in the app and if we have a th000 users on a server running 10 servers you can say oh can run I can run 10,000 users now it's like not really right like the the thing that we're trying to scale isn't the web traffic I mean if it is sure like if you can run a thousand connections on a server and you're trying to scale connections sure start 10 more web servers but usually the things in our app that we're trying to scale the bottom legs we hit are these individual granular pieces and that's what we want to think about when we think about Auto scale it's not just like scaling at the top level it's not just like throwing everything in a background job and trying to work through jobs it's like find the actual individual pieces on our app that are hard to scale and granularly scale those but this is a hard problem because what we really want to do is like write IU code and elixir like here's something that shells out to ffmpeg to generate thumbnails something very heavy CPU bound IO bound I can write it on my laptop and a few lines of code and it just works but wouldn't it be nice if like I could do something like this and not have to like think about elastic scale or manage those pesky servers or think about my infrastructure learn kubernetes split into microservices and there's an answer to that like right if we if we want if we don't want to have to manage those pesy servers like what could we call that oh we could call it serverless right and uh this is a a dumb name but um I don't want to to get away from it but I think in my mind ad was Lambda came on the scene branded these things as serous uh to solve a real problem and I know people deploy like entire apps into Lambda that not for elastic scale reasons and I think that's extremely silly but I do think it solved a real problem and it continues to solve a real problem of granular elastic scale like if I had a client during my Consulting days that came and said we're using Elixir but we want to encode videos I'd say like well you don't have very many good options to stay with an Elixir unless you're running a ton of orchestration code or unless you want to deploy microservices so do that in a Lambda right and that bothered me like I didn't have a good answer to this and like aws's tag line for Lambda is also weird like run code without thinking about servers or clusters um because like Heroku is serverless by this definition right like get or yeah get push Heroku was serverless I wasn't thinking about the server fly deploy is Ser serverless I'm not thinking about the server right it just works thinking about clusters like elixir just clusters together so like for me even the tagline for Lambda is silly but the promise and the thing it solved well was compute that's elastic you pay for only what you use it runs this granular piece of your code but it has all these caveats right but it it does solve this real problem uh and the issue I've seen with Lambda among many is like it balloons in this kind of like outrageous complexity of services uh so this was shared with me a screenshot of this was shared with me and I thought it was fake I was like this is a joke uh so this is a reference architecture for hosting WordPress on AWS right not Lambda but this the screenshot would share with me and like this is actually this actually exists that URL is real and I fact checked it and pulled I screenshotted this myself from that URL but but it wasn't just Lambda gives you the simple way to deploy code and run it elastically right it was like you buy any the ads ecosystem it becomes like this just Labyrinth of nonsense right um and like look Auto scalings in the middle here somewhere like I don't even know what those arrows mean I guess it's scaling ec2 instances I'm not sure but but yeah and this is for like a overpress site right so uh ridiculous but something more applicable though is another reference architecture that AWS published for video encoding so this is something directly applicable to kinds of problems I'm trying to solve in elixir with elastic scale and this is how ads recommends you do it right you don't have to think about those pesky servers or clusters right you just think about putting it on S3 S3 is great actually object storage makes sense but then that that thing is going to trigger a Lambda your L Lambda is going to spin up and it calls a step step function first I've never used those that Lambda is going to call another step function for some reason and surprise adbs lambdas have a 15-minute execution time so you can't actually encode videos there if they run for more than 15 minutes of encoding so you have to call into another proprietary Service uh Elemental media convert that thing runs you pay for that that does something with Cloud watch it goes with another step function it triggers Amazon SNS which is like Phenix pubsub but you pay for broadcast and then it goes into another uh Elemental media package I don't know what that does puts the result on S3 and surprise like you can't get the result of this thing back in your app so you have to put it in sqs you actually pay for that as well and then you go back to your app you write SQ as consumer that way you can say okay this video has thumbnails for me now and then cloud front's in there for probably a CDN right so so this is awesome like I'm not thinking about servers right I'm just thinking about like proprietary services that call proprietary services that call proprietary Services um so yeah so it's just been silly um but it does solve a real problem right like this does solve elastic compute for me it's just I'm paying at every step of this process right like the promise the promise of of Lambda was you only pay for these granular pieces and you only pay for what you use right it's cheap because the job comes up and Lambda runs you pay for that compute then it goes down so you save money right you don't have to you don't have to manage servers you're just managing like eight proprietary Services now and you're paying at every step like there's a middleman toll here like S3 you're paying for and then to read the thing off S3 you're not playing for the ESS there but you're paying for the Lambda spin up time you're paying for the media convert service you're paying for the sqs insert and these things add up like people are spending orders of magnitude more money on Lambda than just running something themselves so like how how can we make this better we can use flame so flame is my attempt to like just burn this serverless space to the ground and it's not just Lambda like Cloud flare uh workers it's like it's like Lambda but if you could only run JavaScript and if you could only run a proprietary version of JavaScript so it's like this whole Space is extremely silly uh but it does solve a real problem but like how can we how can we just remove the need for these these these Services well I came up with flame it's an acronym uh fleeting Lambda application for modular execution there's kind of like a phoenix theme there right but like this acronym is actually really important so I I I burned many of my uh chat GPT uh rate limit credits on like trying to like come up with something that encapsulated uh everything I wanted to convey here so if you don't remember this acronym that's fine because flame is a cool name but like there's actually a lot that we wanted to convey convey in this name so uh fleeing is important like the idea of flame is like we can run your app right we can run your app on an ephemeral server shortlived and it's your whole application right so what if we could treat your whole application as this idea of Lambda so we can co-op the term and just call into slices of it right so start your whole app it's your whole app you can use Pub up you can do database inserts right cuz it's running your whole application but we can just modularly call into slices of your code at any time right so that's that's the flame model because what we really want in a perfect world is to write our naive code and then we come to this code and we're like ah like how are we going to scale this elastically wouldn't it be nice if we could just like take that size of code and scale it elastically we could just wrap it in a flame call and that block of code now runs on a ephemeral server running a whole application it's got a auto scale for us and wouldn't that be nice so this is what flame does it allows you to come in and just wrap any code in your in your current app that code can do what it needs to do like shelling out to have FEG and then notice I'm doing a repo insert all and that's fine because again I'm starting my whole app right I can start my supervision tree as normal I probably can run a smaller database pool because I'm not serving much of web users but I can still call my repo I can still still do pups up broadcast and I'm using all of The Primitives of the language so it's not like this Nerf runtime of a Lambda it's our whole app with the beam so you may have noticed like okay Chris like this this code back here is kind of cheating right the the three lines of naive code uh it takes a video URL and ffmpeg will accept a URL and download the file and transcode it but I have to make that video URL public somewhere and you you could say I was cheating because adws had to put the file on S3 so I'm skipping the fact of like adus had to move the file close to a Lambda to pull it off and process it so I was glossing over that fact here that's true but no problem we're running on the beam so we could just send the file over distributed nerling because that's that's how file IO works right everything's processed so what I can do in flame if I need to get that file onto the server instead of having F imp download it I can just write two lines of code call file stream on the parent that's going to open a file that's actually a process I can say I want to stream that file in 2K chunks and I could do enm into on the flame call which is going to be another node and it's just going to stream that file over right two lines of code and now I have the file on the Node that needs to process it I didn't have to like put it on S3 wait for that just to read it off right and then delete it later right this is like two lines of code and I'm ready to go and this is amazing right and this is just the beam Primitives like I have everything available at my fingertips and you also may have noticed like we're closing over State here right there's a video struct in this interval and that just gets closed over in the closure and run on the other node I didn't have to think about that the beam just gives us that right so it's not like I have to rewrite my code in any way I'm writing a flame call and then worst case I maybe have to write two lines of code to get get the file there if I have if I have to move the file around and you'll notice like as you're using flame like time and time again like you have to change almost nothing so like everything that we're doing in Elixir and beam is typically process based so even if you have processes live views G servers that we're doing some code talking to the system and now you're like shoot I need to like run this somewhere else but how do I get the results back you just it turns out like you just run it somewhere else and since processes are location transparent you almost never have to change code it just sends the messages back and forth and we'll see examples of that and the moment you do need to do something like send a file it's a couple lines of code and again we have the whole Beam at our fingertips because it's running our whole app and if we see how this works internally like it's super simple so like the remarkable thing for me is like that no one's done this yet and I think like we're the only platform that can really do this effectively but it's like I'm there's very little heavy lifting that that that my code is doing in flame like it's a remarkably small library but what's going to happen is when you do a flame call we take your block of code and there's a fly back in built internally uh Jim showed that there's a flame kubernetes back end so flame can work on any infrastructure that has a programmable API that says please give me a server and run my code on it so if you have a infrastructure that does that and most of us do flame kubernetes could pretty much run anywhere because you can run kubernetes pretty much anywhere you could actually run kubernetes on fly people do that um but in any case there's a build-in fly back end but the goal is wherever your app is currently running whatever your current infrastructure is should be able to also launch flame it's not fly specific but obviously fly is going to be a great place to do it but what's going to happen is when we encounter this block of code flame is going to say do I have a current hot runner running this named like f of imp Runner if I do I'm just going to send it this work if I don't I'm going to post the same API this is the fly API here and I'm going to give it the flame parent node in the in environment variable say hey here's the here's the the IP and the name to connect to we're going to Boop a server into existence it can be a big beefy F server much bigger than our web server that's going to start a whole application Whole supervision tree and when that thing starts up there's a flame um process that says like ah I have this special environment variable I'm going to connect back to my parent that you told me to connect to and the parent going to say ah you're online fantastic I have work for you and we're going to send the function over distributed earling and it's going to be run and the amazing thing is like the amount of code this takes is like none right like The Hacker News thread on on a flame blog post I wrote like people were speculating like I'd love to see how they did like function marshalling and encoding like how they pulled that off and and like like literally like the code on the slide here like that's it I'm not kidding like node spawn monitor node function reference done like that's all that's the amount of code I had to write so you take this function it's Anonymous function it's closing over state right and that that video struct is going to change at runtime at any point distributor earling is like no problem I'm just going to send that over and run so like I'm not thinking about the state I'm closing over it just gets encapsulated and the function distributed orling takes care of it noes spawn monitor function like that's it that's all we had to do so that thing runs somewhere else good deal nope everything uh everything that happens in that function like repo insert is is awesome and then that completes and the flame Runner is going to send the result back to the caller that caller is blocking meanwhile and back at the call site we get the result of repo insert all just like if we had called it in line on the parent node and based on your configuration if the the ffmpeg runner has no more fmeg jobs it just idles down just calls system stop and then your infrastructure stops IT and cleans it up and you only pay for what you use right you're only paying for this elastic compute when it's there's no work done or no work left it shuts down and now you save your monies right so I'm stealing this quote from Jose and uh when I was kind of showing him flame and we were discussing the architecture of it he said you know this is a great EX example of solving a problem versus removing the problem and I I love this quote so much because like AWS Lambda Cloud for workers like they solve a problem they help you solve problems like all those proprietary services like it's a demonstrable fact like irrefutable they solve a problem right if I want if I want to do elastic video um encoding they help me solve that problem they just help me solve it with like half a dozen proprietary Services right but flame just like removes a problem right like I'm writing my code and I'm like oh shoot I need to elastic scale this flame call and then like I'm not thinking about these elastic Services I'm not thinking about what back at the call site that was blocking how do I get the result back I have to put it in some message cue and pay for that I'm not thinking about like oh like I need to put the result back in the database so like I have to pay for sqs all that goes away I'm not thinking about the development testing story right like what if I am using this adbs lamina pipeline like how do I test that locally develop it locally do I run like a simulated version of AWS do I disc connect up to ABS directly and use develop against it test against it put my CI against it like if you talk to most people like they just don't test it um in my experience like you just test it in production right uh so it's like no like what if we just remove the problem like it's not even like what if we like abstracted it or what if we hid the problem no it's like what if we just did none of that right and that's what the beam and Flame allows us to do because like this is what six proprietary Services there's like 11 steps but six proprietary services to solve a problem where flame comes in and just says like no you just wrap your current code in in two lines of of code and and it just goes away right and the only other thing you need to do is configure the flame pool so two lines of code and something in your supervision tree and your flame pool is just just a process right and it sits there you give it a named Runner and here's where I can say like I can configure essentially how much money I want to pay right I can say like I have these fmeg jobs I need to do if I want scale to zero Behavior I can give the pool uh zero size so like none are going to run I can run up to some Maximum right so if I want to just make sure I don't have unbounded um growth or I don't know if AWS gives you this feature actually I think it's just like you can set like spinning alerts and prey but here it's like I can yeah I mean literally pray so here I can say like I never want to run more than 20 uh F EG um nodes right so I have this elastic pool and then if I and I can set a Max concurrency kind of like task casing extreme so I can at any point Run 10 concurrent F of imp subscriptions per elastically scaled server and if I hit that the pool is just going to timeout callers right and then I can tune this to my needs I can set these things to stay hot so Idol shutdown after is like if you once you start up hang out there for 30 seconds and then if you have no work um kill yourself um it's actually yeah a sidebar like talking with like uh at fly we use Elixir but we we use a lot of go and rust so like talking about like supervis your children like killing your children like it's like if people in fly like hear about me talking about beam patterns it's always like very very weird very brutal right like we're like oh no you like you you have that child like you need to like brutal kill it not just kill it anyway um yeah but this is all we have to think about is like what do we want our uh our scale to be do we want scale to zero behavior um and if we don't want scale to zero Behavior maybe we never want a cold a cold start we can say I want to always have men one running so I could say that you know I I want to pay for these I want to pay for an FFM server 100% of the time because I always want one available that way I never have to endure a cold start um but I could also have a a warm start saying I always want at app uh application startup time it's going to block the supervision tree provision one and then I can even do a Min Idol shutdown so like I can always cold deploy have a pool hot and ready to service work that way users on a cold depl aren't going to immediately hit a cold start and I can still shut that down if there was no work so it gives you a lot of knobs to turn there and this idea of a cold start is kind of like a contentious topic on fly we can take your existing Elixir app and have it up and ready for work in 3 seconds and sometimes it's like it's like 3 to 5 seconds but super fast in my opinion like outrageously fast like we can go from no provision server to provision server running your whole app in in 3 seconds but The Hacker News comments will say like my a ads lambas can run jobs in like milliseconds of time right um which is actually kind of a lie like if you actually look at the ABS docs like once a Lambda is hot it can run jobs or work really quickly which is the same for for Elixir right but if you actually read the fine print uh ads lambas have a cold start and that cold start can vary from 100 m under 100 Mill seconds to over a second right but ad us is it's careful to say that like under production workloads you're you're very uh you will very infrequently hit cold starts right and Flame pool is the same like under production workloads the pools are going to be hot but if you're running a flame pool you will infrequently hit a cold start and that cold start is going to be on the order of you know a few seconds 3 to 5 Seconds much like uh 8B Lambda but there's also Nuance here because the pool that I'm running is running concurrent jobs right Lambda is like a onetoone process your FEG Runner is running concurrent operations right it's not like I'm spinning up the whole app just to run one thing you could do that with maximum currency one but often you want to provision a big server and get the most out of that Hardware so it's like I'm going to spin one of these things up and it's not like okay another user comes in start another server wait 3 seconds it's service concurrent operations and then if I hit my pool under high growth I could have users incur a cold start uh cuz currently the pool is um somewhat naive like we wait until the pool's at capacity to provision a new server so there are options for us to like look at arrival rate and pre-provision but right now you could infrequently hit cold starts and that happens it's going to be a few seconds at a time and you're ready to go and we'll see examples of cold starts actually in a demo coming up but I'm just going to say that like abos AWS also has cold starts AWS keeps your lambas hot and we keep your your look or nodes hot and they're ready for work in milliseconds but like even faster than that that right like once the no's up and running we just send it a function to execute and we're not even advertising oh we can run your your workloads in milliseconds no it's like it just runs like you're not even thinking about it it's like the speed of latency of the network which is going to be like sub millisecond in in often cases and but we can also other interesting things so Lambda is stateless and that works well for them right stat like Lambda is like this idea of a function like you have input it does some work and produces a result but on beam and elixir like we have stateful systems so we also have this idea of like I have these processes in my application that might be doing expensive work and we'll see an example of this in a moment so in my app I wrote that naive thumbnail code that you saw but then I wanted to hook it up to a live view and I wanted it to not just process a file file I wanted it to actually take a live view upload and as the bits were coming over the wire generate thumbnails as the file was being uploaded which is actually pretty pretty interesting demo it's like something that's would be very difficult to do in most languages it looks or makes it trivial but I wanted to do this uh staple thing where like a live view starts up it's sending messages of a of a file upload from the client into ffmpeg ffmpeg is sending standard out back and we're sending we're looking for the PNG that are generated and then sending them back to a live view so we need to process in between to manage that F imp shell receive messages receive standard out and that thing is a super expensive process right that's something that can bottle neck the system it's doing CPU bound work so it's something we would typically put in like a task super or a supervisor um Dynamic supervisor like start child right we want to put that in our system and supervise it it can monitor the live you shut itself down but how do we dynamically and elastically scale processes like that well flame has this idea of of Place child right it's like process placement it's just like please run this child spec but just find somewhere to run and run that thing with the Maxum currency limits just like a flame call and we'll see an example of this but in Elixir I think we have this Unique stateful Systems that we build and we also want to elastically scale those so I think the flame pattern is applicable to other ecosystems but Place child is something near and dear to uh just what we can do all right so let's see some actual cool stuff here uh so we'll see if the demog goods are are in my favor so uh as long as the internet works I think we'll be will be solid uh with a room full of people uh but the first thing I want to show off is a thumbnail generator it's right here okay so the page loaded that's good all right so this is running on fly and it's going to take the user upload we'll see code in a moment but it's doing a live view upload writer which allows you to not just put a file on a temporary file on dis it actually allows you up access to the upload chunks as they come over the wire so we're going to those chunks we're going to send them to a process that runs a FM Peg that process is going to look for pgs coming out of standard out and send them back to the live view to show on the page while the file is being uploaded nothing ever touches dis actually which is pretty amazing but in this case we want to make it elastically scale so we use flame Place child we change one line of code and I have this configured to scale to zero to to show a cold start right so I want to dispel the myth that cold starts are an issue because that's the first thing that people on Hacker News talk about so in this case I could have I could have provisioned a Min one in my pool always have one ready and I could have masked and handwaved this uh or I could have under production workloads users will infrequently hit a cold start it's like no let's see a cold start right so what happens is I'm going to choose a file and that upload riter is going to start up it's going to hit the flame Place child and so the the progress bar is going to show and the progress bar between the time the progress bar shows up and the progress starts is going to be my cold start so we can actually see what this is like as a user so I'm going to upload a puppy video here okay 2 1,000 3 1,000 4 1,000 5 1,000 whoops oh six okay 6 seconds little bit slow over a cold start but still good right so that provision to server it started my whole application it connected back to the parent here it started the F imp process now that thing's up there and running it's sending messages back to the live view I changed no other code than one line and it generated thumbnails for me of a puppy which is super cute and now that thing is actually hot and up and running right actually I think I may have said set it for 10c idle down we'll see if we catch it no so it's going to do another cold start perfect so we'll see how how long it takes starting up oh zero thumbnails so that wasn't a flame problem that was a I think a live view race let try again okay we catch we caught that hot it's up and running so it's like we can configure these things hot or not and I'm changing one line of code so if we go to our actual code to see this this is the live VI upload writer for this thumbnail app it's a behavior it lets you hook into the chunks that are coming over the wire so you write a few lines of code and you're like I want those chunks so what we do on in it so when the user drops a file in the app and we start the upload we call this function and open is where the like magic happens in this process of the app right when I wrote this demo it wasn't using flame it was using a dynamic supervisor so I had in my app this child spec and I was starting this Dynamic supervisor which is essentially it's just a a g server starts up we pipe the ffmpeg and we manage IO into that uh into that shell so the live view is sending chunks over so this right chunk uh function is going to send a process to or send a message to this process that's going to send as standard input the user's chunk and then what comes out of standard out is a bunch of messages from that shell and that is looking for pngs so let's see like I'm I'm pattern matching on yeah so what comes out of standard out is like f ineg is it doesn't give you uh we're not writing those files to disk it would just Spam standard out with all the pgs so we look for a PNG uh Preamble um with binary pattern matching and we're like oh we have a a new PNG here and as soon as we get the PNG Preamble we know that everything previously was the previous thumbnail so we kind of like accumulate those and then we send a message to the caller which is the the live VI process and then the live VI gets a message and it's like here's a PNG it just does like a a boring assign and the the thumbnail shows up on the web page um so pretty cool but now it's like awesome we have this cool demo how do we elastically scale it so we go back to our thumbnail open and we're like okay flame Place child that same child spec and I carry on with life and now this thing starts up on a new node and runs there and all these process like this gen server is running I had to change nothing else because it's just doing message sending back to the caller right and that caller happens to be a live view and that live view happens to be a process process and that process is location transparent so like everything works right I'm not thinking about where things live because like the beam is made to be location transparent like when you hear like location transparency it's like this really drives at home that like these things can live anywhere and like you don't really appreciate it till like you run it somewhere else and you're like what like that that just works and you're like oh yeah of course it's like it's a process it's a message send so there's our thumbnail generator I didn't have to put the file on S3 I didn't have to put the file on sqs I didn't have to like like imagine how you would WR how you would have written this with AWS right I don't I don't know how like I don't you can't open a socket to a Lambda to my knowledge so I don't even think this would be possible Right you'd have to wait for the upload to complete and you'd have to put it on S3 that triggers a Lambda that lamb it then starts up and then at that point maybe Amazon SNS can like send a message to a browser directly I'm not sure but like eventually it can call back into your app you can use Phoenix Pub sub to tell a live you that it has all the thumbnails and like they would all show up at the end but you're persisting all these things and like what if the user is like not even actually saving the file yet right like what if they haven't committed to uploading this file yet we're we're still staging this information then you have to go clean up all the stuff on S3 so it's like no we can just do this do this in line so there's our puppy and then another example I have is on live beats which you may have seen this is like a social um music application listen to music with your friends it's kind of like a showcase for live view so I can upload music here and like it uses Phoenix presence but in this case I'm using uh NX with Bumblebee and Whisper and here's where like a cold start really matters so I want to show you a very cold start so we'll talk about ml workloads in a bit but I think flame could work really well for elastic ml workloads but we have to really consider cold starts in this case because there's a lot of time spent often just moving these large models around like you just have to get them on to the machine that starts they can be like tens of gigabytes um and then loading those models um compiling them loading them into memory can take like tens of seconds so in this case usually it's about 30 seconds for me to load up whisper uh with bumblebee so I'm going to upload a song we'll see how it works on conference Wi-Fi this is actually like solid okay yeah as it uploads being uploaded here live view is actually like parsing This MP3 so you notice the artist just came out the the attribution showed up um pretty cool didn't have to touch dis there um and now I hit save I'm going to in the naive approach I was just running a uh async task after I inserted the the video to call into Bumblebee and send messages back to the live view but to change that from something that didn't scale to something that elastically scaled again we'll see how I change one line so when I save this it's actually going to spin up a flame and it's going to run that bumblebee code maybe there we go okay so this oh we got some users here cool so this is going to take like 30 seconds here we'll see I like this song too given so we'll see what 15 seconds now so this is like actually a true cold start and this is like ml workloads like you have to if you ever deployed anything ml like you'll understand like you have to plan for you can't just like spin these things up and and run something instantly so we're waiting that thing is compiling the model right now loading it we hit 30 seconds okay and now we're getting transcriptions right and it's working that's running in a flame so if I delete this file now I'll be able to catch this thing hot and again I could have I could have said like Min one and always had one running right so it's like not like you always would pay that I just want to demonstrate like if you want scale to zero Behavior you have to consider cold starts right and for a lot of workloads like that thumbnail generator 3 seconds waiting on a upload that's going to take a minute or 10 minutes doesn't matter right but something that's going to take 30 seconds to upload does matter so if we catch this thing hot now that runnner hasn't idled itself down yet and we'll see how fast it takes to get [Music] transcriptions okay so it should be actually like faster than real time of the song at this point cuz that thing was running NX and Bumblebee and just ready to go and it's hot so if you were tooing this into production you would probably have not men zero or you would have a warm one start up and then maybe if no one if your app didn't have real users you could scale it down to zero but again cold starts for ML workloads are going to be highly considered yeah presence is working actually I haven't used live beats with other people in like like a year so this is actually fun like your code works like look at this it's cool um yeah so that's running NX and bubblebee so let's see an example of what that code looks like Okay so let's go into live beats so we can see I'm configuring my flame back end here with a different um amount of or different CPU and memory so these things can run it's not like just like running your same server you can spin spin up beefy Hardware but let's go see what I'm doing here I think it's in the media library so when the file is inserted uh after we after we actually uh consume the file we're going to call this async transcribe and the way I let's see the way I wrote this initially was a just task supervisor start child right and now you're like shoot like so you're this is the first time I actually played with bumblebee when I wrote this and it was like this is amazing like machine learning and elixir um but then it's like okay how do we scale this because if we go into our audio module like this is doing expensive work right we're calling into F of Impe which is kind of like the UN kind of like the unsung hero of this this uh talk but right we're shelling out to fpeg that's going to do um conversion of the audio file to wave format specifically to what the model is expecting and then we call NX serving uh batch run and this is like two like heavily CPU intensive operations right like or GPU depending on what we're running right so like we we need to scale this like what do we do like this is this is difficult right and and when this thing is done running like it sends messages back to the live view so like how would we how would we put this on Lambda I guess I guess we could pay for proprietary API and that could be one way to get around it but it's like no I can I just want to run this myself right what we can do is we can instead of Tas supervisor start child I do a flame cast flame cast is like Flame call except it's async and that thing is going to call my FM stuff uh look it's doing a pups up broadcast like oh shoot okay this is where I need to hook into an external service no like this is running your whole app right like the a in flame hasn't has specific meaning right I'm doing my pups sub broadcast and that just works because it's running distributed earling it's running your whole app uh you can start your pubsub process in your supervision tree right oh it's doing a repo update all like no problem you can do database inserts inside flames it's running your whole app right but we're only running this slice of code here and we're running that elastically so I changed one line of code and now this thing is is running and it's still doing work it's still sending and streaming those results back to the live view which is on another node I don't care about that I didn't have to change any of my app to consider the fact that this is running somewhere else again messages just arrive and we we do like a sign or in this case I'm using streams stream couple lines of code now the UI is getting realtime updates but it's getting realtime updates from a node dynamically provisioned somewhere else on our infrastructure the messages just ared like seriously like it's like it's ridiculous I don't like I don't understand how like how we have how the beam works and how we've had like this these kind of features for so long and there's another yeah see people are still on this that's fun uh there's another demo I want to show which is a world page speed application um I haven't tweeted about this it's first time you you're seeing this um so this is another example I put together like you know like Google Google has their page speed test where you can see how fast your page loads and you get the score you get a score but often like a lot most of us like we're deployed In Like Us East one and like our website's super fast if we live in the US right but then it's slow elsewhere or we have a CDN to deal with the slowness of our assets but like you know like what what does your page speed like look like as seen from users around around the planet so I wrote an app to to do that like it's it's like ridiculously simple but let's see what it looks like and again I have scale to zero Behavior here um just to show that like like like I said I'm not masking cold starts so what we see is like when I hit go here it's going to start headless Chrome which is expensive right something that's going to be hard to elastically scale uses a lot of memory we're loading a website uh it's going to boo headless CH it's going to load the web page because to know the page speed you actually have to run a web browser you need to download the JavaScript you need to evaluate it you need to download all the images and we're essentially waiting for the window load event to report back the actual page load time as see seen from users around the world so we need to run a flame in different continents around the world running headless Chrome and update the live view of that result uh so when I hit go here you will see all the statuses of the uh the metrics will say uh starting browser and the starting browser is actually the cold start time and then when it goes from starting browser to loading page is when uh it's actually started the app and it's running your code so let's see what's load Hacker News here okay apparently auto complete didn't work okay oh ah who who was on this get off this who okay so you ruined my demo no like like I'm I really wanted to show cold starts there so now we have to wait 30 seconds um okay so we hit this hot because we had active users right this app is like I'm going to make so much money I have users um okay so yeah this is this is still calling those Flames that were started up around the world um stop using the app because I do want to show a cold start um I guess I guess I can just like do a app restart but this is actually hold on I'm going to try it again now the problem is every time I hit these Runners then they're going to bump their uh they're going to bump their timeouts um and if we load uh World page speed we if we load this app we can actually I can just like brag about how fast this app is so this app is running in all the regions that I'm running the Headless Chrome in so it's going to be super fast so I'm going to wait a few seconds here as long as people aren't running this in the background stop okay so no but the cool thing is okay so let's see how fast like uh World page speed itself so like uh in Illinois right us is actually like one of the slower ones on here 160 milliseconds and if you look at like Tokyo Japan 58 milliseconds amazing so we're running a an Elixir node for this live view web app in all of the um countries that you see here and that's why it's fast in all those regions right cuz we didn't have to like go across the ocean to load the web page so 58 milliseconds in Tokyo because we have a a Elixir node running in Tokyo they're all clustered together so I'll come back to this and we'll try after we see the code to show a cold start no people are still using it stop all right so let's see the code for this I mean it really it would have just said starting browser but it was just cool to see like 5 seconds later it what it did so what that's doing in flame is like it's booting your whole application it's starting headless Chrome Chrome runs it's sending a message back to the live view at each step of the process telling you telling you like okay we're starting the web page as the results come in it's sending a message so if I load that live view up what we're doing in this live view when you hit handle event go is we do a start async just async process and like again this is like you have the whole beam like this is just this whole this whole demo like this writing like seriously like I've been doing this for 10 years and like every time I write anything interesting with with beam you're just like it it defies reason on how how ridiculous it is but let's see this right like so what I wanted to do in this example is like I have to run a browser in all those places to know how fast those web pages are so any live view that you land on wherever you are in the world I need to like go do all those uh browser timing things so what I do is I get a bunch of nodes like members list is using uh Phoenix tracker internally but I'm basically just saying give me all the nodes on the cluster and then I start async in live view and I'm just using erpc multicol it's just buil into earling it's just doing an RPC call you give it a list of nodes and it's going to run the function on all those nodes it's going to block till they all return a result and within that function in live view I'm sending uh the parent pit back right because our our parent live view process is going to be needing to get these results back from all these other nodes that were discovered on the cluster right multic call calls this this function runs on another node so if I happen to time nav before flame I just had this code here I have this browser module it has a Time navigation function it just runs headless Chrome underneath and it gets a result back of timing and that timing just has a bunch of performance metrics when it gets the result it sends the parent a message which that parent is a live upid and we're like either we're complete here's a metrics other there was an error and this is happening on another another node right we did erpc multic call it happened on another node the fact that it happened on another node doesn't matter because the parents a process and processes are location transparent so it makes it to to the live view back in the country where you started and it just updates um so you're like wow this is awesome erpc mtic call but you're like how would we elastically scale this right we can't be running headless Chrome for every user that visits our web app right like all of our uis uh all of our live views like we want to keep that expensive thing out of the critical path of our app well I just wrap this thing that's running already on another node in a flame call which is going to put it on another ephemeral node we're going to pass the we're going to close over the parent here right and that parent's a process so like we're already running on another node and now we're like no I want you on another node and the parent gets closed over that's just a process this location transparent it doesn't matter when that flame does its work it sends the parent a message and we didn't have to go back to the multi call node we just go back directly to the live view process and we have to change no other code so what I'm trying to say is I wrap it in a flame call and like it just worked but it's like a flame call within a multi call and it's ridiculous and let's see if we can get a whole a cold start now people no one no one load this don't do it starting browser awesome so what's happening here like we're provisioning a machine we're running your whole app there we're starting the app loading page loading page loading page it booted headless Chrome it's running the web page we're sending messages back as this is happening and this just did this across the world like what like really like yeah so so this is what flame enables for us right and and this cold start example was like I said 3 to 5 seconds and that's 3 to 5 Seconds to boot your whole app with chrome inside of it like Chrome is not like you know it's like we're familiar with slack like hundreds of megabytes right so like this Docker image of my app has Chrome headless Chrome included and like no problem we'll just boot it up and run it there right like just ridiculous seriously like in like in the idea of like like distributed earling not working on um like internet scale like see I've done this enough times that like no it just works like uh no problem like if the if the fiber was cut between us and uh sa Paulo like that wouldn't show up but the rest of the app will continue to work so again flame lets you just wrap things and not think about it and there's all kinds of interesting things you can do and not pay AWS for that right I wasn't thinking about like imagine how you'd have done this with AWS again how would I have gotten the results back to the UI and time probably probably would have paid several proprietary services and it would have been much slower but the goal of flame is like to be unremarkable um which is funny like flame is almost like it's remarkable and how unremarkable it is so like like you change a couple lines of code so like when people like have tweeted like their flame code you're like like even for me you're like what like you know you're excited to see it but you're like oh like like to show off your flame examples is like a flame call you're like oh okay like like it's so like you're like that's it like okay um I guess that's it um but that's the goal like uh elastic scale should be a boring decision on beam right with flame we can say like it should be as boring as do I want concurrency right so when you're in your code and you're like writing some code and you're like oh this you know this this thing could be run concurrently what do you do you reach for task like it should be as a boring decision as that right I want concurrency I'm going to reach for task and I'm thinking like okay I actually want to make sure this thing works I want durability I reached for openin and now you have a third boring question ask yourself oh does this need a loic scale okay wrap it in a flame call and it should be no different than this decision process of a task right and then you just carry on with life shipping features like it should be boring and it is boring because you're changing almost no code to do it and you can combine these aspects right like sometimes we want concurrency sometimes we want durability sometimes we want elastic scale right but often times you might say like well I want to make sure those thumbnails are generated right a user uploaded a file if that fails for whatever reason I want to retry it so you combine that with openin right so you're decoupling your durability from your elastic scale so we can have an openen job fire up and it's like okay I need to encode this video If this fails it's going to be retried and openen has uh features to dynamically scale your app and they also allow you to uh hook into you can manually call and hook into pool growth right so the only missing piece here is you would need to have Oben reach into our pool growth callbacks so as flame is dynamically scaling up and down Oben Q concurrency is going to also want to scale Dynamic dynamically up and down to either send more jobs when you have more capacity or or reduce jobs but you can get both cases here what you want I want durability and I want elastic scale or if I'm running headless Chrome in real time to get performance reports I don't need to put that in my postest database to be pulled off in a job to run because I just want to report back to the UI was I able to access that website right if I was not then I just report to the UI error right so sometimes you want durability sometimes you don't and you're going to reach for the same tools like I said just like task or open insert or flame call you can combine them or you can use them separately it depends on the guarantees that you want and I think like a missing part of this model is Edge Computing and I think Edge Computing is like also an overloaded term and like somehow Edge Computing has turned into like running JavaScript on the edge I don't know why honestly but um like it's weird yeah this is a whole another talk but like it's weird to me that like running on the edge like Cloud fur things or lamba at the edge like AWS has a service for running on the edge and you can only run lambdas there um which is weird um why not just let you run anything but but it's weird because like if you can already run all these things at the edge like the idea is like well I can run close to the user so my JavaScript single page app can like talk to my JavaScript on the server so it's like well if if you're already running on on the edge why not you just run your whole UI on the edge and you're close to the user but anyway that's another talk but the edge is like I'm not trying to discount it like it makes sense to run close to a user and I think all of us like intuitively understand like the CDN model like intuitively makes sense right like and it makes sense because like the speed of light is a thing so like running close to the user is a necessity because the speed of light is a reality so when I want to serve my assets close to a user like my javascripts or my stylesheets or my images I put those things on a CDN at the edge close to each user so that way web pages load faster and they're not waiting for the speed of light and I think all of us understand this right but the same thing applies to elastic compute and I think not of us not all of us think about that right so the idea is like when I'm running these Flames like it's it's even more important than like the CDN idea right if I have a user upload a file in in Europe and I spin up a Lambda in in the US right like I have to get that over there right so do I put it on S3 right and then that's going to take time to hop over the ocean I'm paying egress fees potentially if I have a server running in Europe and then I need to put it on my bucket which is located in us east1 that's a huge eag fee right like this this Matters from both speed of light standpoint and cost so what you really want is you want to take the CDN idea you want to run your server there on the edge and then you want to put the requestor of resources as close as possible to uh the client right so and the user in user state they're requesting some JavaScript you want to put that thing that JavaScript as close as possible to them when a Elixir application says I want elastic compute you want to put that elastic compute as close as possible to The Elixir node that ask for it right so just like in our uh web page speed example if I had my server in Tokyo say like oh shoot I want to do this elastically let's start a Lambda and the Lambda and US E1 was like I got you fam like that would not be doing what we want it to do right you have to run the compute in Tokyo to have that make sense but it but it but what you really want is sending messages back and forth you want to be as fast as possible sending data back and forth like that file stream right like this line here makes a big difference if I'm in Europe calling into a node in Us East one right what I really want is when I ask for a flame call that the node is provisioned as close as possible to the parent on fly that would be best case scenario the same physical rack server as the node that your app's running on so if you ask for a new fly uh machine it will know that if it knows it's running a Docker image cache locally it will pick the host by default so like if there's capacity on the server that's running your current APP and you're requesting a machine of the same capacity like CPU and memory it will just run it there so like you're literally running on the same rck server best case and worst case you're running in the same region was essentially the same data center right so by default like it's just going to do what you you want right this is the whole thing of like solving a problem versus removing the problem like when you ask for compute it's going to run the compute as close as possible to The Elixir node that asks for it that's just how it works that's just how it should work so if you're using kubernetes or whatever ideally you have a similar setup and again because like this matters right the message sending between nodes matters speed of light's a thing especially when you start moving assets around and we' seen some use cases so I think like anything that is heavy uh IO CPU file media bound like is obvious like FNP like say is like the unsung hero here um there's we saw Chrome driver which I think is super interesting like I I kind of want to make like my own um AI agent that just like unleash it on the internet like you know go go crawl around and and do stuff um but I think like if you were building like in all seriousness like a a Google bot today like you would use flame to do that right you would start a headless Chrome drive it from Elixir go crawl the internet and do do it like you're building a search index you're doing AI doing whatever you need that would be super cool and you could reach for flame for that the boring business decision that I think most of us have done is like we need to do PDF generation so this would be perfect for that like as a consultant like uh you know users are asked for like I want to be emailed a PDF uh monthly of my sales report and you're like how do we generate a PDF like we could we could reach for a PDF library and like handb build the PDF or we could just do you know how do we make that look nice so we're like oh browser it's easy to code a web page and make that look nice wouldn't it be nice just to like take a web page convert it to a p PDF so at least in the past I've done this like a dozen times where you have uh a web page that renders the PD that has the PDF document just rendered with HTML CSS you ask the browser to convert that to PDF for you and then you email that to the user so this would be a perfect example for that you write it naively use headless Chrome and wrap it in a flame call and now it scales and then data pipeline ETL is another thing where you you have these like Spike uh bursty workloads come in and you need to um do a bunch of work on some data and then go away and then ml here I think it's actually a really interesting use case for flame but we kind of saw that like cold starts really matter um if you've done any serious work with these big models like I think folks will intuitively intuitively know this but like just getting these things started and running is just like work like you you're moving like tens of gigabytes around just to like get the machine running uh so it's like just to even start you need to get the model there and that's got to be sent over the network once it's there you have to load it so you're talking like tens of seconds time so I think this is really interesting but you have to consider cold starts so most likely you're not you're not doing skill to zero Behavior if you're deploying ml uh today in production unless you don't care about a 30 second latency and maybe your background job doesn't mind that but what you're really going to do if you're actually deploying a model for real is you're probably going to want to just run the minimum amount possible keep one hot and you could still dynamically elastically scale as needed using flame and then again you can run different kinds of Hardware of your same application right so it's taking your your current Docker image here and then on on fly I could say actually want to run this with the Nvidia a100 so I'm running my app on GPU as needed in flame and then my regular app is just running on you know the ple CPU whatever I need so this is a super interesting aspect but again you have to really consider C called deploys um you're going to want a minimum of one probably and then your idol shutdown is probably like you're spending so much time starting these things and just getting them running on the GPU you don't want to throw them away right so your idle shutdown is likely going to be longer and you're probably running a pool of these at all times and just handling the like the bursty workloads as they come in that's all I have so definitely check it out I want to hear what folks are building with flame uh the backends like I said we have a flame kubernetes back end Jim published a package maybe for okay so so so Jim wants to do a gcp backend not using cetes that just does um just just hits the API directly so like literally the fly backend is like 150 lines of code the fly kubernetes backend is less than 200 lines of code so like flame could be work can work on pretty much any infrastructure so if you're interested in having it work on whatever infrastructure you have that's not using kubernetes or or fly come talk to me CU like literally the code is like you you add recck to your project and you do like a wck post and like it's an absurdly small amount of code because the beam just does everything um so yeah that's all I have thanks a lot questions no questions thank you Chris hello hello there we go okay uh let's at we got two questions here I think that are probably worth answering and then we got a break uh are flame calls testable what's the testing approach if flame is I was supposed to mention this um I thought I did but so yeah one of the key things of like Lambda I mentioned testing I didn't directly address it though yeah how do you test this right like one of the solving a problem versus remove them problem is like you just you just test it right so the flame backend for uh Dev and test by default is the flame local backend and guess what the flame local backend does it just runs your function so it's like we H we have a runtime locally we have a compute resource already running our app we just so we just run your code so the story is like the toad that you were testing riding it naively when you wrap it in a flame call it just runs on your laptop and you just run your tests so yeah testing is just write your test like you normally would and you don't have to worry about those pesky servers so so yeah um that's the answer and then the next question is how do you recover from a crash in the flame process yeah uh you don't I mean if you recover in the same way that you would from like if I had that thing in a task and you ask how would I recover from a crash in that task askk and it's like well you are you monitoring that task are you going to restart it is like so yeah this is a long answer but like in the same in the same way you would handle a crash in any other process you started right if you don't want that thing to crash uh you could trap exits that's very rarely what you want to do but it makes no difference here right it's just like where that code is running is really the only difference but like even you could even monit you could remotely monitor whatever is running inside the flame and if that thing went down it's going to be the same strategy of how you would handle the failure uh wherever you're running it so so that's the answer okay all right thank you very much let's give him big hand [Applause] [Music] [Applause] n [Music]

Info

Channel: Code Sync

Views: 4,961

Rating: undefined out of 5

Keywords: elixir language, functional programming, elixirconf, Chris McCord

Id: GICJ42OyBGg

Channel Id: undefined

Length: 64min 3sec (3843 seconds)

Published: Tue Apr 30 2024