Stateful Serverless Applications - Guillermo Rauch @ PrismaDay 2019

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] so I have a lot to talk about today because this server list thing is really really big and it's also really really new and a lot of people mean different things when they double it serverless so we're gonna have a great time today so the talk is therefore divided into three parts and I'm introducing an innovation I'm a color coding this presentation each part gets a color so there's yellow pink and green so part one is going to be very congruent to the point what the heck is serve less again and I mean again because there's been several in several attempts to define it and I feel like the industry as a whole is kind of not agreeing about what they mean when they talk about server less so part two is about how state actually works in serverless because the title the presentations is sustainable and you can already anticipate that someone's gonna be like no but several is stateless yeah you're right and you're also not right so we're gonna talk about that later in part three is how to make this whole thing work so how do you use the state how do you leverage the statelessness but how do you also solve for sinfulness which is I mean a lot of applications in states and they need to be stateful so let's start with part one and what this serverless thing is again so I referring to this universal wisdom that opinions are like butts everyone has one probably someone said this in the past I'll attributed to later Socrates and I'm gonna take that I'm gonna take you to the next level the serverless level that I think that the finishes of server lists are like butts every platform has one and that's me who by the way I operate a platform so I have a lot to say about this so what's our definition of serverless I are the visual service is pretty uncontroversial so way back in the day a lot of you in this room used to just deploy applications by just writing a bunch of HTML files CSS files and some dynamic magic files like PHP files or Perl files yes you were doing that and with that simple sort of heuristic we can imagine what our Minimum Viable serverless app would look like so in my opinion a the most minimum serverless app would have a static file and this is special service function or lambda file so static file is pretty uncontroversial we all know them this magic file is one that when a request comes in instead of being served directly from the cache at the edge of a CDN for example it gets executed on the Manns it's a function I'll and the function typically called so for those of you that are familiar with the gem stack you can think of this as the am stack because there's no need for JavaScript to be inside the HTML file but we can also call it a gem stack as well so functions can be written in any programming language in this serverless ecosystem in service world the only constraint is that they have to run to completion so when the request comes in the code runs it needs to do stuff and then it has to complete it doesn't keep running later on so that's a critical constraint and typically this release function looks like this just the most fundamental primitive that you could find in larger and larger frameworks so if you've been writing Express or things like that you always end up with something that is just a request Handler and this is what a service function is so this file system that I spoke about in our view of the world needs to be globally distributed so to take full advantage of the model and this is something that we didn't get to do back in the day with PHP is we can put all those static files and all those lambda files all over the world and there are some caveats to the all over the world for functions but we'll talk about them later so another really nice thing that comes as a consequence of this model this is not a definition of the model but as a consequence there is no restart button than anything there is no server supervision there is no mod there's not a lot of monitoring that you would do systems to upgrade security patches to issue to the layers of infrastructure and so on in contrary to popular belief it's not about lock-in so a lot of people like oh this sounds great but it's all about locking I don't want it but in my mind it's more about well I want a simpler interface to the world they want it right only the business logic of my code so of course all the rest is hidden from me but I don't see that particularly as locking I see that more as well it's hidden from me because I wanted to be hidden from me it's an abstraction so one critical distinction with the metaphor I used earlier is that when you would deploy your PHP files your server knew everything about PHP it was literally a special case so for all those older people like me in the audience you would set up something like mod PHP or mod Perl and whatever say when a dot PHP request comes in do this and you would never upgrade PHP and it would be full of security bugs and stuff yeah that was back in the day but landis or several it's functions are really sweet in that you can plug in any runtime so going back to our definition there are sort of magic zip files you can think of service functions magic zip files that contain your code in a reference to a runtime that could be PHP but it could be node or anything in the future as I was making this presentation I realized ok so I'm talking about lack of login which I strongly believe in and I'm also talking about this idea of run times but many of you know that when you deploy a service function you tend to hard code some of the known runtimes of the cloud provider so I wanted to introduce slight tangent which is a rowdy conjecture which is going back to my original model of the world which is static files and special magic files once webassembly gets more sophisticated we could simplify potentially this model too well it's just a bunch of aesthetic files and magic Watson files so this magic Watson files would be universally deployable would run on your local machine on the cloud machine or no machine on every machine so as webassembly continues to proliferate you basically have a built target that is universal both for development production and covers the vast majority of runtimes so node modules and a lot of things we deal with with packaging our functions and whatever will cease to be a problem so this is part of the Rajee conjecture it's like a dream of ours so it seems to be a problem and that means that because that black holes this is to exist the universe will go into big crunch it'll all history will stop because when we solve the node modules problem we've solved all the problems in the industry that's great so in our service model a request or event trigger an invocation and the invocation is completely isolated and this is where the idea of the stateless can begins to proliferate so the developer doesn't have to think about multi-threading it doesn't have to you don't have to think about scheduling you don't have to think about okay what is the CPU or memory threshold of my process at which I have to scale it into another copy so you don't worry about any of that because one invocation happens and you get your own dedicated underlying infrastructure just for that thread and then another invocation happens and you get another one and another one and another one so there's a lot that you will cease to worry about like event loops potentially multi-threading whether you need a sink and await keywords or not will become more questionable even the memory models so why would you have a generational garbage collector if you're you know booting up and down a little time what about arena garbage collection where you have this assumption that because you're living throughout a request cycle you can clean memory a lot more efficiently so having said that and describing this model of okay an invocation comes in and some state is actually provisioned let's go deeper into how state actually works in this model so the hidden truth of containers I sort of surprised is that there are containers there's something underneath so your function doesn't just magically execute there's some piece of infrastructure underneath that provides the necessary isolation resource allocation and so on and so forth so you can model this as a container so even though for example in the awsm the world is not a docker container the underlying technology is pretty similar so you can imagine that every time one of your functions gets invoked it's going inside a dedicated lofty comfortable container just for that invocation so there are servers and there are containers but they are instructed from us so it's at the important thing to keep in mind so I wanted to sort of visualize this as a request comes in and you get a function instance provided for you so when your request comes in something is being allocated under the hood and then when your responses as we talked about this run to completion idea when the response goes out what's very interesting is that this function doesn't just get evicted from the universe this function goes into what I'm gonna call today function purgatory so the function becomes cool in a way so it doesn't run anymore but some resources associated with that function continue to live on now no code is running but a snapshot of that function gets frozen and put into this purgatory if another request comes in what do we end up with is most likely the scheduler of lambda which is very good and I'll talk more about it a little bit later it'll retrieve that for and snapshot of your function instance and I'll talk about what comes back with that and it'll reuse it and this is where the cold versus hot problem it stems from so this is what's considered a warm one because for example v8 booted up a lot of your JavaScript code that had to do something in run time run it did some memory allocation and stuff and then it got frozen so the next time your request comes in any requests all that work doesn't have to be done again a lot of people tend to think that you can get away with pre warming functions this is typically it is usually not the case because what happens is you can get a spike of requests right after right and perhaps the person is still busy with your first function so as your requests are proliferating all these functions have to be have to be met with more provisioning and more than underlying containers and so on so all this come back or sorry and get created so when all those respond boom they all go to purgatory so you're gonna have roughly that same amount of concurrency in the hot pool for a request to come back later and then eventually this are going to be really evicted so you can think of the eviction process something gradual you can almost think of it as garbage collection and the beauty of this is that you're actually not paying when the function was frozen and it has a tremendous scalability benefit because because it run to completion and it got sent to purgatory the memory and CPU space could be shared with any of your other functions or even any other customer so there's some very important benefits to this model in and I'll show an example of this later in that if no traffic comes it comes in you're not paying for anything but what's interesting as as we start to discuss a statefulness and state is that when the functions are those are specific ones that were coming from the purgatory this data is getting restored so there are a few things that happen when this is state when this is they gets restored it don't get discussed enough I think so for example if you had set interval or set timeout by the time your function froze when it comes back from purgatory you might have some unexpected invocations from scheduled work that of course could not be scheduled because node is trying to do work when it's frozen process trees so you can actually freeze so when lambda or other function schedulers put your function to purgatory they freeze the entire process tree or a process group so for example if you were running ffmpeg or you will try to do some batch stuff everything gets frozen and the fun fact is that everything gets restored and there's some important like it you can take advantage of this with very interesting use cases sockets might need to reconnect so as we start discussing databases and memory sharing techniques this is something to always be mindful about is what is the life cycle of the socket right so what is the life cycle of the connection to the database so you kind of have to think of your socket as being primarily useful during the life cycle of the of the function and not being to be relied upon so for example when you think of a socket pool in the stateful world you can rely on that socket pool being there and you can run it rely on some monitoring process to ensure the health of all the sockets and almost a control loop that is keeping count of how many stalkers you have in your socket pool and making sure they're all connected and healthy the baby was saying basically I hate socket pools and stuff yeah I understand so and the interesting thing for us as well as we discuss the statefulness is memory is intact which has it has the benefit of giving us a powerful memorization optimization but of course memory being there is what also makes the subsequent hot invocation so fast because your Java code is not running again and allocated a bunch of stuff in memory and the slash temp directory also is there so the throughput in terms of IOPS and capacity and and performance of slash temp and memory are different but they both have very valid use cases so the bottom line is you should take advantage of caching now as we'll see later you can't fully rely on it so with this I think we can put down this sort of idea that lambdas are stateless no lenses are actually quote or our service functions are actually quite stateful is that the model in itself doesn't lady fall for typical stateful patterns so when you try to write to the file system in a traditional container every write is welcome however in the service function world you'll be told oh you can only write to slash dump and actually it's a good thing because then you don't have any surprises about the lifecycle of your function and so on so there's stateful when some implementation details but the mental model is that they're still stateless and they get recycled very often to be more precise the state is ephemeral and when the functions are coming back into what requires they get assigned you don't have any control over that therefore this recovery of a state could be a little bit surprising to a lot of people so I wanted to demonstrate this with an experiment so a long time ago some of you might know me here from a technological socket IO there was this experiment that was launched on twitch that was called twitchplayspokemon so twitchplayspokemon was an experiment of one single stream of a pokemon emulator controlled by a lot of people concurrently so what I did is I took that idea and he built it entirely based on node JavaScript and the web browser and I called it we play and I relied a lot on servers so I used sir I used the hell out of servers iced-over server it's so server full he used WebSockets and servers and lot of lots of in-memory state so I called it we play it's very successful then he launched another one that was a shared Windows XP computer and that one got really weird and it made it to vice comm you can look up the story because people did a lot of weird stuff on it so the reason I like Pokemon is you can't really do a lot of weird stuff with it so when they left servers behind as I started dedicated most of my focus to server less I wanted to rebuild that experience I wanted to rebuild that demo which is really cool but the the primitives at the time the primitives were there but we we couldn't just like you know take that exact code and be like oh just redeploy it or serverless boom no it's more complicated than that so what I did is I was just explaining how this stuff of state worked to an incoming junior developer and her company so what I did is I created a little demo of this which was just running a JavaScript emulator of that game inside a lambda function so all this function did was it would receive a request it would boot up the emulator it would execute n steps of the CPU of the emulator and then it would render it to an html5 headless canvas and stream it to the user basically as part of the response it would just spit the PNG frame right back out so to it the model was very simple to advance the emulation because each invocation is advancing the emulator just hit refresh and it actually worked it's pretty funny so you have to press refresh a lot you have to make it through this initial hurdle though the credit screen but yeah it actually is pretty fast it's surprisingly fast and like cuz I'm a that's my hand pressing that button cool so this is pokemon yellow running inside one lambda so this is where it gets really interesting so the next thought was of course instead of just manually hitting the button let's write some JavaScript magic to refresh this lambda in an infinite loop so it actually worked really well so this is not accelerated in any form of fashion that's we get you doing some explosions and stuff and this is jsbin because I was explaining this demo to one of our of our employees and I was like what if we just you know do this weird stuff and it worked great so and this is where I was talking about like the idea of state of recovering land is gonna we're like how is this working like where is that state like how is it still advancing instead of refreshing and starting all everything from scratch is weird so the reason that worked is because I put the emulator into the global state and this is quite surprising but this is all the code that was necessary during this demo it's not fake news this is it so notice that I put the emulator outside of the function so what's happening is and I mentioned the aw slam the scheduler being really good I was doing that infinite loop of requests and they were all reusing my function from the purgatory and my function from the purgatory had all the state of the emulator that had been initialized only once and that's why it's so fast I don't even need WebSockets well so those are so last decade I just need a refresh loop just kidding that don't say that I didn't say that don't quote me no but if it works really well but again it kind of worked because as my coworker started opening another tab of J's bin now we started seeing different stuff coming in right so the emulator would sometimes get reinitialized and whatnot so it kind of reminded me of a scene in the beginning of the game where you have to pick your starter Pokemon so the mental model is that function one has Bulbasaur and function to his Charmander and function 3 has similar Pokemon like this is kind of the world that we ended up with because all these functions were being generated and they all were picking up from different states so we had a multiverse and we didn't want that because it's no fun the fun of this demo was that we can all collaborate in one really messed up game so you can actually still go to this demo the one where and I'm not going to bother you with the implications of refreshing but you can you can clearly see that like we could have spent a lot of time here and I could play the entire game with just hitting the refresh button yeah but we're not gonna do that so I said what's going through the the proposed talk of this idea of the statefulness I realized okay if we really want stateful we can do stateful so I had written this demo 95 days ago and then I said okay I'm gonna really make this a stateful instead of faking new sustained fulness so I spent a few hours trying it okay how can we share that state how can we take it out of the land and and for every invocation to share so I'll give you stateful so how do we make the how do we make this work it's actually very complicated and don't recommend that you do it at home but the premise is very useful and Universal and you've probably done this already if you use for example PHP way back in the day because you needed memcached for example as a layer the traditional even architecture that Facebook got started with was lamp PHP my sequel oh my sequel it's not fast enough for sessions and stuff like that oh let's add memcached so we have PHP memcached my sequel this is kind of the world we're going into so what do we what do we use when we need shared memory typically Microsoft SQL Server no Redis so what I did is I created Redis instance I would hold all the shares say between all this function invocations when a request comes in what I changed is that I use an e-tag to determine is this user at the latest version of the game if they are at the latest version they are the candidate to be the chosen one what I mean by chosen one is that they can be the invocation that actually advances the emulation would I would I which I call here a write what do we need now is a little bit of coordination so we have a lot of containers right and it brings me sort of nightmares of kubernetes and etcd but but we need it in this case so we needed these tributed mutex so that only one invocation ideally does the right if it cannot be acquired then that person just gets a read of the emulation if they used to press the key the interesting thing about this is the model remains very simple if the user print in a key I just attached it to the incoming requires the one that I was showing earlier that I was refreshing the game with and the entire state of this where it gets expensive the entire state of the emulator is serialized and save the Redis boom simple but as I mentioned earlier we can still take advantage of this idea of affinity because for a lot of people they get lucky and there might be some interesting stuff that you can reuse as your invocations come in so I ended up with a new version of this that I actually decided to boot up as specifically so that is close to us and gives us low latency so I'll share quickly what I did so at the beginning of the game they ask you for the name of the main character I said Prisma and then they ask you the name of your opponent and it was Oracle and that was really funny and I'll shoot really quick that indeed we were able to accomplish our mission of glorious shared statefulness so I made it work on my phone and as I play on my phone I'm able to play with the version that I'm seeing on my screen as well so as some of them was Prisma what are you doing for Aswan so anyone anyone can also join and I try to contribute to get a key and it's gonna be hard because everyone's gonna try to compete with it but it gets really really funny so so much so that I pushed the you US version as well and people are playing the game because I didn't leave it there so I don't know who took my main character this far so I'm going to talk about briefly about what went into sharing state with lambdas and some conclusions so the first conclusion is shirt emulation is not the intended use case so the idea is very compelling but obviously servers have a slight advantage here so connecting and disconnecting to Redis in the life cycle the invocation actually works remarkably well so I stress this to the game by issuing a lot of requests to try to get the thousand concurrency limit and so seeing operations per second on my radius instance grow but connections will remain in fairly stable because they were connecting and disconnecting very quickly I was seeing connections being established and destroyed within a fraction of a millisecond so that works really well deploying functions close to your database is very very key so when I was testing this demo from here to the US the experience wasn't that great so what I did is I just added a region identifier to my configuration and then the function moved which by the way was a great reminder I was working on this that this idea that you can just make a tiny adjustment and your function goes to where your database is it's quite a promising idea leveraging the lambda state continues to be a great strategy for speeding up execution and subsequent requests and specifically something that's worth noting is that in the service world function memory size and CPU location are proportional so if you demand a lot of CPU you get three gigabytes of memory for example so the recommendation here would be to try to use as much of that memory as possible because you you will be paying for it'll be allocated regardless and something's very interesting as well is that lambdas don't have a flat time to first byte as we talked about earlier they could come from that functional purgatory they could be brand new allocations but it's kind of not under your control what the stable latency of your function is going to be now you can do a lot of things to optimize a bit of time of your functions but at the end of the day the user ideally should always go to a consistently fast source namely a cache so a cache that could be produced by another function or just static files so the idea here being your front end will continue to be powered by the technologies that you're already using but you will use more of this functions as sort of this sidecar and another successful strategy being when you compute something that could be very expensive like our emulation caching it at the edge will give all your subsequent users and visitors an incredibly fast experience all right thank you so much [Applause] [Music]
Info
Channel: Prisma
Views: 3,274
Rating: 5 out of 5
Keywords: #PrismaDay, #Prisma, #Berlin, #databases, #applicationdevelopment, #AppDev, #conference
Id: lUyln5m6AhY
Channel Id: undefined
Length: 29min 24sec (1764 seconds)
Published: Tue Jul 16 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.