Prisma in Production: How to Load Test Your API with k6

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- (gentle upbeat music) Welcome to another live stream hosted by Prisma. I'm your host Daniel. And today I have a very, very, very special guest. We have Simon also known as Simo. Who's joining us from Sweden. Sime is a Developer Advocate at K6 and we'll discover more about K6 very soon. But the short version of that is that K6 is a load. It's an open source load testing tool. And so today we're going to talk about load testing as part of this new series where we're talking about running Prisma in production. And so I'll give a little bit of a background about Sime. So first of all Sime Hey, it's nice to have you here. - Hey, good to be here. Thanks for having me. So Sime is a Developer Advocate Public Speaker and Meetup Organizer from Sweden. He's been working in tech for the last 10 years or so in many different roles, ranging from a full Stack Developer to Systems Architect in Scrum Master and Ops Engineer. And in the last couple of years, he's put a lot of his time into devops practices, cloud deployment automation and creating highly efficient teams. And today we're going to discuss load testing as part of your production strategy so to say. And so, without any further delay Sime welcome. - Thanks. - So today I'd love to, before we sort of get into the topic of Load Testing and Reliability in Production I'd love to hear more about sort of your concrete background and what you do at K6. - Yeah, sure. As you said, I've been working in tech for a little over 10 years in different roles. Most of the time I've been a consultant either an independent consultant or through a consulting agency. Been working with customers ranging from small startups, all the way to enterprise level customers like, Access or CPA global and help them work with software all through their stack basically. I have a lot of experience in JavaScript as well as Go and C-sharp. But I also like to tinker with teams just as you said and especially to work with say devops practices or agile practices in trying to create a workspace where everyone feel that they can contribute to something meaningful and create good work. - I see. I see. - And so you mentioned it's already interesting that you mentioned you worked with smaller clients and bigger clients, and I'd love to hear a bit about your thoughts when it comes to sort of reliability and some of the engineering practices how do they differ when you're working a small company versus like a more enterprise company? What are the different consents that are at play here? - Well, one of the biggest concerns for sure, as you can imagine is the economic constraints, right? So if you work in an enterprise organization you obviously usually have a lot more budget than you will do in a startup environment where you have to make deal with whatever you have and trying to be as cost efficient as humanly possible. And that brings a whole other set of challenges regarding whether to self host things or if you should go for the cloud or that you don't really have to concern yourself with as much in an enterprise environment where you probably have an architect or a team of architects that have decided on some kind of reference architecture for whatever you want to build and you'll have to piece your solution into that sort of say. So that's certainly different. And personally, I can see jollies or see cool things about both setups but startups is really, really interesting for sure due to the high pace and kind of flat, what do you call the power distance is very small as compared to in a larger organization. - Right, and so when we speak about reliability in production there is a lot of different terms that are often thrown around, there's Devops and there's Site Reliability Engineering which is sort of like a whole branch of software engineering that is really focused on this that I think emerged inside Google and then sort of like expanded to the broader public once they publish their SRE book. So there's a lot of different terms that also fall under this. How do you think in general when it comes to deploying to production and thinking about all of the things that concern an app or even just an API when deploying that and serving that to real users. - Yeah, I'm personally not that interested in roles and departments and that kind of thing but rather try to think of it as a team delivering something together, right? And then you need to have different competencies and different people on that team to make that possible. And whilst site reliability engineering is a great term 'cause it encompasses everything that you basically need to do to get a performance and reliable service. And, yeah I'm sorry. Could you repeat the question again? 'Cause I might misinterpreted you. - No, no, I think that was a great sort of introduction. What are some of the practices perhaps we can talk about? I know that there's for example, I remember reading in the SRE book, there were these three terms, SLAs, SLOs, and SLIs and the three I think stand for Service Level Agreement, Service Level Objective, and then Service Level Indicator. And so these three measures that you sort of come up with as a sort of, as part of your strategy and they start from really what you want to deliver to your users and they spend all the way to really how you measure that in the application. What is sort of your approach to thinking about that when it comes to the concrete practices? - Yeah, I mean SLAs, they are usually kind of hard, right? 'Cause they're usually tied to some kind of agreement with a customer where you for instance set financial terms for what you'd have what would happen if your service wasn't available as promised in your agreement. So that's probably the most critical one of the three. Service Level Objectives more describe what you do, what goals you have internally for your quality of service or your reliability or uptime for instance. So that's something that you'd like to aim for as a team or as a delivery unit or whatever you wanna call it. So you say that, okay these are our objectives for our service level. They usually tie to the service level agreement which in turn ties to an overarching agreement that has financial terms and things like that. And the Service Level Indicators, they indicate or they serve as metrics to explain to you whether you've met your service level objectives or not. So it's kind of like fine grading it all the way from the agreements and down to the to the smallest piece of it, which is the indicators then. - And so what are some interesting indicators assuming I'm working say on a new startup and you know where we wanna we're building a graph QL, API, or even a rest API. What are some of the interesting indicators that come to mind when we think about this. - Uptime is definitely one of them. Probably the easiest one to measure as well but things like response times or error rates when you try to interact with for instance an API or time from the first meaningful render to the first meaningful paint or till your first interaction with a server for instance all of those are great indicators so that you can use to piece together your service level objectives for sure. - And let's get a bit into K6. So what is K6 and how might it relate to something like all of these service level indicators or metrics in simpler terms? - Although you could say that K6 or rather K six is primarily a load testing tool or a performance testing tool, but it's also like a Swiss army knife for doing other things related to reliability and performance as well. For instance, chaos engineering is one of the topics that is perfectly possible with K6 or with K6 and other complimentary tools, but... - Can we pause there? What is chaos engineering for the viewers who might not be familiar with the term? - Yeah, sure. Chaos engineering is when you apply a scientific method to your reliability testing and try to like introduce turbulences in your systems to measure or rather observe how they react to this turbulence. So basically you try to provoke your service into failing and making sure that it doesn't or build away that failure. - Got it. - And for K6 as a tool I guess the metric that we're most concerned with or rather the primary metric that you could for instance use as a service level indicator would be the response time given that you have saved for instance 10 users on your page, how long does it take until they get a meaningful response from the server? What happens if you scale it up to 10,000 users? Are the numbers still the same or does it take significantly longer to get something meaningful from the server? - Got it. Yeah, and I think we're gonna jump into a demo at some point during this live stream and I guess we'll actually see how all of this works. So if you're tuning in and you're like, "Okay, I like all the theory, but show me the actual code and how it works." We're gonna get to that very soon. And you mentioned when it comes to like there's cloud native approaches to deployment, with the rise of all of these public cloud services, AWS Google cloud platform, Azure, and many, many more how does that sort of influence architectures today? And what role do architectures have? Like does the architecture of your application having this? Like, do you have any thoughts on that? - Yeah, for sure. I mean, one of the biggest challenges with writing good applications today is that they are quite complex and you want to move as fast as possible 'cause you want to be able to get value from whatever investment, either same time or in money doesn't really matter, but you want to get a return on that investment as soon as possible or rather provide a value to the customers. So they kind of classical way of deploying a monolith and having a release train every six months and providing a new version of your software that doesn't really cut it today, right? You need to move a lot faster than that. Many companies that I worked for we've been aiming for multiple deployments a day. So in that kind of reality things start to get really complex and it starts to get complex to measure as well whether your service is performing good or bad and whether it's still on par with your requirements for instance, SLS and SLOs. So, given that you want to move this fast you don't really have time to, and you don't really you can't really afford to invest a lot of time into running your own hardware or your own data center. 'Cause that's usually at least not part of your business critical path, right? Or your business critical flows or revenue flows. So you kind of want to think about that as little as possible and then using these cloud native providers or like AWS or Azure GCP, that makes total sense. And it does affect your architectural decisions 'cause in that environment, you don't wanna spin up a monolith as you did say, five years, 10 years ago. You want your architecture to be very modular and you'll want to enable teams to move independently of each other. So you get a lot of small pieces scattered throughout your like system architecture rather than this big monolith that you can easily reason about. - And that's sort of where the term microservices comes in, right? That's like basically the breaking up of a monolith into multiple components that can move independently of each other, but also scale independently of each other. - Yeah, exactly. - Would you like to think about it? - And you can even take it one step further with the offerings like AWS Lambda or Azure functions or Heroku providing, just a run time for a single function. So you run it as functions rather than us or server less as it's called, rather than running it on an actual server or an actual microservice in a container environment or anything like that. So for sure. - Yeah. I recently run a survey, I was trying to sort of gauge where the Prisma community is at in terms of how do they approach deployment and obviously in the Node JS community Alameda and Serverless in general is very very popular because it provides I thin very very powerful (indistinct) that most developers find easy to pick up and easy to reason about. And it is indeed the most popular approach I think from the survey over 50% of the participants. And perhaps I can find the link and I'll share that in the comments. More than 50% of the users who participated in the survey are using serverless. Now when working with a database this can be quite challenging. And I did, we made a video, I think last week about how to sort of set up PG bouncer if you're working with PostgreSQL and PG bouncing in case you're not familiar, or if one of the viewers isn't familiar is a connection Pooler for Postgres, which makes it easier when you're working with serverless functions that are connecting to the database. And this is because the database connection churn is quite expensive opening up and closing a connection to a Postgres database. Actually it has a lot of overhead. And if your serverless functions are constantly, spinning up and they scale elastically. So if you say a thousand requests come in, you can actually scale those Lunda functions to handle those thousand requests. However, the bottleneck now becomes your database. And so actually this was the big motivation for me to have you join me today because in that video I actually did some load testing for serverless. Now serverless also has this cold start problem and there's a lot of nuances that really need to be addressed. And I think until you run the load tests, you don't really know how well your application will perform. Obviously this depends other traffic that you might be serving while you're running these load tests. So there's a lot of different details. And perhaps this is a good moment to introduce K6. So K6 is an open source load testing tool. You mentioned that it's using a bunch of different environments, perhaps you can mention how is it different to other load testing tools out there? - Yeah, one of the main differences with K6 compared to other load testing tools or most load testing tools I'd say is, that it's really focused on the developer experience and the developer ergonomics. So instead of having for instance a UI where you point and click and build up these diagrams or these hierarchical structures on tests you actually write it in JavaScript. So if you're for instance use node JS or something similar then you can actually integrate that called somewhat at least into K6 as well, for instance sharing models or things like that would be perfectly possible which makes it really convenient to work with in your everyday workflow, right? 'Cause you've write code every day for a living as a developer and you don't really want to leave that environment and going to these gooey which you don't probably don't are that familiar with you. You want to keep using the same tools as you usually do, say that for instance, that you have the perfect VM set up, then you want to continue using them for writing your load tests as well. And K6 really tries to cater to that we are all in the company or all in the project where we're all in engineers. So we try to go back to ourselves and what we would appreciate as developers as well in a tool. And that's basically what makes K6 so nice or at least I think it's so nice as a developer too. - Yeah, yeah. I can attest to that. Having worked with a patchy I think I be and worked also with Vegeta. I really had a nice experience in, starting out with K6 and I think this is, we share the similar philosophy. At Prisma we really have a strong focus on what will this feature be like for our users? Does it make sense? Like we're trying to really create a nice developer experience and to laminate as much toil and unnecessary work. So I wanna pause on that moment before we sort of jump into case K6 with the demo, and I want to tell all the listeners that feel free to use the chat. We're seeing all of the comments coming in. So thank you for joining from Indonesia and from Chicago in the US. And I think from Cameroon too if I can recognize that flag correctly. So thank you all for joining. And I think now this is a good moment for us to start the demo. So I'll pull up my screen here and I'll adjust the labels. And so here I have a little rest API that is deployed to digital ocean and this is using the digital ocean app platform. And this is the console that you're seeing here and I have this actively deployed. And in fact if I open it, I have just a status endpoint. I'll make it a bit bigger so that the viewers can see and this let's look a bit at some of the code that we have here. So this is all written in Javascript and this is using a framework called foster file. And so here I've defined this a bunch of different endpoints and what we saw just a second ago was this upend point. And this rest API is backed by a database of Postgres database and the postgres database the model for it was created with Prisma. And so what you can see here are the different Prisma goals to create the user. There's an end point here to create a single post to leave comments on a single post, to like different posts to delete different posts and get posts. And we also have a feed endpoint which returns the 10 most recent posts and their author. And I can just to demonstrate this I can open up the feed endpoint and what we see here are indeed different post and their related author. And so, I mean, back to the root of the repository I'm just gonna briefly show the database schema and this is the database schema. So we have a user table, a post table and a common table and they all have like one to many relationships, a single use it can have many posts and a single post can have many comments. And there's also in many too many here because different users can like different posts. So that's roughly the scheme, this is of course an example, but some of the paterns here they apply to many different situations. These are all just relational primitives. And I see there's some comments coming in. So thank you all for joining. Nicole as joined, and Mohammed as joined from Paris and David B from France. So thank you all for joining. I think so I have this API deployed, right? To digital ocean and I'll share some more details about some of the interesting details that we have here. So we have a very basic this is running on a very basic instance. It's a one gigabyte and one virtual CPU. This is about 10 bucks a month. And we also have a database and the database is the sort of the minimum production database that digital ocean offers which can accept up to 22 connections. And so what I've done is I've gone ahead and already set it up in a way that Prisma will utilize those 22 connections. And I think this is a good moment to look at the load test script. So I have here a K6 load test script. And this is probably a good moment to mention that with K6 you can define the load test script in JavaScript. You can also use TypeScript, I believe. - Yeah, you can use TypeScript, but you'd have to transplant it into JavaScript before running, but it's definitely perfectly possible to use TypeScript as well. - Right, yeah. Okay, so this was one of the very first load testing scripts that I've created. And there might be some things that are unconventional here but the various sort of the, the very beginning here what I'm doing is I'm creating these trends and what perhaps you can elaborate what are trends and when are they useful in a load testing test? - Well, K6 has a couple of different metric types, right? And trends are one of them. And when it comes to trends specifically they allow you to calculate, minimum, maximum average and ratios of whatever values you provide into that metric. So for instance, in this case, we kind of, for instance it might want to know how many users were created at a specific point in average or something like that. And other than that you could also have used as you also do. I think you could have used rates metrics or even gauzes or counters to perform similar, similar things but kindah tailored to other use cases, right? - So here, I'm basically using the trends in order to capture. And we'll take a look at this here. So basically this script exports, a default function and this default function is essentially the load test. So I'm doing a bunch of preparation here and configuring some options. These are two quite important options. Vus stands for Virtual Users. And this is essentially the number of virtual users that will be making requests. And so if you're running this and the second one is duration. So if this load test runs for 10 seconds during these 10 seconds 20 virtual users will make those requests that we've defined this main default function. Here I'm just defining the base URL, learning that from an environment variable, and just defining a bunch of constants here for the different endpoints. Now we come to the actual test and the actual requests in this load test happen using this HTTP. I believe this isn't the standard HTTP library that is provided by node because K6 runs this in a different way. - Yeah, exactly. This is native and native HTP client written in Go actually. So I wouldn't dare to say that it's more performance but it should be at least equally performance to the NOJS one. And it supports all the basic HTTP methods, as well as some some case six specific options. For instance, if you want to tag your requests to easily be able to find them later in your bunch of metrics. So yeah, it's K6 native for sure. So after I make this request here I run a check in order to make sure that the status was 200. The nice thing I guess about checks is that you can run these different checks throughout your load tests. And then in the end it will tell you how many of them passed or not based on the condition that you pass in here. The condition is that the status is 200. And then there is this status trend to which we add the duration of the request. And this is probably the most interesting metric, as a starting point, we said in the beginning one of the important things to measure besides whether your requests are successfully served is how long did it take for the requests to serve. And I believe that this duration also includes the network latency between the place from which the load test happens and the distance to the server. So, I mean, if you're running a load test locally against an API that is in a far and remote country, I'm situated in Germany. And assuming that I was to run this load test against an API that is deployed in San Francisco region then I would expect obviously these durations to be longer because it just takes physically longer time for those requests to get served. - Yeah, for sure. They include or rather they represent the full end to end time for a request. So from initiating on your side to the server and back again, yeah. - And then, doing a bunch more requests to get the feed, create a user. And there's also a little sleep duration so that we don't overload the API with unrealistic traffic. So what I tried to do here and I'd love to hear your thoughts about this statement is whether this makes sense of have defined a set your sleep duration to simulate what it would be like for a real user to say use this blog, write this blog that is backed by this API. And so the idea would be that, they might load the page and the feed will be requested and then the user will create an account and then you'll create a post. And while it's not a hundred percent realistic there's no one sort of user flow, right? Uses are free to sort of use the website or the API in any way. But this is sort of somewhat trying to see me like that kind of traffic pattern that a normal user would happen. - Yeah, it looks great. I mean, you could for instance if you wanted to elaborate further on this you could for instance, think about how long will the user spend on each, each place in the website. For instance, if you have a list of posts maybe you will have a longer so-called things time before actually loading a post then you would the second time around when you load the list of posts. So it definitely makes sense to consider things like think time we're pacing and try to vary that a bit to make sure that the user you simulate actually behaves as an actual user, but in most cases starting out with a static sleep, for instance, that makes total sense. 'Cause it's easy and you might not even know yet what the thing times or the pacing is of an actual user. - Okay, so at this point, I think I also have this repository locally client I'll make this a bit bigger. Okay so this was a load test. Now I'm gonna open up my terminal and generally you can install K6 using brew or any kind of package manager. Again, perhaps someone from the K 16 could drop a link to the installation page but it's really quite easy to get started. And so I'm ready now to run this load test. So I'm just defining this environment variable because I don't want to keep this statically inside my load testing script because there could be multiple environments. And then it's just really K6 run and you pass the script. Now we can go on. - Yeah. - Yeah I go on. - And as you can see here the test quickly ramped up to 20 virtual users. And you can see how it iterates over the default function as many times as it can during the duration that you specified. In this case, we managed to do 107 complete iterations which include all the requests that you put into your default function. - I see. And then in total we had 642 HTTP requests. So that is about 56 requests per second, on average. And I think what's particularly interesting is these bunch of rows. So these are essentially what we see here. These are the different trends, right? So these are the results for the different trends. And we have a bunch of columns for each one of them. Do you wanna sort of work through each one of those columns? - Yeah, sure. First we have the average, which is, as you might guess the average duration that each request took. And this one is kind of interesting cause it gives us some kind of saying or it gives an indication of how our service is performing on average. And as you can see, these timings are somewhat higher than the minimum duration. They're more on the higher end of the spectrum. Next up, we have the minimum duration, which yeah the shortest time it took for a request to complete followed by the median and the maximum duration. And then last two are probably the ones I think are the most important. You have the duration for the 90th percentile of users as well as the 95th percentile. So what that means basically is that, that number was perceived by 90% or rather 90% of the users had a better time than that or 95% of the users had a better time than that. So all of these results are in milliseconds and the percentile. I think this is a really important note to make, right? Averages can be very misleading especially if the different durations, they really the variation is really large. You have some really slow requests and you have a lot of fast requests. The average doesn't really sort of represent that very well. What's your perform your overall performance like and this is why the centiles are useful. Is that a correct way to think about it? - Yeah, for sure. I mean, say for instance that your API has started to return empty responses for some reason, because some service you have further back in your stack, as started too.. Or as fallen over and you get empty responses back then those would usually be really, really fast than lowering the request duration. While for instance if you wait for a timeout then that would probably take awhile, maybe even a minute before it actually times out and thus skewing your result in the other way. And while we probably only to include metrics for requests that actually returned at 200. So we know they are successful. We're gonna have outliers there as well for instance. So looking at the average might give you a really a bad idea or it might not serve you that good in terms of knowing what the average user experience will be. So instead of by looking at the, for instance 95th percentile, then we know that the tail behind that will be quite short, right? So we're not gonna have that many requests that are worse than this. So if you wanna make a user experience or experienced performance, that is to some specific level then that's probably a way more interesting metric to look at then both average or median. - I see. And this is specifically percentiles, right? - Yeah, specifically percentiles. - Okay, so we had someone who has a tip for us, so let's see what is it you can prefer? - Yeah, Mihail Ctonkob on the K16. - Oh, okay, cool. So, and then the output is human readable. Oh, okay I believe I came across that so if I open up the load test again. So for all of these trends, I can set this to true on the night's auto-completion it says, "Oh, it's this time." So then it gets, it will output results in time. So it would be back to the trains. Thank you, Mihail. That's always, I think I was looking through the docs at some point trying to find this. So let's run this again and I wanna also point out a couple of things in the results. So run this now again and we're already at 20 virtual users. - And I can mention while we wait that, virtual users if that's a new concept to any of the Watchers or viewers. Virtual users that are basically parallel run times, right? So while they're called virtual users or concurrent run times rather you can actually more think of them as separate instances of your default function that will, loop over your actual test function. - Yeah that makes sense. But that's a good one. Yeah, so now we see it obviously in milliseconds and I think it's particularly interesting to see that this status endpoint, which doesn't make a round trip to the database is obviously much faster than the rest of the endpoints. They have to, as a request comes in, it has to actually make a call or send an SQL query using prisma to the database and that has to return. And obviously that affects the performance quite significantly. Now I've played a lot with the different configurations here in order to get these numbers. This is sort of a Prisma specific tip but I'll open up here. The page in the docs, in the prisma docs, we have demployment documentation Let's see. And it details a lot of the differences that matter if you're deploying to a serverless platform versus to a long running sort of like platform as a service. And this API is deployed to the DigitalOcean app platform which is a long running model. That means that you have essentially a virtual server that is running your JS application. And that node JS application is handling multiple requests. And what are the things that you can set is you can set in the connection string the connection limit for the number of connections for the connection pool that Prisma uses. And what I've done is I've set this to 22 which is the maximum number that my database can expect, can accept. I'll open this up. Perhaps I can go into the page for this. So if you're using digital ocean, one of the great things is they have a bunch of different options here. But the nice thing is that they have the ability to set up PG bouncer. This is obviously not relevant if you're deploying to the app platform, but I'm just pointing this out. And indeed, so there's a connection limit of 22 and this has been already configured in the environment variable. I'm not gonna show it right now but the there's an environment variable here called the database URL. And that's where I added this sort of parameter of connection limit and have set it to 22 this obviously improves the performance because you're utilizing the maximum number of connections. And as far as I know, you can send basically one query at a time per connection to the database. You can't send multiple queries per connection. Otherwise it will be just queued up. So only one query gets into the database connection. So you should obviously exploit that fully. - Would you say that it still makes sense to run something like PG bouncer, even if you're running Prisma. - So, it does really depends on where you're deploying to if you have an architecture where you might have multiple instances of your application scaled up automatically for you then probably it's a good idea to use PG bouncer because you don't wanna exhaust the 22 connections 'cause that can lead to user errors. And the moment that you're having user errors I think it doesn't really matter what the duration of your requests is because you're actually failing requests. So the first priority should obviously be to avoid failed requests and then to optimize the performance. So, yeah, I think it would be a good idea. In fact, I think it might be interesting to run an experiment and see how this performs if we're using a connection pool to the database let's see if we have time. We're about 20 minutes in there were a bunch more things that we wanted to cover. I think before we sort of dive into PG bouncer. I think it's worth talking about what do you do with these results? So you've got those results. You look at the 95th percentile and you say, "Okay, that seems maybe reasonable, maybe not." What are some, how do you think about the results of the load test and how do you make them actionable? - Well, it really depends application to application or system to system, right? In your case, you've done a really good job with setting up metrics that are actually relevant to being able to benchmark this. For instance, we know that the get status without a DB that takes only 133 milliseconds while getting the feed, which includes a lot of posts that takes a whole 1.3 seconds. So already there, we know that there might be some things we can trim to get the feed loading to be a lot more performance, right? For instance, since it is a feed from something that looks like somewhat of a forum or something like that, right? Yeah, then we could actually, we could actually for instance add some kind of caching to make sure that we don't actually get to the feed every time we pull the feed endpoint. And by doing that reduce the time to maybe even what it takes to do around trip without a database. So there are multiple things we can do here to kind of improve or optimize this specific use case. But it's hard to say something in general that applies to all users. The only, I guess advice I can give for that is that you should do a baseline when your system performs as you want it to, and you have sort of like conditions that you think are reasonable take a baseline as in check what metrics you get then, and then use that as a comparison for all your future performance tests to make sure. So you'll be able to visualize whether you improve or whether it gets worse over time. So comparing to yourself is probably the best way of making sure that you can take action on whatever you are seeing in your results. So essentially the idea is first, you probably wanna configure the database connection details all of those parameters sort of maximize whatever resources you have available. And then based on that run some load tests come up with some kind of a baseline that you're happy with. I mean, assuming that you've already deployed, you are saving traffic it seems fine. You do a load test you sort of keep track of whatever results came of that. And then you try to essentially maintain that as perhaps your user base grows. And then the moment that you start seeing that the latency increases for some of these requests you might introduce something like caching as you suggested. Do you wanna dive a bit deeper into some of the different approaches for caching in general and what are the implications? Because obviously caching solves one problem, but then introduces another. And that is that you might be potentially serving stale content. - Yeah, for sure and that's also why I mentioned that this seemed like a forum or a blog as you said, cause given that that's the case maybe the actual feed of posts, it's not really relevant that that data is perfectly fresh at every request. So you have to do that kind of judgment yourself, whether can we tolerate if this data goes stale for a safe for instance one second. If we feel that that's a like a business trade off that we can afford then that could be a great way to optimize at that point. But as you say, for content that is interacted with a lot, it probably doesn't make sense to cache it but for data that you, for instance post once and then not touch that much again it definitely makes sense to add some caching to that as that is not as prone to change. So instead of looking that up and running through all these expensive queries for instance, comparing the ID to whatever or what have you, you could just cash that result 'cause you know that it will probably be the same in a second as well saving you a lot of expensive computational time. - Yeah, also I recently came across this stale while a revalidate approach that I think is starting to gain a lot of momentum. Are you familiar with this caching approach? - No, please explain. - So I think the rough idea is that when you make a request, you're always getting served from cache, but as the request comes in, that marks, "Hey, this data should be refreshed in the background." And so what happens is, your backend service or your caching layer will upon a request coming in will actually refresh whatever's in the cache. And so that way you're always serving content from cache unless it's the first request, but based on the traffic you're regenerating the cache on demand in the background. So you sort of get the benefits of fresh data. Obviously you have some situations where if say an endpoint doesn't get a request for a long time that, the first user to request that might get stale content depending obviously on how you can configure that. You can also configure it so that it will after a certain period just bust the cache. And then the first request might be slow because it will have to go to the database. But there's a great benefit, I think to cashing in general. And that is that it reduces the overload, the overall load on your database, thereby making those requests that do end up going to the database a lot faster, 'cause you don't just have this fight for the same resources. - Okay, so what you're saying basically is that you decouple the actual serving of a response to the database call, right? So for the next user, they will be be getting your response so to say? - Hmm, yeah. That makes a lot of sense for sure. Especially if you have high volumes, so times are never, maybe higher than a couple of Ms since the last call. Then you're basically served fresh data. - Yeah, another thing that I found interesting in the results here is that this really simple, I'll open it up by the side. So let's look at this. So we have this end point here and really this is the simplest endpoint. It just returns the static object. And if you look at the results for this it varied quite a lot. So they range from the average was 87. The minimum was 27 milliseconds, but the max was 197. And I think that this is already the kind of latency that you see because the node JS event loop for that server is already quite loaded. So it's doing essentially some context switching. So to say between the different things in the event loop, and this is why we're seeing such high variance, if we were to disable in the load test all of the different calls to the other endpoints and we just call the status, I think that, or you know what? I'll leave all of them, but I'll reduce the virtual users to N and perhaps increase the sleep duration a little bit more. And I think we're gonna see much, much better results Now, so we spoke a bit about how do we, Oh, there we go. We see that there isn't so much variance. The maximum was a hundred, but even the 95th percentile was 65, whereas here it was 133. So I mean, just looking at all of them and comparing them here you see that there's much better results. So it's always good to know, I guess what's your rough load like, and you spoke about setting up a baseline, but setting up a baseline. But I guess coming up with that baseline can be done by just the looking at real traffic, right? - Yeah, for sure. And if you don't have any real traffic yet, then you could use some kind of measurement of what you expect to have once you go live, right? So say for instance, that you're launching a new service, maybe you have some prior experience from other services you've launched that. Okay, so we think we might get 200 users the first week. Okay, so let's see how that would perform with what we currently have. And then you can sort of iterate from that, but at the same time, which I'd like to just mention as well it does make sense to early on, get a, sort of a sense of what the maximum load you can handle actually is. So increasing the load until your server actually falls over, 'cause then you get a hard limit where you need to be cautious that if you approach that limit, you know what you will need to do additional engineering efforts to increase that, right? So being aware of your maximum limit or your limit before your critical limits that makes total sense to do early. And so I would definitely suggest or advise to test for that as well, and not just for what you expect - And would that be done essentially by trying to just increase the number of virtual users until you start really seeing errors coming back and then you're now okay. That's more or less roughly the upper threshold that my API can handle. Is that right? - Yeah, yeah. That would be a good start for sure. And also just another thing that I saw in the chat. Nicole actually pointed out that it could be good for instance, for your status, node B and point to actually use some kind of comparison against something in the response as well. Other than that, it's returning to a hundred 'cause you couldn't be served the two hundreds that are completely useless ' cause they contain only garbage data. So for instance, in this case, you had a... you're responding with status up through, right? Or something like that from your ... - Yes, we can open that up. Here we have it. - Yeah, then you could check the response and make sure that it actually contains the key up with the value true. - We could do that in the load test. That would be, yeah. And then that would be another check. And then I would check that the, r dot body. - Yeah. Is equal to. And then, or I would probably want to pass that because it's json, right? And I think I'm doing that somewhere here, so. - Yeah, you have a built in for that. So can you do r dot Jason I think it is. - Okay. As a function. - Okay, I see. Really nice. Oh, that's a function. - Yeah. And then dot for instance up and then just compare that to true. - Let's give that a go and let's see what configuration to have yet. So this is something, I guess we didn't point out before but all of these checks we'll see them at the top in green or so. Now we see we have this other one and indeed that was true for all of them. - Yeah. - If some of them fail essentially that will show how many of them failed, right? - Yeah, exactly. 'Cause what you could have here for instance, is that if you run a microservices or you run a auto scaling say the replica set or whatever, to actually scale the service up and down, depending on the load, then you could have a service or one instance in that replica set that responds with up false 'cause something is wrong, but it still responds with 200. Cause the request was okay, but the health of the service is actually not okay. So checking for that, might catch some outliers or some some false positives that you otherwise wouldn't catch. - Got it. And so sort of starting where we're going to wrap up soon. So this is a good moment to the viewers. If you have any questions, this is a good time to ask. We have Sima here who's being really helpful so far and I'd love to hear your thoughts about so, in many situations, your services are gonna be used from one single geographical region. And if you're deploying a relatively simple app that isn't sort of multi deployed to multiple geographical regions then you probably wanna also get a rough idea of how slow it is for users joining from different countries. How do you think about that and how can that be achieved with K6? - Well, it definitely makes sense to do that just as you say, for instance, the user experience will be a lot different depending on if you're in Japan as compared to if you're in New York and you host your service or your system in US East, for instance, then your users in New York will have a significantly better experience than the ones you serve from say Tokyo or Kyoto and why K6 itself (indistinct) tool don't have any support for running distributed load tests. As in geographically distributed load tests. The case is cloud service that we offer actually does. So it's basically the same as the K6 OSS but it's managed and you'll be able to specify what load read Jones you want your load to generate from. And other than that, we also have a bleeding ajure alpha project where you'd be able to use a Kubernetes operator to spin up multiple K6 tests on different nodes concurrently and run the tests that way. So that one you'll be able to find as open source on GitHub in the K6 IO slash operator repository. - I see. So this would be a way if users don't if they wanna do load testing that is geographically distributed you have a cloud offering that does that. And I imagine it also visualizes, I think I briefly looked at some of the results that I had in cloud test run. And the alternative to that would be to use something like maybe the K6 operator, which would allow you to essentially run your own load testers distributed across your cluster, your kubernetes cluster. - Yeah, exactly. And then you could for instance have a couple of nodes in different places that all spin up and run their experiments concurrently. And then that would give you a sense of what that would look like. But as I said, that's a really alpha project. So there are a lot of kinks and hiccups with that one that you might stumble upon. So if you want to just load test from multiple regions at the same time then the K6 cloud has you covered for sure. - Hey, great. So we're coming towards the end of this live stream. We didn't have any questions so far. Are there any other thoughts that you wanna leave with the audience before we wrap up Sime? (laughing) - Yeah, that's the hard one, right? There to come up with on the spot. But I definitely think that more people should test whether it's performance tests or unit tests or integration tests. I definitely think that all teams should consider whether they have solid practices for that integrated into their everyday developer workflow. And if they don't, they should probably look at that. 'Cause I mean, we all do right, shift the code from time to time and we need to make sure we got that covered. We did enough quality measures for sure. So, if you want to talk more about test or anything related to testing or tech development feel free to hit me up on Twitter and we can discuss it further for sure. - So you have his Twitter handle below here and there was one thing that I sort of skipped if we have a couple more moments if you don't mind Sime, I'm gonna share my screen again. And so I've already configured this to run in GitHub actions. And so I have a pipeline here workflow. I believe the term is called in GitHub actions and one of them is to run the tests. And that will really just use just in order to inject requests to the API. The API in these test environment will actually spin up a real Postgres database. And this happens here. We have this Postgres service container and then the tests run and they actually inject real requests that go to the API. I'm gonna also share the URL to this if you're interested in exploring this. And the interesting thing is that, just before the live stream, I set up K6 to also run. You have a Github action is built for that. And here I'm running this load test and Github actions. And as you can see, we have all of the results. And so, I just wanted to point that out because that's also a useful way. You can also use this to run the cloud tests with the hosted service, but you can also just run them locally from the Github action and you'll have the results in the action run. So that was just the last thing that I wanted to demo. And before we wrap up, I also want to say that we have a lot more content coming up in these live streams that we wanna cover on running Prisma in production. We're gonna talk a lot more about continuous integration and some of the workflows for integrating Prisma into a continuous integration pipeline and doing some more advanced things like testing and a lot more. So stay tuned. If you haven't already hit the subscribe button and that way you'll be notified for all of the upcoming live streams. Sime, I'd really, really like to thank you for joining me today. It's been a great pleasure having you and just learning from your experience and knowledge. And I'm really excited about K6. - Thanks for having me. It was a blast for sure. - All right. So in that note, goodbye. - Goodbye.
Info
Channel: Prisma
Views: 1,752
Rating: undefined out of 5
Keywords: REST, API, Prisma, Deployment, Data, PostgreSQL, JavaScript, Database, Backend, TypeScript ORM, Object Relational Mapper, continuous integration, ORM, Serverless, PgBouncer, Cloud, Node.js, Load testing, Open source, k6, SRE, DevOps, Cloud Deployment, GraphQL, DigitalOcean
Id: ZV01hyVR1yw
Channel Id: undefined
Length: 63min 50sec (3830 seconds)
Published: Wed Feb 10 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.