- (gentle upbeat music) Welcome to another live
stream hosted by Prisma. I'm your host Daniel. And today I have a very,
very, very special guest. We have Simon also known as Simo. Who's joining us from Sweden. Sime is a Developer Advocate at K6 and we'll discover more
about K6 very soon. But the short version of
that is that K6 is a load. It's an open source load testing tool. And so today we're going
to talk about load testing as part of this new series where we're talking about
running Prisma in production. And so I'll give a little bit
of a background about Sime. So first of all Sime Hey,
it's nice to have you here. - Hey, good to be here. Thanks for having me. So Sime is a Developer
Advocate Public Speaker and Meetup Organizer from Sweden. He's been working in tech
for the last 10 years or so in many different roles, ranging from a full Stack Developer to Systems Architect in Scrum
Master and Ops Engineer. And in the last couple of years, he's put a lot of his time
into devops practices, cloud deployment automation and creating highly efficient teams. And today we're going
to discuss load testing as part of your production
strategy so to say. And so, without any
further delay Sime welcome. - Thanks. - So today I'd love to, before
we sort of get into the topic of Load Testing and
Reliability in Production I'd love to hear more about
sort of your concrete background and what you do at K6. - Yeah, sure. As you said, I've been working in tech for a little over 10
years in different roles. Most of the time I've been a consultant either an independent consultant or through a consulting agency. Been working with customers
ranging from small startups, all the way to enterprise
level customers like, Access or CPA global and help them work with software all through their stack basically. I have a lot of experience in JavaScript as well as Go and C-sharp. But I also like to tinker
with teams just as you said and especially to work
with say devops practices or agile practices in trying to create a workspace where everyone feel
that they can contribute to something meaningful
and create good work. - I see. I see. - And so you mentioned
it's already interesting that you mentioned you
worked with smaller clients and bigger clients, and I'd love to hear a
bit about your thoughts when it comes to sort of reliability and some of the engineering practices how do they differ when
you're working a small company versus like a more enterprise company? What are the different
consents that are at play here? - Well, one of the
biggest concerns for sure, as you can imagine is the
economic constraints, right? So if you work in an
enterprise organization you obviously usually
have a lot more budget than you will do in a startup environment where you have to make
deal with whatever you have and trying to be as cost
efficient as humanly possible. And that brings a whole
other set of challenges regarding whether to self host things or if you should go for the cloud or that you don't really
have to concern yourself with as much in an enterprise environment where you probably have an architect or a team of architects that have decided on some kind of reference architecture for whatever you want to build and you'll have to piece your solution into that sort of say. So that's certainly different. And personally, I can see jollies or see cool things about both setups but startups is really,
really interesting for sure due to the high pace and kind of flat, what do you call the power
distance is very small as compared to in a larger organization. - Right, and so when we speak about reliability in production there is a lot of different terms that are often thrown around, there's Devops and there's
Site Reliability Engineering which is sort of like a whole
branch of software engineering that is really focused on this that I think emerged inside Google and then sort of like expanded to the broader public once
they publish their SRE book. So there's a lot of different terms that also fall under this. How do you think in general when it comes to deploying to production and thinking about all of the
things that concern an app or even just an API when deploying that and serving that to real users. - Yeah, I'm personally not
that interested in roles and departments and that kind of thing but rather try to think of it as a team delivering something together, right? And then you need to have
different competencies and different people on that
team to make that possible. And whilst site reliability
engineering is a great term 'cause it encompasses everything that you basically need
to do to get a performance and reliable service. And, yeah I'm sorry. Could you repeat the question again? 'Cause I might misinterpreted you. - No, no, I think that was a
great sort of introduction. What are some of the practices
perhaps we can talk about? I know that there's for example, I remember reading in the SRE book, there were these three
terms, SLAs, SLOs, and SLIs and the three I think stand
for Service Level Agreement, Service Level Objective, and then Service Level Indicator. And so these three measures
that you sort of come up with as a sort of, as part of your strategy and they start from really what you want to deliver to your users and they spend all the way to really how you measure that in the application. What is sort of your approach to thinking about that when it comes
to the concrete practices? - Yeah, I mean SLAs, they are usually kind of hard, right? 'Cause they're usually tied to some kind of agreement with a customer where you for instance set financial terms for what you'd have what would happen if your
service wasn't available as promised in your agreement. So that's probably the most
critical one of the three. Service Level Objectives
more describe what you do, what goals you have internally for your quality of
service or your reliability or uptime for instance. So that's something that
you'd like to aim for as a team or as a delivery unit or whatever you wanna call it. So you say that, okay these are our objectives
for our service level. They usually tie to the
service level agreement which in turn ties to
an overarching agreement that has financial terms
and things like that. And the Service Level Indicators, they indicate or they
serve as metrics to explain to you whether you've met your service level objectives or not. So it's kind of like fine grading it all the way from the agreements and down to the to the
smallest piece of it, which is the indicators then. - And so what are some
interesting indicators assuming I'm working say on a new startup and you know where we wanna we're building a graph QL,
API, or even a rest API. What are some of the
interesting indicators that come to mind when
we think about this. - Uptime is definitely one of them. Probably the easiest
one to measure as well but things like response
times or error rates when you try to interact
with for instance an API or time from the first meaningful render to the first meaningful paint or till your first interaction
with a server for instance all of those are great
indicators so that you can use to piece together your service
level objectives for sure. - And let's get a bit into K6. So what is K6 and how might it relate to something like all
of these service level indicators or metrics in simpler terms? - Although you could say that K6 or rather K six is primarily
a load testing tool or a performance testing tool, but it's also like a Swiss army knife for doing other things related to reliability and performance as well. For instance, chaos engineering is one of the topics that is
perfectly possible with K6 or with K6 and other complimentary tools, but... - Can we pause there? What is chaos engineering for the viewers who might not
be familiar with the term? - Yeah, sure. Chaos engineering is when
you apply a scientific method to your reliability testing and try to like introduce turbulences in your systems to measure or rather observe how they
react to this turbulence. So basically you try
to provoke your service into failing and making
sure that it doesn't or build away that failure. - Got it. - And for K6 as a tool I guess the metric that
we're most concerned with or rather the primary metric that you could for instance use
as a service level indicator would be the response time given that you have saved for instance 10 users on your page, how long does it take until
they get a meaningful response from the server? What happens if you scale
it up to 10,000 users? Are the numbers still the same or does it take significantly longer to get something
meaningful from the server? - Got it. Yeah, and I think we're gonna
jump into a demo at some point during this live stream and I guess we'll actually
see how all of this works. So if you're tuning in and you're like, "Okay, I like all the theory, but show me the actual
code and how it works." We're gonna get to that very soon. And you mentioned when it comes
to like there's cloud native approaches to deployment, with the rise of all of
these public cloud services, AWS Google cloud platform,
Azure, and many, many more how does that sort of
influence architectures today? And what role do architectures have? Like does the architecture of
your application having this? Like, do you have any thoughts on that? - Yeah, for sure. I mean, one of the biggest challenges with writing good applications today is that they are quite complex and you want to move as fast as possible 'cause you want to be able to get value from whatever investment, either same time or in
money doesn't really matter, but you want to get a
return on that investment as soon as possible or
rather provide a value to the customers. So they kind of classical
way of deploying a monolith and having a release
train every six months and providing a new
version of your software that doesn't really cut it today, right? You need to move a lot faster than that. Many companies that I worked for we've been aiming for
multiple deployments a day. So in that kind of reality things start to get really complex and it starts to get
complex to measure as well whether your service is
performing good or bad and whether it's still on
par with your requirements for instance, SLS and SLOs. So, given that you want to move this fast you don't really have time to, and you don't really you can't really afford
to invest a lot of time into running your own hardware
or your own data center. 'Cause that's usually at least not part of your business critical path, right? Or your business critical
flows or revenue flows. So you kind of want to think about that as little as possible and then using these
cloud native providers or like AWS or Azure GCP, that makes total sense. And it does affect your
architectural decisions 'cause in that environment, you don't wanna spin up a monolith as you did say, five years, 10 years ago. You want your architecture
to be very modular and you'll want to enable
teams to move independently of each other. So you get a lot of small pieces scattered throughout your like system architecture rather than this big monolith that you can easily reason about. - And that's sort of where the term microservices comes in, right? That's like basically the
breaking up of a monolith into multiple components that can move independently of each other, but also scale
independently of each other. - Yeah, exactly. - Would you like to think about it? - And you can even take
it one step further with the offerings like AWS Lambda or Azure functions or Heroku providing, just a run time for a single function. So you run it as functions rather than us or server less as it's called, rather than running it on an actual server or an actual microservice in a container environment
or anything like that. So for sure.
- Yeah. I recently run a survey, I was trying to sort of gauge
where the Prisma community is at in terms of how do
they approach deployment and obviously in the Node JS community Alameda and Serverless in general is very very popular because it provides I thin very very powerful (indistinct) that most developers find easy to pick up and easy to reason about. And it is indeed the most popular approach I think from the survey over
50% of the participants. And perhaps I can find the link and I'll share that in the comments. More than 50% of the users
who participated in the survey are using serverless. Now when working with a database this can be quite challenging. And I did, we made a video, I think last week about how
to sort of set up PG bouncer if you're working with PostgreSQL and PG bouncing in case
you're not familiar, or if one of the viewers isn't familiar is a connection Pooler for Postgres, which makes it easier when you're working with serverless functions
that are connecting to the database. And this is because the
database connection churn is quite expensive opening up and closing a
connection to a Postgres database. Actually it has a lot of overhead. And if your serverless
functions are constantly, spinning up and they scale elastically. So if you say a thousand requests come in, you can actually scale
those Lunda functions to handle those thousand requests. However, the bottleneck
now becomes your database. And so actually this
was the big motivation for me to have you join me today because in that video I actually did some load testing for serverless. Now serverless also has
this cold start problem and there's a lot of nuances that really need to be addressed. And I think until you run the load tests, you don't really know how well your application will perform. Obviously this depends other traffic that you might be serving while you're running these load tests. So there's a lot of different details. And perhaps this is a good
moment to introduce K6. So K6 is an open source load testing tool. You mentioned that it's using a bunch of different environments, perhaps you can mention
how is it different to other load testing tools out there? - Yeah, one of the main
differences with K6 compared to other load testing tools or most load testing tools I'd say is, that it's really focused
on the developer experience and the developer ergonomics. So instead of having for instance a UI where you point and click
and build up these diagrams or these hierarchical structures on tests you actually write it in JavaScript. So if you're for instance use
node JS or something similar then you can actually integrate that called somewhat at
least into K6 as well, for instance sharing models or things like that would
be perfectly possible which makes it really convenient to work with in your everyday workflow, right? 'Cause you've write code every day for a living as a developer and you don't really want
to leave that environment and going to these gooey which you don't probably don't
are that familiar with you. You want to keep using the
same tools as you usually do, say that for instance, that you have the perfect VM set up, then you want to continue using them for writing your load tests as well. And K6 really tries to cater to that we are all in the company
or all in the project where we're all in engineers. So we try to go back to ourselves and what we would appreciate as developers as well in a tool. And that's basically what makes K6 so nice or at least I think it's
so nice as a developer too. - Yeah, yeah. I can attest to that. Having worked with a patchy I think I be and worked also with Vegeta. I really had a nice experience in, starting out with K6 and I think this is, we
share the similar philosophy. At Prisma we really have a strong focus on what will this feature
be like for our users? Does it make sense? Like we're trying to really create a nice developer experience and to laminate as much
toil and unnecessary work. So I wanna pause on that moment before we sort of jump
into case K6 with the demo, and I want to tell all the listeners that feel free to use the chat. We're seeing all of
the comments coming in. So thank you for joining from Indonesia and from Chicago in the US. And I think from Cameroon too if I can recognize that flag correctly. So thank you all for joining. And I think now this is a good moment for us to start the demo. So I'll pull up my screen here
and I'll adjust the labels. And so here I have a little
rest API that is deployed to digital ocean and this is using the
digital ocean app platform. And this is the console
that you're seeing here and I have this actively deployed. And in fact if I open it, I
have just a status endpoint. I'll make it a bit bigger
so that the viewers can see and this let's look a bit at some of the code that we have here. So this is all written in Javascript and this is using a
framework called foster file. And so here I've defined this
a bunch of different endpoints and what we saw just a second
ago was this upend point. And this rest API is backed by a database of Postgres database and the postgres database the model for it was created with Prisma. And so what you can see here are the different Prisma
goals to create the user. There's an end point here
to create a single post to leave comments on a single post, to like different posts to delete different posts and get posts. And we also have a feed
endpoint which returns the 10 most recent posts and their author. And I can just to demonstrate this I can open up the feed endpoint and what we see here are
indeed different post and their related author. And so, I mean, back to
the root of the repository I'm just gonna briefly
show the database schema and this is the database schema. So we have a user table, a post table and a common table and they all have like
one to many relationships, a single use it can have many posts and a single post can have many comments. And there's also in many too many here because different users
can like different posts. So that's roughly the scheme, this is of course an example, but some of the paterns here they apply to many different situations. These are all just relational primitives. And I see there's some comments coming in. So thank you all for joining. Nicole as joined, and Mohammed as joined from Paris and David B from France. So thank you all for joining. I think so I have this
API deployed, right? To digital ocean and I'll
share some more details about some of the interesting
details that we have here. So we have a very basic this is running on a very basic instance. It's a one gigabyte and one virtual CPU. This is about 10 bucks a month. And we also have a database and the database is the sort of the minimum production database that digital ocean offers which can accept up to 22 connections. And so what I've done is I've gone ahead and already set it up in a
way that Prisma will utilize those 22 connections. And I think this is a good moment to look at the load test script. So I have here a K6 load test script. And this is probably a
good moment to mention that with K6 you can define the
load test script in JavaScript. You can also use TypeScript, I believe. - Yeah, you can use TypeScript, but you'd have to transplant it into JavaScript before running, but it's definitely perfectly possible to use TypeScript as well. - Right, yeah. Okay, so this was one of the very first load testing scripts that I've created. And there might be some things
that are unconventional here but the various sort of the, the very beginning here what I'm doing is I'm creating these trends and what perhaps you can
elaborate what are trends and when are they useful
in a load testing test? - Well, K6 has a couple of
different metric types, right? And trends are one of them. And when it comes to trends specifically they allow you to calculate, minimum, maximum average and ratios of whatever values you
provide into that metric. So for instance, in this case, we kind of, for instance it might want to know how
many users were created at a specific point in average
or something like that. And other than that you could
also have used as you also do. I think you could have used rates metrics or even gauzes or counters
to perform similar, similar things but kindah tailored to other use cases, right? - So here, I'm basically using the trends in order to capture. And we'll take a look at this here. So basically this script
exports, a default function and this default function is
essentially the load test. So I'm doing a bunch of preparation here and configuring some options. These are two quite important options. Vus stands for Virtual Users. And this is essentially the number of virtual users that
will be making requests. And so if you're running this and the second one is duration. So if this load test runs for 10 seconds during these 10 seconds 20 virtual users will make those requests that we've defined this
main default function. Here I'm just defining the base URL, learning that from an
environment variable, and just defining a
bunch of constants here for the different endpoints. Now we come to the actual test and the actual requests in this load test happen using this HTTP. I believe this isn't the
standard HTTP library that is provided by node because K6 runs this in a different way. - Yeah, exactly. This is native and native HTP client
written in Go actually. So I wouldn't dare to say
that it's more performance but it should be at
least equally performance to the NOJS one. And it supports all
the basic HTTP methods, as well as some some case
six specific options. For instance, if you
want to tag your requests to easily be able to find them later in your bunch of metrics. So yeah, it's K6 native for sure. So after I make this request here I run a check in order to make
sure that the status was 200. The nice thing I guess about checks is that you can run these different checks throughout your load tests. And then in the end it will tell you how many of them passed or
not based on the condition that you pass in here. The condition is that the status is 200. And then there is this status trend to which we add the
duration of the request. And this is probably the
most interesting metric, as a starting point, we said in the beginning one of the important things to measure besides whether your requests
are successfully served is how long did it take
for the requests to serve. And I believe that this
duration also includes the network latency between the place from
which the load test happens and the distance to the server. So, I mean, if you're
running a load test locally against an API that is in
a far and remote country, I'm situated in Germany. And assuming that I was
to run this load test against an API that is deployed
in San Francisco region then I would expect obviously
these durations to be longer because it just takes physically longer time for those
requests to get served. - Yeah, for sure. They include or rather they represent the full end to end time for a request. So from initiating on your side to the server and back again, yeah. - And then, doing a bunch more requests to get the feed, create a user. And there's also a little sleep duration so that we don't overload the
API with unrealistic traffic. So what I tried to do here and I'd love to hear your
thoughts about this statement is whether this makes sense of have defined a set your sleep duration to simulate what it would be
like for a real user to say use this blog, write this blog that
is backed by this API. And so the idea would be that, they might load the page and
the feed will be requested and then the user will create an account and then you'll create a post. And while it's not a
hundred percent realistic there's no one sort of user flow, right? Uses are free to sort of use the website or the API in any way. But this is sort of somewhat trying to see me like that
kind of traffic pattern that a normal user would happen. - Yeah, it looks great. I mean, you could for instance if you wanted to elaborate further on this you could for instance, think about how long will
the user spend on each, each place in the website. For instance, if you have a list of posts maybe you will have a
longer so-called things time before actually loading a post then you would the second time around when you load the list of posts. So it definitely makes sense to consider things like think time we're pacing and try to vary that a bit to make sure that the user you
simulate actually behaves as an actual user, but in most cases starting
out with a static sleep, for instance, that makes total sense. 'Cause it's easy and you
might not even know yet what the thing times or the
pacing is of an actual user. - Okay, so at this point, I think I also have this
repository locally client I'll make this a bit bigger. Okay so this was a load test. Now I'm gonna open up my terminal and generally you can
install K6 using brew or any kind of package manager. Again, perhaps someone from
the K 16 could drop a link to the installation page but it's really quite easy to get started. And so I'm ready now
to run this load test. So I'm just defining
this environment variable because I don't want
to keep this statically inside my load testing script because there could be
multiple environments. And then it's just really K6
run and you pass the script. Now we can go on. - Yeah. - Yeah I go on. - And as you can see here the test quickly ramped
up to 20 virtual users. And you can see how it iterates over the default function
as many times as it can during the duration that you specified. In this case, we managed to
do 107 complete iterations which include all the requests that you put into your default function. - I see. And then in total we
had 642 HTTP requests. So that is about 56 requests
per second, on average. And I think what's
particularly interesting is these bunch of rows. So these are essentially what we see here. These are the different trends, right? So these are the results
for the different trends. And we have a bunch of
columns for each one of them. Do you wanna sort of work through each one of those columns? - Yeah, sure. First we have the average, which is, as you might guess the average duration
that each request took. And this one is kind of interesting cause it gives us some kind of saying or it gives an indication of how our service is
performing on average. And as you can see, these timings are somewhat
higher than the minimum duration. They're more on the higher
end of the spectrum. Next up, we have the minimum duration, which yeah the shortest time it took for a request to complete followed by the median
and the maximum duration. And then last two are probably the ones I think are the most important. You have the duration for
the 90th percentile of users as well as the 95th percentile. So what that means basically is that, that number was perceived by 90% or rather 90% of the users
had a better time than that or 95% of the users had
a better time than that. So all of these results
are in milliseconds and the percentile. I think this is a really
important note to make, right? Averages can be very misleading especially if the different durations, they really the variation is really large. You have some really slow requests and you have a lot of fast requests. The average doesn't really sort
of represent that very well. What's your perform your
overall performance like and this is why the centiles are useful. Is that a correct way to think about it? - Yeah, for sure. I mean, say for instance that your API has started to return empty responses for some reason, because some service you have
further back in your stack, as started too.. Or as fallen over and you
get empty responses back then those would usually
be really, really fast than lowering the request duration. While for instance if
you wait for a timeout then that would probably take awhile, maybe even a minute before
it actually times out and thus skewing your
result in the other way. And while we probably
only to include metrics for requests that
actually returned at 200. So we know they are successful. We're gonna have outliers
there as well for instance. So looking at the average might
give you a really a bad idea or it might not serve you that good in terms of knowing what the average user experience will be. So instead of by looking
at the, for instance 95th percentile, then we know
that the tail behind that will be quite short, right? So we're not gonna have that many requests that are worse than this. So if you wanna make a user experience or experienced performance, that is to some specific level then that's probably
a way more interesting metric to look at then
both average or median. - I see. And this is specifically
percentiles, right? - Yeah, specifically percentiles. - Okay, so we had someone who
has a tip for us, so let's see what is it you can prefer? - Yeah, Mihail Ctonkob on the K16. - Oh, okay, cool. So, and then the output is human readable. Oh, okay I believe I came across that so if I open up the load test again. So for all of these trends, I can set this to true on
the night's auto-completion it says, "Oh, it's this time." So then it gets, it will
output results in time. So it would be back to the trains. Thank you, Mihail. That's always, I think I was looking through the docs at some
point trying to find this. So let's run this again and I wanna also point
out a couple of things in the results. So run this now again and we're already at 20 virtual users. - And I can mention while we wait that, virtual users if that's a new concept to any of the Watchers or viewers. Virtual users that are basically
parallel run times, right? So while they're called virtual users or concurrent run times rather you can actually more think
of them as separate instances of your default function that will, loop over your actual test function. - Yeah that makes sense. But that's a good one. Yeah, so now we see it
obviously in milliseconds and I think it's particularly interesting to see that this status endpoint, which doesn't make a
round trip to the database is obviously much faster than
the rest of the endpoints. They have to, as a request comes in, it has to actually make a
call or send an SQL query using prisma to the database
and that has to return. And obviously that affects the performance quite significantly. Now I've played a lot with the
different configurations here in order to get these numbers. This is sort of a Prisma specific tip but I'll open up here. The page in the docs, in the prisma docs, we have demployment documentation Let's see. And it details a lot of the differences that matter if you're deploying
to a serverless platform versus to a long running sort
of like platform as a service. And this API is deployed to
the DigitalOcean app platform which is a long running model. That means that you have
essentially a virtual server that is running your JS application. And that node JS application
is handling multiple requests. And what are the things
that you can set is you can set in the connection
string the connection limit for the number of connections for the connection pool that Prisma uses. And what I've done is I've set this to 22 which is the maximum number that my database can expect, can accept. I'll open this up. Perhaps I can go into the page for this. So if you're using digital ocean, one of the great things
is they have a bunch of different options here. But the nice thing is
that they have the ability to set up PG bouncer. This is obviously not
relevant if you're deploying to the app platform, but I'm just pointing this out. And indeed, so there's
a connection limit of 22 and this has been already configured in the environment variable. I'm not gonna show it right now but the there's an
environment variable here called the database URL. And that's where I added this sort of parameter of connection
limit and have set it to 22 this obviously improves the performance because you're utilizing the
maximum number of connections. And as far as I know, you can send basically one query at a time per connection to the database. You can't send multiple
queries per connection. Otherwise it will be just queued up. So only one query gets into
the database connection. So you should obviously
exploit that fully. - Would you say that it still makes sense to run something like PG bouncer, even if you're running Prisma. - So, it does really depends on where you're deploying to if you have an architecture where you might have multiple
instances of your application scaled up automatically for you then probably it's a good
idea to use PG bouncer because you don't wanna
exhaust the 22 connections 'cause that can lead to user errors. And the moment that
you're having user errors I think it doesn't really matter what the duration of your requests is because you're actually failing requests. So the first priority should obviously be to avoid failed requests and then to optimize the performance. So, yeah, I think it would be a good idea. In fact, I think it might be interesting to run an experiment and
see how this performs if we're using a connection
pool to the database let's see if we have time. We're about 20 minutes in there were a bunch more things
that we wanted to cover. I think before we sort
of dive into PG bouncer. I think it's worth talking about what do you do with these results? So you've got those results. You look at the 95th
percentile and you say, "Okay, that seems maybe reasonable, maybe not." What are some, how do you
think about the results of the load test and how do
you make them actionable? - Well, it really depends
application to application or system to system, right? In your case, you've
done a really good job with setting up metrics
that are actually relevant to being able to benchmark this. For instance, we know that
the get status without a DB that takes only 133 milliseconds while getting the feed, which includes a lot of posts that takes a whole 1.3 seconds. So already there, we know that there might
be some things we can trim to get the feed loading to be
a lot more performance, right? For instance, since it is a feed from something that looks like somewhat of a forum or something like that, right? Yeah, then we could actually, we could actually for instance add some kind of caching to make sure that we don't actually get to the feed every time we pull the feed endpoint. And by doing that reduce the time to maybe even what it
takes to do around trip without a database. So there are multiple
things we can do here to kind of improve or optimize
this specific use case. But it's hard to say something in general that applies to all users. The only, I guess advice
I can give for that is that you should do a baseline when your system performs
as you want it to, and you have sort of like conditions that you think are reasonable take a baseline as in check
what metrics you get then, and then use that as a
comparison for all your future performance tests to make sure. So you'll be able to visualize whether you improve or whether
it gets worse over time. So comparing to yourself is probably the best way of making sure
that you can take action on whatever you are
seeing in your results. So essentially the idea is first, you probably wanna configure the database connection details all of those parameters sort of maximize whatever
resources you have available. And then based on that run some load tests come up with some kind of a
baseline that you're happy with. I mean, assuming that
you've already deployed, you are saving traffic it seems fine. You do a load test you sort of keep track of
whatever results came of that. And then you try to essentially maintain that as perhaps
your user base grows. And then the moment that you start seeing that the latency increases
for some of these requests you might introduce something like caching as you suggested. Do you wanna dive a bit deeper into some of the different approaches for caching in general and
what are the implications? Because obviously caching
solves one problem, but then introduces another. And that is that you might be potentially serving stale content. - Yeah, for sure and
that's also why I mentioned that this seemed like a
forum or a blog as you said, cause given that that's the case maybe the actual feed of posts, it's not really relevant that that data is perfectly fresh at every request. So you have to do that
kind of judgment yourself, whether can we tolerate if
this data goes stale for a safe for instance one second. If we feel that that's a
like a business trade off that we can afford then that could be a great
way to optimize at that point. But as you say, for content that is interacted with a lot, it probably doesn't make sense to cache it but for data that you,
for instance post once and then not touch that much again it definitely makes sense
to add some caching to that as that is not as prone to change. So instead of looking that up and running through all
these expensive queries for instance, comparing the ID to whatever or what have you, you could just cash that result 'cause you know that it
will probably be the same in a second as well saving you a lot of
expensive computational time. - Yeah, also I recently
came across this stale while a revalidate approach that I think is starting
to gain a lot of momentum. Are you familiar with
this caching approach? - No, please explain. - So I think the rough idea is
that when you make a request, you're always getting served from cache, but as the request comes in, that marks, "Hey, this
data should be refreshed in the background." And so what happens is, your backend service or your caching layer will upon a request coming in will actually refresh
whatever's in the cache. And so that way you're always
serving content from cache unless it's the first request, but based on the traffic you're regenerating the cache
on demand in the background. So you sort of get the
benefits of fresh data. Obviously you have some situations
where if say an endpoint doesn't get a request
for a long time that, the first user to request
that might get stale content depending obviously on how
you can configure that. You can also configure it so that it will after a certain
period just bust the cache. And then the first request might be slow because it will have
to go to the database. But there's a great benefit, I think to cashing in general. And that is that it reduces the overload, the overall load on your database, thereby making those requests that do end up going to
the database a lot faster, 'cause you don't just have this fight for the same resources. - Okay, so what you're saying basically is that you decouple the
actual serving of a response to the database call, right? So for the next user, they will be be getting
your response so to say? - Hmm, yeah. That makes a lot of sense for sure. Especially if you have high volumes, so times are never, maybe
higher than a couple of Ms since the last call. Then you're basically served fresh data. - Yeah, another thing
that I found interesting in the results here is that this really simple, I'll open it up by the side. So let's look at this. So we have this end point here and really this is the simplest endpoint. It just returns the static object. And if you look at the results for this it varied quite a lot. So they range from the average was 87. The minimum was 27 milliseconds, but the max was 197. And I think that this is
already the kind of latency that you see because
the node JS event loop for that server is already quite loaded. So it's doing essentially
some context switching. So to say between the different things in the event loop, and this is why we're
seeing such high variance, if we were to disable in the load test all of the different calls
to the other endpoints and we just call the status, I think that, or you know what? I'll leave all of them, but I'll reduce the virtual users to N and perhaps increase the sleep duration a little bit more. And I think we're gonna see
much, much better results Now, so we spoke a bit about
how do we, Oh, there we go. We see that there isn't so much variance. The maximum was a hundred, but even the 95th percentile was 65, whereas here it was 133. So I mean, just looking at all of them and comparing them here you see that there's much better results. So it's always good to know, I guess what's your rough load like, and you spoke about setting up a baseline, but setting up a baseline. But I guess coming up with
that baseline can be done by just the looking at
real traffic, right? - Yeah, for sure. And if you don't have
any real traffic yet, then you could use some
kind of measurement of what you expect to have
once you go live, right? So say for instance, that you're launching a new service, maybe you have some prior experience from other services you've launched that. Okay, so we think we might
get 200 users the first week. Okay, so let's see how that would perform with what we currently have. And then you can sort
of iterate from that, but at the same time, which I'd
like to just mention as well it does make sense to early on, get a, sort of a sense
of what the maximum load you can handle actually is. So increasing the load until your server actually falls over, 'cause then you get a hard limit where you need to be cautious that if you approach that limit, you know what you will need to do additional engineering efforts
to increase that, right? So being aware of your maximum limit or your limit before your critical limits that makes total sense to do early. And so I would definitely
suggest or advise to test for that as well, and not just for what you expect - And would that be done essentially by trying to just increase
the number of virtual users until you start really
seeing errors coming back and then you're now okay. That's more or less
roughly the upper threshold that my API can handle. Is that right? - Yeah, yeah. That would be a good start for sure. And also just another thing
that I saw in the chat. Nicole actually pointed
out that it could be good for instance, for your
status, node B and point to actually use some kind of
comparison against something in the response as well. Other than that, it's
returning to a hundred 'cause you couldn't be
served the two hundreds that are completely useless ' cause they contain only garbage data. So for instance, in this case, you had a... you're responding with
status up through, right? Or something like that from your ... - Yes, we can open that up. Here we have it. - Yeah, then you could check the response and make sure that it
actually contains the key up with the value true. - We could do that in the load test. That would be, yeah. And then that would be another check. And then I would check
that the, r dot body. - Yeah. Is equal to. And then, or I would
probably want to pass that because it's json, right? And I think I'm doing
that somewhere here, so. - Yeah, you have a built in for that. So can you do r dot Jason I think it is. - Okay. As a function. - Okay, I see. Really nice. Oh, that's a function. - Yeah. And then dot for instance up and then just compare that to true. - Let's give that a go and let's see what
configuration to have yet. So this is something, I guess
we didn't point out before but all of these checks we'll see them at the top in green or so. Now we see we have this other one and indeed that was true for all of them. - Yeah. - If some of them fail essentially that will show how many
of them failed, right? - Yeah, exactly. 'Cause what you could
have here for instance, is that if you run a microservices or you run a auto scaling say
the replica set or whatever, to actually scale the service up and down, depending on the load, then you could have a
service or one instance in that replica set that
responds with up false 'cause something is wrong, but it still responds with 200. Cause the request was okay, but the health of the
service is actually not okay. So checking for that,
might catch some outliers or some some false positives that you otherwise wouldn't catch. - Got it. And so sort of starting where
we're going to wrap up soon. So this is a good moment to the viewers. If you have any questions, this is a good time to ask. We have Sima here who's
being really helpful so far and I'd love to hear
your thoughts about so, in many situations, your services are gonna be used from one
single geographical region. And if you're deploying
a relatively simple app that isn't sort of multi deployed to multiple geographical regions then you probably wanna
also get a rough idea of how slow it is for users joining from different countries. How do you think about that and how can that be achieved with K6? - Well, it definitely
makes sense to do that just as you say, for instance, the user experience
will be a lot different depending on if you're in Japan as compared to if you're in New York and you host your service or your system in US East, for instance, then your users in New York will have a significantly
better experience than the ones you serve
from say Tokyo or Kyoto and why K6 itself (indistinct) tool don't have any support for
running distributed load tests. As in geographically
distributed load tests. The case is cloud service
that we offer actually does. So it's basically the same as the K6 OSS but it's managed and you'll be able to
specify what load read Jones you want your load to generate from. And other than that, we
also have a bleeding ajure alpha project where you'd be able to use a Kubernetes operator
to spin up multiple K6 tests on different nodes concurrently
and run the tests that way. So that one you'll be able
to find as open source on GitHub in the K6 IO
slash operator repository. - I see. So this would be a way if users don't if they wanna do load testing that is geographically distributed you have a cloud offering that does that. And I imagine it also visualizes, I think I briefly looked
at some of the results that I had in cloud test run. And the alternative to
that would be to use something like maybe the K6 operator, which would allow you to
essentially run your own load testers distributed
across your cluster, your kubernetes cluster. - Yeah, exactly. And then you could for instance have a couple of nodes in different places that all spin up and run their
experiments concurrently. And then that would give you a sense of what that would look like. But as I said, that's
a really alpha project. So there are a lot of kinks and hiccups with that one that you might stumble upon. So if you want to just load test from multiple regions at the same time then the K6 cloud has
you covered for sure. - Hey, great. So we're coming towards the
end of this live stream. We didn't have any questions so far. Are there any other thoughts that you wanna leave with the audience before we wrap up Sime? (laughing) - Yeah, that's the hard one, right? There to come up with on the spot. But I definitely think that more people should test
whether it's performance tests or unit tests or integration tests. I definitely think that all teams should consider whether
they have solid practices for that integrated into their everyday developer workflow. And if they don't, they
should probably look at that. 'Cause I mean, we all do right, shift the code from time to time and we need to make sure
we got that covered. We did enough quality measures for sure. So, if you want to talk
more about test or anything related to testing or tech development feel free to hit me up on Twitter and we can discuss it further for sure. - So you have his
Twitter handle below here and there was one thing
that I sort of skipped if we have a couple more
moments if you don't mind Sime, I'm gonna share my screen again. And so I've already configured this to run in GitHub actions. And so I have a pipeline here workflow. I believe the term is
called in GitHub actions and one of them is to run the tests. And that will really just use just in order to inject requests to the API. The API in these test environment will actually spin up a
real Postgres database. And this happens here. We have this Postgres service container and then the tests run and they actually inject real
requests that go to the API. I'm gonna also share the URL to this if you're interested in exploring this. And the interesting thing is that, just before the live stream, I set up K6 to also run. You have a Github action
is built for that. And here I'm running this
load test and Github actions. And as you can see, we
have all of the results. And so, I just wanted to point that out because that's also a useful way. You can also use this
to run the cloud tests with the hosted service, but you can also just run them locally from the Github action and you'll have the
results in the action run. So that was just the last
thing that I wanted to demo. And before we wrap up, I also want to say that
we have a lot more content coming up in these live
streams that we wanna cover on running Prisma in production. We're gonna talk a lot more
about continuous integration and some of the workflows
for integrating Prisma into a continuous integration pipeline and doing some more
advanced things like testing and a lot more. So stay tuned. If you haven't already
hit the subscribe button and that way you'll be notified for all of the upcoming live streams. Sime, I'd really, really like to thank you for joining me today. It's been a great pleasure having you and just learning from your
experience and knowledge. And I'm really excited about K6. - Thanks for having me. It was a blast for sure. - All right. So in that note, goodbye. - Goodbye.