Load Testing Kubernetes: How to Optimize Your Cluster Resource Allocation in Production [I]

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

well thanks everybody for coming out and sticking around for the last talk of the day and this is my first ever talk at a conference so this is really exciting and if I've learned anything about talks from other people you always start off with a question so my question is who here is setting resource limits on their on their pods or containers okay so quite a few people now who here is using load testing techniques to set those resource limits okay not quite as many hands but still a few people interesting well today I want to convince you that that using some of the load testing techniques to set those are a good idea so I'm an engineer at buffer and a lot of what I do is help transition from a monolithic application everything's written in PHP and a few front-end frameworks and we're working towards moving moving towards kubernetes and we've had a fair amount of success with this and at the moment we've got about 75% of all of our production traffic served by kubernetes I'm going to jump into a case study and we started off with a pre-existing endpoint on our monolith and one of our higher throughput endpoints and this particular endpoints responsibility is to serve the number of times the link has been shared within the buffer app and buffer is social media applications so knowing how many times somebody has shared your blog as important information and people actually use this to put a button on their blog to know how many times people have shared that so you can gauge interest on something so we eventually settled on the design using node and dynamodb it just happened to be the right performance and price range for what we were trying to accomplish so we built the service and then we started to roll it to roll it out to kubernetes and we had about four replicas to start off with and we manually verified that things were working with curl so we shifted about 1% of our traffic over to this service things were looking great we had monitoring hooked up with data dog and you'd but you could barely notice that there was any load running on each of the containers same story at 10 percent then we scaled things up to 50 percent of all of our traffic and this is where things started to get a little hairy so the first thing we did was scaled up our replicas up 5 times - so we had 20 pods running and this helped but we still got that oom Kildare that that that's very dreadful so we shifted back our traffic under our mana let them started to investigate what was actually going on there so what had happened is I had copied and pasted a deployment file from another service and this particular service was an old service that didn't have as much traffic and we were still pretty new to this at the time and this in this deployment file contains some resource limits so when we were doing the coop kuddle describe that the pods were reporting oh and killed so let's talk a little bit about what resource limits actually are there's something that can be set on both CPU and memory and when they're not set these things can it can run unbounded and they can actually take up all the resources on the node that they're on and when the limits are exceeded kubernetes is going to go through and it's going to it's going to kill those pods so how do we go about actually setting these things so it's it's helpful to discuss what optimal limits actually look like this means that pods have enough resources to complete whatever tasks they're given whether that's HTTP maybe it's a worker doing work on a queue and also each of the nodes can run the maximum number of pods so there's no waste so it's helpful to talk about ways that things can go wrong and also things can go right when you're allocating resources is the first one is under allocation this one's really obvious this is what happened to us we didn't have enough resources allocated and what kubernetes is going to do is recognize that one of your limits has been crawled crossed and that's going to cook that's going to kill the container the other one is over allocation this is where you've given your your pod or your container enough resources that it's never actually going to be able to use those no but for the load that that you're trying to service and this is a tricky problem to detect because things things can work for a while and you can be in this place where you're running you you're you're handling all of your load and things look great but it becomes a problem when you start to scale up your replicas let's say that you've got one container that is wasting five megabytes of memory you scale that up to a hundred you're wasting or you're wasting five hundred and so on and so forth as you scale up so this the here's a little picture that that illustrates this if I give one workload half the resources on a node that means I can only run two of the replicas on a node now if I give it the appropriate amount of resources that's an extra container that it can be running and if you take away anything from this talk I think this is the one thing is if you set your container resources limits correctly you're going to you're going to save money and you're going to be utilizing the resources that cougar Knight kubernetes provides that's one extra pot in that example so let's talk a little bit about how kubernetes monitoring works under the hood this graph there's a lot going on here but I wanted to show this first because I think it's important I think it shows that there's the centralized thing in the middle called heap stir that's orchestrating or that's collecting information and also providing an API to other parts of the system so each node is effectively working through heap stir and the kubernetes master is making that information available also heap store pushes things off to a storage back-end that's pretty important too so let's dig a little bit into the details here so the first step is see advisor and what that does is monitors all of the containers on a given node and it's going to be getting information about network CPU filesystem and memory utilization and then on top of that the Kubla is going to be getting information from Z advisor and it's going to use the information from C advisor to make decisions on whether things should should be killed and also use it for some monitoring and then on top of that there's heap stir which essentially aggregates all of the information from all the couplets on all the nodes and makes that information available and it pushes it off to a storage back-end that could be in flux TB with your fauna it could also be Google Cloud monitoring is another one and there's a bunch of third-party backends that you could also use so how do we go about setting limits that the goal is really just to understand what one pod can handle here and you start with a really conservative set of limits they start really low and you change one thing at a time and you observe the changes you try to be scientific about it because if you change too many things it's it's kind of hard to understand what actually happened when you change something so there's a couple of testing strategies that I'm going to employ to actually set the limits and the first one is a ramp up test and what we're gonna do is start with one client and then we're gonna scale up the clients until and just watch what happens to the response times for the for this particular test and the idea is that you you eventually want to find a breaking point you're looking for major changes maybe you cross a threshold and then all of a sudden you're getting a bunch of 500s after that we're essentially going to take the this slice the the tallest slice where we're right before we broke and then we're going to operate just under the breaking point for an extended period of time and what we're gonna look for here is it is major changes in response times maybe you get some variance but what you want to see here is pretty consistent load pretty consistent response times and this is also where you make fine-tuned adjustments as well so I'm going to do a demo I'm going to set limits for an Etsy d-pod when I take this off all right it's like it's flickering is that visible I'm going to increase the font size here so I'm watching the pods and I've got one nginx container running right now and if you remember before I said I was going to be setting things for @cd the reason why I have to do this and I'm just going to show oh yes maybe there's too many cables yep that looks better I have to readjust all the settings it's okay oh wait all right what oh I didn't do it yep and I was getting to the good part - okay okay okay let's see if I can wait I'll make that happen um I'll make this one smaller I don't even know it all my screens are all right okay uh [Music] does this really matter that much it's like right on the edge all right look we're gonna roll with this okay all right so I've got an engine X Server and that's exposed or that's proxying requests out to at CD and I'm also exposing a loader i/o token I'm basically using I'm using loader I did IO to run these load tests and I need to expose a token to give them permission to to run to run load tests so that's that's why this looks like I thought it was important to call that out because it look a little funky so I'm gonna create a deployment and while that's creating I'm gonna take a look at what the deployment file actually looks like it's pretty pretty low resources using 50 M which is a roughly one twentieth of a core and four megabytes and it looks like the container came up so I'm gonna exact into it so I can grab a shell and I'm just going to show you that that there's there's there's a there's some data in that CD service don't worry this is expected the the resources are so constrained here that I can't even run a command so it's a pretty good idea that I should probably I should probably give this thing more room to breathe here so I'm going to edit the deployment and give it a little bit more resources and I'm going to increase the memory to 250 we're going to watch that container restart and if you remember before I was talking about the monitoring I'm actually going to tap directly into the sea advisor and I'm going to use that to get the resource limits and that container came up so here's the adviser here it's a little clunky to get to it but once I'm there you can see that I've got my resource limits are showing up here the increased memory shows me CPU and in this case I've only got one core so the total usage and usage per core are going to look the same and then we've also got some some memory to look at and that 50 that I set to I have a lot more room to breathe now if I would have looked before this would have been about a hundred percent as things are looking good I'm gonna start so the I'm gonna start running that ramp up test now so here's a previous test we're gonna go from 0 to about 250 so you'll see that the CPU utilization is starting to creep up here a little bit and the data is not exactly real-time but it's about as close as as you can get as you can see here memory jumped up a little bit but not quite as much as the CPU I'm still waiting for another update here so 0:05 just from doing this before this is this is the bottleneck here this is its its hit its resource limit so what I want to do here now since I can see that this is the bottleneck it's kind of flatlined once it reached that point that's what I know I need to increase now so if I take a look here at about 30 seconds and see at the halfway point I'm at around 450 to 500 milliseconds for response time so if this was in fact the bottleneck I would expect that number to go down so I'm going to edit the deployment again and I'm going to increase this thing to 500m for the CPU and unfortunately this isn't going to come up and I kind of wanted to show this because this is something that can happen when you're setting resource limits if I do a describe on that pod I'm gonna see that it did fail fail to get scheduling because it didn't have enough resources to for the CPU and this particular cluster is really really small so it's kind of expected to do that so I'm going to set that to something a little bit more reasonable so I'm gonna rerun that test and what I'm expecting now is to see better results at something better than about 500 milliseconds that 30 seconds would 500 milliseconds response time at halfway point through the test oh I need to go grab so unfortunately see advisor change or the idea of of things change when the pod goes down so I have to go grab it again I did yeah you wouldn't want to do this in production so we're at about the halfway point and we can already see we're at around 133 milliseconds so at the moment that is in fact our bottleneck and I don't have any more resources on this machine to give it so that this is effectively the most I can do with this setup you can also see here that we've we've hit our resource limit 0.15 so at this point I can take a look at loader i/o is pretty nice because it can show you what what requests per second you were actually doing and it looks like we're doing something we're somewhere between 800 and 900 requests per second so the next test that duration test I would actually probably want to run with something like 800 so I've got another test prepared it's going to do 800 clients over one minute and if you are doing this in production you probably want to do this as long as you can something more than 10 minutes maybe an hour depending on what you have bandwidth for so I'm gonna go ahead and kick that test off and really what you want to see here is once things kind of level out your your response times should should remain at the other test running you know your response time should remain relatively flat and this graph is looking kind of interesting usually it sticks around 100 milliseconds would probably want to dig into to what happened there and why why that why that increased for a certain period of time but could also be that somebody here is running load tests on this right so when you're going through this process it's really important to keep a failed log and what what you want to see or what you want to write down or think the way that things failed things are good things are going to break when you do this especially when you get to the point where you're crossing the threshold of the particular pod and some of the stuff that we've seen memory slowly increasing what we saw there was CPU as peg debt at the resource limit we've also seen five hundreds high response times a little more interesting is when you see large variants and response times we had a cron job that was running that that was causing one of these it was pretty interesting that to catch that and also dropped requests is another common one so when we were going through this process with the link service really what we what we were trying to accomplish here was we needed to understand how things break and for us it things don't feel production ready until you go through a process like this because you don't really understand what's gonna happen when you're pushing things towards the edge and really it's about increasing predictability and this is important because when when things are predictable you can understand how to scale this up you have sort of a unit of scaling and though it's not exactly linear it's also important to do this with with more pods to get closer to the load that you want to match but really it's just about predictability so just looking forward with kubernetes there's so many great tools for ops and things that are done on the cluster wide level but I think developers really want to be able to dig into one things times and get get their hands dirty and to do some debugging and there's some tooling that could exist that just doesn't yet and I think it's just kind of the natural progression of kubernetes some of the talks earlier today have reflected this to to developers developers want to be using kubernetes to so I just want to say thanks everybody and open up for her questions you should do a talk man you should do a talk most of most of the experience that we've got is with PHP and node applications so we haven't really done a whole lot of JVM tuning so I can't really speak to that I'm sorry so the question was have we thought about automating this process and the answer that is yes we have loader Eyal actually exposes some API endpoints so we could actually get to the point where we can automate this maybe working with the the dais team and packaging some of those things might be there might be something cool that you could automate that whole flow but yeah we want to do that yes so the question is how do you deal with noisy neighbors and that's that's a really good question the I guess doing load testing like this you want to be as close to what your production setup looks like and if your neighbors are in fact noisy and production and use you scale up your traffic with real load you're still going to be seeing those things so I think it's okay to be to be load testing within within the bounds of your real traffic because that's that's closer to getting a real story of what's happening rather than the the other way which would be maybe spinning up another cluster and doing that on a on a cluster that's running only that one thing so I guess I like that there's noise II neighbors that that helps you any other questions yes yeah yes you're absolutely right so I think that's a really hard question to answer because that there are so it kind of depends on the load you're trying to handle there's there's some people like the AI stuff maybe your your resource type might need GPUs on it or maybe you need instances with lots of memory because you're you're running Redis on on them or something like that for us more compute tends to be better I mean we use the compute instances but I can't remember exactly which one like X large or something I don't know I don't normally I'm more I'm more on the application side of things yes yeah so the question is do we have resource limits on all of our pods the answer is yes we do what say we do not did the question was do we use limit range and we don't I'm gonna look into it okay any other questions well anyways thanks everybody and [Applause]

Info

Channel: CNCF [Cloud Native Computing Foundation]

Views: 6,735

Rating: 4.8933334 out of 5

Keywords: CloudNativeCon Europe 2017, KubeCon, CloudNativeCon + KubeCon Europe 2017, CloudNativeCon 2017, CloudNativeCon Europe, CloudNativeCon + KubeCon, CloudNativeCon + KubeCon 2017, KubeCon Europe 2017, CloudNativeCon, KubeCon Europe, KubeCon 2017, CloudNativeCon + KubeCon Europe

Id: _l8yIqMpWT0

Channel Id: undefined

Length: 31min 26sec (1886 seconds)

Published: Mon Apr 10 2017