Kubernetes pod autoscaling for beginners

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] this is a movie about scaling pods on kubernetes when i started using kubernetes for the first time i got really excited because traditionally every time i had to create new web applications i had to go and provision infrastructure first now when you have many web applications like front end and back end this can be pretty tedious and this can slow development teams down while they have to wait for you to provision infrastructure now with kubernetes i can simply just hand over a yaml file and developers can plug that yaml file into their ci cd pipelines and deploy it to kubernetes with ease this process takes a couple of minutes rather than days and no one has to constantly provision infrastructure anymore the deployment file represents our source of truth of our infrastructure it tells us what container image to deploy what port to expose what health probes to monitor any configs any secrets that may be required resource requests and limits as well as a number of replicas when you start out with kubernetes this is great but sometimes hard coding the number of replicas is not ideal let's say the demand of your application grows over time you might need eight parts during the day to cope with the traffic demand and maybe at night time you only need one pod now to demonstrate this i'm gonna run a simple kubernetes cluster using kind i'm gonna run kubernetes 1.18 and this is going to create a kubernetes cluster inside a docker container because i'm running in docker the resources i've allocated is 6 cpu so i'm going to have a kubernetes machine with 6 cpu cores and to demonstrate this i have a simple golang application that runs a web server and generates a little bit of cpu load using a for loop every time it gets a web request i also have this very simple deployment yaml file that tells kubernetes we want to run one replica of the application this is the container to run exposure on port 80 and we have a few resource requests and limits so to deploy it i simply change directory to where that yaml file is located i say cube ctl apply and off it goes now to simulate that traffic i'm going to deploy a simple traffic generator now if you take a look at the traffic generator yaml file it's a simple alpine container that sleeps in a loop and this is going to allow me to deploy a small load testing utility and generate some traffic load to our application so to start generating traffic i'm going to exec into that pod i'm going to install a lightweight utility for load testing called wrk and then i'm going to run my load tester saying wrk i'm going to run five connections over five threads i'm gonna give it a very long duration for the purpose of this demo i'm also gonna tell it to close every connection just so that it creates a number of unnecessarily high connections and call our example application that's going to go ahead and start up our load tester so now that we have a high number of traffic on the system and we have high cpu i'm expecting our application to be struggling and customers to be having a hard time now how do we know this usually you would be running something like prometheus and have a dashboard with alerts to say high cpu high cpu or you would have customers complain that the site is slow and this is great for us to get alerted but how does kubernetes know what's going on and how can we make kubernetes smarter to react to the situation kubernetes has this component called metric server matrix server runs in the cube system namespace and it monitors key metrics such as cpu and memory of pods and nodes it gathers this information at an interval and it writes it back to the api server this means we can build pipelines around container metrics like auto scalers now kubernetes has this component called metric server that is built and maintained by the kubernetes community you can find it on github they have a readme guide showing you all the use cases requirements and how to deploy it now it's important that you deploy the correct metric server version for the supported kubernetes version that you have so in this demo i'm running kubernetes 1.18 so i head over to the releases page and i downloaded the components yaml from here for metric server 0.3.7 in my docker development youtube series github repo i have a kubernetes folder i've auto scaling folder with a readme and here i list all the steps we're going to be doing in this demo to showcase the horizontal part auto scaler so under this repo i have a components folder and i pasted that components yaml into the metric server folder so this is the metric server that i'm running over here now if you're running a kind cluster you'll probably want to disable tls for the metric server in order to get it to work with kind so if you take a look at the yaml file what i've done is if you go down to the container spec it has an argument i needed to add these two arguments to get this to work with kind if you're running metric server in production you'll want to remove these two lines over here now to deploy it i'm just going to change directory to where that yaml file is located and then i'm going to say cube ctl in the cube system namespace and i'm going to apply that yaml file that's going to go ahead and deploy metric server with all the role bindings and all the components it needs to run to check it out i can say cubect also in the cubesystem namespace i can say get pods and we should see metric server up and running now give it a couple of minutes but you should be able to say cube ctl top nodes and you should see metrics coming through shortly now after giving it a couple of minutes i can say cube ctrl top nodes and i can see here that i have one node currently sitting at 63 cpu with 162 megs of memory used i can also say cube ctl top pods and i can see that my application is using 1499 milli cores and 4 megabytes of memory i can also see my traffic generators using a fair bit of resources as well so now that kubernetes knows exactly how much cpu and memory our application is using and kubernetes also knows how much cpu and memory our node is using and kubernetes also knows how much cpu memory we've allocated to each of our pods this allows us to do some cool things now kubernetes knows we have a six core machine one core equals a thousand milli cores times that by six we have 6 000 milli cores on this node kubernetes knows we expect each pod to use 500 milli cores defined in our yaml file dividing 6000 by 500 the scheduler knows it's able to fit roughly 12 pods onto this machine we can visualize each part almost like a micro virtual machine that has 500 milli cores of cpu now because of the high traffic load we saw earlier our pod is currently sitting at 1499 milli cores technically it's running at 300 cpu now adding more pods would allow the cpu usage to spread and technically come down for each pod the more pods we add the better each pod will start use more or less the same cpu as we've allocated and this is good it's important that the allocated cpu is close to the desired optimal cpu for that workload so now we know we requested 500 millicourse of cpu for our application we can also see running cube ctl top parts that our application is currently using 1463 milli cores so that's almost 300 cpu and we also know that our monitoring systems are throwing areas customers are complaining the sites are slow so now it makes sense for us to scale the system up so what i'm going to do is i'm going to say cube ctl scale and i'm going to scale to two parts to see what happens and if i say cube ctrl get pods we can see we have a second pod now but it might take some time for the metrics to populate so you have to be patient and finally when i say cube ctl top pods we can see that we now run 750 average cpu per pod so as i'm adding more pods the cpu load is spreading among the pods which is really helpful when scaling up so what i'm going to do now is i'm going to scale up even further so that we can get closer to that 500 milli core cpu threshold that we want so i say cube ctl scale and i go up to four replicas so if i say cube ctrl get pods we can now see we have four pods up and running and now once we have four parts up and running we can see the cpu load is much better it's much more evenly spread within each part and it's below the 500 milli core threshold that we've requested now every workload is going to be different so it's important to take a look at your monitoring system take a look at things like latency and other metrics to make sure you find the sweet spot for setting your request value of cpu correctly so in this demo basically my sweet spot for my application is 500 milli cores as you can see i've specified 500 milli cores in my yaml file you want to find the sweet spot for making sure you can utilize as much of the cpu as possible now every workload is going to be very different but the fundamentals are still the same that's why it's important to understand the impacts of setting the right request values for cpu and memory in your deployment yaml now let's take a look at how we can scale up these pots automatically based on these values now the horizontal part auto scale is what's used to scale pods and it's usually supported by by many kubernetes versions to see if it's supported you can say cube ctrl api versions and you should see the auto scaling apis over here you can see i have three auto scaling api support now what we're going to want to do is deploy a horizontal part auto scaler for this deployment and what that allows us to do is we can specify how much cpu we want this pod to use as a sweet spot so basically what i can say is that the horizontal pot auto scaler needs to check and make sure that my application is below 95 percent of that 500 milli cores so to do that i say cube ctl auto scale i target my deployment and i say that i want to be able to scale between a minimum of 1 and a maximum of 10 pods i also want to scale up when the cpu percent is higher or around 95 95 of the 500 millicourse i've requested this means if my pod starts going higher than about 480 millicourse we're going to see a scaling up event so i go ahead and run this this will create a horizontal pod auto scaler and then for some information i can say cube ctl get hba and i can run oy to get more details and what this is going to show me is the percentages of cpu utilization so currently you can see we're using 0 of the 95 for this deployment now as we only have one pod if i start generating load again you will see we will use roughly 1400 milli cores that means we're probably going to be sitting at around 300 percent over the 95 and that will scale up and then the auto scaler will keep scaling until we are close to this 95 percent value so you can see traffic has come in and we're sitting at a thousand six hundred milli cores if i say cube ctl get hpa we can see we're at 320 percent so this would trigger a scale-up event we can also describe the hpa by running the describe command and we can see that the hp has done the calculations it's currently sitting at 321 percent over the 95 and it's it's determined that it would need four pods to scale up and cater for this demand so if we say cube ctrl get pods we can see the autoscaler has summoned four pods in total and if we do cube ctl top parts we can see cpu is spread among these pods you can see we are still quite hot on cpu we're using 140 percent out of it in 95. so the desired replicas have actually gone up from four to six now if we say cube ct i'll get pods we can see we now have six parts created and if we run cube ctrl top parts now we can see we're now sitting roughly around the 500 mili core sweet spot we can also see we're very close to the 95 percentage now it's important to know that cpu also can be very spiky so it's not like cpu usage is always very constant that's why you'll find that the pod autoscaler might not always hit the 95 percent value on the spot it might be slightly above or slightly below and that's probably okay that's because the horizontal part order scaler has to take a while to scrape some metrics from the metric server to make sure it doesn't just scale up aggressively when cpu spikes happen you can see now after it's waited some time it's actually come down from 105 and it's now 74 so it's now within the perfect threshold of six replicas and it's sitting below that 500 millicourse sweet spot another important piece of information to know is that from kubernetes 1.18 they've introduced the v2 beta horizontal pod auto scaling api which allows you to supply scaling policies so this allows you to override the default behavior and you can actually set scale down and scale up policies for different timeouts and periods and values that you want to use so it's important to know with kubernetes 1.18 and up you can actually override the default values of the horizontal pod auto scaler to cater for your needs so without really good monitoring and insights it can be quite difficult to find the right request value for cpu and memory there's another piece of software called the vertical pod auto scaler that you can run in recommendation mode it'll basically sit and monitor the actual cpu and memory usage over time and provide you with recommendations of what you can put in your yaml file so i hope this video was helpful and helped you guys build a foundation of how resource utilization works in kubernetes and why it's important to allocate the right cpu and memory values and also how it's important for auto scaling pipelines so be sure to like and subscribe and check out the community page down below in the description box and if you wanted to support the channel further be sure to become a member and as always thanks for watching and until next time [Music] peace you

Info

Channel: That DevOps Guy

Views: 11,305

Rating: undefined out of 5

Keywords: devops, infrastructure, as, code, azure, aks, kubernetes, k8s, cloud, training, course, cloudnative, az, github, development, deployment, containers, docker, rabbitmq, messagequeues, messagebroker, aws, amazon, web, services, google, gcp, autoscaling, hpa, pod, autoscale, auto, scale

Id: FfDI08sgrYY

Channel Id: undefined

Length: 13min 22sec (802 seconds)

Published: Wed Sep 02 2020