Understanding CPU & Memory with the Kubernetes Vertical Pod Autoscaler

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
have you ever asked yourself how much cpu and memory does your application need now in kubernetes this is very important the scheduler needs to know how much cpu and memory to allocate to pods in order to schedule them onto nodes otherwise all the pods will compete and have access to all cpu and memory on the node and this can lead to resource contention in traditional systems we deploy processors onto machines the processors on the machine will have access to all the cpu and memory which can lead to performance contention between processors in kubernetes we deploy containers to a machine containers allow us to restrict the resources the container can see and use kubernetes leverages this feature so containers can share machine resources and lower possibilities of resource contention so we know setting resource requests and limits in our pods is important for scheduling purposes but what about auto scaling and kubernetes for auto scaling it's very important to set requests and limits on pods as well but why is that when we set resource request values the scheduler knows how many pods it can fit onto a node when the node becomes full the cluster autoscaler can add a new node to allow scheduling to continue when a pod uses too much of its requested cpu or memory the pod autoscaler can add another pod to give it more resources so now we can see why it's important to set request values for cpu and memory in our pods they are used for both scheduling and auto scaling pipelines but how much cpu and memory do we actually need how do we determine what values to set now traditionally you can use your monitoring tools but in today's video we're going to be taking a look at a tool called the vertical pod auto scaler which is designed to deal with this problem so without further ado let's go [Music] [Music] so if we take a look at my github repo i have a kubernetes folder and in there i have an auto scaling folder in the auto scaling folder i have a readme and in here is the full kubernetes auto scaling guide where we talk about the cluster auto scaling as well as horizontal pod auto scaling so as i mentioned earlier the cluster auto scaler allows us to add more nodes to the cluster when the cluster becomes full the horizontal pod autoscaler allows us to scale our pods up and down based on metrics like cpu and memory so for this video we specifically going to be looking at the vertical pod auto scaling which is under the kubernetes auto scaling vertical pod auto scaling folder and there's a readme and this readme will show you the full vertical pod auto scaling guide with all the steps that i'm about to show you today so be sure to check out the link to the source code down below so you can follow along so the first thing we're going to need is a kubernetes cluster now i'd like to use a tool called kind which allows me to spin up disposable kubernetes clusters locally in docker containers that i can use to test features and delete the cluster when i'm done so getting a kubernetes cluster is really simple we can just say kind create cluster i'm gonna call it vpa and i'm gonna run kubernetes 1.19 so i go ahead and run that and that'll give me a disposable one node kubernetes cluster in a docker container and when that's done i can say cube ctrl get nodes and we can see we have a one node kubernetes cluster now in order for auto scalers to work they are driven by metrics and in order to get metrics we need a component called metrics server now most cloud providers have metric server already pre-installed in the cube system namespace so usually you don't have to do much so if you need to install metric server you can head over to the metric server github repo and just before you do that make sure you scroll down you read the requirements and make sure you check the compatibility matrix as this might change over time so they indicate here that we need metric server 0.3 point x for kubernetes 1.8 and up so to download that i'm going to head over to the releases page and on the releases page we can see this version 0.3.7 and there's the components yaml file this is the cluster manifest you need to deploy metric server now i've already downloaded that yaml file and i've placed it under kubernetes auto scaling components and in here i have a metric server folder and i renamed it to metric server 0.3.7.yaml now for production you can take that file and apply it as is but if you're using a service like kind you're going to want to open the yaml and go down to the metric server deployment and you're going to want to introduce these two fields you're going to want to enable insecure tls and the address type internal ip in order for it to work locally with kind if you're applying my yaml file over here to production make sure you remove these two lines and also remember this is all documented in the vertical pod autoscaler readme under metrics server now to deploy metric server i can change directory to the kubernetes auto scaling folder and then i can say cubectl in the cubesystem namespace i can say apply and apply that yaml file so i go ahead and run that and that'll go ahead and apply the metric server into the cube system namespace i can then test to make sure everything is okay by saying cubectl getpods and we can see the metric server is now running and we need to give it a couple of seconds and then we can run cubectl top nodes and when we do that we can see the metrics come through so we can see cpu pause as well as percentage as well as memory usage and memory percentage of a node we can also get metrics of pods as well so we can say cube ctl in the cube system namespace we can say top pods and we can see cpu usage and memory usage of pods as well so the metric server periodically scrapes the cluster and gives us metrics for nodes and pods in the cluster and these metrics are then used to drive auto scaling pipelines so now that we have a cluster and we have some metrics let's go ahead and deploy some real-life workloads and then we can generate traffic and get some real cpu and memory usage so in my vertical pod auto scaler readme i have a simple example application that we can build and deploy if you want to check out the application code you can head over to the kubernetes auto scaling components folder and in here i have an application folder with a small golang application this golang application exposes a web server over port 80 that we can hit and every time a request comes it runs a for loop to generate some cpu load i also have a very simple docker file that helps us build that application and go and make up a container that we can deploy using this deployment yaml file so this is a simple deployment file that shows one replica and tells us what container we want to run and has basic resource requests and limits now to build this application it's really simple i just open up a new terminal i change directory to the kubernetes auto scaling components application folder and then i say docker build minus t and i tag the image and that'll go ahead and build up a container that we can run as part of our workload once that application has successfully built i can go ahead and say docker push and i can push that image to a container registry this will allow kubernetes to download and run that application and to deploy it i can simply say cubect i'll apply and i can apply the yaml file this will go ahead and deploy a service to expose the application as well as a deployment with a single pod that will become our workload we can say cube ctl get pods and we can see our pod is now up and running and if we say cube ctl top pods we can see it says error metrics not available yet because we have to be patient it takes roughly 30 seconds for the metrics to start populating after that we can run the command again and we can see that cpu and memory is coming through for the pod now let's go ahead and generate some traffic for this workload to generate some cpu usage now to do that in the application folder i also have a traffic generator.yaml which is basically a small alpine container that's going to sleep in a loop and that'll allow us to deploy a small load testing utility to generate some load so to deploy that traffic generator i'm going to say cube ct i'll apply minus f and i'm going to apply the traffic generator yaml file that'll go ahead and deploy the small alpine container that we can ssh to and when that container is up and running i can say cube ctrl exec and go into that guy and because it's alpine i can simply say apk add and i can add the wrk load testing package this is a lightweight load testing utility that we can use to generate some traffic and then i can generate some cpu load to my application saying wrk with five connections five threads over a very long duration and i can call my endpoint over here this will go ahead and generate some cpu load in our cluster for our application and you can see now if we run cubectl top nodes we can see the metrics coming from our node is very high so our cpu is currently at 71 percent and our memory is at 831 and if i run cube ctl top pods we can see our application is now generating one two seven seven milli cores of cpu and 6 megabytes of ram so we can see the cpu is quite busy on this pod so far everything we've done is super simple we've deployed a kubernetes cluster we have some workloads running on the cluster generating cpu receiving traffic and we have metrics now if our application is using a lot of cpu and struggling we may want to scale this application up now as i mentioned before the horizontal pod auto scaler can help us increase and scale more pods automatically based on the current cpu usage the pod autoscaler is also driven by the cpu and memory request values that we've put in our deployment yaml file now if we take a look at our deployment yaml file we can see we have some resource requests and limits now the resources section is the heart of auto scaling and scheduling the limits are very obvious if the pod uses more memory than defined in the limit the oom killer will kill the part and it will restart if the pod uses more cpu defined in the limit the linux kernel will throttle that pod cpu the request values over here is how much resources we need so how much cpu and memory does our application need so this part is tricky if you ask a developer how much cpu and memory does the application need you're probably just gonna get a guess you're either gonna get a value that's too high and end up wasting too much resources or you're gonna get a value that's too low and you're gonna end up running too many pods now the goal is to set this as accurately as possible now currently if we take a look at the resource requests we're currently asking for 15 megs of memory and 500 millicourse of cpu so this is kind of a guesstimate that we start off with but if we run cube ctl top pods we can see that the application is actually using 940 milli cores of cpu that's almost twice the amount we've requested now there's two things we can actually do here we can either deploy a horizontal pod auto scaler which will double the amount of pods automatically by scaling it up because our resource usage is twice the amount or we can go and adjust this resource request value to request more cpu as we know it uses more so we can either go manually and adjust this value to a thousand milli cores but this value would still be a guesstimate but this value would only be based on the current cpu usage and not the value over time now if we take a look at the kubernetes autoscaler github repo and we go to the vertical pod autoscaler the vertical part autoscaler frees the users from setting up up-to-date resource limits and requests for containers and pods you can see when configured it will set these request values automatically based on usage and that will allow proper scheduling onto nodes so the vpa can provide up-to-date recommended cpu and memory request values and it can either recommend these values or it can automatically go and update the values on our pods as well now i would always recommend a step-by-step approach where you set the request values to a very low number when you start out and then you can deploy and run the vpa in recommendation mode to see what it says and you can manually take that value and update your deployments and then monitor that usage over time most people manually update their deployments based on the recommendations values from the vpa if you've gone through this process and you're happy with the values that the recommendation process gives you you can also run the update mode which will tell the vpa to automatically update pods as they're being created with the right recommended values now the updater is a nice to have feature but it's not always needed so to deploy the vertical pod auto scaler to my cluster i'm going to change directory to the kubernetes auto scaling vertical pod auto scaling folder and then what i'm going to do is i'm going to run a small debian container i'm going to say docker run minus it i'm going to mount my home directory that contains my cube config file so i can connect to my kubernetes cluster i'm also going to mount the working directory which is this entire github repo and i'm going to mount that into a folder called work and i'm going to set that as my working directory i'm going to run this container using host networking so that i can access my kind cluster and i'm going to run a small debian container that'll go ahead and run a debian container and give us a little terminal where we can start running the vpa steps and then i'm going to go ahead and say apt get updates and i'm going to install git so that we can get clone the vertical pod auto scaler repo i'm going to download curl so we can install cube ctl and a lightweight text editor and with that done the next step is i'm going to run curl and i'm going to download the latest version of cubectl i'm going to give it execution rights and i'm going to move cubectl to userlocalbin so i'll go ahead and run that and now cubectl is installed so i can say cubectl getnodes and we can access our kind cluster so next up i'm going to go ahead and download the auto scaler so i'm going to just change directory to temp and i'm going to say git clone and i'm going to clone the kubernetes auto scaler git repo that will go ahead and clone it to that temp directory that is the kubernetes auto scaler github repo over here so be sure to check this guy out and also make sure you check out the right tree of this repo and check the compatibility matrix to make sure you deploy the right version of the vpa for the right version of kubernetes so now we can see i have the autoscaler downloaded i can change directory to the autoscaler vertical pod autoscaler directory and if we take a look at the contents of this folder we can see that there is a hack directory and inside the hack directory is the vpa up script and if we take a look at their github instructions the install command is to say vpa up so we have to run this command and this script will basically go ahead and apply the vpa to the cube system namespace so i go ahead and i say dot slash hack slash vpa up dot sh and that'll go ahead and apply all the components and the pods to our cube system namespace and to verify that i can say cubectl in the cubesystem namespace getpods and we can see the autoscaler components are creating so you can see there are three components here the first component here is there the vpa recommender which job is to recommend request values for cpu and memory for pods the updater's job is to go ahead and update those pods and then we have the admission controller which job is to basically update pods as they're being admitted to the api server so now that we have the vpa components running let's go ahead and take a look what a vpa looks like for our application so if we take a look at the kubernetes auto scaling folder we have a vertical pod auto scaling folder and we have a vpa.yaml inside here now the vpa yaml is basically where we can opt in for the vpa features so we can say kind as vertical pod autoscaler we can give it a name and we can point to a specific target so we can point in this case to apps v1 deployment so we're pointing to our application cpu deployment and we give it an update policy where we say off so this is basically we'll just allow the recommender to run and recommend cpu request values for us so to deploy that vpa i can say cubectl apply and i can apply that vpa yaml file so i go ahead and apply that and that will go ahead and create a vpa for our deployment and if we say cube ctl describe vpa we can actually run a describe command on the vpa and it will give us the recommendations in the describe output so if we run that we can actually go through and see a recommendation we can see it gives us lower bound targets uncapped target and upper bounds so you can see it's recommending us to add a cpu value of 1.5 horse and if we take a look at the usage we can see the usage is 1.3 cores so the vertical part autoscaler is giving us recommendations of the values we can go ahead and apply in our yaml file so we can see our pod is using 1 300 millicourse and the recommender is telling us we should set it 500 millicourse so at this point we can go back to our components application deployment yaml file we can head over to the request values and we can manually update this with the recommended value that's one way of dealing with it an alternative way to deal with it is to go to our vpa yaml file and set the update mode to auto that means the vpa updater will come back to this pod and actually update the values automatically it's also important to note that the vpa will not touch pods if you only have one replica because it tries to not disturb the deployment that it's running so in this case i'm just going to go ahead and scale up my replicas to two to add an additional pot to the deployment so after my deployment is scaled i can run cube ctl top pods and we can see now we're using between 700 and 1000 milli cores i can then go ahead and describe our vpa again and you can see that it's recommending us to set the cpu to 1500 milli cores so to test the updater mode what we can do is we can go to update and we can say auto and then what i can say is cubect i'll apply and i can update that vpa and if we give it some time to react we can see that a new pod has been created eight seconds ago and if i say cube ctl describe pod on that new pod we can see that it now has a higher request value so the vpa updater has gone ahead and added that updated recommended value to the pod and if we say cube ctl get deploy and if we describe that deployment we can see that the request value is still the original value that i've set so the vpa does not update the deployment it's important to know that the vpa uses the admission controller to update pods as they're admitted to the kubernetes api now if you don't want to create these vpas yourself there's a pretty neat tool built by the fairwind ops folks called goldilocks it has a fancy ui and it can target individual pods or an entire namespace create vpas automatically and provide you with a dashboard with recommendations you can find this goldilocks tool on github and it's very well documented on how to install it now to install it i'm just going to change directory back to the temp folder and i'm going to say git clone and i'm going to clone the goldilocks repo and inside the goldilocks repo there's a folder called hack and manifest so i'm going to change directory to this folder and then what i'm going to do is i'm going to create a goldilocks namespace saying cubect i'll create namespace goldilocks and then i'm going to apply the goldilocks controller by saying cubectl in the goldilocks namespace apply the controller folder and i'm going to go ahead and do the same thing and apply the dashboard now as i mentioned before goldilocks allows us to target specific pods or an entire namespace so we can enable it on a namespace by labeling a namespace so what i'm going to do is i'm going to say cubectl label and i'm going to label the default namespace with this enabled equals true so that will go ahead and enable goldilocks to look at that specific namespace and create vpa autoscalers for all the deployments running in that namespace and then what i'm also going to do is i'm going to turn the update mode to off so i'm going to label that saying goldilocks fairwinds opt.com vpa update mode off so that'll ensure that the vpas don't go ahead and update the pods and simply provide recommendations so to see goldilocks running we can say cubectl in the goldilocks namespace getpods and we can see the controller and the dashboard is up and running i can then run this port forward command to access the goldilocks dashboard in the browser so if i go ahead and run that port forward command we can head over to the browser and we can see the goldilocks dashboard up and running and all the namespaces that we label should start appearing in here and we can go ahead and view them now it's important to notice that because we've created the vpas manually we have to go and delete the vpas that we've created to allow goldilocks to manage the vpas for us so to do that i'm just going to break out of this port forward i'm going to say cubect i'll get vpa i'm going to grab the name and i'm going to say cubectl delete vpa just going to go ahead and delete the vpn that we've created to allow goldilocks to manage its own vpas and then i'm going to go ahead and port forward to that dashboard again and we can see now goldilocks has created a vpa in the default namespace for our deployment and we can see that it has not populated yet so we have to give it a couple of seconds and refresh the page and finally you can see we get a pretty neat dashboard for all the deployments running in our namespace it'll show you the current cpu and memory limits and request values and it will give you suggested changes for both guaranteed quality of service as well as burstable quality of service so goldilocks might help you in your kubernetes cluster to provide recommendations and give you a pretty good insight of all your request values and limits running in your cluster now remember to check out both my cluster auto scale as well as my horizontal pod auto scaling guide also be sure to check out the link to the source code down below and try these auto scalers for yourself and let me know down below how do you deal with kubernetes auto scaling and remember to like subscribe and hit the bell and check out the link to the community server down below and if you want to support the channel even further be sure to hit the join button and become a member and as always thanks for watching and until next time [Music] peace
Info
Channel: That DevOps Guy
Views: 8,597
Rating: 5 out of 5
Keywords: devops, infrastructure, as, code, azure, aks, kubernetes, k8s, cloud, training, course, cloudnative, az, github, development, deployment, containers, docker, rabbitmq, messagequeues, messagebroker, messge, broker, queues, servicebus, aws, amazon, web, services, google, gcp
Id: jcHQ5SKKTLM
Channel Id: undefined
Length: 22min 11sec (1331 seconds)
Published: Sun Nov 15 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.