EKS Cluster Auto Scaling (Kubernetes Autoscaler | EKS Cluster Autoscaler | EKS Autoscale Nodes)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in this video we're going to talk about how to auto scale eks instance groups using kubernetes cluster after scalar we will use both managed and unmanaged eks node groups however adws recommends using managed instance groups since they come with the automatic after scale group discovery and graceful node termination all right let's get started first of all we need to create the ecs cluster i'm gonna be using ectl but you can use terraform or you can even use the adwords console to create the cluster all right so to use the castile we need to define the config first i'm going to give it a name anton putra that's going to be the class name and this is very important for us we need to define the version for our cast cluster it's going to be 120 and in later we have to match with um cluster after scalar then we're gonna use the managed node group and also we're gonna use unmanaged node group and after scalar can work with both of them but aws recommends that you would use the managed node group because um it can actually before it remove the node from the pool it can gracefully drain the node and reschedule those ports on different nodes in case of the unmanaged node group it will simply terminate the node and that's it all right let's go ahead and create the cluster in order to do that we're going to be using the ekstl create command cluster and then provide the config f and usually takes about 15-20 minutes to create the cluster all right let's wait [Music] all right cluster is ready so to verify connection we can type cubestyle get svc and check if we can connect to the kubernetes all right and there are a few requirements before you can use and deploy the afterscaler for example your instance groups when you create the kubernetes cluster and instance groups they are created as the after scaling groups in adwords so those after scaling groups have to have certain labels and let me show you let's go to the first of all eks and this is our cluster the name end on putra for the cluster and then the version of the cluster and if you go to the ec2 after scaling groups you're going to find two of the scaling groups the first one is for managed nodes and the second one for unmanaged nodes and for example if we click on it and scroll all the way down there are two labels that have to present before you can use after scalar this is the first one cluster other scalar enabled equal to true and the second and the second label is the cluster after scalar and this is the eks cluster name and the value is on those two labels have to be present otherwise after skiller will be able to discover those after scaling groups and adjust the number of nodes also you have to keep in mind that you need to create the [Music] eks cluster with instance groups and that instance group have to have a different minimum and maximum capacity for example the after scalar will only adjust this particular number desired capacity based on the requirements of your workload but for example if you have minimum capacity equal to 1 and maximum capacity equal to 1 after scalar won't be able to do anything because those too much so you have to have some room for the after scaler to adjust number of nodes so just for you to keep in mind all right so the second requirement is to create the open id connect and to do that let's go back to the ecas cluster and we need to copy the open id connect url go to the configuration tab then details and this is open id connect provider url let's copy this and then navigate to im and go to the identity providers and click add provider then it's going to be the open id connect let's paste our url and get thumbprint and for the audience let's paste the default value sts amazonews.com and let's create this provider all right provider is created now let's go ahead and create the policy that we're gonna use for after scalar let's go to the policies create policy json and paste the policy and you can find this in my github repository and the link will be in description alright so this is the policy that will allow our after scalar to adjust of number of nodes to set the appropriate capacity let's click next then again next and let's give it a name amazonics cluster after scalar policy and create policy all right and the second one is to create the role let's go to the roles click create role it's going to be web identity and let's select the provider that we just created open id connect provider and audience for now just select the default one click next permissions and then let's attach our policy that we just created the afterscaler policy next next and let's give it a name [Music] amazonic has clustered after scalar role and creator and now we need to adjust couple of things let's search for our role and click on it again and update trust relationship click on edit trust relationship and here we can update this parameter first of all this part and then we're going to provide the service account that going to be used by afterscaler so instead of this volume we're gonna say serious account this is the namespace and this is the name of the series account and i'll show you later where we're gonna update this value well we're gonna use the rn of this row in our kubernetes object for now let's update the trust and that's it now let's go ahead and deploy the after scalar let's create the folder kits and let's create the first file for the after skiller okay the first we're going to create the series account and here is very important that you're going to add this annotation and replace this with the iran of the role and to find it it's going to be here this is the iron of our role let's copy this and let's let's replace it then we're gonna create couple of the roles those roles will be the same so you don't need to update them so it's going to be the cluster role then it's going to be the role and some role bindings and the last item here in this file will be the deployment itself all right and for example it's going to be one single replica then this part is important annotation safe to evict false and then here is important that you're gonna match with the version of your cast cluster so it's gonna be 20 zero so the the ecas cluster that we have here let me open up clusters so our cluster is 120 and you need to match the major version and the minor version and you can use the latest patch version and you can find it here under kubernetes after scalar under the tags and the latest patch version for this particular version is zero so i'm going to be using this uh 120 version for this afterscaler then there are a few parameters that you can update and you can follow this link and find more flux that you can use to update the behavior of this afterscaler for example cloud provider obviously it's going to be adobe s some other parameters skip nodes with local storage falls expander and then balance similar node groups etc these parameters is very important this is the mechanism how this after scalar will discover that scaling groups and it will discover by those two groups by those two labels this is uh the first label has to be present on the after scaling group as i showed before so if you go to ec2 and we get after scaling groups so this is the managed one and that one has to be present here right so this is the first one and this is the second one and you need to replace cluster name with your cluster name because cluster name and let's go and find again i guess clusters and let me copy the cluster name for the recast and replace it here and that's pretty much it now let's go ahead and deploy it and to do that let's copy the relative path go back cubestl apply dash f and then to the folder and now it's going to create all those resources all right and you can check the logs by typing cubestl logs.f dash l will provide will allow you to use the label of that particular port and it's the after scalar is deployed in cube system namespace and f stands for follow and you can find a lot of useful logs and in if you want to debug it's very useful to read all of the output all right the next step is to let me open this tab and type watch cubestyle get pots basically watch will just repeat this command every two seconds so right now we don't have any ports in default namespace and another command is get notes so this one will just repeat the same notes every two seconds and we have one instance peer instance group so one for manage group and one for unmanaged group right now we have two and now let's test our after scaling um ability of our after scalar all right so let's create a couple of examples it's going to be very simple nginx deployment and we're going to have two replicas in defaulting space and we're going to target the managed instance group here and in order to specify the instance group we're gonna use affinity and then node affinity and say required during scheduling and here we're gonna specify the role this is the key and this is the value for this label basically our notes have the these labels and also important part here so we're gonna define the port anti-affinity is uh we'll make sure that the ports from the same deployment will not be able to schedule on the same node and this in this way we're gonna test our after scaling policy so one port will be in painting state and after scalar has to expand the cluster all right and the second one gonna be almost identical the second deployment but in that case we just gonna target the unmanaged instance group all right so here just unmanaged and we're gonna use the node affinity and select unmanaged nodes all right and let's go ahead and deploy it so right now we have two only nodes and zero ports in default namespace now let's go ahead and try to deploy it could still apply and it will create those pots all right now we have four and well as expected now we have one port per instance so and two ports in pending state because we don't have enough nodes um and then after scalar if you you can find all this information in the after scalar output so it will try to expand the cluster and increase number of nodes in order to fit those ports all right so it's going to take some time maybe 5-10 minutes so let's just wait till it will increase the number of notes all right so it's quite quickly so this is the managed group and it was able to create the node all right now it's in ready state and it will take maybe a few seconds and this port will be rescheduled this particular one all right so now it's able to schedule it but we still have a problem with the second one this engine x unmanaged deployment targets actually unmanaged instance groups and we have a problem with that because we don't have the appropriate labels and the after scalar will not be able to expand the cluster and in order to fix that we can go back to the instance groups so this is after scaling groups the first one is for managed and this managed instance group have the appropriate labels this first label and this is the second label and we need to have exactly the same labels on the unmanaged group let me open it up and this is the second one if you scroll down you're not gonna find those labels so in order to fix that we can click edit scroll down and maybe we can just add the tag go back here and we can just grab it from even here so this is the first label that has to be present the value is true and the second one is this one and the value is owned right so let me see the value is old and let me click update and let's go back here and well in a few seconds maybe 30 seconds the afterscaler will be able to discover that after scaling group in nws and we'll set the desired replica to 2. let's wait a little bit and we still have one port in pending state so it's going to fix that all right so it was able to discover that after scaling group and now it's creating the node for unmanaged instance group for the ecas still not ready and when it's going to be ready it will be able to schedule this pop all right now let's see how long it's going to take all right so autoscaler was able to set the appropriate desired state desired capacity for each instance groups let's refresh and you can see that here the mean and max 10 and desired capacity equal to 2 for both of them all right so after scalar can on not only increase the number of instances about also it can decrease the number and to test it we can just remove those let me in the copy the first one unmanaged and remove the deployment keep still delete let's do the first one and the second one all right so now it's going to delete all the ports all four pots and now if you go to the logs you're gonna find the information that the after scalar will identify few nodes that can be removed and it will take approximately 5 minutes to identify those nodes and then additional 10 minutes to remove them so right now it's 5 19 and let's see maybe in round 15 16 minutes it will remove those two notes right let's wait all right so the after scaler was able to scale down our cluster and now we have two uh instances one per instance group all right thank you for watching please don't forget to like and subscribe and i'll see you in the next one

Info

Channel: Anton Putra

Views: 1,488

Rating: 5 out of 5

Keywords: eks cluster auto scaling, eks automatic scaling, eks auto scaling, eks cluster autoscaler, eks autoscale nodes, kubernetes cluster autoscaler, eks cluster setup, eks cluster, eks cluster using terraform, eks cluster aws, eks cluster setup in aws, kubernetes autoscaler, kubernetes automatic scaling, kubernetes autoscaling memory, kubernetes autoscaling memory example, devops, sre, anton putra, kubernetes, aws, aws eks, eks, eks autoscaler, eks kubernetes, kubernetes aws, eks aws

Id: gwmdboC-BtE

Channel Id: undefined

Length: 20min 34sec (1234 seconds)

Published: Thu Jun 10 2021