I got a couple of videos where we launch EC2 instances to then run a web server on top of them. In this video here, we're going to dive into Auto Scaling. EC2 instance Auto Scaling and how we can use Load Balancers to distribute incoming traffic across multiple running EC2 instances. Now, Auto Scaling is something we typically might wanna do if we're running serious workloads with AWS EC2, because Auto Scaling allows us to ensure that a single EC2 instance, a single server. Doesn't get overwhelmed by incoming traffic. Of course, for trivial workloads, for basic workloads, basic websites, you might not need Auto Scaling. But once you have serious workloads, big websites that receive lots of traffic, and especially if you have spiky traffic, maybe a website with unpredictable traffic patterns, then Auto Scaling can be something you wanna enable to ensure that more or less EC2 instances, so more or less servers are added dynamically based on the amount of incoming traffic. And that's exactly what we will explore in this video. Now, as always, if you wanna get started with AWS and get a thorough overview of all AWS services, you might wanna check out my complete Cloud Practitioner course, where I also help you pass that entry level certification in case you're interested. Now for this video, I prepared a basic basic website, a basic web application, to which you find a link attached to this video, which in the end is a simple Node.js REST API. So a website you could say to which we can send requests to two specific endpoints, two specific paths, which then trigger different kinds of actions. And here on this get request main route here, I, for example, read a file, and I read that file in a loop to generate a huge amount of work that has to be done on the server. And I'm doing this for demo purposes here to show you how a server, how an EC2 instance could get a lot of work it needs to do, and how it could get overwhelmed by that work and how Auto Scaling can then help with that issue. Now the offer, as a first step, I will launch a new EC2 instance. And for this instance, I use regular Amazon Linux base AMI. And I will choose the t2.nano instance type, which is the weakest instance type I can select here simply because that makes it even easier for me to show you how such an instance could get overwhelmed, because of course the more hardware capabilities it has the more CPU course and memory it has, the harder it is to overwhelm it. And that's, by the way, an important takeaway besides scaling by adding more instances, which is what we'll do in this video. You could also scale vertically, as it's called, instead of horizontally, which is that you add more instances. But you could scale vertically by launching instances that are more capable, that are more powerful when it comes to their hardware profiles. So that's something you could do if you have a workload that needs a lot of power, and it's, of course, something you should consider doing. Here, however, I wanna introduce you to AWS Auto Scaling service and to horizontal scaling, where we add more instances. And therefore I'll pick this weakest instance to have an easier time showing this to you. I don't need a key pair because I won't connect to it. I will select an existing Security Group, which in the end opens ports 80 and 22. Actually, we don't even need 22 here, so I could also create a new one, which only allows http traffic now that I think about it. And under advanced details, I'll paste in some Userdata, so a script that will be executed when that instance starts, which you also find attached to this video. And in this Userdata, I in the end install Git and Node.js, then I grab this demo code I showed you, and I start this web server in the background. And with all that done, I'll launch that instance and it will launch a single instance for now. Now it will take a couple of minutes for this web server to be up and running on that instance, and therefore I'll be back once that is the case. Now that the instance is up and running, we can grab its public IP or it's public DNS, so the automatically assigned domain address that was assigned by AWS. And we could now send a request to this address and get back some data. Or what I'll do, I prepared a simple script here which sends a huge amount of requests to that address. And I'm just doing this to show you how this instance will start struggling with handling all these requests. So therefore here, I will execute this script here, and it'll take a short while to get started or to show something here in the terminal, but it will then send a huge amount of requests to that web server, to that web API that's running on that EC2 instance, so that's running on that virtual server in the AWS cloud. Here it is sending the requests, and all these requests are now hitting this web API running on that instance. And we can see the impact of that if we go to the monitoring tap and we refresh there. And we'll have to wait a couple of minutes, because this is not reflecting a live status of that instance, but instead data is aggregated and therefore we'll have to wait a couple of minutes until we can see some effect here. And I'll be back once we can see that effect So those requests were now sent for quite a while and therefore now I'm back here. And what we can see in the monitoring tab is that here, this instance is under heavy load. It's basically using all its CPU cores. It's definitely doing way too much work. And that's exactly where scaling becomes important because this process with that amount of traffic, which is of course simulated here, but still that is all too much for this instance. Now one solution could be, as mentioned before, to launch a more powerful instance, choose a more capable instance type here, use more hardware. And that would definitely help here. An alternative option, or an additional option if you can't upgrade anymore, or if you don't want to, or if you simply wanna combine both options, is to also scale horizontally, which means we don't just want a single instance up and running. And for that, we can use the Auto Scaling service. Now, to use Auto Scaling, I'll, first of all, terminate this instance. And here on the EC2 service page, I'll go to Launch Templates here on the left, because the first step now is that we create a launch template. A launch template simply is a blueprint for new instances that we wanna start. Not like an AMI that defines the software that should be installed, but instead, a configuration blueprint. It looks like the launch wizard from before basically, if you take a look at it, but it won't launch an instance. Instead, we just create a blueprint here, and you'll see why we need that in just a second. So here, I'll give this template a name like demo1, and I'll check this box here so that we get some information here when we create this template about which field must be filled out to use this template for EC2 Auto Scaling, because that's why I'm creating this template, as you will see later. And then here as a first step, I will choose an AMI here, and I'll use that same AMI as I used before. We don't have to include it here, but I wanna include it here because I know that I will use this AMI. I will choose the t2.nano instance type now again, so this weakest instance type for demonstration purposes. I don't specify any information about the key here, but here for the Security Group, I'll pick that launch-wizard-2 Security Group that opens port 80. Don't need any other configuration here, don't need any storage configuration, but for the advanced details, I will again paste in that Userdata script which downloads the code and starts the web server and then I create that launch template. Now, very important, with that alone, no instance is launched. If you take a look at the instances here, I have no instances up and running. Instead, I now won't launch instances manually, but use this Auto Scaling service, which is another service offered by AWS for doing that. Here on the EC2 service page, you find that on the left side. And as a first step, I will create a Auto Scaling group. And we can give this group a name like as1, and we have to pick a launch template. And here I'll pick the launch template I just created. With that, we're telling AWS how future instances that are launched automatically by that Auto Scaling service should be configured. As a next step, we now have to choose the VPC into which they should be launched and which subnets should be used. Now you can enable all subnets and therefore all availability zones, or just a couple of subnets, whatever you want. You can also override some instance type settings, but here I'll stick to t2.nano, and then we can continue. And now here we already see the option to add a Load Balancer. Now, what's a Load Balancer? Well, with that Auto Scaling service, AWS will launch more or less EC2 instances for us automatically when we got more or less traffic incoming. So if we got a lot of traffic incoming, if our instance struggles to keep up with it, then more instances will be launched by this Auto Scaling service If traffic decreases, those excess instances will be shut down. But if we have more instances up and running, we of course must make sure that that incoming traffic is distributed equally amongst them, otherwise a single instance would still get overwhelmed. And that's what the Load Balancer service, another service offered by AWS, does for us. When we attach a Load Balancer, that service will make sure that incoming traffic is distributed equally amongst the available instances. Now, there are two types of Load Balancers, as I also explain in my course, the Application Load Balancer and the Network Load Balancer. The difference or domain difference between these Load Balancers, is that the Application Load Balancer focuses on http and https and allows us to set up rules where we can send requests to different instances based on the content of an http request or based on query parameters that might be attached to an http request, whereas the Network Load Balancer does not allow us to look deeply into the incoming request and therefore is the perfect Load Balancer for low level network traffic, and not for http or https traffic. You can use both, but here we'll use the Application Load Balancer. Now with that selected, we have to give it a name and we want an internet-facing Load Balancer because here we know that we have a web API. And therefore the Load Balancer should take those requests and forward them to that API, hence the Load Balancer must be internet-facing. We then have to choose the same subnets as we chose for the Auto Scaling group, because of course we wanna distribute traffic amongst instances in those subnets, those subnets which we also use for launching the instances with the Auto Scaling service, and those instances listen on port 80. The web server I'm running on those instances listens on port 80. Then we have a health check here that simply is used internally to find out whether an instance is up or running. If it's not responding properly to the health check, it will be replaced. And you can also enable extra health checks performed by the Load Balancer service. And a health check simply means that the Load Balancer service sends http request to the instance from time to time and checks whether it responds as expected. And if it doesn't, it tells AWS, so to say, that that instance should be replaced. With that, we just have to make sure that we also create a new target group here, that's important, and that's simply the group of instances which will be used for load balancing. And since we add this here, whilst adding Auto Scaling, that will be the same as the Auto Scaling group. So all instances that are launched by the AWS Auto Scaling service will automatically be included in that target group that's used by the Load Balancer, Then we can proceed and now we can choose how we wanna scale our instances. So now we're back in the Auto Scaling configuration. We can set a desired minimum and maximum capacity. And desired simply is to default capacity, which we always wanna have, let's say one instance. Minimum often is the same, but sometimes you could have a higher default capacity, which could temporarily be reduced. That's when minimum becomes important. And maximum, as the name suggests, is the highest number of instances you wanna have up and running when needed. So that's the highest number of instances this service can launch when they are needed. Now regarding the when they are needed part, that's configured here with the scaling policies. And here we wanna set a target tracking scaling policy instead of no policy. With no policy, we basically have no scaling, then just failing instances would be replaced, but with a target tracking scaling policy, we can for example say that if the average CPU utilization of those instances goes above 50% or maybe 40%, or whatever, let's stick to 50, then a new instance should be added. And if it goes below that then instances will be shut down, basically. So you can of course play around with those settings, but with that set up, you can also add notifications so that you get alerted when a scaling event happens. You can add tax as you can for most AWS resources, and then you can confirm that and create the Auto Scaling group. And this will also create the Load Balancer here since I also configured this during the setup process. Now, since I set a desired capacity of one, one instance will be launched automatically. So here we will see that a new instance is being started. But if they are then needed, more than one instance will be added. And to show this to you, I'll again wait for a couple of minutes, then send all my http requests again, and then see whether our Auto Scaling service does its job as it should do it. So after a while, the first instance started, and actually a second instance was started already by the Auto Scaling service, because during the startup process, also thanks to the Userdata script and because I also sent a test request, this CPU utilization already went above 50% for this first instance. And that's why here the Auto Scaling group did launch a second instance. So this Auto Scaling service launched a second instance as it should. So now we got this up and running. How do we now send requests to those instances, though? Well, not by using the automatically assigned IP addresses or DNS names of the individual instances, but instead by using the Load Balancer. That's why we created an internet-facing Load Balancer. Here it is. That Load Balancer also has an automatically assigned DNS name. And that DNS name, so that domain, should be used for as sending the requests now, because those requests will then reach the Load Balancer and the Load Balancer will then forward them to the different instances that are managed by that Load Balancer, or that are included in the target groups of that Load Balancer, to be precise. So it's that DNS name which we can now use to send a request. And of course, it's that DNS name that I can use in my script, therefore, to send multiple requests. And if I do that, we should then also see how those requests are forwarded to different instances. We will be able to tell that that's the case if we take a look at the CPU utilizations of the different instances. They should all be more or less equal. At least we should see that all instances are doing some work. And in addition, of course, because we have Auto Scaling active, we also should see that more instances are launched if they should be needed because CPU utilization is too high. Now also another brief word about the Load Balancer, you can, of course also use that Load Balancer without Auto Scaling. It's of course, a very common combination, but you could also use load balancing across a group of manually launched instances. And you could also use Auto Scaling to launch multiple instances that are not managed by a Load Balancer. Then you would have to ensure that traffic is distributed across those instances though. That's why it is a common combination after all. So here, we now should see that if we go to the instances, that both instances have to do some work, and again, we'll have to wait a couple of minutes to see an effect here. But I can already see that this second instance has a utilization of 40% here, and for the first instance, we'll have to wait a couple of minutes until this updates again. And if it should be needed, then of course the Auto Scaling service also will launch another instance. By the way, if we go to the Auto Scaling group and to monitoring there, we can also see statistics for the entire group and not the CPU utilization, but the number of instances and so on. And then here for EC2, if we go there, then we also see the average utilization, the average CPU utilization, and other instance metrics aggregated across all the EC2 instances that belong to that group. And it's also these numbers that matter for the Auto Scaling service when it comes to determining whether new instances are added or removed. So here we can now wait a couple of minutes to see this effect, to see whether more instances are added. Yes, they are here. A third instance is coming up. And to also see that all instances are doing work if we take a look at their monitoring stats. So that's the AWS Auto Scaling and load balancing services in action. Here, I will now delete that Auto Scaling group. This will also terminate all instances that belong to the group, which is, of course, perfect because I wanna get rid of those as well. And I will also get rid of my Load Balancer. For that, we can select that Load Balancer and also delete that Load Balancer here. This then also deletes the target groups that belong to it, at least it should, otherwise we can delete those manually. And with that, we get rid of all those resources, which could cost us money, especially also, of course, of all those instances, you wanna make sure that you don't have unnecessary instances up and running. And therefore you wanna check whether all those instances are really shut down. They should, of course be, and you should wait a couple of minutes for that. But ultimately if you see that things are not shutting down, you can, of course, always terminate those instances manually as well, just to make sure you're not paying for unnecessary instances.