Terragrunt Tutorial: Create VPC, EKS from Scratch!

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

In this tutorial, I'll guide you step by step on how to create a VPC using plain Terraform code. Next, we'll refactor it into a well-structured Terraform module by following best practices. Finally, we'll make use of Terragrunt's top features to set up your infrastructure. We'll not only build a VPC module, but also create EKS and Kubernetes add-ons modules from scratch. With the Kubernetes add-ons module, you can easily enable managed add-ons like the CSI storage driver, or self-managed add-ons such as the cluster autoscaler, load balancer controller, ArgoCD, and others deployed as Helm charts. In this tutorial, you'll learn many valuable techniques and best practices for creating Terraform modules, even if you don't end up using Terragrunt in your projects. Towards the end, I'll demonstrate a production-ready setup that includes using an S3 bucket and DynamoDB table to lock the state, as well as creating an IAM role that can be assumed by users or automation tools like Jenkins. Additionally, we'll set up separate Git repositories for the live infrastructure and Terraform modules, to help maintain a well-organized and efficient workflow. Terragrunt offers many useful features that can enhance your workflows. One such feature is the ability to execute and apply Terraform on multiple modules simultaneously, while also sharing output variables from one module as input to others. For instance, we'll create a VPC module and use the private subnet IDs as input for the EKS modules. Terragrunt can also generate backend configurations and provider configs, saving you from copy-pasting and keeping your code DRY. Many people, including myself, find Terragrunt extremely helpful and use it in production environments all the time. Let’s get started. In this tutorial, we'll talk about various ways to organize your Terraform code. One method is to make an "environments" folder containing separate folders for each environment, like "dev" for development. This environment is for testing new infrastructure and app features, and it's okay if things break. Next, you might have a "staging" environment where you test your apps before going live. Sometimes called "pre-prod," this more stable environment should resemble production but might be smaller in terms of resources and cost. And so forth. If we want to use Terragrunt recommended approach, we'll refer to these environments as "live environments" or "infrastructure live." This naming sets this folder or Git repository apart from other Terraform code and modules. The "live" label means you can treat this repo as a reliable source, and whatever is declared in those folders should be up and running. For your reference, I'll label this first approach as "v1" when we set up our infrastructure using the basic Terraform code. Alright, let's begin by creating a VPC under the "dev" folder. We'll use this VPC later to deploy an EKS cluster. First, we need to declare the provider. There are several ways to authenticate with AWS. For this initial example, I'll stick to the default method, which uses either environment variables or your default AWS profile. Later, we'll assume roles, and I'll explain the difference it makes when creating an EKS cluster. Next, let's set some Terraform constraints. You should use version 1 or higher. When Terraform provision infrastructure, it needs a way to track what was created. This is called the Terraform state, and we use backend configuration to decide where to store it. For testing purposes, we'll use the local backend, which is the default option, but we can adjust a few parameters. Later, we'll switch to using an S3 bucket and a DynamoDB table to lock our state. This approach should be used most of the time, especially when working in a team, instead of relying on local state. Here, we can set a path for storing the state file. Often, this path will be in an object store like an S3 or Google Cloud Storage (GS) bucket. Additionally, we can set version constraints for the AWS provider itself. That's all for the provider configuration. Next, let's create a VPC. First, we need to assign it a CIDR block. At this point, you need to make an important decision. If you plan to peer multiple VPCs in the future, you should come up with unique CIDR ranges beforehand. Many third-party add-ons for Kubernetes, like the EFS storage driver, require DNS support. It's a good idea to enable it from the start, as it can save you a lot of time when troubleshooting issues later on. Lastly, let's add the Name tag. You'll notice that we use environment prefixes, which is a common practice even if you have separate accounts for different environments. Next, we'll create an internet gateway to provide internet access for public subnets. We need to attach it to the VPC and add a Name tag. Now, let's create subnets. We'll make two public and two private subnets. The CIDR block for each subnet should be a subset of the VPC's CIDR block. Then, choose the availability zone. For EKS, you need at least two different availability zones. Subnet tags are crucial, especially for EKS. First, add a Name tag. Then, include a tag indicating that EKS can use the subnet to create private load balancers. Add another tag to associate this subnet with EKS, with a value of either "owned" or "shared." You can create an EKS cluster without these tags, but some components might not work as expected. For example, the Cluster Autoscaler or Karpenter will use these tags to auto-discover subnets for creating additional Kubernetes workers. Next, create another private subnet in a different availability zone. Remember, the cluster tag must match your EKS cluster name. For the dev environment, we'll create the "dev-demo" cluster. Now, let's create the public subnets. You'll want to enable assigning public IP addresses when virtual machines launch. Additionally, tag these subnets to allow EKS to create public load balancers. Public load balancers get public IP addresses and are used to expose your service to the internet. For example, an Nginx ingress controller can create a public load balancer. In contrast, private load balancers only get private IP addresses, allowing you to expose your service within your VPC only. Finally, create the last public subnet in a different availability zone. Next, we need to create a NAT gateway to provide internet access to private subnets. I recommend manually allocating a static public IP address, as you might need to whitelist it with your clients in the future. It's better to allocate multiple public IPs in case you need to perform blue-green deployments. For the NAT gateway, we must explicitly depend on the internet gateway. Additionally, the NAT gateway must be placed in one of the public subnets with an internet gateway. Finally, we need to create routing tables. For now, both public and private subnets have the same default route, which is limited to your VPC only. First, we'll create a private routing table and use the NAT gateway as the default route. A route that uses all IP addresses is called the default route. Then, the second public routing table will have the default route set to the internet gateway. Next, we need to associate all four subnets (two private and two public) with the corresponding routing tables. In the next example, we'll dynamically generate subnets and associate them with routes. That's pretty much it. Optionally, you can expose the VPC ID. This can be useful if you use it as input for another Terraform code or module. That's all for the VPC setup. The current folder structure consists of a "live" folder, followed by "environments" and component-specific Terraform code, such as the VPC. Now, let's switch to the VPC folder and initialize Terraform. This will download all required Terraform providers and initialize the Terraform state. Then, run "apply" and enter "yes" to create your VPC. It may take about 2 or 3 minutes to complete. Once Terraform finishes, it should return the VPC ID. Now, you can check the AWS console to confirm that we have a newly created VPC with the proper Name tag and all four subnets with the "dev" prefix. This is a typical example of how to use Terraform to create an AWS VPC. It's very straightforward: define in the code what you want to create and apply Terraform. The challenge is reproducing this at scale and keeping the Terraform code DRY (Don't Repeat Yourself). Additionally, based on your backend configuration, Terraform will create a state file under the "dev/vpc" folder. If we want to reproduce the same setup in another environment, such as staging, using just plain Terraform code, we'll need to copy the VPC folder to the new environment and replace all the environment-specific references. For some, we can use variables, but for others, such as the backend configuration, we cannot. First, let's remove the state folders and file. In the backend block, variables are not supported, and you'll have to manually replace the path for each new environment. For example, replace "dev" with "staging." If you use a separate bucket for each environment, you'll need to replace the bucket name instead. Then, we need to find all the references to the specific environment and replace them with "staging." You can definitely use variables here. Under subnets, we have a lot of references to "dev" that we need to replace. I'll use a Visual Studio shortcut to replace all occurrences of "dev." Next, update the NAT gateway and routes. It's possible that we may miss something, so let's search for any remaining "dev" references. Also, don't forget to update the internet gateway. So far, our directory structure looks like this: That's it! Let's go ahead and initialize Terraform in the staging VPC folder. You may encounter an error since we copied the Terraform folder. To fix it, just run "reconfigure," but be very careful and first ensure that you're in the right place. Then, apply the Terraform. Alright! Now, we have one VPC for the dev environment and another for staging. The same applies to the subnets; we have four for dev and another four for staging. Some companies may choose to dedicate a separate AWS account for the production environment with very limited access. Additionally, we now have another Terraform state file for the staging environment under the "staging" folder. In the following example, we'll improve it. But before we proceed, make sure to destroy the staging and dev VPCs. Once completed, you should not have any VPCs in that account besides the default VPC. By the way, you should not use the default VPC in any situation. It's only there for demonstration purposes, so feel free to delete it. Sometimes you may get an error that the default VPC does not exist, but it's a good sign that you forgot to update something. In the next part of this tutorial, we will transform our terraform code into a module. This will greatly reduce the amount of code we need to duplicate when working with different environments. By doing this, we are taking the first step towards making our code more efficient and less repetitive. At the moment, we will keep everything in the same repository and create a folder specifically for infrastructure modules. In the future, we will explore how to structure terraform modules across different git repositories. Now, let's copy the entire VPC folder and place it under the modules folder. We can remove the dev folder, which contains the Terraform state and lock files. Next, we'll begin refactoring. We don't need to declare the provider within the module. Instead, we can just set version constraints for Terraform and the provider. Let's remove the provider and backend configurations. The provider and backend will be managed in the infrastructure-live folder. Next, create a variables file. We'll modify our code by moving some parts into variables. One commonly used variable is the environment variable. This helps differentiate between different environments and is often used as a prefix for your infrastructure components. It's a good idea to add a description and specify a type, like "string," for this variable. The next variable is a CIDR block. In this case, we can set a default value for the variable and only override it when necessary. Now, let's begin converting this code into a module, following best practices. If there's no more descriptive and general name available, or if the resource module creates only one resource of this type, the resource name should be called "this." Next, replace the hardcoded CIDR block for the VPC with a variable. Whenever you create a module, replace all possible configurations with variables and provide default values if you don't want to change them. This approach will be helpful in the future if you receive new requirements and need to update a parameter. It makes your modules flexible and future-proof. For instance, when working with DNS, you should also create variables and set the defaults to true. Now, replace the "dev" prefix with the environment variable. You can also replace the entire name with a variable, not just the prefix. Next, apply the same process to the internet gateway. Replace the resource name with the "this" keyword. Update the vpc_id to reference "this" as well. Also, update the name tag following the same approach used for the VPC resource. For subnets, let's remove the existing code and replace it with logic that dynamically creates subnets on demand. You can find similar logic in the official AWS VPC module. But first, let's add a few new variables. The first variable is for availability zones. We'll pass a list containing the zones we want to use. Next, create a variable for CIDR ranges for the private subnets. Similarly, create another variable for CIDR ranges for the public subnets. Next, create two more variables to pass additional tags for private and public subnets. This is particularly useful when using a VPC for EKS clusters, as previously mentioned. Now that we have variables for subnets, we can create Terraform code for them. Instead of using a "for each" loop, we'll use the "count" variable and create as many private subnets as needed, based on the input provided to the module. We'll then use the VPC reference. For the CIDR block, we'll use the count index. We'll apply the same logic for the availability zone. For the tags, we'll use the built-in merge function to combine the provided tags with a name tag. Essentially, we'll use the same logic to create public subnets for the module. Now, let's update the NAT Terraform code to use the "this" keyword and modify the Name tags to incorporate the environment variable. For the subnet, let's use the first generated public subnet. The same applies to the routes. We need to refactor the code and use the "this" keyword. We should also update the logic to associate these routing tables with the generated subnets. Instead of hardcoding each route, let's use a count variable and the index of the subnets. First, let's associate a private routing table with all private subnets, and then do the same for public subnets. Now, let's add a few output variables, such as lists of private and public subnets. These output variables can be used later when passing information to the EKS module. We can use the star (*) to return all created subnets. These shortcuts are simple but very powerful. We have now completed the module. Next, let's create another live environment where we can call these Terraform modules. We'll use the same structure: a 'dev' folder for the development environment and another folder for the staging environment. Now, create another VPC folder to invoke the VPC module. One crucial lesson learned from writing hundreds of thousands of lines of infrastructure code is that large modules should be considered harmful. In other words, it's not a good idea to define all your environments (dev, stage, prod, etc.) or even a significant amount of infrastructure (servers, databases, load balancers, DNS, etc.) in a single Terraform module. Large modules are slow, insecure, hard to update, challenging to code review, and difficult to test. Alright, create a main.tf file to call the VPC module. First, as with plain Terraform code, we need to declare the AWS provider and set up the backend. In this example, we'll continue to use the local state. Next, declare the VPC module. In this case, 'VPC' is an arbitrary variable. Then, specify the source. A Terraform module is simply a folder containing a bunch of Terraform code. You can reference it using a relative path or, later on, use a dedicated git repository. Declare the environment. Since it's under the 'dev' folder, it should be the development environment. Later, I'll show you how to dynamically obtain this information from the folder structure. Next, provide the availability zones, private subnet CIDR ranges, and public ranges. Finally, let's pass the same subnet tags for both private and public subnets. As you can see, the code is now much more concise. We've reduced our Terraform code to a single file. Optionally, we can use output variables if we want to display them in the console; otherwise, you can still use them, but they won't be printed to the console. To reference an output variable, first use the 'module' keyword, followed by the module name and the module output variable. So far, we have the VPC Terraform module and a main.tf file under the 'dev' environment to invoke it. Now, let's switch to the environment and initialize Terraform. After that, run 'terraform apply' to create the VPC using this module. In the terminal, you can see all output variables, such as subnets and the VPC ID. Now, instead of copying the entire Terraform folder with all its files, we'll simply create another main.tf file under the staging environment. Let me copy the content of the main file from the 'dev' environment and replace all references with the 'staging' keyword. Also, let's copy the output file. Switch to the 'staging' VPC folder, initialize Terraform, and then apply the changes. Now, we have the same setup as in the first example: a 'dev' and 'staging' VPC, along with 8 subnets. Before moving on to the next example, let's destroy both the 'dev' and 'staging' VPCs. Alright, we have successfully deleted both VPCs. In this section, we'll improve our current setup by using Terragrunt. Terragrunt is a simple tool that offers additional features for making your configurations more efficient, working with many Terraform modules, and handling remote state. Let's make another live environment to use Terragrunt. To set it up, we need to create a terragrunt.hcl file. If you're using the same S3 bucket and configuring a different path to store your state, you can place this file above your environment folders. You’ll see more examples later. In this tutorial, we'll cover many features of Terragrunt. While most of them are simple shortcuts, when used together, they can greatly enhance your workflow. First, let's reorganize the Terraform backend configuration. Usually, you'll use remote state, but for this initial example, we'll stick with local state. Keep in mind that the backend configuration doesn't support variables or expressions, so you'll need to copy and paste it, updating the parameters as needed. For instance, even when using local state, you must update the path to the state file: for the development environment, it's "dev/vpc/state," while for staging, it's "staging/vpc/state." If you use different buckets, you'll also need to update the bucket name for each environment. Terragrunt helps you maintain efficient backend configurations by letting you define them just once in a root location and then inheriting that configuration in all child modules. The "path_relative_to_include" will be translated to "dev" for the development environment and "staging" for the staging environment. This way, you won't have to repeat yourself in the configuration. With Terragrunt, you can now create your backend configuration just once in the root terragrunt.hcl file, and it will be used across all environments and modules. This simplifies your setup and reduces repetition. Managing provider configurations across all your modules can be challenging, particularly when customizing authentication credentials. If you need to update your provider, you must do so in each environment separately, which can be time-consuming and repetitive. Let's say you want Terraform to assume an IAM role before connecting to AWS; you need to add a provider block with the "assume_role" configuration. You would then copy and paste this configuration into every one of your Terraform modules. While it's not a significant amount of code, it can be difficult to maintain. For instance, if you need to modify the configuration to expose another parameter (e.g., "session_name"), you would have to go through each of your modules and make the change, which can be cumbersome. In addition, what if you wanted to directly deploy a general purpose module, such as that from the Terraform module registry? These modules typically do not expose provider configurations as it is tedious to expose every single provider configuration parameter imaginable through the module interface. Terragrunt allows you to refactor common Terraform code to keep your Terraform modules DRY. I’ll show you more examples later. For this basic example, we'll use the AWS provider with the default authentication method and the "us-east-1" region. That's all the setup required to begin using Terragrunt. With this basic configuration, you can start taking advantage of its features to improve your Terraform workflows. Now, let's create a standard folder structure for our environments, including "dev" and "staging" folders as usual. We'll also create a "vpc" folder to call the VPC module. To use Terragrunt, we need to declare a single file. Start by defining the source for the module, which can be a local path, a Git repository, or the Terraform registry, just like a regular module source attribute. Here's where things differ: we'll include the root terragrunt file that we defined earlier. This will generate backend configuration and set up the provider for us. Then, under the "inputs" section, you'll provide the same Terraform variables that we used in the previous example, such as environment, availability zones, etc. This part is identical to a regular module, except that you need to use the "input" block to supply these variables. To start using Terragrunt, you first need to install it. You can download it from the source, but a preferred method is to use a package manager. For instance, on a Mac, you would use Homebrew to install Terragrunt. Next, navigate to the "vpc" folder within the development environment. Instead of running "terraform init," you just need to run "terragrunt init" (you may want to create an alias for this command). Then, execute "terragrunt apply." Once Terragrunt completes the deployment, you'll have the same VPC and subnets as before. Now, let's examine the backend configuration. Terragrunt generates backend and profile configurations in its own working directory, which you can find under the "terragrunt-cache" folder. You'll notice that we still have the "dev" key for the state. It's important not to use local state with Terragrunt; later, we'll convert it to S3. With the current setup, it's challenging to share your state with other team members since the entire folder is ignored by Git. Now, let's create a similar VPC in the staging environment. Copy the Terragrunt file from the "dev" environment and replace all references with "staging." That's pretty much it. Navigate to the "vpc" folder under the staging environment, initialize Terragrunt, and then apply the changes. Because we used the same backend configuration, Terragrunt created a different path based on the location, which starts with the "staging" key. This approach will be very helpful when working with remote backend configurations. As a result, we have the same VPC for both the development and staging environments, along with their respective subnets. While it may not seem like a significant change, it can save you a lot of time and help maintain DRY configurations when you have large number of environments. Another useful Terragrunt feature is the ability to run commands in multiple folders simultaneously. This will be incredibly valuable later when we define dependencies between modules. For now, instead of changing directories and running "destroy" to clean up, we can simply run "terragrunt run-all destroy" from the root folder. It will show you where it's going to execute those commands and ask for your confirmation. By running just one command, we've successfully destroyed both VPCs in the staging and development environments. In the following part of this tutorial, we will make an EKS Terraform module and add some extra features like cluster autoscaling. Let's move forward by making a new folder called 'eks' inside the 'infrastructure modules' folder. We'll start by using the same version constraints for the AWS Terraform provider. Before we can set up the EKS control plane, we need to create an IAM role with EKS principal. After that, we must attach the AmazonEKSClusterPolicy, which allows EKS to create EC2 instances and load balancers. For the cluster, we will use an environment variable and pass the EKS name variable. Like in the previous module, it's best practice to parameterize all possible options and set defaults, rather than hardcoding them in the module or relying on provider defaults. Then, attach the IAM role to the cluster. We need to parameterize these values too. For now, I'll turn off the private endpoint since I don't have a VPN in this cluster, and enable the public endpoint. This way, I can access the EKS from my laptop and deploy applications. Up next, we have to supply subnets for EKS, which should be located in at least two different availability zones. Amazon EKS sets up cross-account elastic network interfaces in these subnets to enable communication between your worker nodes and the Kubernetes control plane. We'll pass this variable dynamically from the VPC module using the Terragrunt dependency feature. That's about everything we need for the control plane. Now, let's create an IAM policy and IAM role for the Kubernetes nodes. We'll use a similar prefix for the environment and cluster name. If you want to set up multiple environments in the same account, you'll need to do the same, or else you'll face a conflict when trying to create another environment. Next, we have to attach multiple IAM policies to this role. We'll use a 'for each' loop to iterate over all provided policies and attach them to the nodes' IAM role. The last policy is optional; it allows you to use the session manager to SSH into the node. In the next file, let's create EKS-managed instance groups. As in the previous example, we want to iterate over all node groups provided as a map variable. All node groups must be connected to the EKS cluster we created earlier. We'll use a key of the object for the node group, for instance, general. We'll also share the same IAM role among all node groups. If you need to grant additional access for applications running in EKS to the AWS API, you would use an OpenID Connect provider instead. You'll see an example of this later on when we deploy the cluster autoscaler. Next, we'll set the capacity type of the node, which can be either on-demand or spot type. Then, we'll specify the list of instance types associated with the EKS Node Group, such as 't3a.xlarge'. After that, we'll configure the scaling settings. Remember that these settings only set up the initial autoscaling group parameters, like minimum, maximum, and desired size. To enable autoscaling, you must deploy the cluster autoscaler or use Karpenter. Next, we'll set the desired maximum number of unavailable worker nodes during a node group update. I'll keep the default of 1 node at a time. It's also helpful to assign labels to the Kubernetes workers. Later, you can use nodeSelector or affinity to bind pods with nodes. Additionally, we'll use a similar 'depend_on' statement to ensure the IAM role is ready before creating instance groups. The next step involves setting up the OpenID Connect provider, which is used to grant access to the AWS API. Most of the time, you'll want this in your cluster, but sometimes it's not necessary. Let's create a boolean flag called 'enable_irsa' that we can use to create this provider on demand. Then, you'll need to point it to the EKS control plane. Once you've retrieved the EKS TLS certificate, you can proceed to create the OpenID Connect provider. Since we'll be creating additional Kubernetes add-on modules, we want to expose some variables that can be passed to another module. For example, the full EKS name that includes the environment prefix. To deploy the cluster autoscaler, we'll need to use this OpenID provider ARN to establish trust between AWS IAM and the Kubernetes service account. Finally, let's declare the variables that we want to provide for this module. First, we'll use the same environment variable as a prefix. Then, we'll specify the desired EKS version and the name of the EKS cluster. Next, we'll provide the list of subnet IDs that we need to pass to EKS, followed by the default IAM policies that we have to attach to the EKS nodes. Lastly, we'll define the node groups, specifying all the parameters of the desired Kubernetes node group. And finally, we'll add the 'enable_irsa' flag to create the OpenID Connect provider. Now, let's create another live environment—our fourth one. First, let's create a Terragrunt file and define shared objects between environments and modules. We'll continue using the local state for now, but this will be the last time, I promise. Also, we'll set up the AWS Terraform provider with the 'us-east-1' region. Next, create a 'dev' folder for the development environment. Inside it, let's create another Terragrunt file, but in this case, define common variables only for this development environment. Later, when we need to create another environment, that's when you'll need to update most of them. For example, I want to share the 'dev' environment prefix across all modules in this environment. Before we can create the EKS cluster, we need to provision a VPC in your AWS account. Let's copy the VPC module from the previous example and paste it under the development folder. Make sure to delete the lock and state files from the previous example, as we don't need them. First, let's refactor the environment variable. Like in the root, we can include that environment variable in the code and use it wherever we need it. The 'expose' attribute determines if the included config should be parsed and made available as a variable. This enables other parts of the configuration to access and use it. Instead of hardcoding the environment variable, we can dynamically pass it from the parent folder. This is another step to make your Terraform code DRY (Don't Repeat Yourself). That's all for the VPC; we'll leave the other variables as they are. If you decide to use a different EKS name, you must update the tags accordingly. Now, create another folder for the EKS module and make a new Terragrunt file. For the time being, we'll keep using the relative path to specify the module's source. Next, include the root to generate backend configuration and the AWS provider. Then, add a similar environment variable. By the way, you can also expose some variables in the root Terragrunt file if you want to share them between different environments, such as the AWS account or region. Next, we need to provide input variables for the module. First, we want to use the most recent EKS control plane version available at the moment, which is currently 1.26. Then, we'll use the same environment variable as in the VPC module and set the EKS cluster name. This is where Terragrunt really shines—it allows you to define dependencies between modules. For example, the EKS module depends on the VPC module and needs subnet IDs. With plain Terraform, you would have to use the Terraform remote state and execute those modules sequentially. Next, we need to create a 'node_groups' variable with the desired settings for the group. Finally, define the dependency on the VPC module. To do that, you simply need to point to the VPC folder where you invoke that module. It's also important to provide some mock outputs. This is helpful when you want to run 'terraform plan' on both modules simultaneously. If you omit this mock output variable, the plan will exit with an error stating that the EKS module needs subnet IDs. You'll see this in action soon. That's all for the VPC and EKS module. Now, let's go ahead and initialize Terraform. Terragrunt offers another useful feature that allows you to run the same command in multiple folders and, most importantly, respect dependencies. Let's switch to the development environment. From here, we can run 'init' for both VPC and EKS modules. By the way, it's optional—Terraform will automatically initialize it when you run 'plan' or 'apply' anyway. You can see that Terragrunt will run 'init' in the VPC first and then in the EKS module since we've defined the dependency. Now, let's run 'plan'. It will execute in the same order—VPC and then EKS. If you leave the mock variables out, the plan will exit with an error on EKS. Alright, we can run 'apply' now. Terragrunt will show you the order again and ask you to confirm the action. When you say yes, it will create the VPC first and use subnet IDs output variables as inputs in the EKS module. Typically, we have many different modules in our environment, and it becomes extremely helpful to share output variables and run 'apply' on the entire environment simultaneously. Alright. It will take maybe 10 minutes to create the VPC and EKS cluster. To access the EKS cluster, we need to update our local Kubernetes config using the 'aws eks' command. Let's run 'kubectl get nodes' to verify that we can connect to the cluster. At this point, we have the VPC and EKS cluster set up. Most of the time, you would want to deploy additional components, such as cluster autoscaler, CSI storage drivers, load balancer controllers, etc. For that, let's create a separate Terraform module called 'kubernetes-addons'. We'll combine managed and self-managed addons that are deployed as Helm charts. As always, let's begin by creating version constraints. In the case of this module, we want to use the Helm provider to deploy self-managed Kubernetes addons as Helm charts. Next, let's create an additional Terraform file for each addon. The first one is the cluster autoscaler. The cluster autoscaler needs access to the AWS API to discover autoscaling groups and adjust desired size setting on them. For that, we need to use IAM for service accounts. We'll deploy it in the 'kube-system' namespace, and the Kubernetes service account name is 'cluster-autoscaler'. We need to set it in the Helm chart later. All Kubernetes addons will have a flag to enable them, such as 'enable_cluster_autoscaler'. If it's true, the count is 1, which means we will create this type of resource. Then, use the EKS name as a prefix for the IAM role. Next, let's create the IAM policy that allows the cluster autoscaler to work properly, as I described earlier. Attach this policy to the trusted IAM role. And let's create a Helm release. It will also use a boolean flag to enable it, the name of the Helm release, the remote repository to use, and the chart name. Specify the namespace, which must match the namespace on the IAM role. Then, provide the chart version. Now it's important to match the service account name with the IAM role. To establish trust, we must set the Kubernetes service account annotation with the IAM role ARN. Finally, provide the EKS name so that the cluster autoscaler can auto-discover subnets and autoscaling groups. This functionality is based on the subnet tags that we provided earlier. Now, I intentionally limited the number of parameters that we can customize so that everyone can follow the same process and limit the drift between environments. To deploy the load balancer controller, just create another Terraform file and follow the same logic. The same goes for Karpenter, ArgoCD, and other components. First, as always, the environment variable. Then, the EKS cluster name that we'll get from the EKS output variable, a flag to deploy the cluster autoscaler, the Helm chart version of the cluster autoscaler, and finally, we need to pass the OpenID Connect provider ARN from the EKS module. That's all for the addons module. Next, create an addons folder under the live development environment and create a Terragrunt file. As always, we need to define the source of the module, include the backend and AWS provider config, and then the same environment variable. Now, we need to pass the EKS name as a dependency from the EKS module and the OpenID Connect provider ARN, enable the cluster autoscaler, specify the version of the chart that we want to install, and define dependencies on the EKS module. Specifically, we need the EKS cluster name and the OpenID Connect provider ARN. The tricky part is to authenticate the Helm provider, which we will generate using Terragrunt as well. Keep in mind that you cannot pass variables from the EKS module here. This provider will be generated and can only use variables that are provided to the module itself. Essentially, you can generate anything you want. To initialize the Helm provider, we need to get a temporary token. You can pretty much initialize any provider that needs to authenticate with EKS using the same principle, such as the Kubernetes and kubectl Terraform providers. That's all. Let's go back to the terminal and run terraform apply on the whole development environment. Since we defined dependencies, you can see the order in which Terraform will apply the infrastructure. Let's confirm it. After deploying the infrastructure with Terraform, it's important to verify that the cluster autoscaler has been installed correctly. To do this, you can check the installed helm charts and confirm that the autoscaler is installed. Additionally, you should make sure that the autoscaler pod is up and running. It's also recommended to check the logs of the autoscaler for any errors. If there are any misconfigurations, you may see an error in the logs indicating that the autoscaler is unable to access the AWS API due to permission denied. If the autoscaler is unable to access the AWS API, it won't be able to adjust the desired size property for autoscaling. Therefore, it's important to confirm that the autoscaler has been deployed successfully. To test if the autoscaler is working correctly, we can create a simple deployment based on Nginx with 4 replicas. After creating the deployment, we can watch the pods in the default namespace and see that one pod is in a pending state. This is because there are not enough nodes to schedule the pod. We can describe the pending pod and confirm that the autoscaler is triggered to add more nodes. We should see a message indicating that the pod triggered a scale-up from 1 to 2 instances. After a few seconds or minutes, a new node should join the cluster, and the pending pod should be scheduled. This test confirms that the autoscaler is working correctly. To replicate the setup in the staging environment, you can simply create a copy of the dev folder and rename it to staging. The only required change would be to update the environment variable from "dev" to "staging". While you can also update other local parameters such as EKS node instance types, since autoscaling is set up, this same setup can be used across different environments including production. Switch to the staging environment, and initialize. When we run the plan, we may receive an error message, but we can run the command again. However, Helm provider requires an existing EKS cluster, so we cannot fake it. If we still want to run the plan command on some folders and exclude addons, we can use a specific command. Alternatively, we can run the apply command, which will work because the addons module is invoked only after the EKS cluster is provisioned. Overall, we have three groups in this execution plan. Great! We have successfully set up the VPC, EKS cluster, and autoscaler in the staging environment. To connect to the staging eks, just update the name of the cluster in your Kubernetes config file. You can see that the cluster autoscaler is running in the kube-system namespace. Now, let's check our clusters on the AWS console. Since we used environment prefixes, we were able to create two independent environments in a single AWS account. We now have two clusters, each with its own environment prefix. To destroy both environments, we just need to run "destroy" on the top level. Terragrunt will destroy both the development and staging environments at the same time. It will do so in reverse order, deleting the helm charts, EKS clusters, and lastly, VPCs. If you check the AWS console, you will see that all the clusters and VPCs have been removed. In this part of the tutorial, I'll teach you how to use Terragrunt for real-world projects. We'll separate our code using Git modules, save our remote state in an S3 bucket, and secure it with DynamoDB locking. Plus, we'll use an IAM role to set up our infrastructure. First, we need to create an S3 bucket to save the Terraform state. You can name it anything, but remember it must be globally unique. It's also a good idea to turn on bucket versioning, so if something happens to your state, you can always go back to an earlier version and recover it. Now, let's create a DynamoDB table to lock the Terraform state. This helps avoid conflicts when several team members try to run Terraform simultaneously. We'll name the table "terraform-lock-table" and create a "LockID" partition key. That's all we need to manage the remote state in our S3 bucket. Instead of giving users direct access, we can create a dedicated IAM role for applying all infrastructure changes. This role can be assumed by users or, even better, by automation tools like Jenkins. For now, let's give the role admin access, as the specific permissions needed will depend on your infrastructure plans. We won't cover IAM permissions in this tutorial. Choose "AdministratorAccess" and add it to the role, then name it "terraform." By default, any user in the account can potentially assume this role. We can limit this on the principal side if needed, but it's not required, as we'll need to explicitly grant users permission to assume the role. In the next step, we'll create an IAM policy that allows users to use the "terraform" role. First, copy the ARN of the role. Then, create a new policy allowing users to use the "terraform" IAM role and name it "AllowTerraform." Following best practices, let's avoid attaching the policy directly to users. Instead, create an IAM group called "devops" and add the "AllowTerraform" policy to this group. For the demo, create a new IAM user and add them to the "devops" group. This will allow the user to assume the "terraform" role. Keep in mind that any user with Admin access in the account will also be able to assume the "terraform" role. Next, generate security credentials for the user. We'll use these credentials to create an AWS local profile. Download the credentials. Now you can use the "aws configure" command to add a new profile for the user. Next, create a separate Git repository to store your Terraform modules. You have two options: store all modules in the same Git repo or create a separate Git repository for each Terraform module. With the last option, you might end up with many repositories, which can be difficult to maintain. If you're just starting out or have a relatively small DevOps team, begin with a single repository. Name it "infrastructure-modules." Make the repository private, add a README file, and include a .gitignore file for Terraform. Now, clone the "infrastructure-modules" Git repository and open it with a text editor. First, let's copy the VPC Terraform module we created earlier. Add this module to the Git repo and commit the changes. When using a single Git repository for multiple modules, create a Git tag specific to this module so we can reference it later. If you need to make changes to the VPC module, commit again and create a new VPC tag. Finally, push the Git tags to GitHub or another remote Git server. So far, we have a single tag in the GitHub repository. Next, let's copy the EKS module. Follow the same workflow: add it to the repo, commit the changes, and create a new Git tag specific to this EKS module. Finally, let's move the "kubernetes-addons" module to the "infrastructure-modules" Git repo. Add it to the repo and create a new tag. This is the last module we're going to add. Now we have three separate Git tags related to specific modules. That's how you manage multiple Terraform modules in a single repo. Next, create a new Git repository to store the live state of our infrastructure. Make the repository private and select "Terraform" for the .gitignore file. Clone this new repository as well and open it in Visual Studio Code or your preferred text editor. First, create a root Terragrunt file. In this case, we'll use a remote S3 bucket to store our state. Since we're using an IAM role to apply changes, we need to provide the IAM role ARN and specify the profile that can assume that role. Then, provide the S3 bucket and, finally, the DynamoDB table to lock the state. Next, let's generate the AWS Terraform provider. We'll also use the same role and AWS profile to run Terraform. Optionally, you can give the provider a session name. Now, let's copy the development environment from the previous example and make a few adjustments, as it won't work out-of-the-box. First, let's clean up by deleting previous local state and lock files from all modules. Next, we need to update the source of the modules to point to the remote Git repository and use tags to pin each module to a specific version. Let's update the "kubernetes-addons" source first, and then the VPC module. This example is taken from the Terragrunt Quick Start. Additionally, we must update the Helm provider to use a token; I'll explain why later. We also need to ignore the Terragrunt cache. Now, switch to the development environment and run "terragrunt apply." It will show the order in which the modules will be applied and ask for confirmation. It seems that Terraform was updated and no longer supports "github.com" as is. We need to update this in the code. Let's go through all three modules and add the "git" schema before each module: the Kubernetes addons module, and finally the VPC module. Run the command again, confirm that you want to update the state, and in a few minutes, Terraform should create the VPC, EKS cluster, and deploy the auto-scaler. Now, you can check the S3 bucket and find that there's a "dev" key and separate paths for each module: VPC, EKS, and Kubernetes addons. Let's try connecting to the cluster as we did in the previous example. We now encounter an error; even when using the default AWS profile with admin access, we don't have permissions to access Kubernetes. The problem is that only the IAM user or role used to create the cluster has access to it. I have a separate video on how to add additional users and IAM roles while following best practices. For now, let's create a new AWS profile for the "terraform" IAM role. This indicates that the "anton" user will be used to assume this "terraform" role. Next, update the Kubernetes context once again, but this time use the "terraform" profile. Now, we can access the cluster. Let's also check if the auto-scaler is running. To destroy all the infrastructure, just run "destroy" in the "dev" environment. The next step is to add additional users and learn how to use ArgoCD to deploy applications to Kubernetes. Thank you for watching, and I'll see you in the next video.

Info

Channel: Anton Putra

Views: 40,551

Rating: undefined out of 5

Keywords: terragrunt, terragrunt tutorial, terragrunt vs terraform, terragrunt terraform, terragrunt aws, terragrunt explained, terragrunt demo, terragrunt multiple environments, terragrunt dependency, create EKS using terraform, EKS, AWS EKS, Kubernetes, create eks cluster aws using terraform, create eks cluster aws, eks aws, eks tutorial aws, terraform eks, terraform eks cluster creation, devops, anton putra, terraform, aws, aws cloud, aws tutorial, sre, gitops, terraform helm, helm, k8s

Id: yduHaOj3XMg

Channel Id: undefined

Length: 61min 9sec (3669 seconds)

Published: Sat Apr 15 2023