Terragrunt Tutorial: Create VPC, EKS from Scratch!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
In this tutorial, I'll guide you step by step on  how to create a VPC using plain Terraform code.  Next, we'll refactor it into a well-structured  Terraform module by following best practices.   Finally, we'll make use of Terragrunt's  top features to set up your infrastructure. We'll not only build a VPC module, but also  create EKS and Kubernetes add-ons modules   from scratch. With the Kubernetes add-ons  module, you can easily enable managed add-ons   like the CSI storage driver, or self-managed  add-ons such as the cluster autoscaler,   load balancer controller, ArgoCD,  and others deployed as Helm charts. In this tutorial, you'll learn many valuable  techniques and best practices for creating   Terraform modules, even if you don't end up using  Terragrunt in your projects. Towards the end,   I'll demonstrate a production-ready setup that  includes using an S3 bucket and DynamoDB table to   lock the state, as well as creating an IAM role  that can be assumed by users or automation tools   like Jenkins. Additionally, we'll set up separate  Git repositories for the live infrastructure and   Terraform modules, to help maintain a  well-organized and efficient workflow. Terragrunt offers many useful features  that can enhance your workflows. One such   feature is the ability to execute and apply  Terraform on multiple modules simultaneously,   while also sharing output variables from  one module as input to others. For instance,   we'll create a VPC module and use the  private subnet IDs as input for the   EKS modules. Terragrunt can also generate  backend configurations and provider configs,   saving you from copy-pasting and keeping  your code DRY. Many people, including myself,   find Terragrunt extremely helpful and use  it in production environments all the time. Let’s get started. In this tutorial, we'll talk  about various ways to organize your Terraform   code. One method is to make an "environments"  folder containing separate folders for each   environment, like "dev" for development. This  environment is for testing new infrastructure   and app features, and it's okay if things break.  Next, you might have a "staging" environment where   you test your apps before going live. Sometimes  called "pre-prod," this more stable environment   should resemble production but might be smaller  in terms of resources and cost. And so forth. If we want to use Terragrunt recommended  approach, we'll refer to these environments   as "live environments" or "infrastructure live."  This naming sets this folder or Git repository   apart from other Terraform code and modules. The  "live" label means you can treat this repo as a   reliable source, and whatever is declared  in those folders should be up and running. For your reference, I'll label this  first approach as "v1" when we set up   our infrastructure using the basic Terraform code. Alright, let's begin by creating a VPC under the   "dev" folder. We'll use this VPC  later to deploy an EKS cluster. First, we need to declare the provider. There  are several ways to authenticate with AWS. For   this initial example, I'll stick to the default  method, which uses either environment variables   or your default AWS profile. Later, we'll  assume roles, and I'll explain the difference   it makes when creating an EKS cluster. Next, let's set some Terraform constraints.   You should use version 1 or higher. When Terraform provision infrastructure,   it needs a way to track what was created.  This is called the Terraform state,   and we use backend configuration to decide  where to store it. For testing purposes,   we'll use the local backend, which is the default  option, but we can adjust a few parameters. Later,   we'll switch to using an S3 bucket and a DynamoDB  table to lock our state. This approach should be   used most of the time, especially when working  in a team, instead of relying on local state. Here, we can set a path for storing the state  file. Often, this path will be in an object   store like an S3 or Google Cloud Storage  (GS) bucket. Additionally, we can set version   constraints for the AWS provider itself.  That's all for the provider configuration. Next, let's create a VPC. First, we need  to assign it a CIDR block. At this point,   you need to make an important decision.  If you plan to peer multiple VPCs in the   future, you should come up with  unique CIDR ranges beforehand. Many third-party add-ons for  Kubernetes, like the EFS storage driver,   require DNS support. It's a good  idea to enable it from the start,   as it can save you a lot of time  when troubleshooting issues later on. Lastly, let's add the Name tag. You'll  notice that we use environment prefixes,   which is a common practice even if you have  separate accounts for different environments.  Next, we'll create an internet gateway  to provide internet access for public   subnets. We need to attach it  to the VPC and add a Name tag. Now, let's create subnets. We'll make  two public and two private subnets.   The CIDR block for each subnet should be  a subset of the VPC's CIDR block. Then,   choose the availability zone. For EKS, you  need at least two different availability zones. Subnet tags are crucial, especially  for EKS. First, add a Name tag. Then,   include a tag indicating that EKS can  use the subnet to create private load   balancers. Add another tag to  associate this subnet with EKS,   with a value of either "owned" or "shared." You  can create an EKS cluster without these tags,   but some components might not work as expected.  For example, the Cluster Autoscaler or Karpenter   will use these tags to auto-discover subnets for  creating additional Kubernetes workers. Next,   create another private subnet in a different  availability zone. Remember, the cluster tag   must match your EKS cluster name. For the dev  environment, we'll create the "dev-demo" cluster. Now, let's create the public subnets. You'll  want to enable assigning public IP addresses   when virtual machines launch. Additionally, tag  these subnets to allow EKS to create public load   balancers. Public load balancers get public IP  addresses and are used to expose your service   to the internet. For example, an Nginx ingress  controller can create a public load balancer.   In contrast, private load balancers only get  private IP addresses, allowing you to expose   your service within your VPC only. Finally, create the last public   subnet in a different availability zone. Next, we need to create a NAT gateway  to provide internet access to private   subnets. I recommend manually allocating  a static public IP address, as you might   need to whitelist it with your clients  in the future. It's better to allocate   multiple public IPs in case you need to perform  blue-green deployments. For the NAT gateway,   we must explicitly depend on the  internet gateway. Additionally,   the NAT gateway must be placed in one of  the public subnets with an internet gateway. Finally, we need to create  routing tables. For now,   both public and private subnets have the same  default route, which is limited to your VPC only.  First, we'll create a private routing table  and use the NAT gateway as the default route.   A route that uses all IP addresses is called the  default route. Then, the second public routing   table will have the default route set to the  internet gateway. Next, we need to associate all   four subnets (two private and two public) with the  corresponding routing tables. In the next example,   we'll dynamically generate subnets and associate  them with routes. That's pretty much it. Optionally, you can expose the VPC  ID. This can be useful if you use it   as input for another Terraform code or module. That's all for the VPC setup. The current folder   structure consists of a "live" folder, followed  by "environments" and component-specific Terraform   code, such as the VPC. Now, let's switch to the  VPC folder and initialize Terraform. This will   download all required Terraform providers  and initialize the Terraform state. Then,   run "apply" and enter "yes" to create your VPC.  It may take about 2 or 3 minutes to complete.  ​​Once Terraform finishes,  it should return the VPC ID.  Now, you can check the AWS console to confirm  that we have a newly created VPC with the proper   Name tag and all four subnets with the "dev"  prefix. This is a typical example of how to   use Terraform to create an AWS VPC. It's  very straightforward: define in the code   what you want to create and apply Terraform. The  challenge is reproducing this at scale and keeping   the Terraform code DRY (Don't Repeat Yourself). Additionally, based on your backend configuration,   Terraform will create a state  file under the "dev/vpc" folder. If we want to reproduce the same setup  in another environment, such as staging,   using just plain Terraform code, we'll need to  copy the VPC folder to the new environment and   replace all the environment-specific references.  For some, we can use variables, but for others,   such as the backend configuration, we cannot. First, let's remove the state folders and file. In the backend block, variables are not supported,  and you'll have to manually replace the path for   each new environment. For example, replace  "dev" with "staging." If you use a separate   bucket for each environment, you'll need  to replace the bucket name instead. Then,   we need to find all the references to the specific  environment and replace them with "staging."   You can definitely use variables here. Under  subnets, we have a lot of references to "dev"   that we need to replace. I'll use a Visual Studio  shortcut to replace all occurrences of "dev." Next, update the NAT gateway and routes.   It's possible that we may miss something, so  let's search for any remaining "dev" references.   Also, don't forget to update the internet gateway. So far, our directory structure looks like this: That's it! Let's go ahead and initialize  Terraform in the staging VPC folder.   You may encounter an error since we  copied the Terraform folder. To fix it,   just run "reconfigure," but be very careful and  first ensure that you're in the right place.   Then, apply the Terraform. Alright! Now, we have one VPC for the  dev environment and another for staging.   The same applies to the subnets; we  have four for dev and another four   for staging. Some companies may choose to  dedicate a separate AWS account for the   production environment with very limited access. Additionally, we now have another Terraform state   file for the staging environment under the  "staging" folder. In the following example,   we'll improve it. But before we proceed, make  sure to destroy the staging and dev VPCs.   Once completed, you should not have any VPCs in  that account besides the default VPC. By the way,   you should not use the default VPC in any  situation. It's only there for demonstration   purposes, so feel free to delete it.  Sometimes you may get an error that the   default VPC does not exist, but it's a good  sign that you forgot to update something.  In the next part of this tutorial, we will  transform our terraform code into a module.   This will greatly reduce the amount of  code we need to duplicate when working   with different environments. By doing this,  we are taking the first step towards making   our code more efficient and less repetitive. At  the moment, we will keep everything in the same   repository and create a folder specifically  for infrastructure modules. In the future,   we will explore how to structure terraform  modules across different git repositories. Now, let's copy the entire VPC folder  and place it under the modules folder.   We can remove the dev folder, which  contains the Terraform state and lock files.   Next, we'll begin refactoring. We don't need to  declare the provider within the module. Instead,   we can just set version constraints for Terraform  and the provider. Let's remove the provider and   backend configurations. The provider and backend  will be managed in the infrastructure-live folder. Next, create a variables file. We'll modify our  code by moving some parts into variables. One   commonly used variable is the environment  variable. This helps differentiate between   different environments and is often used as a  prefix for your infrastructure components. It's   a good idea to add a description and specify  a type, like "string," for this variable. The next variable is a CIDR block. In this case,   we can set a default value for the variable  and only override it when necessary. Now, let's begin converting this code into a  module, following best practices. If there's   no more descriptive and general name  available, or if the resource module   creates only one resource of this type,  the resource name should be called "this." Next, replace the hardcoded CIDR block for the  VPC with a variable. Whenever you create a module,   replace all possible configurations with variables  and provide default values if you don't want   to change them. This approach will be helpful  in the future if you receive new requirements   and need to update a parameter. It makes your  modules flexible and future-proof. For instance,   when working with DNS, you should also create  variables and set the defaults to true. Now, replace the "dev" prefix with  the environment variable. You can   also replace the entire name with  a variable, not just the prefix. Next, apply the same process to the  internet gateway. Replace the resource   name with the "this" keyword. Update the  vpc_id to reference "this" as well. Also,   update the name tag following the same  approach used for the VPC resource. For subnets, let's remove the  existing code and replace it with   logic that dynamically creates subnets  on demand. You can find similar logic   in the official AWS VPC module. But  first, let's add a few new variables. The first variable is for availability zones.  We'll pass a list containing the zones we want   to use. Next, create a variable for CIDR  ranges for the private subnets. Similarly,   create another variable for CIDR  ranges for the public subnets. Next, create two more variables to pass  additional tags for private and public   subnets. This is particularly useful when using  a VPC for EKS clusters, as previously mentioned. Now that we have variables for subnets, we  can create Terraform code for them. Instead   of using a "for each" loop, we'll use the  "count" variable and create as many private   subnets as needed, based on the input provided  to the module. We'll then use the VPC reference.   For the CIDR block, we'll use the count index.  We'll apply the same logic for the availability   zone. For the tags, we'll use the built-in  merge function to combine the provided tags   with a name tag. Essentially, we'll use the same  logic to create public subnets for the module. Now, let's update the NAT Terraform code  to use the "this" keyword and modify the   Name tags to incorporate the environment variable.   For the subnet, let's use the  first generated public subnet. The same applies to the routes. We need to  refactor the code and use the "this" keyword. We should also update the logic  to associate these routing   tables with the generated subnets.  Instead of hardcoding each route,   let's use a count variable and the index of  the subnets. First, let's associate a private   routing table with all private subnets,  and then do the same for public subnets. Now, let's add a few output variables, such  as lists of private and public subnets.   These output variables can be used later when  passing information to the EKS module. We can   use the star (*) to return all created subnets.  These shortcuts are simple but very powerful. We have now completed the module. Next,  let's create another live environment   where we can call these Terraform  modules. We'll use the same structure:   a 'dev' folder for the development environment  and another folder for the staging environment. Now, create another VPC folder  to invoke the VPC module. One crucial lesson learned from writing hundreds  of thousands of lines of infrastructure code is   that large modules should be considered harmful.  In other words, it's not a good idea to define all   your environments (dev, stage, prod, etc.) or even  a significant amount of infrastructure (servers,   databases, load balancers, DNS, etc.) in a  single Terraform module. Large modules are slow,   insecure, hard to update, challenging  to code review, and difficult to test. Alright, create a main.tf file  to call the VPC module. First,   as with plain Terraform code, we  need to declare the AWS provider   and set up the backend. In this example,  we'll continue to use the local state. Next, declare the VPC module. In this  case, 'VPC' is an arbitrary variable. Then,   specify the source. A Terraform module is simply  a folder containing a bunch of Terraform code.   You can reference it using a relative path  or, later on, use a dedicated git repository. Declare the environment. Since  it's under the 'dev' folder,   it should be the development environment. Later,   I'll show you how to dynamically obtain  this information from the folder structure. Next, provide the availability zones, private  subnet CIDR ranges, and public ranges. Finally,   let's pass the same subnet tags for  both private and public subnets. As you can see, the code is now much more concise.  We've reduced our Terraform code to a single file.   Optionally, we can use output variables if we  want to display them in the console; otherwise,   you can still use them, but they won't be printed  to the console. To reference an output variable,   first use the 'module' keyword, followed by  the module name and the module output variable. So far, we have the VPC Terraform module and  a main.tf file under the 'dev' environment   to invoke it. Now, let's switch to the  environment and initialize Terraform.   After that, run 'terraform apply'  to create the VPC using this module.   In the terminal, you can see all output  variables, such as subnets and the VPC ID. Now, instead of copying the entire  Terraform folder with all its files,   we'll simply create another main.tf file under  the staging environment. Let me copy the content   of the main file from the 'dev' environment and  replace all references with the 'staging' keyword.   Also, let's copy the output file. Switch to the 'staging' VPC folder, initialize  Terraform, and then apply the changes. Now, we have the same setup  as in the first example:   a 'dev' and 'staging' VPC, along with 8 subnets. Before moving on to the next example, let's  destroy both the 'dev' and 'staging' VPCs.   Alright, we have successfully deleted both VPCs. In this section, we'll improve our current setup   by using Terragrunt. Terragrunt is a simple tool  that offers additional features for making your   configurations more efficient, working with many  Terraform modules, and handling remote state. Let's make another live environment  to use Terragrunt. To set it up,   we need to create a terragrunt.hcl file.  If you're using the same S3 bucket and   configuring a different path to store your state,   you can place this file above your environment  folders. You’ll see more examples later.  In this tutorial, we'll cover many features  of Terragrunt. While most of them are simple   shortcuts, when used together, they  can greatly enhance your workflow. First, let's reorganize the Terraform backend  configuration. Usually, you'll use remote state,   but for this initial example, we'll stick with  local state. Keep in mind that the backend   configuration doesn't support variables or  expressions, so you'll need to copy and paste it,   updating the parameters as needed. For  instance, even when using local state,   you must update the path to the state  file: for the development environment,   it's "dev/vpc/state," while for staging, it's  "staging/vpc/state." If you use different buckets,   you'll also need to update the  bucket name for each environment. Terragrunt helps you maintain efficient  backend configurations by letting you   define them just once in a root  location and then inheriting that   configuration in all child modules.  The "path_relative_to_include" will   be translated to "dev" for the development  environment and "staging" for the staging   environment. This way, you won't have  to repeat yourself in the configuration. With Terragrunt, you can now create your  backend configuration just once in the root   terragrunt.hcl file, and it will  be used across all environments and   modules. This simplifies your  setup and reduces repetition. Managing provider configurations across  all your modules can be challenging,   particularly when customizing authentication  credentials. If you need to update your provider,   you must do so in each environment separately,  which can be time-consuming and repetitive. Let's say you want Terraform to assume  an IAM role before connecting to AWS;   you need to add a provider block with the  "assume_role" configuration. You would   then copy and paste this configuration into  every one of your Terraform modules. While   it's not a significant amount of code, it  can be difficult to maintain. For instance,   if you need to modify the configuration to  expose another parameter (e.g., "session_name"),   you would have to go through each of your modules  and make the change, which can be cumbersome. In addition, what if you wanted to  directly deploy a general purpose module,   such as that from the Terraform module  registry? These modules typically do   not expose provider configurations as  it is tedious to expose every single   provider configuration parameter imaginable  through the module interface. Terragrunt   allows you to refactor common Terraform  code to keep your Terraform modules DRY. I’ll show you more examples  later. For this basic example,   we'll use the AWS provider with the default  authentication method and the "us-east-1" region. That's all the setup required to begin using  Terragrunt. With this basic configuration,   you can start taking advantage of its  features to improve your Terraform workflows. Now, let's create a standard folder structure for  our environments, including "dev" and "staging"   folders as usual. We'll also create a "vpc" folder  to call the VPC module. To use Terragrunt, we need   to declare a single file. Start by defining the  source for the module, which can be a local path,   a Git repository, or the Terraform registry,  just like a regular module source attribute. Here's where things differ: we'll include the  root terragrunt file that we defined earlier. This   will generate backend configuration and set up the  provider for us. Then, under the "inputs" section,   you'll provide the same Terraform variables  that we used in the previous example,   such as environment, availability zones, etc.  This part is identical to a regular module,   except that you need to use the "input"  block to supply these variables. To start using Terragrunt, you first need to  install it. You can download it from the source,   but a preferred method is to use a  package manager. For instance, on a Mac,   you would use Homebrew to install Terragrunt. Next, navigate to the "vpc" folder within the  development environment. Instead of running   "terraform init," you just need to  run "terragrunt init" (you may want   to create an alias for this command). Then,  execute "terragrunt apply." Once Terragrunt   completes the deployment, you'll have  the same VPC and subnets as before. Now, let's examine the backend configuration.  Terragrunt generates backend and profile   configurations in its own working directory, which  you can find under the "terragrunt-cache" folder.   You'll notice that we still have the "dev" key  for the state. It's important not to use local   state with Terragrunt; later, we'll convert it  to S3. With the current setup, it's challenging   to share your state with other team members  since the entire folder is ignored by Git. Now, let's create a similar VPC in  the staging environment. Copy the   Terragrunt file from the "dev" environment  and replace all references with "staging."   That's pretty much it. Navigate to the  "vpc" folder under the staging environment,   initialize Terragrunt, and then apply the changes. Because we used the same backend configuration,   Terragrunt created a different path based on  the location, which starts with the "staging"   key. This approach will be very helpful when  working with remote backend configurations. As a result, we have the same VPC for both the  development and staging environments, along with   their respective subnets. While it may not seem  like a significant change, it can save you a lot   of time and help maintain DRY configurations  when you have large number of environments. Another useful Terragrunt feature is the  ability to run commands in multiple folders   simultaneously. This will be incredibly  valuable later when we define dependencies   between modules. For now, instead of changing  directories and running "destroy" to clean up,   we can simply run "terragrunt run-all  destroy" from the root folder. It will   show you where it's going to execute those  commands and ask for your confirmation. By running just one command,   we've successfully destroyed both VPCs in  the staging and development environments. In the following part of this tutorial,  we will make an EKS Terraform module and   add some extra features like cluster  autoscaling. Let's move forward by   making a new folder called 'eks' inside  the 'infrastructure modules' folder. We'll start by using the same version  constraints for the AWS Terraform provider.   Before we can set up the EKS control  plane, we need to create an IAM role   with EKS principal. After that, we  must attach the AmazonEKSClusterPolicy,   which allows EKS to create EC2  instances and load balancers. For the cluster, we will use an environment  variable and pass the EKS name variable. Like   in the previous module, it's best practice  to parameterize all possible options and set   defaults, rather than hardcoding them in the  module or relying on provider defaults. Then,   attach the IAM role to the cluster. We need  to parameterize these values too. For now,   I'll turn off the private endpoint since I don't  have a VPN in this cluster, and enable the public   endpoint. This way, I can access the EKS  from my laptop and deploy applications. Up next, we have to supply subnets for  EKS, which should be located in at least   two different availability zones.  Amazon EKS sets up cross-account   elastic network interfaces in these subnets to  enable communication between your worker nodes   and the Kubernetes control plane. We'll pass  this variable dynamically from the VPC module   using the Terragrunt dependency feature. That's  about everything we need for the control plane. Now, let's create an IAM policy and IAM role  for the Kubernetes nodes. We'll use a similar   prefix for the environment and cluster name. If  you want to set up multiple environments in the   same account, you'll need to do the same,  or else you'll face a conflict when trying   to create another environment. Next, we have to  attach multiple IAM policies to this role. We'll   use a 'for each' loop to iterate over all provided  policies and attach them to the nodes' IAM role.   The last policy is optional; it allows you to  use the session manager to SSH into the node. In the next file, let's create EKS-managed  instance groups. As in the previous example, we   want to iterate over all node groups provided as  a map variable. All node groups must be connected   to the EKS cluster we created earlier. We'll  use a key of the object for the node group, for   instance, general. We'll also share the same IAM  role among all node groups. If you need to grant   additional access for applications running in EKS  to the AWS API, you would use an OpenID Connect   provider instead. You'll see an example of this  later on when we deploy the cluster autoscaler. Next, we'll set the capacity type of the node,  which can be either on-demand or spot type. Then,   we'll specify the list of instance types  associated with the EKS Node Group, such   as 't3a.xlarge'. After that, we'll configure  the scaling settings. Remember that these   settings only set up the initial autoscaling  group parameters, like minimum, maximum,   and desired size. To enable autoscaling, you must  deploy the cluster autoscaler or use Karpenter. Next, we'll set the desired maximum  number of unavailable worker nodes   during a node group update. I'll keep  the default of 1 node at a time. It's   also helpful to assign labels to  the Kubernetes workers. Later,   you can use nodeSelector or affinity to bind  pods with nodes. Additionally, we'll use a   similar 'depend_on' statement to ensure the IAM  role is ready before creating instance groups. The next step involves setting  up the OpenID Connect provider,   which is used to grant access to the AWS API. Most  of the time, you'll want this in your cluster,   but sometimes it's not necessary. Let's create  a boolean flag called 'enable_irsa' that we   can use to create this provider on demand. Then,  you'll need to point it to the EKS control plane. Once you've retrieved the EKS TLS certificate,  you can proceed to create the OpenID Connect   provider. Since we'll be creating additional  Kubernetes add-on modules, we want to expose   some variables that can be passed to another  module. For example, the full EKS name that   includes the environment prefix. To deploy the  cluster autoscaler, we'll need to use this OpenID   provider ARN to establish trust between  AWS IAM and the Kubernetes service account. Finally, let's declare the variables that  we want to provide for this module. First,   we'll use the same environment variable as a  prefix. Then, we'll specify the desired EKS   version and the name of the EKS cluster. Next,  we'll provide the list of subnet IDs that we   need to pass to EKS, followed by the default IAM  policies that we have to attach to the EKS nodes. Lastly, we'll define the node groups, specifying  all the parameters of the desired Kubernetes node   group. And finally, we'll add the 'enable_irsa'  flag to create the OpenID Connect provider. Now, let's create another live  environment—our fourth one. First,   let's create a Terragrunt file and define shared  objects between environments and modules. We'll   continue using the local state for now, but  this will be the last time, I promise. Also,   we'll set up the AWS Terraform  provider with the 'us-east-1' region. Next, create a 'dev' folder for the development  environment. Inside it, let's create another   Terragrunt file, but in this case, define common  variables only for this development environment.   Later, when we need to create another environment,  that's when you'll need to update most of them.   For example, I want to share the 'dev' environment  prefix across all modules in this environment. Before we can create the EKS cluster, we need  to provision a VPC in your AWS account. Let's   copy the VPC module from the previous example  and paste it under the development folder.   Make sure to delete the lock and state files  from the previous example, as we don't need them. First, let's refactor the environment  variable. Like in the root,   we can include that environment variable  in the code and use it wherever we need it. The 'expose' attribute determines if the  included config should be parsed and made   available as a variable. This enables other  parts of the configuration to access and use it. Instead of hardcoding the environment variable,   we can dynamically pass it from the parent  folder. This is another step to make your   Terraform code DRY (Don't Repeat Yourself). That's  all for the VPC; we'll leave the other variables   as they are. If you decide to use a different  EKS name, you must update the tags accordingly. Now, create another folder for the EKS module and  make a new Terragrunt file. For the time being,   we'll keep using the relative path to specify  the module's source. Next, include the root   to generate backend configuration and  the AWS provider. Then, add a similar   environment variable. By the way, you can also  expose some variables in the root Terragrunt   file if you want to share them between different  environments, such as the AWS account or region. Next, we need to provide input variables  for the module. First, we want to use the   most recent EKS control plane version available  at the moment, which is currently 1.26. Then,   we'll use the same environment variable as in  the VPC module and set the EKS cluster name. This is where Terragrunt really shines—it allows  you to define dependencies between modules. For   example, the EKS module depends on the VPC module  and needs subnet IDs. With plain Terraform,   you would have to use the Terraform remote state  and execute those modules sequentially. Next,   we need to create a 'node_groups' variable with  the desired settings for the group. Finally,   define the dependency on  the VPC module. To do that,   you simply need to point to the VPC  folder where you invoke that module. It's also important to provide some mock outputs.  This is helpful when you want to run 'terraform   plan' on both modules simultaneously.  If you omit this mock output variable,   the plan will exit with an error stating  that the EKS module needs subnet IDs. You'll   see this in action soon. That's  all for the VPC and EKS module. Now, let's go ahead and initialize Terraform.  Terragrunt offers another useful feature that   allows you to run the same command in  multiple folders and, most importantly,   respect dependencies. Let's switch  to the development environment. From here, we can run 'init' for both  VPC and EKS modules. By the way, it's   optional—Terraform will automatically initialize  it when you run 'plan' or 'apply' anyway. You can see that Terragrunt will run 'init' in  the VPC first and then in the EKS module since   we've defined the dependency. Now, let's run  'plan'. It will execute in the same order—VPC   and then EKS. If you leave the mock variables  out, the plan will exit with an error on EKS. Alright, we can run 'apply' now.  Terragrunt will show you the order   again and ask you to confirm the action.  When you say yes, it will create the VPC   first and use subnet IDs output variables  as inputs in the EKS module. Typically,   we have many different modules in our  environment, and it becomes extremely helpful   to share output variables and run 'apply'  on the entire environment simultaneously. Alright. It will take maybe 10 minutes  to create the VPC and EKS cluster. To access the EKS cluster, we need to update  our local Kubernetes config using the 'aws   eks' command. Let's run 'kubectl get nodes'  to verify that we can connect to the cluster. At this point, we have the VPC and EKS cluster  set up. Most of the time, you would want to deploy   additional components, such as cluster autoscaler,  CSI storage drivers, load balancer controllers,   etc. For that, let's create a separate  Terraform module called 'kubernetes-addons'.   We'll combine managed and self-managed  addons that are deployed as Helm charts. As always, let's begin by creating version  constraints. In the case of this module, we want   to use the Helm provider to deploy self-managed  Kubernetes addons as Helm charts. Next,   let's create an additional Terraform file for each  addon. The first one is the cluster autoscaler.   The cluster autoscaler needs access to the AWS  API to discover autoscaling groups and adjust   desired size setting on them. For that, we need to  use IAM for service accounts. We'll deploy it in   the 'kube-system' namespace, and the Kubernetes  service account name is 'cluster-autoscaler'.   We need to set it in the Helm chart later. All  Kubernetes addons will have a flag to enable them,   such as 'enable_cluster_autoscaler'. If  it's true, the count is 1, which means we   will create this type of resource. Then, use  the EKS name as a prefix for the IAM role. Next, let's create the IAM policy that allows  the cluster autoscaler to work properly,   as I described earlier. Attach this policy to the  trusted IAM role. And let's create a Helm release.   It will also use a boolean flag to enable it, the  name of the Helm release, the remote repository to   use, and the chart name. Specify the namespace,  which must match the namespace on the IAM   role. Then, provide the chart version. Now it's  important to match the service account name with   the IAM role. To establish trust, we must set the  Kubernetes service account annotation with the IAM   role ARN. Finally, provide the EKS name so that  the cluster autoscaler can auto-discover subnets   and autoscaling groups. This functionality is  based on the subnet tags that we provided earlier. Now, I intentionally limited the number of  parameters that we can customize so that everyone   can follow the same process and limit the drift  between environments. To deploy the load balancer   controller, just create another Terraform  file and follow the same logic. The same   goes for Karpenter, ArgoCD, and other components.  First, as always, the environment variable. Then,   the EKS cluster name that we'll get from  the EKS output variable, a flag to deploy   the cluster autoscaler, the Helm chart version of  the cluster autoscaler, and finally, we need to   pass the OpenID Connect provider ARN from the  EKS module. That's all for the addons module. Next, create an addons folder under the live  development environment and create a Terragrunt   file. As always, we need to define the source of  the module, include the backend and AWS provider   config, and then the same environment variable. Now, we need to pass the EKS name as a dependency   from the EKS module and the OpenID Connect  provider ARN, enable the cluster autoscaler,   specify the version of the chart that we  want to install, and define dependencies   on the EKS module. Specifically, we need the EKS  cluster name and the OpenID Connect provider ARN. The tricky part is to authenticate the Helm  provider, which we will generate using Terragrunt   as well. Keep in mind that you cannot pass  variables from the EKS module here. This provider   will be generated and can only use variables that  are provided to the module itself. Essentially,   you can generate anything you want. To initialize  the Helm provider, we need to get a temporary   token. You can pretty much initialize any  provider that needs to authenticate with   EKS using the same principle, such as the  Kubernetes and kubectl Terraform providers. That's all. Let's go back to the  terminal and run terraform apply on   the whole development environment.  Since we defined dependencies,   you can see the order in which Terraform will  apply the infrastructure. Let's confirm it. After deploying the infrastructure with  Terraform, it's important to verify that   the cluster autoscaler has been installed  correctly. To do this, you can check the   installed helm charts and confirm that  the autoscaler is installed. Additionally,   you should make sure that the autoscaler pod  is up and running. It's also recommended to   check the logs of the autoscaler for any  errors. If there are any misconfigurations,   you may see an error in the logs indicating  that the autoscaler is unable to access the AWS   API due to permission denied. If the  autoscaler is unable to access the AWS API,   it won't be able to adjust the desired  size property for autoscaling. Therefore,   it's important to confirm that the  autoscaler has been deployed successfully. To test if the autoscaler is working correctly,  we can create a simple deployment based on Nginx   with 4 replicas. After creating the deployment,  we can watch the pods in the default namespace   and see that one pod is in a pending state.  This is because there are not enough nodes   to schedule the pod. We can describe the pending  pod and confirm that the autoscaler is triggered   to add more nodes. We should see a message  indicating that the pod triggered a scale-up   from 1 to 2 instances. After a few seconds or  minutes, a new node should join the cluster,   and the pending pod should be scheduled. This test  confirms that the autoscaler is working correctly. To replicate the setup in the staging environment,   you can simply create a copy of the  dev folder and rename it to staging.   The only required change would be to update the  environment variable from "dev" to "staging".   While you can also update other local parameters  such as EKS node instance types, since autoscaling   is set up, this same setup can be used across  different environments including production. Switch to the staging environment, and initialize.   When we run the plan, we may receive  an error message, but we can run   the command again. However, Helm provider  requires an existing EKS cluster,   so we cannot fake it. If we still want  to run the plan command on some folders   and exclude addons, we can use a  specific command. Alternatively,   we can run the apply command, which will work  because the addons module is invoked only   after the EKS cluster is provisioned. Overall,  we have three groups in this execution plan. Great! We have successfully set up the VPC,   EKS cluster, and autoscaler in the staging  environment. To connect to the staging eks,   just update the name of the cluster  in your Kubernetes config file. You   can see that the cluster autoscaler is  running in the kube-system namespace. Now, let's check our clusters on the AWS console.  Since we used environment prefixes, we were able   to create two independent environments in a  single AWS account. We now have two clusters,   each with its own environment prefix. To destroy  both environments, we just need to run "destroy"   on the top level. Terragrunt will destroy both the  development and staging environments at the same   time. It will do so in reverse order, deleting  the helm charts, EKS clusters, and lastly,   VPCs. If you check the AWS console, you will see  that all the clusters and VPCs have been removed.  In this part of the tutorial, I'll teach  you how to use Terragrunt for real-world   projects. We'll separate our code using Git  modules, save our remote state in an S3 bucket,   and secure it with DynamoDB locking. Plus, we'll  use an IAM role to set up our infrastructure. First, we need to create an S3 bucket to save  the Terraform state. You can name it anything,   but remember it must be globally unique. It's also  a good idea to turn on bucket versioning, so if   something happens to your state, you can always  go back to an earlier version and recover it. Now, let's create a DynamoDB table  to lock the Terraform state. This   helps avoid conflicts when several  team members try to run Terraform   simultaneously. We'll name the table  "terraform-lock-table" and create a   "LockID" partition key. That's all we need  to manage the remote state in our S3 bucket. Instead of giving users direct access,  we can create a dedicated IAM role for   applying all infrastructure changes.  This role can be assumed by users or,   even better, by automation tools like Jenkins. For now, let's give the role admin access, as  the specific permissions needed will depend on   your infrastructure plans. We won't cover  IAM permissions in this tutorial. Choose   "AdministratorAccess" and add it to the role,  then name it "terraform." By default, any user   in the account can potentially assume this role.  We can limit this on the principal side if needed,   but it's not required, as we'll need to explicitly  grant users permission to assume the role. In the next step, we'll create an IAM policy that  allows users to use the "terraform" role. First,   copy the ARN of the role. Then, create  a new policy allowing users to use the   "terraform" IAM role and name it "AllowTerraform." Following best practices, let's avoid attaching  the policy directly to users. Instead,   create an IAM group called "devops" and add  the "AllowTerraform" policy to this group. For the demo, create a new IAM user  and add them to the "devops" group.   This will allow the user to assume the  "terraform" role. Keep in mind that any   user with Admin access in the account will  also be able to assume the "terraform" role. Next, generate security credentials for the user.   We'll use these credentials to  create an AWS local profile.   Download the credentials. Now you can use the "aws configure"  command to add a new profile for the user. Next, create a separate Git repository to store  your Terraform modules. You have two options:   store all modules in the same Git repo or  create a separate Git repository for each   Terraform module. With the last option,  you might end up with many repositories,   which can be difficult to maintain.  If you're just starting out or have a   relatively small DevOps team, begin with a single  repository. Name it "infrastructure-modules."   Make the repository private, add a README file,  and include a .gitignore file for Terraform. Now, clone the "infrastructure-modules" Git  repository and open it with a text editor. First, let's copy the VPC Terraform  module we created earlier.   Add this module to the Git repo and commit the  changes. When using a single Git repository for   multiple modules, create a Git tag specific  to this module so we can reference it later.   If you need to make changes to the VPC module,  commit again and create a new VPC tag. Finally,   push the Git tags to GitHub  or another remote Git server. So far, we have a single tag  in the GitHub repository. Next, let's copy the EKS module. Follow  the same workflow: add it to the repo,   commit the changes, and create a new  Git tag specific to this EKS module. Finally, let's move the "kubernetes-addons"  module to the "infrastructure-modules" Git repo.   Add it to the repo and create a new tag.  This is the last module we're going to add. Now we have three separate  Git tags related to specific   modules. That's how you manage multiple  Terraform modules in a single repo. Next, create a new Git repository to store  the live state of our infrastructure.   Make the repository private and select  "Terraform" for the .gitignore file. Clone this new repository as well and open it in  Visual Studio Code or your preferred text editor. First, create a root Terragrunt  file. In this case, we'll use a   remote S3 bucket to store our state. Since  we're using an IAM role to apply changes,   we need to provide the IAM role ARN and specify  the profile that can assume that role. Then,   provide the S3 bucket and, finally,  the DynamoDB table to lock the state. Next, let's generate the AWS Terraform  provider. We'll also use the same role and   AWS profile to run Terraform. Optionally,  you can give the provider a session name. Now, let's copy the development  environment from the previous   example and make a few adjustments,  as it won't work out-of-the-box.   First, let's clean up by deleting previous  local state and lock files from all modules. Next, we need to update the source of the modules  to point to the remote Git repository and use tags   to pin each module to a specific version. Let's  update the "kubernetes-addons" source first,   and then the VPC module. This example is  taken from the Terragrunt Quick Start. Additionally, we must update the Helm provider  to use a token; I'll explain why later.   We also need to ignore the Terragrunt cache. Now, switch to the development  environment and run "terragrunt apply."   It will show the order in which the modules  will be applied and ask for confirmation.   It seems that Terraform was updated and  no longer supports "github.com" as is.   We need to update this in the code. Let's go  through all three modules and add the "git"   schema before each module: the Kubernetes  addons module, and finally the VPC module.   Run the command again, confirm that you want  to update the state, and in a few minutes,   Terraform should create the VPC, EKS  cluster, and deploy the auto-scaler. Now, you can check the S3 bucket  and find that there's a "dev" key   and separate paths for each module:  VPC, EKS, and Kubernetes addons. Let's try connecting to the cluster  as we did in the previous example.   We now encounter an error; even when using  the default AWS profile with admin access,   we don't have permissions to access  Kubernetes. The problem is that only   the IAM user or role used to create the  cluster has access to it. I have a separate   video on how to add additional users and  IAM roles while following best practices. For now, let's create a new AWS profile for  the "terraform" IAM role. This indicates   that the "anton" user will be used  to assume this "terraform" role. Next, update the Kubernetes context once again,  but this time use the "terraform" profile.   Now, we can access the cluster. Let's  also check if the auto-scaler is running.   To destroy all the infrastructure, just  run "destroy" in the "dev" environment. The next step is to add additional users and  learn how to use ArgoCD to deploy applications   to Kubernetes. Thank you for watching,  and I'll see you in the next video.
Info
Channel: Anton Putra
Views: 40,551
Rating: undefined out of 5
Keywords: terragrunt, terragrunt tutorial, terragrunt vs terraform, terragrunt terraform, terragrunt aws, terragrunt explained, terragrunt demo, terragrunt multiple environments, terragrunt dependency, create EKS using terraform, EKS, AWS EKS, Kubernetes, create eks cluster aws using terraform, create eks cluster aws, eks aws, eks tutorial aws, terraform eks, terraform eks cluster creation, devops, anton putra, terraform, aws, aws cloud, aws tutorial, sre, gitops, terraform helm, helm, k8s
Id: yduHaOj3XMg
Channel Id: undefined
Length: 61min 9sec (3669 seconds)
Published: Sat Apr 15 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.