How to Create GKE Cluster Using TERRAFORM? (Google Kubernetes Engine & Workload Identity)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
In this video, I'll show you how to create a GKE  cluster using Terraform. You may follow along and   create a VPC from scratch with Terraform, or  you can plug in values from existing network   and subnets. We will create two instance groups,  one for general services and one instance group   that will use spot instances and have proper  tains and labels. In the first demo, I'll show   you how to configure autoscaling for the cluster.  In the second one, we will use workload identity   and grant access to the pod to list GS buckets in  our Google project. For the final example, I will   deploy the nginx ingress controller with a public  load balancer to expose services to the internet. Let's create terraform folder where we're going  to place all the terraform-related files. First   of all, we need to declare a terraform provider.  You can think of it as a library with methods   to create and manage infrastructure in a  specific environment. In this case, it is   a Google Cloud Platform. When it comes to the  file names, try to give self-explanatory names. I include a link to the official  doc for most of the new resources   I will use in the code. You can find an example  of how to use it, all possible input parameters,   and output variables exported when the  resource is created. You need to include   a project id and a region for the google  provider. When you create resources in GCP   such as VPC, Terraform needs a way to keep track  of them. If you simply apply terraform right now,   it will keep all the state locally on your  computer. It's very hard to collaborate with other   team members and easy to accidentally destroy all  your infrastructure. You can declare Terraform   backend to use remote storage instead.  Since we're creating infrastructure in GCP,   the logical approach would be to use Google  Storage Bucket to store Terraform state. You   need to provide a bucket name and a prefix. We  will create them in a minute. Also, you have an   option to set constraints on the version of the  provider that you want to use. Now, let's create   a bucket before running Terraform. Go to Google  Cloud console and select Cloud Storage. You can   create a GS bucket using console, gcloud cli, but  you can't use Terraform since it needs a bucket   beforehand. Click on Create Bucket. Then pick a  name for the bucket; it must be globally unique. It's common to separate terraform workspaces by  the environment to reduce risks. This bucket will   be used to manage infrastructure in the  staging environment. For the location, I   always use multi-region. Terraform will only keep  the state in the form of json files. It will not   require a lot of storage and should be pretty  cheap. You can also select different classes,   but Standard is perfect for the Terraform. The  only parameter that you need to update is version   control. It will help you to recover the state  in case of an incident. That's all, now we can   create a bucket. Nothing stops you from using  the existing VPC to create a Kubernetes cluster,   but I will create all the infrastructure  using Terraform for this lesson. For example,   instead of resource, you can use data keyword  to import it to Terraform. Before creating VPC   in a new GCP project, you need to enable  compute API. To create a GKE cluster,   you also need to enable container google API.  Now let's create VPC itself. Give it a name,   for example, main. Then select the routing  mode. You have two options here: Regional   and Global. If set to REGIONAL, this network's  cloud routers will only advertise routes with   subnets of this network in the same region as  the router. If set to GLOBAL, this network's   cloud routers will advertise routes with all  subnets of this network across regions. Then set   auto_create_subnetworks to false. When set  to false, the network is created in "custom   subnet mode," and let us define our own subnets.  Maximum Transmission Unit in bytes. The minimum   value for this field is 1460. If you set this  value to true, it will delete the default   route to the internet. We need to explicitly  specify resources that need to be created before   creating VPC. We need compute and  optionally can specify container api. The next step is to create a private subnet to  place Kubernetes nodes. When you use the GKE   cluster, the Kubernetes control plane is managed  by Google, and you only need to worry about the   placement of Kubernetes workers. Give it a name  private. If you have more than one private subnet,   it's better to be more specific. Then the CIDR  range of the subnet. 10.0.0.0/18 will give you   16 thousand ip addresses to play with. VPC in  the Google Cloud is a global concept. You can   create subnets in different regions; on the other  hand, in AWS, VPC belongs to a specific region.   You need to provide a reference to the network  that we created earlier. Enable private IP google   access. VMs in this subnetwork without external IP  addresses can access Google APIs and services, for   example, Managed Redis or Postgres. Then you need  to provide secondary ip ranges. Kubernetes nodes   will use IPs from the main CIDR range, but the  Kubernetes pods will use IPs from the secondary   ranges. In case you need to open a firewall to  access other VMs in your VPC from Kubernetes,   you would need to use this secondary ip range as  a source and optionally service account of the   Kubernetes nodes. Each secondary IP  range has a name associated with it,   which we will use in the GKE configuration. The  second secondary range will be used to assign IP   addresses for ClusterIPs in Kubernetes. When you  create a regular service in Kubernetes, an IP   address will be taken from that range. Next, we need to create Cloud Router to   advertise routes. It will be used with the NAT  gateway to allow VMs without public IP addresses   to access the internet. For example, Kubernetes  nodes will be able to pull docker images from   the docker hub. Give it a name router. Then  the region, us-central1, is the same region   where we created the subnet. Then the reference  to the VPC, where you want to place this router. Now, let's create Cloud NAT.   Give it a name and a reference to the  Cloud Router. Then the region us-central1.   You can decide to advertise this  Cloud NAT to all subnets in that VPC,   or you can select specific ones. In this example,   I will choose the private subnet only. The  next option is very important, especially if   you have external clients. You can let Google to  allocate and assign an IP address for your NAT,   or you can choose to manage yourself. In case you  have a webhook and a client that need to whitelist   your public IP address (allow your IP address to  access their network by opening up a firewall),   that's the only way to go. Then the list of  subnetworks to advertise the NAT. The first   one is for the private subnet. You can also choose  to advertise to only the main CIDR range or both,   including secondary IP ranges. Since we will  allocate External IP addresses ourselves,   we need to provide them in the nat_ips field. You  can allocate more than one IP address for NAT. The following resource is to allocate IP.  Give it a name and a type External. Also,   you need to select the network_tier. It can be  premium or a standard. Since we create VPC from   scratch, we need to make sure that compute  API is enabled before allocating IP.The next   resource is a firewall. We don't need to create  any firewalls manually for GKE; it's just to give   you an example. This firewall will allow sshing  to the compute instances within VPC. The name is   allow-ssh. Reference to the main network. Ports  and protocols to allow. For ssh, we need TCP   protocol and a standard port 22. For the source,  we can restrict to certain service accounts   network tags, or we can use CIDR. 0.0.0.0/0  will allow any IP to access port 22 on our VMs. Finally, we got to Kubernetes resource. First,  we need to configure the control plane of the   cluster itself. Primary will be a cluster  name. Now for location, you can either select   a region or an availability zone. If you choose a  region, GKE will create a highly available cluster   for you in multiple availability zones of that  region. No doubt that it is a preferred setup,   but it will cost you more money. If you are budget  sensitive, you may want to select a zonal cluster   and deploy your Kubernetes nodes in different  availability zones. If you go with that approach,   as we will in this video, if something happens  with your control plane, all your applications   will continue running without interruptions. You  only won't be able to access the master itself,   for example, deploy a new application or a  service. But I would highly recommend choosing at   least two availability zones for Kubernetes nodes.  In my experience, availability zones go down   often. This cluster will have a single NOT highly  available control plane in the us-central1-a zone.   Then choose to destroy the default node pool since  we will create additional instance groups for the   Kubernetes cluster. Initial node cont does not  matter since it will be destroyed anyway. Provide   a link to the main VPC and a subnet. In this  case, it's a private subnet. Now be very careful   with services that you enable for Kubernetes.  Obviously, you want logging for your applications.   This option will deploy a fluent bit agent on each  node and scrape all the logs that your application   sends to the console. But it will add cost to  your infrastructure. At some point, for the short   period of time in one of my environments, the cost  of logging exceeded the cost of infrastructure.   Because the developer enabled debug logs. Be very  careful and constantly monitor the cost. Next is   monitoring; the same thing here it's not free.  If you plan to deploy Prometheus, you may want to   disable it. All cloud providers will try to sell  you as many managed services as possible, which   are very easily scalable and convenient. But it  may lead to a huge bill at the end of the month. The networking mode is VPC_NATIVE. Available  options are VPC_NATIVE or ROUTES. VPC-native   clusters have several benefits; you can read about  them here. As I mentioned before, if we create   a zonal cluster, we want to add at least one  availability zone. We already have us-central1-a   zone; let's add b zone. There are many different  addons you can enable and disable. For example,   you can deploy istio service mesh or disable  http_load_balancing if you're planning to use   nginx ingress or plain load balancers  to expose your services from Kubernetes.   Later I will deploy the nginx ingress controller  anyway, so let's disable this addon. The second   is horizontal pod autoscaling; I want to keep  this addon enabled. The release channel will   manage your Kubernetes cluster upgrades. Keep  in mind that you never be able to completely   disable upgrades for the Kubernetes control  plane. However, you can disable it for nodes.   Then I want to enable workload identity. You can  substitute this with variables and data objects.   You need to replace devops-v4 with your project  ID. Under the ip allocation policy, you need to   provide the names of the secondary ranges.  First for the pods and then for the cluster   IPs. To make this cluster private, we need  to enable private nodes. This will only use   private IP addresses from our private subnet for  the Kubernetes nodes. Next is a private endpoint.   If you have a VPN setup or you use bastion  host to connect to the Kubernetes cluster,   set this option to true, otherwise keep it false  to be able to access GKE from your computer. You   would also need to provide a CIDR range for the  control plane. Since it's managed by Google,   they will create a control plane in their network  and establish a peering connection to your VPC.   Optionally you can specify the CIDR ranges  which can access the Kubernetes cluster. The   typical use case is to enable Jenkins  to access your GKE. If you skip this,   anyone can access your control plane endpoint. Before we can create node groups for Kubernetes,   if we want to follow best practices, we need  to create a dedicated service account. In this   tutorial, we will create two node groups.  The first one is general without tains to   be able to run cluster components such as DNS.  Provide a cluster id. This node group will not   have autoscaling enabled; we need to specify  how many nodes we want. For the management,   allow auto_repair and auto_upgrade. Under node  config, we can specify that this node group is   not preemptible. Choose a machine type, for  example, e2-small. I prefer to have large   instances and a small number of nodes since  there are a lot of system components that   need to be deployed on each node, such as fluent  bit, nodes exporter, and many others. If you have   smaller instances, those components will eat a  lot of your resources. You can give this node   group a label. Provide a service account and  oauth_scope cloud-platform. Google recommends   custom service accounts that have cloud-platform  scope and permissions granted via IAM Roles.   Later we will grant the IAM role to the service  account to access GS buckets in our project. Now the second instance group. It will have a few  different parameters. Give it a name spot. Then   the same cluster-id. Management config will  stay the same. But now we have autoscaling.   You can define the minimum number of nodes and a  maximum number of nodes. Under node config, let's   set preemptible equal to true. This will use much  cheaper VMs for the Kubernetes nodes, but they can   be taken away by google at any time, and they last  up to 24 hours. They are perfect for some batch   jobs and some data pipelines. They can be used  with regular applications, but they have to be   able to tolerate if nodes will go down. Give it a  label team equal to devops. And most importantly,   such nodes must have taints to avoid accidental  scheduling. In this case, your deployment or   pod object must tolerate those taints. Same  service account and scope for this node group. To run Terraform locally on your computer, you  need to configure default application credentials.   Run gcloud auth application-default login  command. It will open the default browser,   where you would need to complete authorization.  When it's done, make sure that the google project   id matches the one you used in Terraform. Let's  change the directory where we have Terraform   files. The first command that you need to run is  terraform init. It will download google provider   and initialize the Terraform backend to use GS  bucket. To actually create all those resources   that we defined in Terraform, we need to run  terraform apply. Terraform wants to create   12 resources and destroy 0. Looks like that  is what we want; let's agree and type yes.   It may take up to 20 minutes to create  all those components, be patient. It's completed; let's go to the Google console to  look on VPC and other resources. When you enable   the compute API, Google will generate a default  network for you. You can disable the creation   of default networks by creating an organization  policy with the compute.skipDefaultNetworkCreation   constraint. Projects that inherit this policy  won't have a default network. Let's delete   this VPC manually. Click DELETE VPC NETWORK and  confirm that you want to delete it. This is the   main network that we created with Terraform. We  have a single private subnet in the us-central1   region. You can see the main IP address  range and secondary ranges for Kubernetes.   You will always have much more pods than  services in Kubernetes; that's why we have   different IP ranges. One for /14 and /20. We  also have a few firewalls created by GKE and our   allow-ssh firewall as well. Now let's go to the  Kubernetes engine. You may see some warnings,   but I found more often than not, they are  false alarms. We have two availability zones   for Kubernetes nodes, regular channel  and public endpoint. Under node pools,   you will find two instance groups. General with  2 nodes, one per zone and spot with autoscaling. To connect to the cluster, you need  to click connect and copy the command.   Then just paste it to the terminal and execute.   To check the connection, just run kubectl  get svc. It should return the Kubernetes   service from the default namespace.  We can also run kubectl get nodes to   list all the nodes in the cluster. We  have two from the general node pool. Now let's deploy a few examples to the  Kubernetes. The first one is the deployment   object to demonstrate cluster autoscaling.  Let's use the nginx image and set two replicas.   We want to deploy it to the spot instance group  that does not have any nodes right now. First,   we need to tolerate those taints set on the nodes.  Then we want to restrict deployment to only nodes   with label team equal to devops. PodAntiAffinity  will force Kubernetes to spread pods between   different nodes. We can use the kubectl apply  command and provide a path to the folder or file,   in this case, example one. Let's use the watch  command to repeatedly run kubectl get pods.   For now, pods are in a pending state since they  only can be scheduled on the spot instance group.   Let's split the screen and also run the kubectl  get nodes command. We can describe one of the pods   to check the status of autoscaling. You can find  the message from cluster-autoscaler that the pod   triggered scale-up. It's a good sign  we just need to wait a few minutes.   Two additional nodes joined the cluster from the  spot group. When they become ready, two pods will   be able to schedule. Now we have four nodes in  total, two for general and two for the spot. In the following example, I'll show you how to use  workload identity and grant access for the pod to   list GS buckets. First of all, we need to create  a service account in the Google Cloud Platform.   Let's give it an account id service-a. Then we  need to grant access to that service account   to list buckets. Specify the project. Then the  role, for example, Storage Admin. And a member,   which is a service account. I suggest using  google_project_iam_member. It's non-authoritative;   other members for the role  for the project are preserved.  Finally, we need to allow the Kubernetes service  account to impersonate this GCP service account.   To establish a link between the Kubernetes RBAC  system and the GCP IAM system. It's always the   same role workload identity user, but you need  to update the member. First is a project ID for   your GKE cluster. Then the Kubneretens namespace  where you are planning to deploy your application.   In this case, it's a staging namespace. And  a name for the Kubernetes service account,   which is service-a. We will create a  namespace and a service account in a   minute. Now we need to reapply the terraform  to create a GCP service account and bindings. Let's go back to the Google console.  And choose Service Accounts. Here is a   service-a service account created  by Terraform. Under the IAM tab,   we can inspect the roles assigned to that service  account. Here is an account and a Storage Admin   role. To check who can use this account, you  can go to the permission tab. Under principals,   you will find Kubernetes workload identity  associated with Kubernetes service account. Time to create the second example.  The first will be a staging namespace.   Then the deployment. Give it a name gcloud  and specify the same staging namespace.   We will use the gcloud-sdk image and define the  command and arguments to prevent exiting right   after the pod is started. This will give  us time to ssh to the pod and run gcloud   commands inside of it. Let's apply and test  if we can list GS buckets. We have a staging   namespace created a few seconds ago. Since  we created deployment in staging namespace,   don't forget to provide it with get pods.  To execute commands inside that pod,   use kubectl exec and provide the id of the pod.  You can specify the command to execute after   the pod. Now run gcloud storage ls to get all  the buckets in the project. We got an error;   the caller does not have storage.bukets.list  access. That's because when we omit the service   account in the Deployment object, it will use  the default service account in that namespace. Let's fix it. First, create a service account with  the service-a name, then you have to bind this   service account with the GCP service account using  the annotation. Under the spec of the deployment,   you need to override the service account  to use the one we just created. Optionally,   you can use affinity to place this pod to the  node with workload identity enabled. We have   enabled workload identity on each instance group  already. We updated the Deployment object with a   service account; when we reapply it, it will  force Kubernetes to redeploy the pod. We have   a new service-a service account. Now we need to  ssh to the new pod and run list buckets again.   If you run the gcloud storage ls command again,  it should impersonate the GCP service account and   get access to buckets. Alright, we have a single  GS bucket that we created for Terraform backend. For the last example, let me deploy the  nginx ingress controller using the helm.   Add the ingress-nginx repository.  Update the helm index.   And search for the nginx ingress. We will use  the 4.0.17 Helm Chart version. To override some   default variables, create the nginx-values.yaml  file. This is just an example that you can provide   some variables such as compute-full-forwarded-for,  use-forwarded-header, proxy-body-size,   and many others. Those variables override  global default nginx settings. You can find   all possible options on the nginx website.  You can also override the same settings on   the ingress level instead. New ingress uses a new  CustomResouce called ingressClass instead of old   annotation to specify the ingress to use for the  service. Enable it, and optionally, you can mark   it as a default ingress class. I highly suggest  using podAntiAffinity with all kinds of ingresses. This will make it highly available and spread  the pods between different nodes. In case one   node fails or be simply upgraded, you always  have another pod to handle the requests.   For this example, I use a single replica but  for production, always use multiple instances.   We will disable the admission webhook for this  example since in GCP with private GKE clusters,   it will require opening an additional port;  it's out of the scope of this video. You have an   option to configure the load balancer for the  ingress. By default, it will create a public load   balancer. But in case you want to have a private  ingress, you can set load balancer annotation   to Internal. Metrics are out of scope for this  video as well, but I have another tutorial that   explains how to use ingress with Prometheus. It's  time to deploy ingress. Provide the helm release,   namespace, version, and values to override. In  a few minutes, you will get a fully functional   ingress controller. Let's check if the controller  is running. Also, you should check the service; if   it will be stuck in a pending state, you need to  describe this service and find the error message.   Sometimes it's because you exceeded the  limit of the number of load balancers in   the project. When you get the IP address, you  can continue with ingress. Create the third   example. We will reuse the deployment object  created earlier for the autoscaling demo. This   service will select those pods by using the  app: nginx label. Then the ingress itself.   Default namespace. Then specify the ingress class  name to be external-nginx. The first example will   be applied to all possible domain names. Specify  the path and a backend pointing to the service   created before. Now let's apply this ingress. You  can also verify if you set up ingress correctly   by getting the ingress class. You should see  the external-nginx class name. When you get   the ingress, you should see the host. In our case,  the star will represent all possible domain names,   and you should see the address. It may take  a few seconds or minutes, but you should see   this address is equal to the nginx ingress load  balancer. The final step to make this ingress work   is to create DNS A record in your DNS provider.  In my case, it's hosted in Google Domains. Give   it a name. Then change it to A record. For the IP,  use the one from the nginx ingress load balancer.   You can test it in the browser. Paste  your domain; it should return the default   nginx page NOT from ingress but from the  Deployment. Looks like our ingress works   as expected. To use this ingress with only  a specific domain, update the host property.   If you get the ingress right now, you should  see the HOST field filled with your domain.   If you refresh your browser, nothing should  change. If you want to learn more about ingress,   I have a dedicated video with a bunch of examples  of how to use nginx ingress, including TLS,   cert-manager, and TCP proxies. Thank you for  watching, and I'll see you in the next one.
Info
Channel: Anton Putra
Views: 30,725
Rating: undefined out of 5
Keywords: google cloud, kubernetes engine, google kubernetes engine, Create GKE Cluster Using TERRAFORM, google cloud platform tutorial, google cloud platform, cloud computing, how to use gke, gke cluster, google cloud tutorial, gke tutorial, terraform, terraform google cloud tutorial, google cloud tutorial for beginners, gke, gke terraform, gke terraform tutorial, gcp terraform tutorial, gcp terraform, gcp terraform setup, terraform backend gcs, devops, sre, anton putra, aws, gcp, cloud
Id: X_IK0GBbBTw
Channel Id: undefined
Length: 23min 21sec (1401 seconds)
Published: Wed Mar 02 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.