How to Monitor a Kubernetes Cluster in 2022 with Prometheus & Grafana

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

kubernetes is constantly receiving updates almost every single day being an engineer and looking after a kubernetes cluster means you probably have to keep your kubernetes cluster up to date now with a lot of software tightly integrating with kubernetes it means that every upgrade will probably bring some deprecations today we'll be taking a look at what it takes to monitor a kubernetes cluster in 2022 with prometheus and kubernetes 1.23 we'll take a look at all the components needed we'll create a kubernetes cluster deploy all the components and check out the dashboard but more importantly i'll show you the strategy that i use to make sure i can deploy the latest version of the monitoring components for the latest version of kubernetes so even if you're watching this sometime in the future you should be able to use this as a strategy for future versions of kubernetes prometheus is a metrics aggregator we can configure prometheus to scrape metrics endpoints of various systems grafana is a dashboarding tool which looks at prometheus as a data source we can visualize all our metrics in grafana to see how our cluster is performing now there are many endpoints for monitoring kubernetes majority of these endpoints already come with the cluster other components will add so that we can get extra observability let's take a look at the source code so if we take a look at my github repo i have a monitoring folder and in the monitoring folder i have a prometheus folder with all my guides on prometheus including kubernetes in the kubernetes folder we have a readme showing how to monitor kubernetes using prometheus i have all my old guides here and today we're going to be taking a look at kubernetes 1.23 and if we go into that that guide will link us to the 1.23 folder and as you can see there's a readme in there which is the kubernetes 1.23 monitoring guide so in this demo we're going to create a cluster with kind we're going to install the q prometheus components which uses the prometheus operator deploys all the manifests and then we're gonna view the grafana dashboards so be sure to check out the links down below to the source code so you can follow along now the first thing we're to do is create a kubernetes 1.23 cluster using a product called kind now kind is a very useful utility used by the kubernetes community to test kubernetes in docker containers it allows us to run lightweight portable kubernetes clusters in lightweight docker containers that we can use for testing so the first thing i'm going to do is change directory to my monitoring prometheus kubernetes 1.23 folder and if i do alice in that folder you can see we have a kind dot yaml taking a look at the kind.yaml you can see this is our cluster definition we're gonna have one master and three workers and this is to showcase how to monitor multiple worker nodes in kubernetes as well so to spin up that cluster it's very easy i'm going to say kind create cluster i'm going to give it a name call it monitoring and i'm going to pass in an image for 1.23.1 it's a version of kubernetes i'm using and the kind yaml configuration i'm going to paste that to the terminal and that will go ahead and spin up a kubernetes cluster in docker we can use for testing and when that command is finished you can see it created our cluster i can say cube ctl get nodes and you can see we have one master and part of the control plane and three worker nodes now before you continue it's important to understand what the prometheus operator is if you're new to prometheus check out my link down below to my prometheus guide the prometheus operator is used to manage prometheus instances so that you don't have to deploy them yourself we're gonna deploy one prometheus operator and then we're gonna use that operator to deploy two prometheus instances now writing and maintaining configuration for prometheus is a pain that's why there is this thing called service monitors a service monitor tells prometheus what services in kubernetes to monitor so if you have an arbitrary deployment with some pods running behind it and you're exposing a service to that pod you can create a service monitor that uses a label selector to select the service and then in prometheus you can use label selectors to select the service monitors that prometheus needs to consume that'll tell prometheus what service endpoints to start scraping to collect metrics and that is roughly how things are stitched up now fortunately there is a prometheus operator community on github and they maintain all the manifests under the cube prometheus repository there are raw yaml manifests as well as a helm chart that you can use to deploy all of these components to your cluster but more importantly before deploying anything to kubernetes we'll want to make sure we have a compatibility matrix so if we scroll down here we can see there's a kubernetes compatibility matrix if we click into that we can see what version of q prometheus supports what version of kubernetes so this helps you make important decisions when upgrading your cluster as you can see there are two version compatibilities here so 1.19 is compatible with 1.20 if you are using release 0.7 so this allows you to deploy the right release to use as a stepping stone for upgrading your cluster as there's always some sort of backwards compatibility here in this video we're going to be taking a look at release 0.10 which is for the latest version of kubernetes 1.23 so this is an important strategy to use when you want to build out your monitoring for future versions of kubernetes now to get the manifest it's really really simple what i'm going to do is run a lightweight alpine container and mount all my source code into the container so i'm going to say docker run minus it mount this current working directory into the container on a folder slash work i'm going to set that as my working directory and i'm going to run alpine so i copy paste this and this is because i want to do some things in the container that i don't want on my machine the first thing i'm going to do is just add git so i'm going to say apk add git this will install git inside of the container and next up what i'm going to do is i'm going to create a shallow clone of that q prometheus repository but specifically release 0.10 that's the one that's used by kubernetes 1.23 so i'm going to say git clone depth 1 and the url to that github repo and i'm going to pull in the branch which is really 0.10 and i'm going to clone it to the temp directory so copy paste that that'll clone it into slash temp so now i have this github repo inside of the container to view the files we can say alice minus l slash temp you can see we have all the files here now we're not interested in all of these files what we're interested in is the manifests folder that is where all the kubernetes yaml is to deploy this entire stack now to view the manifest i can say alice minus l slash temp slash manifests if i run that you can see everything is in here so we have alert manager black box exporter graffana cube state metrics a bunch of service monitors we have node exporter we have our prometheus instance we have the prometheus adapter and the prometheus operator as well as a bunch of rbac roll bindings we also have a setup directory the setup directory contains all the crds that is used by the prometheus operator so what i want to do is go ahead and grab this manifests folder and put it into my github repo so that we can take a closer look so from within this container i'm gonna say cp minus r for a recursive copy i'm gonna copy the manifest folder into my current working directory so i paste that and because my current working directory is mounted in you can see we now have a manifest folder on the left here and these are all the files we've just copied in so on the left here you can see all the manifests i just showed alert manager blackbox exporter all the way down to node exporter prometheus operator as well as the prometheus instance we also have a setup directory which is all the custom resource definitions as well as the namespace so if you deploy the setup directory first it will create this namespace the namespace is called monitoring and it will create all the custom resource definitions and then when you apply the remaining bunch of yaml it'll go ahead and deploy all the components needed to monitor your cluster it's also important to note that the q prometheus repo also has a quick start section which shows you how to go ahead and deploy all the stuff you can see they say here that you need to apply the manifest setup directory first and then wait for all these custom resource definitions to be applied before applying the remaining manifests folder so now we're done inside this container i can go ahead and exit out and to deploy this it's extremely simple all i'm going to do is say cube ctrl create minus f and i'm going to apply the setup folder first copy paste that to my terminal and you can see it's gone ahead and created the custom resource definitions because kubernetes doesn't know what an alert manager is it also doesn't know what a pod monitor or a probe or a prometheus or prometheus rules service monitor or thanos rules these are all new objects that the prometheus operator will use to monitor our cluster remember to check out my prometheus guide down below where i go into much more detail on all these components now that we have all the crds in place we can go ahead and deploy the manifest so that's very simple we say cube ctl create minus f and we apply the manifest directory that's as simple as that that'll go ahead and apply all those components and if i immediately go cubectl in the monitoring namespace get pods you'll see all the components that are being applied so go ahead and run that you can see there's a black box exporter grafana cube state metrics node exporter prometheus adapter as well as the prometheus operator now once this prometheus operator is created it'll start looking at those custom resource definitions and because we've deployed a prometheus instance and a bunch of service monitors this prometheus operator will go ahead and deploy two prometheus instances and it'll take all those service monitors and configure the prometheus instances so that they can start scraping now the most important key components here are cube state metrics cube state metrics is like a metric server that gets all metrics from pods such as cpu usage network disk io memory usage and so forth there's also node exporter node exporter is one of my favorite because it is supplying all the metrics of the entire node node exporter provides rich telemetry of what's happening on your node so if you have a troublesome pod you can use node exporter to find out if there's any performance issues on that node itself and then we have the prometheus operator which basically helps us maintain all the prometheus instances and then there's grafana which we'll take a look at in a second which helps us visualize all of these metrics now if i waited a couple of minutes and i do get pods again you can see all the components on our up and running including two prometheus instances alert manager as well and these are all the components you need for the basic monitoring of kubernetes now what we're going to do is we're going to open up a new window and i'm going to port forward to grafana to check out the dashboards to do that it's very simple i say cubectl in the monitoring namespace port forward to a service called grafana on port 3000. that is because i don't want to make my grafana public so i set up a private port forward endpoint and this allows me to port forward and access grafana the username and password is just the default which is admin admin and for this demo i'm just going to skip creating a new password and the first thing we'll want to do is come over to the settings and look at data sources and we can see that we have a prometheus data source here now there is one caveat if you click this prometheus data source you can see it's monitoring this kubernetes service which for some reason if you hit test you'll find that it'll eventually time out and it doesn't work i'm not sure why this is but i have a fix for it now ideally it should just work out the box but for some reason it doesn't it's very simple to fix it if you run cubectl in the monitoring namespace getservice you can see that there is this prometheus kits service that is the one that's set up in grafana that it uses as a data source but that one is not working there's also a service which points to the exact same pods which has two endpoints for our prometheus pods so i'm going to use this service instead and it's very easy to change this if we go to our manifests folder and we look at the dashboarddatasources.yaml it has the link to that service right here what i'm going to do is i'm going to change that to the prometheus operated service go ahead and change that and i'm going to save that file so to fix that it's very simple i say cubect i'll apply minus f and i apply that yaml file that'll go ahead and update that secret which has the endpoint fix then what i need to do is restart grafana so that change can take effect so i'm going to say cube ctl get pods and see that's migrafana instance over there i'm going to say cubectl and i'm going to take the name of grafana pod paste it there that'll go ahead and delete the grafana pod now if i do cubect i'll get pause you can see i'm getting a new grafana instance which will have that fix and now that that's up and running and ready we can run this port forward command again so i'm going to paste that to the terminal and i'm going to go back into grafana under data sources you can see now my change has been affected i go down to the bottom i say test and the data source is working now i can go back to the home page and we can take a look at all the dashboards now before we take a look at the grafana dashboards it's more important to understand how these dashboards are populated how does prometheus get all this telemetry to understand that let's jump into the prometheus instance now if i do cubectl in the monitoring namespace getpods we can see that there are two prometheus instances and they are interfaced by a service if i do ctl get service we have this prometheus operated service which exposes port 1990 that we can use to access the ui so to check prometheus is very simple i say cubectl port forward and i port forward to that service on port 1990 go ahead and copy paste this to the terminal and now we can access the prometheus user interface now the first thing we want to do is go to status and go to targets and here we can see all the service monitors that the prometheus instance is configured with so you can see here that the prometheus instance scrapes alert manager it scrapes black box exporter core dns graphana cube api server cube state matrix the cubelets as well as node exporter prometheus adapter and prometheus also provides matrix to itself as well as the operator now the most important ones here are the api server if you're managing your own kubernetes cluster and you have a lot of web hooks running you may want to monitor the api server cube state metrics provides telemetry of all pods such as cpu memory disk io and network and then the cubelet is what's running on every node it has telemetry about containers running on every node and then we have node exporter which provides rich telemetry of the actual nodes that running our workloads so if we take a look at the service monitor and use node exporter as an example you can see if i click this one it has four instances that it's scraping so there are four endpoints here one for every node we have three worker nodes and one master node now how does prometheus find these endpoints to understand that what we need to do is check the service monitors so we say cube ctl in the monitoring namespace get service monitors and you can see we have a bunch of them all of these should appear on that prometheus targets page so you can see we have a service monitor called node exporter and if i say cube ctl describe service monitor node exporter it'll tell me about that service monitor and you can see if i do that it has a selector matching labels of a service so it's looking specifically for kubernetes services with these labels and if i do cubectl in the monitoring namespace getservice you'll see that we have a node exporter service that has those labels on it so that is basically how prometheus is configured so that provides you with all the basic components needed to monitor a kubernetes cluster now it's important to note that prometheus stores all this telemetry in memory that means if prometheus dies you lose all your metrics now you can set up persistent storage to write this stuff to disk so in case your prometheus instance restarts you can still load all of this data from disk back into memory but for most companies this is sufficient enough because most companies uses remote right to write these metrics into third-party applications such as sumo logic data dog new relic and other third-party metrics providers that means if prometheus is restarted you still have all your telemetry sitting outside of your cluster so hopefully this video helps you deploy the latest version of monitoring components to the latest version of kubernetes and also gives you a clear indication of how this stuff is stitched together how it works and give you a strategy on how to keep your monitoring up to date remember to let me know down in the comments what sort of videos you'd like to see in the future and as always like and subscribe hit the bell and if you want to support the channel even further be sure to check the patreon link down below or become a youtube member and also check out the community link down below and as always thanks for watching and until next time peace [Music]

Info

Channel: That DevOps Guy

Views: 70,831

Rating: undefined out of 5

Keywords: k8s, kubernetes, prometheus, grafana, operator, monitoring, observability, course, tutorial, guide, devops, learning

Id: YDtuwlNTzRc

Channel Id: undefined

Length: 18min 9sec (1089 seconds)

Published: Sun Jan 23 2022