Beautiful Dashboards with Grafana and Prometheus - Monitoring Kubernetes Tutorial

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

today we're gonna set up grafana and prometheus to monitor our entire kubernetes cluster the easiest and surefire way out there the holy grail of every kubernetes cluster is getting beautiful and meaningful charts filled with glorious data in grafana grafana is a single plane of glass to visualize your data wherever it lives whether that be on some servers in your home lab some raspberry pi's in a cluster or even a kubernetes cluster in the cloud it's able to do all of this without ingesting data from some back-end store or locked away in some database it uses prometheus which is an open source monitoring and alerting system to create and store time series data this data is scraped from metrics endpoints using the pull model over http which makes it really easy to get that data and then it stores them on the prometheus server from there you can visualize it with grafana and even set up alerts using alert manager and that's where the challenges start it's been kind of rough setting up prometheus properly in kubernetes with grafana there are so many ways to do it and not to mention how challenging it is to set up grafana with charts that work look good and help you visualize meaningful data well over the last week i've set out to solve just that i wanted to create a repeatable way to install and configure grafana and prometheus and gather logs from all of my nodes and workloads and that's when i found the open source project kube prometheus stack this helm chart installs a collection of kubernetes manifests rifana dashboards and prometheus rules as well as alert manager rules to completely monitor your kubernetes cluster and there's no guesswork involved i've already created a configuration that should work on your cluster that will give you the same data as i'm able to collect but before we get started i just wanted to give a huge thank you to our sponsor the tree for making today's video possible how many times have you applied a kubernetes configuration only to realize later that it was misconfigured not configured according to best practices or just plain wrong these types of misconfigurations can create engineering churn and possibly even downtime that's where the tree can help the tree is an open source tool that prevents kubernetes misconfigurations from ever reaching your kubernetes cluster it does this by scanning objects against essentially managed policy this policy comes with kubernetes best practices built in but it's flexible enough so that teams can customize this policy according to your organization's needs and a tree isn't just a simple yammel enter along with yaml validation it does schema validation as well as checking against your configured policy the tree also comes with a fancy dashboard that is backed by great documentation to help you fix errors fast it installs in seconds and can be run from a cli from coop control in your ci and cd pipeline and even as a kubernetes admission hook that can intercept and test kubernetes manifests even in this last mile to sum it all up the tree can help prevent kubernetes misconfigurations from happening in the first place so help empower your engineers by installing the tree today be sure to see the link in the description to get started with the tree today to monitor our kubernetes cluster we're going to need a few things first of all we're going to need a kubernetes cluster up and running now if you need help setting one up i've created an ansible playbook that does this automatically for you using ansible in k3s you'll want to run a command like coupe control get nodes to make sure that you can communicate with your cluster this should return all the nodes in your cluster now i'm doing this on a clean kubernetes cluster when i just spun up with ansible but you can do this on an existing cluster and it'll work just fine but if you are going to install this with k3s and possibly with my ansible playbook there are some additional flags you'll need to pass through to k3s and these additional arguments are listed right here now i'll have these in the documentation and you can find that in the description below but prometheus will still work if you don't have these arguments you'll just have less metrics which means less charts so you'll want to add all of these arguments to get the most data inside of your charts now with k3s you can re-run and bootstrap this again using these arguments or you can update a yaml file and then restart your notes i'll have that in the documentation too and if you have no idea what i'm talking about totally fine you can take care of this later you just won't have some data in some of the charts but once you enable these flags you will so if you're getting hung up on this don't worry too much about it and if you need help from a ui that's totally fine i totally get it a lot of things are easier to visualize with a ui than it is in a terminal i'll be providing the cli way to do it but i'll also be doing some stuff in lens lens is a desktop app that makes it easy to visualize your cluster and i showed this off in another video but i'll be using it in this one too but it makes checking things out poor forwarding logs etc a lot easier than doing it in the cli but as i mentioned i'll be doing both the next tool we'll need and this will be the last one is helm so helm is a package manager for kubernetes it helps you install applications by setting some default values from the package maintainers it also helps you version those applications too so you'll want to be sure that you have helm installed and that you can run a helm version it should return something just make sure you have the latest version once all of that's done let's install cube prometheus stack the first command we're going to run is helm repo add and we're going to add this repository to our local machine so we can install this on our kubernetes cluster then we'll want to run helm repo update which updates our local repository with the latest version as you can see i use helm a lot you'll see a lot of different repositories but we can see right here prometheus community chart repository so it's added to our local machine next we're going to create a namespace for all of our components that are included with this chart now this isn't necessary and if you don't specify a namespace later on it will just install them to the default namespace in kubernetes now namespaces are kind of a big topic but some people use them to organize things or segment things or even for security here i'm just creating a namespace called monitoring strictly for organization known purposes anything in my cluster that has anything to do with monitoring i'm going to put in this namespace so let's create that so we created that namespace and we could test this by running coop control get namespaces and we can see our monitoring name space down there so next what we would typically do is run a helm install command and we would pass a bunch of flags however rather than passing a bunch of flags to it we're going to provide a values yaml file so this yaml file is kind of like an answer file if you've ever done anything programmatically before or scripted installation of software most of those installation packages you can provide an answers file so think of a helm values file like an answers file except for it comes with some default values already and boy does it come with some values this values file is really long like it is super duper long like i'm scrolling and it's thousands maybe lines of yammer still going still going page down still going still going it's a lot of yaml so this file is over 3 000 lines long of yaml now remember i mentioned that the answers file can provide default values for you well these are the default values so really we don't have to set any of these if we want the default values but we're going to override a couple of things and this is where i spent most of my time over the last week was configuring my own values file so that it runs the way i want it and here is the values file i have a lot of values in here as you can see and i've trimmed it down to 181 lines 180 yeah 181 lines still a lot of ymo and i will admit that i do have some default values still defined in here now doing that is totally up to you but i typically do it for knobs and dials that i want to turn later on it's one thing to know that they exist and it's another thing to actually change them this helps me solve both of those problems so let's talk about some of these some of these may or may not be obvious but we should talk before we just apply this yaml file so full name override is just saying that we're going to call this deployment prometheus if you don't change this you're going to get prometheus prometheus grafana dash something else and it's kind of confusing so i override some of the names of these deployments so they're easier to find and they look better in my kubernetes deployment but i will have to remember in the future that they come from this prometheus chart which is actually a prometheus operator so we didn't really talk about operators but you can think of operators like orchestrators of other packages within kubernetes so this operator will create and orchestrate all of these packages within kubernetes so for this operator it's operating in orchestrating grafana prometheus alert manager and more if you wanted to and you could see some of those rules and settings that we have here so these are all default values this was really just for me to know that they exist in case something went wrong when i was configuring it but i figured i'd leave them here next we're saying for alert manager we're going to do a full name override and call it alert manager that's so we don't have really weird long names we're going to enable it and i'm disabling the ingress you might be asking why are you disabling the ingress don't we want to enable it yes yes you do at some point but because people use different ingresses ingresses and grassy congressi ingresses i didn't want to enable this so if you use an nginx ingress controller or traffic ingress you can enable this give it a few labels and it will create the ingress for you so you can get to these services now i don't enable it because i use an ingress route with traffic and i'm going to take care of that later with some manifest on my own so you won't see me enable any of these ingresses and if you're using traffic and you're actually using an ingress route as well i'll include that in my documentation now i said ingress a lot if you don't know what an ingress is really simply put it's just a route or a path to get into your service and if you don't understand that really it's just a way to expose this service over http so you can get to it either via ip or dns entry let's continue now that we got the hard stuff out of the way now i can gloss over the kind of the next ones so next we're saying grafana is enabled we're doing the full name override again and then i have some additional properties and most of these are default and then we get to this right here admin so these are the credentials or how we're going to log into the system now if you don't set this it has a default value and you can change that later when you sign in but i'd rather create a secret up front store that secret in kubernetes and then use that secret for grafana so it has my secure password the instant i deploy it uh but just know that we are going to use an existing secret that secret name is grafana dash admin dash credentials the user key on this secret is the admin user and the password key on the secret is admin password so this is really just mapping these values to secrets that plug in it's not the best diagram but kind of how it works we'll create a secret here in a little bit and then we're turning on monitoring for a bunch of stuff and this took me a lot of time to figure out on coop control manager you'll see that i have it enabled and endpoints and these endpoints are ips of my k3s servers not my agents my servers formerly known as masters and you'll see i did this for quite a few things so for cubette cd we want to monitor that i gave it the ip address of the endpoints i also gave it some ports which the default are fine but i still specified them so i know what ports they are and if the defaults ever change i don't have to worry about it changing cube scheduler same thing we're going to monitor it and we're going to give it the ip address of our k3s servers and the same thing with cube proxy we're going to do the same thing ips of our servers so that's basically telling it hey for all of these services queue proxy cube scheduler lcd here are the ip addresses of those services to go and scrape so we can have those charts later on so there are a few places where i decided to provide my own keys or values to replace the values that the helm chart is using and what i'm doing here although it looks confusing all i'm doing is replacing the ip address for these devices we talked about earlier those servers with the node name in the charts so it just looks a little bit nicer and we'll see that and for prometheus note exporter almost the same thing you will see these extra args specified here but really these are the default ones but i specified them so i could kind of play with their values and so that i know what they are but i haven't changed anything if you really want to delete extra args you totally good because the default ones are just this next some more relabelings and for the prometheus operator some more resources that i set and then some additional flags here again most of these are default so one thing you are not seeing here is storage so i have a default storage class set up on here it's going to use the default storage class with the default values and the default retention policies except for this one right here i set it to six hours but if you don't have a storage class set or you have the weird problem where you have multiple default storage classes you'll want to specify either a default in your cluster so that it just works or you can set storage keys on these services that need storage and specify which storage class to use for example you'll see in the default values they'll have a storage spec set with an empty object or a struct depending on what language we're talking about but you can see that this is empty the default version is empty and so if you wanted to supply a persistent volume claim you can set a claim template set the storage class the access mode how much you want to provision and then selectors if you need now i recommend doing this in the future rather than take the default values which for some of them are none that means unless you're using local node storage after that container restarts you're going to lose your metrics so i highly recommend setting this up and i'll have some examples in the documentation as well the only reason why i didn't set it is because i don't know which storage class you're using are you using rook ceph are you using longhorn are using local storage or using nfs all are great all work but i'm not sure which one you're going to use so not super important the first time you launch it but you'll want to get these in place shortly after as i said i'll have some examples but it's really going to be up to you to determine your storage class your template and how much you're going to provision and it's easy to update these values as you'll see here in a little bit and roll this out with new settings so that was a lot of talk about settings but i figured it was worth a few minutes to talk about them because they're super important and a lot of this context you can't get from looking at a values yaml file especially a default one of over three thousand characters so let's deploy and check out our charts so i did mention that grafana needs a username and password so that you can sign in and that we're going to use a secret for it so let's create a secret really quick so that when we spin up our grafana dashboard we can sign in securely using that secret now there are lots of ways to create secrets you could do them in the ammo but then you don't want to commit them so you would probably encrypt them if you were going to or you could do them all from the terminal but then that's in your history or you can do what i'm going to do and echo them out to the file system but you'll need to make sure you delete them from the file system afterwards anyways this is how we're going to do it we're going to echo out to a file called admin dash user the name admin user next we're going to echo out to the file admin password our super secret password don't use this one but i am then if we cat out this admin dash user we should see our admin user there and if we can out admin password we should see our password there so we're in a good state so we're going to create this secret using coop control create secret generic grafana admin credentials that's the name of our secret we're going to say it's from the file that we just created admin user and then from the file admin password and then we're going to store that in our namespace monitoring the namespace is really important it needs to be in the same name space as our services remember we're going to install those into the monitoring namespace so our secrets need to be there as well so we'll hit enter we'll create this secret and kubernetes says hey we created a secret now if we wanted to make sure that kubernetes has the right secret we could do a coupe control get secret from the namespace monitoring and then we're going to say this is the name of the secret we're going to output it to jsonpath and in json we're saying use that property called data admin user and then base64 decoded because they're base64 encoded in kubernetes so if we run that we see our admin user user and if we run this command for the password we can see our password there as well so we're in a good state so now we can install our coup prometheus stack and that's as easy as running helm install dash n monitoring for our namespace prometheus the name of the deployment and then the name of the repository that we're going to use and then we're saying dash f values.yaml so that dash f means file and then we're providing a file to it and it's that values yaml file that we just reviewed a little bit ago so if we run this it will start to install and while that's going let's pull up lens and see what it's doing if we look at lens we can see some things are starting to happen we can see an overview things are getting pulled and installed we can see we have some pending pods some pending deployments some pending demon sets some pending replica sense some pending jobs some pending everything this is a good sign so things are getting installed and let's take a look at pods and see so we see that alert manager is still coming up grafana is still coming up it looks like it has three replicas and prometheus is still coming up so these are all still coming up you want to wait till these are fully installed here's the cool thing about lens 2 we can go and look at this specific deployment we can see lots of stuff about it we can see cpu memory file system we can even see the pod logs right from here so this pod is up and going it's getting a 500 but it'll settle down here in a bit and if we wanted to filter this by namespace by monitoring which i mentioned earlier we can see all of our pods in this name space now so we have alert manager grafana cube state node exporters for all of our nodes to scrape metrics our prometheus operator and then our prometheus prometheus prometheus deployment or staple set so it looks like we're in a good state everything looks good to go so let's check out grafana now as i mentioned earlier typically you would want to install an ingress and expose this so that you can actually get to it but kubernetes has a really nice way to get to deployments or even individual pods without exposing anything at all so you can do this without an ingress and it's port forwarding with kubernetes so if i want to port forward to one of my services say for instance our grafana service i could go down to networking and go to services and click on grafana and right here we'll see i can actually click on this and this will port forward to the service and i can get to it without even exposing it through an ingress so if i click on this it's actually going to open up a browser port forwarded to my cluster and i can get right in so this isn't exposed anywhere in my cluster it's all going through localhost and using portfolios to get to these services so really awesome and we can do that from a cli too i wanted to show you in lens because that's really awesome if you don't want to run commands so let's do it in the cli too so let's actually delete this port forwarding rule within lens remove and that's in the networking section all right let's do it the terminal way so the terminal way is going to be coupe control port dashboard and then your namespace dash n monitoring is our namespace and then grafana as you see i did a tab completion i enabled the kubernetes plugin for zsh which will give me tab completion it will go out query kubernetes look for pod give me one and here's our pod and then i'll pick a port on the outside which can be anything i picked five two two two two and then the port on the inside is three thousand so after we run this it's now port forwarding this to my local machine and if i go to local host five two two two two i can see microfinance dashboard so really awesome so now let's enter our username and password so i think it was admin user and then my super secret password and i didn't get it right it was so secret i couldn't even figure it out there we go got the password wrong but the nice thing is it actually used our secret so the cool thing that just happened was we used our secret from kubernetes to sign in and that secrets not exposed in our manifest at all we reference that secret in our manifest so i figured it was worth teaching secrets just so we aren't setting our passwords in manifests or relying on the default password but once we get into grafana this is typical gravana stuff or is it when we go into grafana we automatically have dashboards set up for a lot of the things we care about for example if we want to check alert manager we already have data so this is scraping the alert manager service and getting this data if we want to check out at cd and see how etcd is doing we have it right here we can see that three lcd nodes are up and we can see different metrics already in these beautiful charts if we wanted to see how much compute resources our cluster is using it's right here so our cpu utilization is around one percent almost two percent so we've requested about nine percent and we've set a limit to around two percent that probably needs to be adjusted and we're right about one percent so we're doing pretty good how much memory we're using these nodes have low memory and so we're below the limit which is good which means our pods won't get evicted for the most part but we don't have our requests set properly because we're over so as it schedules pods kubernetes might not know where to put things or it might try to put things where it doesn't have enough memory anyways that's pretty advanced topic this looks pretty good we have data but you can see we have lots of data so per name space i only have three name spaces right now monitoring and metal lb and cube system how much memory they're using how much memory requests and if we go back we can actually see this per pod too now if you land on this and you don't see any data you probably need to change the namespace like i said this is a clean cluster and i don't have anything deployed in default so where did we deploy stuff and monitoring so as soon as we go into monitoring we can see cpu utilization memory utilization and we can see all of our individual pods so this is pretty cool and as you deploy services you'll be able to see all of this data for individual pods too and since i don't have a ton of data here i have one preheating in the oven at 350 degrees it's already been running for about 365 days no it's my home production cluster so let's take a look at that a lot more data there than this test cluster okay so let's check out the dashboard on my home cluster so if we check out core dns lots of activity there so core dns a lot of queries for dns within my cluster and if we look at compute resources oh these charts look so good you can see my compute resources per name space so networking monitoring longhorn cube gitlab flux default serve manager i have a lot of name spaces i tried to keep them somewhat organized but you can see i have a ton of data here and this is so awesome to look at you can see bandwidth per container or per name space if we wanted to drill into individual pods let's say for instance we wanted to look at my monitoring stack we can see these individual pods and how much cpu utilization and how much compute resources they're using so it's really easy to identify if something is going wrong or something's using a lot more than it should like this one right here kind of a problem right here but the memory request for this prometheus prometheus prometheus which i think is a replica set is using a thousand percent more memory than we set in its request so i should probably increase the request so that it can be successfully scheduled somewhere with the appropriate amount of memory but really cool really cool and if we want to see networking for my whole entire cluster we can and this is divided up it looks by namespace so who's transferring the most data out of everyone it looks here like kettle system yeah cattle system so far is transferring a lot of data so cattle system is for rancher but you can see cube systems up there monitoring's up there they're all doing things normally i mean this doesn't look out of the ordinary two megabytes per second that's not a lot you can see bandwidth history over time the min max average the current and we could see the average bandwidth if we want to i think there's some fancier gauges in here too so if we look at networking at the name space level per pod if we look at the namespace monitoring we can see how much overall bytes are received and transmitted so a couple of gauges but pretty cool pretty cool so you have charts already set up within grafana for all kinds of metrics everything you see here what happens if something goes wrong and that's where alert manager comes into play you can set up alerts based on metrics and based on thresholds to let you know when something's going wrong say for instance my cpu utilization stayed higher than a certain threshold say eighty percent for more than five or ten minutes i can set up an alert to alert myself and that's all done within alert manager which if you want to know more about alert manager we'll be covering that in a future video so be sure you're subscribed to see how that turns out so grafana and prometheus is pretty amazing and it's pretty amazing that this helm chart exists to fully configure your kubernetes cluster with very little configuration and for all of you out there you can just use my configuration and save the hours of trying to figure this out for yourself so i hope you learned something today and remember if you found anything in this video helpful don't forget to like and subscribe thanks for watching i know very little german i think in like uh middle school i we got to take we took four languages in middle school and i remember uh the gates uh the gates hoyta vias not svet there [Music] i think that means read the list because i think our german teacher would tell us that all the time at least elisto oh oh and uh what uh what a uh esses findig s findig means it is windy that's about the only german i remember from from middle school german that was a while ago too so

Info

Channel: Techno Tim

Views: 104,430

Rating: undefined out of 5

Keywords: kubernetes, grafana, prometheus, alert manager, prometheus tutorial for beginners, grafana tutorial, charts, metrics, data, monitoring, k3s, kube-prometheus-stack, kubectl, helm, server monitoring, pod, pods, techno tim, lens, datree, beginner, visualization, observability, devops, operations, monitor cloud, cloud, how to install grafana, how to monitor kubernetes cluster, k8s, dashboards, alerts, operator, secrets, port forward, namespace, values, chart, networking, homelab, home lab, pi, cluster

Id: fzny5uUaAeY

Channel Id: undefined

Length: 27min 41sec (1661 seconds)

Published: Sat Jul 23 2022