Homelab Monitoring Made Easy - Part 1: Tools Overview - Grafana, Prometheus, InfluxDB, Telegraf

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everybody and welcome back to Jim's Garage this is the first of two videos that are going to focus on logging and monitoring this first video I'm going to walk you through all of the tools that you're going to require and how to configure and deploy those in the second video we're going to look at how you actually configure and use those tools so by the end of this video we'll have things like grafana influx DB Prometheus Telegraph and a couple of other freebies all set up within our environment in the second video I'll show you how to plug those in so that we can monitor things like Docker or crowdsec or pretty much anything that is able to spit out a log or has access to a metrics API this is going to give you all of the fundamentals you need to then happily go off a monitor and alert on basically anything you want to so we've touched a little bit on logging and monitoring already in this series things like our traffic logs which we set up so that we could use things like orthelia or things like crowdsec both of which require access to logs now the monitoring part which we're covering today Kana sits in the middle of the sandwich so at the bottom you've first got to enable your logging and then you want to do some monitoring of those logs because there's little point in just doing logging if you're not going to monitor them and the top part the next bit of bread in the sandwich that would be your alerting so a bit like in my world where we have security operations centers you would pass your logs you would monitor those and then you would set up alerts based on thresholds and what you're going to learn in the next couple of videos is applicable to pretty much any IT Tech job and also other works of life outside of Technology things like a call center for example you'd want to Monitor and record your call efficiency how long it's taken to do a call all of those sorts of things in an IT operations World you'd be concerned on uptime availability those sorts of things and importantly you want to be alerted when something goes outside of your threshold let's say your servers are using more than 80 CPU that's either indicative of something going wrong or maybe you need to expand and upscale your environment so as always in these videos I'm going to show you all of the configs will walk through them step by step I'll explain what each of the tools does and why we've configured it in such a way I'll show you how to deploy that within Docker and handily all of these scripts are going to be available on my GitHub and it's a single multi-container Docker compose file so really simple to deploy this we just need to do a few config tweaks and we'll be up and running so let's do a feature overview of some of the tools and then we're going to dive into configuration so the First Tool and probably the one that you're already familiar with even if you're not familiar with logging and monitoring in general is grafana now grafana has exploded over the years and is an open source logging and monitoring tool and it's pretty much the glue that's going to bind everything together so grafana by itself isn't going to be doing the recording or the collection of our metrics no what it's going to be doing is bringing that all together so that you can do queries based upon or the data sets or metric scrapers and it then gives you the ability to present this in really handy dashboards that are highly customizable there are tons of native panels those are the things that you put into the dashboard these are things like graphs pie charts histograms Etc and it's fully extensible so you can even write your own if you wish now because this is home lab we're going to be setting this up on our own infrastructure but do be aware that there is a grafana cloud which is free there are some constraints on that usually around the amount of traffic and data retention but that might be something that you want to consider and you can see the link here so touching on a couple of points I just made if we head over to something like the dashboards you get an example of what this looks like so here we can see a number of dashboards that are logging and monitoring various things on a system we can have a look at a demo here for example this is looking at jira tickets something we all spend too much time monitoring or it could be something like a mongodb and as a quick spoiler here's the dashboard that we'll be setting up later in the video series this as you can tell is monitoring my Docker containers so you can see CPU usage memory usage disk usage Network usage at the per container level so this is useful for keeping tabs on containers understanding how your resources are being utilized and maybe you want to put some constraints on them or revisit in terms of if there are any problems I.E you're seeing excessive disk usage when you know that container shouldn't be doing anything moving on from grafana which is our primary dashboard for graphing all of our data we get onto what I call the back end Services which are the metrics Collections and the first one of those is going to be influxdb influx database and the clues in the title it's all about time so this is probably the most performant time-based metric collection database and it's great for just hammering loads of time-based data into a database really quickly and this is used at scale for some of the biggest organizations you can think of and handily influxdb has its own web GUI so you could actually use this to do some graphing but we're going to leave that to grafana and as I showed you the docker dashboard that's actually plug-in back end to this influx DB that you can see on screen and you can see lots of Handy things so like Docker memory total for my Ubuntu machine and there's a value here and you could graph that and that's exactly what grafana is doing I have it connected to this influx DB which is scraping the metrics from my Docker host and is presenting that in the nice grafana dashboard and in this video we will be deploying influxdb and in the subsequent video I'm going to show you how to configure it to do that scraping so that we can pull it into grafana moving on to the next product in our stack is Telegraph now be Glide among you will notice that this is actually owned by influx data so you'll be asking the question what's the difference between influx DB and Telegraph now influx DB the clues in the title is the actual database and some applications can directly write to it however there's a number of applications that don't have that capability and so Telegraph is a separate component that will collect those metrics for you and then import them into influxdb or any other data store that you want to use it doesn't have to be in fluxdb it could be Prometheus for example so I use Telegraph in my home lab to get metrics from sofos XG into grafana now Sophos XG doesn't have a native way of pushing that data into a database which graphana can use instead it opens up a metrics API Telegraph can connect to that API and store that data in influx I can then use the data in influx to create a graph on a dashboard so I hope that spells it out for you there are applications that can natively store straight into a database but for all those that can't or anything you're doing custom on your home lab you might need something like Telegraph to do it for you one other application that I've included in the stack is graphite now we're not actually going to be using this in the second video but do note that it exists it's one of the big players and it might have some features over the others that we're going to deploy that might make it more attuned to what you're trying to accomplish as you can see graphite is used by some of the biggest players in the game Reddit Salesforce booking.com Etsy GitHub you get the idea its USP is for storing numeric time series data a bit like influxdb and to render graphs of this data on demand so I'll provide a config to set this up we're not actually going to use it but it might be useful for you to have and you can simply comment this out or delete it if you don't want to deploy it the final application that we're actually going to deploy and make use of is Prometheus and this is a big player in the area of monitoring and alerting and that's kind of where it excels as it says it's the leading open source provider for alerting and that's pretty much what we're going to be using it for so you can store data in here but typically it's more for queries and setting up alerts so that we can use Telegraph to scrape our metrics we can store that into a database we can deploy grafana to then use it and dashboard it and then we can set up alerts using something like Prometheus a couple of other extras that I'm throwing into the compose stack are promptail and Loki now these are part of the grafana suite and they're typically around scraping and storing log files so a bit like applications that might not have a mechanism to export directly to a database promptail and the clue being tail we've tailed logs before it can take those log files it can store them it can then pass them and then store them into a database so that we could do grafana dashboarding off some of those queries so I appreciate a lot of this is getting quite complicated but it's just another way to pick up a log file store it somewhere do some queries on it and then create a dashboard out of it so congratulations if you've made it through my explanation and hopefully your head hasn't exploded there'll be a lot of new Concepts in there if you're new to logging and monitoring but hopefully I've tried to break it down into some simple steps and even better I've put this into a single Docker compose file so this is available on my GitHub let's have a look through each of those applications now and see what's going on in there in terms of the config files so Loki sits in front of promptail in a way so as I mentioned we've got promptail here and that's for storing the locks for our services Loki on the other hand is for querying those logs and being able to pump that data out so that something like grafana could use it so Loki is pretty straightforward it needs a simple volume Mount and it requires a port to access it that's it you'll want to make sure that any customization to the config file is done within that volume Mount because it's subsequently picked up here where it stipulates where the config file is mounted moving on to promptail this version on here will deploy prom tail and because of the volume mount it will actually take the system logs on your Docker host or whatever machine you're running it on so that means you could get all of the system logs that are going on so things like access login all those sorts of things you would then be able to pass those with low-key and create a dashboard out of that using grafana next we get on to grafana itself now this is the front end and where we're going to be doing most of our monitoring and again it's pretty straightforward we're simply going to pull the container we're going to put it on some internal and external networks so I've set up the grafana monitoring Network which all of these services are connected to so they can talk to each other using friendly DNS names and they're not going to be accessible outside of our Network to things that we don't want to be able to access them the ones where I've added a port are obviously available externally because we can put this through our reverse proxy and get the benefits of SSL encryption Etc I've included the traffic labels here next up is influxdb and again this is pretty straightforward it just requires a couple of ports to be open and a volume Mount to store all of your data within your influx database next is Telegraph and there's only one thing here really we need to focus on and that is the user in older versions of telegraph you could simply run this as your local or root user but that's changed in an effort to improve some of the security so you need to find out the global ID of your Docker user if you've not added Docker to your pseudo group and just specify it here now I've added some additional volume mounts here and you can remove those if they're not applicable the one that you're probably most interested in is the docker sock and we've used that before namely in portena where it's able to have access to the whole Docker runtime environment and actually spin up and edit containers dive into them Etc and that's because it has access to the docker sock so because we want to monitor the docker containers on our host we add the docker sock here and that enables Telegraph then to go and scrape some of that data next up is graphite that's pretty straightforward again there's a number of ports depending on what services you want to run on graphite or what you want it to be used for and there's a few volume mounts here but chiefly it's for your configuration files and your data storage lastly we have Prometheus and again this is pretty straightforward Prometheus is listening on Port 1990 and we can access that as a web GUI to do queries and just to double check what it's actually scraping but we simply need to mount a configuration file into the bind Mount and then load that up this gives us the benefit of being able to make configuration changes within a nice friendly text editor and then having those recognized within Prometheus itself so for example we're going to be editing the config file for Prometheus so that we can query our crowdsec data so having gone through all of the services we're going to run let's get on to deployment so heading over to my Docker VM I've copied that compose file over now obviously you can redact the services that you're not interested in as I mentioned in the second video we're only going to be focusing on grafana influx Prometheus and Telegraph so you can get rid of the other ones if you don't want them but I put them there primarily to make you aware that they exist and they might be useful for specific data to collection that you have in mind now before we go ahead and deploy this there are a couple of configuration files that we need to make first otherwise is either going to create default configuration files that won't be of use or is not going to be collecting some of the metrics that we want it to or it may even just fail saying that it couldn't find a configuration file so let's go and rectify that now so the key file here that we're going to want to change or create is the telegraph.com file now I've put this in my Docker grafana monitoring Telegraph folder because that's where I specified it in the bind mount if we open up that file I've added the basics so that you can start querying the docker sock so this is the first part towards getting our Docker metrics into the grafana dashboard so what we've set here is that the endpoint is the docker sock and you'll remember that we mounted that as a volume mount in our compose file and we've said that we want basically all containers we're going to include them all we're not going to exclude any specific ones we've said that we want it for each device we've said that we want to include things like the CPU and a whole host of other things which you can see in here things like the block i o the input output and networking stats at the bottom you'll notice here that I've actually put the outputs influx dbv2 now this is because as I mentioned Telegraph scrapes data and sends it it's kind of like the postman there's a factory that makes something there's a postman that delivers it and you receive it so for this example we've got Docker which Telegraph is scraping the metrics it's then sending them and this is where we specify the outputs now just be aware that there are lots of different outputs for lots of popular data stores this one just happens to be influxdb because it's pretty much the best for what we're wanting this time series data and you'll notice that I've got some URLs and some tokens here you can't set this up at the moment because we haven't deployed influx DB what we're going to need to do is deploy it without the output section and come back and add this in later once influx is up and running because you generate this token within the influx GUI so redact this section I'll have it cleaned on my GitHub and you should be good to get this up and running we'll also need to create our prometheus.yaml file now this is the Prometheus config file and if I open up that file it looks like this so all we're doing here is setting the global parameters now if you didn't set this config file it would Auto generate one and that's fine but we need to be able to edit it plus when it comes to integrating with crowdsack We'll add this second section here we'll do that in the next video so just for this video feel free to redact this section here and simply run it with the ones above and feel free to tweak those as it covers things like how often it's going to scrape that means how often it's going to go and fetch new values to put into your database so with all of those config files configured let's close these down we'll head to our command line we'll navigate to where our dock compose file is and we'll simply run a sudo Docker compose D once that's been run the files will download and fingers crossed you should be up and running so let's go ahead and do that now so this will take a little while to download because there's quite a few containers but once it's done you'll end up with something like this that says running and started so let's go and validate that that's all working now within portina obviously you can do this with sudo Docker logs and then the container name if you're not running poor taina so hopping into portena we can see that all of the containers are up and running and if we want to look at the published ports we can see that over here so we should be able to hit the VM IP and then the port to access this service let's have a quick look in some of the logs just to make sure things are as expected let's take grafana so heading into the grafana logs hopefully we can see something like this it's up and running and it's listening on Port 3000 which is the default Port that grafana runs on so if you're not using a reverse proxy you'll be able to go to your IP and 3000 and you should be presented with the login page if you're running this behind a reverse proxy like traffic I've included the labels in the compos you should be able to now go to your url that you've put within your DNS resolver and access this with a friendly URL with https just checking some of the other services are up and running so one of the key ones will be Telegraph we can see that Telegraph is up and running and there are no errors in the logs here if we head over to Prometheus we'll also see that mine is up and running and I don't have any errors in the logs we can see that it's listening on Port 1990 so again if we go to our Docker VM IP and Port 1990 we'll be able to access the web GUI for Prometheus so let's just go ahead and test a couple of these containers just to make sure that they're up and running and I'll show you quickly how to connect these Services together within grafana so that in the next video we can start to scrape and pass some of that data and start to build out some of our dashboards so here you can see that I've got influxdb up and running and is assigned to write vmip and Port 8086 which is what we expose within the dock pose file now if we log in with our username and password you'll need to create this the first time you run it you should get into the dashboard and you can have a look around and if we log in we should already start to see some of the docker metrics because if you remember in the compose file we mounted the docker sock and we told it to go and start collecting some of this data so if we click on something like the docker container CPU we can get the container ID now these aren't very user friendly they will have a name assigned to them these are just the internal reference for docker we click that expand this all of the way unfortunately it's hidden behind my face but this is telling me the name of it is the grafana monitoring container so this ID here is basically grafana and you can go through and query all of these just to check that you get some data obviously the best way to do this will be to use grafana to pull these stats out and give you nice friendly names and nice friendly stats speaking of grafana let's head over to that now as with most of my services I've run it through my reverse proxy so in my case it's grafana.jimsgarage.co.uk I never hit that I'm going into my grafana now you might worry that I've got a not securing the top left that's just because I spun up a new version of traffic that I have a separate VM for all these machines and it's using the default certificate at the moment but in my previous videos I've showed you how to go and get a valid certificate I'll fix that for the next video so once you've created your user within grafana you'll be presented with the dashboard now I've already showed you a few dashboards that I've got configured and we'll be going over that in the next video but if you want to connect some things together you simply need to click the menu in the top left and you need to go to connections which is here once you click connections you'll want to search for the things that we've set up within the docker compose stack so for instance influx would obviously be one of them so if we search for influx we would click on here we would add a new data source and then we'd fill in all of the details so in this case it would be HTTP influxdb colon 8086 which is what we set in the docker compose file and we need to add in some API Keys which we'll do in the next video but that's very quickly how you would connect these services and you'll do the same thing for things like Prometheus and Telegraph so with Prometheus there isn't any username and password we just simply log in and as you can see I have this open running on Port 1990 and we can start to query some stats here already if we configured the output in the config file so for mine and in the next video we'll be doing this if we went to the status and we went to the targets you'll see that I've got my crowd SEC specified here so this is going away and it's pulling in the data from my crowdsack so I can create a dashboard and so in the next video I'm going to show you how to plug all of these together how to set them up to record that data and ultimately you're going to end up with something that looks a little bit like this this is monitoring my crowdsec and if you wanted to have a look at say something like Docker I also have a dashboard for that as well which if you click on your selected host and you want to select home lab in this case you can see that I can pick any of my containers and I get some nice stats about their memory uses disk uses Etc now this is quite Bare Bones and you'll have seen some amazing looking home lab dashboards out there I'm giving you the tools and showing you how to do this so that you can go away and build something similar also and we'll cover this in the next video there's a whole host of dashboards that are publicly available on the grafana website provider that you feed it the data that it expects so thanks for sticking with me guys on this video I know there's been a lot to take in but hopefully by the end of this video You're now able to go and deploy all of the tools that you need to start logging and monitoring as I said in the next video I'm going to show you how to actually configure these tools so that we can scrape many of the things that we're running in our home lab that will include monitoring our Docker environment things like crowdsec and I'll even show you how to plug it into things like your firewall so that you can monitor all of that traffic but it's really about giving you guys the building blocks and I'd love to see what you have in mind so drop me a note in the comments below please like And subscribe if you found this useful and I'll see you in the next video take care everybody foreign [Music]
Info
Channel: Jim's Garage
Views: 9,045
Rating: undefined out of 5
Keywords: grafana, influxdb, homelab monitoring, logging and monitoring, grafana dashboard, prometheus, telegraf, loki, promtail, docker, proxmox, crowdsec, linux, system monitoring, how to monitor computer, grafana tutorial, grafana tutorial for beginners, grafana loki, prometheus grafana, influxdb tutorial, alerting, prometheus alertmanager, monitoring and alerting, homelab, self hosted, server, technology
Id: LShvy9l3tzs
Channel Id: undefined
Length: 25min 28sec (1528 seconds)
Published: Thu Aug 31 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.