How to Monitor Jenkins With Grafana and Prometheus

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in this video we're going to talk about how to monitor jenkins with grafana and prometheus [Music] you've been asked to set up monitoring and alerting for your jenkins infrastructure of course the first thing you think about is there has to be a plugin for that well you would be both right and wrong on the right side you'll need some metrics from your controller to monitor but you don't want your jenkins controller to be your monitoring system in this video we're going to talk about one possible option by using prometheus and grafana to capture and visualize the metrics from your jenkins infrastructure here's today's starting point we have three machines involved in today's demo the first machine is going to be our jenkins controller and it's running jenkins lts 2.289.2 when it was installed it was installed using suggested plugins the second machine we have is a linux based agent but at this moment that agent is not connected to our controller and the final machine already has grafana installed on it but it's not configured yet and we're going to use docker on that same machine to run prometheus in the description of this video is a link to a gist for all of the commands and the documentation that we are using in this video as we get started let's take a look at the prometheus plugin that we're going to be installing on our controller we can see here that it's named prometheus metrics and it goes through all of the basic details of what's going to happen when we install this plugin one thing that i want to call out is collect disk usage now in order to collect disk statistics we need to install a separate plugin and that's the cloudbees disk usage simple plugin so when we go into our controller we're going to install two plugins so let's go ahead and go over to our controller and install the plugins so let's go ahead and log in and manage jenkins manage plugins available i'm going to look for prometheus and the second one is going to be disk usage and the one that we want to install is clobby's disk usage simple okay so two plugins but when we click on download now and install after restart what we're going to see is we have variant metrics which are dependencies of four prometheus metrics and we have extended read permission which is a dependency of the clobbies disk usage simple plugin so they're all downloaded let's go ahead and do a restart okay so now that we're back let's go ahead and go to manage jenkins and let's look at the changes that happened after installing the two plugins the first one is disk usage for the simple disk usage section and it's just going to give us some information as time goes along since we don't have any jobs set up yet or directory setup or the basic directories are here but there's no jobs yet to show sizings of the job folders let's go back to manage jenkins and go to configure system and let's review the prometheus setup so we'll scroll down to our prometheus section the path so this path is where prometheus server is going to scrape to get prometheus metrics so what this will be in our case it's going to be jenkins 8080 slash prometheus we're going to leave the default namespace the way it is by default metrics are collected and exposed from the controller every two minutes or every 120 seconds for this demo i'm going to change this to where we are collecting every five seconds okay and then there's a handful of default options already checked for us and a handful that are not checked those are okay we're going to leave all the defaults so the only change that we've made up to this point is collecting metrics in seconds so we've taken it from 120 down to 5. we're going to leave all the defaults for everything else and i also want to go ahead and call out one more thing by default collect disk usage is automatically checked if you failed in installing the cloudbees disk usage simple plugin what you would see in your controller logs is an error so if you're looking at your logs or if you're not receiving your disk usage data inside the prometheus export that we're getting ready to look at then the reason why is that you did not install the cloudbees disk usage simple plugin so i'm going to go ahead and click on save because we did make that one change from 120 to 5. now let's go up into our url and change the url to slash prometheus now what you'll notice here i don't have a trailing slash here but once i hit enter if you pay attention you'll now see a trailing slash so it is getting redirected to a trailing slash keep that in mind for when we set up prometheus scraping in a few minutes so the output that we see right now is being generated every five seconds so if i was to just refresh this we're going to see some data move and we'll see some data not move because some data just isn't going to change that often but this should be collecting data every five seconds and then exposing it to us in this page with the most recent updated data next up let's go over and set up our prometheus scraping so we're going to go over to our shell where we're going to be running prometheus and what we need to do first is we need to create a configuration file for prometheus and we're going to call this prometheus.yaml and already have a configuration set here these are basically just the defaults that you would use for any normal prometheus configuration the default scrape intervals are every 15 seconds evals are 15 seconds and scrape timeouts or 10 seconds so one thing you might want to keep in mind is you probably want to have your scrape timeout to be less than your scrape interval if you think about it if i'm going through and scraping every 15 seconds but i've set a timeout of 30 seconds then it would get weird and wouldn't be able to completely finish the timeout so your timeout should always be less than your scrape interval now i want you to take a look at one thing here at the very end the scrape configs this first one is fairly normal and we're running prometheus locally on this machine so that's the reason why we're basically scraping ourselves at localhost 9090 but then i added in this one section here or this one scrape config for jenkins notice the metrics path is slash prometheus slash now if i would have forgotten this trailing slash what would happen is it would go ahead and handle the redirect by default but in order to not have that redirect in place let's just go ahead and leave that trailing slash on there that way it's always going to be an http response code of 200 and then for the target this is the ip address now if i had my dns set up on this machine i could reference it by name but just to be explicit i'm going to go by ip address and then port 8080 because that is the port that jenkins is listening on so if you think back to it here for a moment this is the ip address of my jenkins server colon 8080 slash prometheus which is the same as what we have up here in the url now let's go back over and let's start prometheus so let's go ahead and finish saving this and we're going to be running prometheus through docker so we'll download a container image turn it into a container and it will just be running here locally so let's go ahead and run our configuration whoops i want to go ahead and run this in the background so you can see here that we're mounting a volume home vagrant prometheus.yaml the home vagrant prometheus yaml is the file that we just created i also want to go ahead and run this in the background and then that prometheus gmo file gets mapped into etsy prometheus yaml and within the container then it just starts up prometheus okay so here we go so we couldn't find the image this will take just a couple of seconds and once it completes then we'll have prometheus running right now it's running in the background but how do i know that it's running in the background well let's go back over to our browser and let's load up a page in my case the machine name is grafana but prometheus listens on 90 90. so i'm gonna go ahead and do a refresh and we can see here that we get our prometheus ui and to validate that our configuration is set up the way that we expect we can go to status and then go to configuration and what we're going to see is the configuration that we set up within our yaml file however you're going to see all of the configurations here not just the configurations that we added so we had added job name and metrics path and the static config but then you can also see that the follow redirects are true so if we would have left off the trailing slash it would have followed the redirect because prometheus was redirecting to prometheus so now that we have validated our configuration let's go back and make sure that we're receiving data as we expect so i'm going to click on prometheus here and i'm going to click on the world sign here it's going to list all of the different metrics that we have available to us i already know one of the metrics i want to check and that is my jenkins executor count value so i'm going to enter that there and click on execute we can see here from a table view perspective i have two executors which at this point is true because if we go back and take a look at our controller the only executors we have are the two executors that exist on the controller so at this point prometheus is set up and responding to our jenkins prometheus endpoint correctly so now that we have our controller set up correctly with the prometheus plugin and the cloudbees disk usage simple plugin and we've confirmed that that data that's being exported from the controller is showing up inside of prometheus let's set up grafana to use prometheus as a data source so let's go over to grafana and the default for that is going to be grafana and then port 3000 so when you first initially log in this is admin admin and you have to give it a new password to begin with so my password is just password and we get this welcome to grafana screen i'm going to go ahead and remove this top panel just because that's what i do and then let's go and set up a data source now you do that under configuration so we'll go to the configuration gear and click on data sources and we're going to click on add data source and the first one that we want to do is prometheus we will click on select so we'll leave the name prometheus but let's say this is prometheus jenkins in case we had multiple prometheus data sources the url for this case for prometheus we're running on the same server as grafana so we're just going to say localhost 9090. the access is going to be server because this is a server to server communication and then finally what i want to do let's go and click on save and test we can see here the data source was updated and the data source is working if you click on save and test and you do not receive a green check mark you're definitely not going to have a data source working for you so now that we have that set up let's go back and look again at our data sources and we should see our prometheus jenkins and it's pointing at localhost 9090. so at this point our data source for prometheus is set up within grafana but if we go back to the top of the grafana dashboard we just see star dashboards recently viewed dashboards how do i create a dashboard well fortunately there are a few default jenkins dashboards that are available through grafana and that's what we're going to look at today these can be starting points for the dashboard that you want to create i'm not saying these are the end-all be-all but these are ones that you can use as a basis for building out your own dashboards so let's see how to import a dashboard from grafana into our instance of grafana so i'm going to go over to the grafana dashboards site which is grafana.com grafana dashboards i'm going to scroll down and at the time of recording there's a filter on the left side so i'm going to type in the name jenkins and it filters down to jenkins and when it first shows up here it shows up in alphabetical order but what i want to sort by is i want to sort by downloads and i'm just going to assume the most number of downloads is going to be the one that i want to use now this one picks out jenkins performance and health overview that's fine the second one here also has a very large number of downloads maybe you want to try that one out but for me for today i'm going to be testing out this first one so i'm going to click into this dashboard and the way that this works is i just need this id in this case the id is 9964. so i can copy the id to the clipboard okay and i can remember too just in case 9964. so let's go back over to our grafana and then we're going to click on the plus and we're going to go down here to import and we import from grafana.com i'm going to paste in my 9964 that's all i've got to do because it knows what that id is i'm going to go ahead and click on load and i can take a look at it saying okay yep that's the right author that's the right name we're going to have it go into our general folder that's fine and i need to select what the prometheus data source is for this dashboard the way that this dashboard is written it expects us to provide what data source it is that we need to add so we're just going to select prometheus jenkins so that was the prometheus data source that we just set up a few moments ago let's go ahead and click on import and as we imported this now the dashboard is fully loaded up and running here so i'm going to size this down just a little bit so we get all the panels in i think yep that's it right there so we can just take a quick overview of this we can see the cpu usage is a very low percentage we can see that there are two executors free that's good there are no jobs defined this is total jobs successful jobs no data on aborted jobs or unstable because we've not run any jobs yet before we leave this dashboard what i want to do is i want to go ahead and set this to auto refresh every five seconds so at this point instead of having to go up and refresh the dashboard i'm just saying it to refresh every five seconds so the thing to keep in mind as we're looking at grafana and we're looking at prometheus think about it for just a moment we've set up our export from our prometheus plugin to be every five seconds down from 120 seconds so at best we're going to be getting new data roughly every five seconds and then the auto refresh of this dashboard is roughly every five seconds so don't think of this as a second by second by second type dashboard this is going to be over every 5 10 15 seconds you're going to start seeing data change so don't get frustrated if it's not automatically updating you might want to tune some of your values but again you don't want to tune your values too much on your controller because you don't want to be exporting every second that would be too much but then you might want to not also go with the default which was every two minutes that might be a little too long so the first item that we're going to do is we're going to connect our agent to our controller and what we're going to expect is that our executor free count is going to move from two to three because we're going to add our agent with a single executor on it so let's go back over to our controller add executors here and new node we call this agent one and we can see that we should expect to receive whoops vagrant agent because the number of executors on this agent is one we're expecting the two to turn into a three and go ahead and create a label and change that to ssh agent one let's create a credential here there's that one one two three and select that and select that that's all i need to do for adding that agent so let's click on save give this a few seconds for it to come all the way up okay so now we have three agents running on our controller now let's go back over to our dashboard and let's see what's happening here again remember this is not going to be real time so we have to wait for the data to be exported out of the controller so it can be scraped by prometheus and then made available within grafana so not real time that's the thing to remember here but the data should show up within 5 10 15 20 seconds depending on where you're at in the time slice of the export the scrape and then the view and now you can see that we have three executors if you look down here in the bottom left hand side so we've gone from two online executors to now three executors and finally what i want to do is i want to create a handful of jobs so you can see the difference as it shows up within our dashboard so let's go back over to our controller and let's create three folders first so i'm going to create a team a and it's a folder i'm going to create a team b folder and i'm going to create a team c folder if you're going to remodel this yourself do whatever you want it's just just an example okay so there's team a b and c let's go ahead and go into team a let's create a new item this is going to be called test it's a pipeline and let me grab my example pipeline for this yep that's the one i want it's a really simple pipeline it's just doing an echo out of hello world team a and then it's sleeping for some amount of time in this case for this job it's three and it's going to be running once a minute so let's go ahead and click on save and then i need to run it once so that the cron gets set up correctly and then if we take a look at it we can see that the job is running but let's take a look at the configuration and make sure that our config that our cron is there and our cron is correct so we're good with team a let's go through and do the other two here real quick and as we're adding in this one notice that the time is 24 seconds so team a test job was three seconds the team b1 was 10 seconds and now this one is 24 seconds and you'll see why i created some of these the way that i did in just a few seconds so let's go and click on save and let's do a build now so now all of our jobs are set up and scheduled to run in fact this one's going to sleep for 24 seconds so if we take a look at what's going on here we can see that team c is using one of the executors on the controller now let's go back over to our grafana dashboard and let's see if anything's starting to show up here yet we're seeing some cpu usages going up it's going from whatever was low here to now a little bit higher we can see our executors are still showing free here at the moment but think about how this is working remember this is not real time it depends on when the data is exported from the controller and then when it is scraped by prometheus and remember prometheus is only scraping every 15 seconds but we're in theory creating new data every 5 seconds so it takes time for data to show up now in order to sort of make this fake and try to drive my executor free down from three to zero here's what i want to do we can start to see some of our job cues are going up here as well so data is starting to show up here let's go back over to our job and i'm going to go to team c and remember that this is the one that sleeps for 24 seconds what i'm going to do is go into the job and i'm going to click on build now and basically create a build storm of this 24 second sleep job so i'm just clicking away for however many that i'm clicking right now you can probably hear the clicking that's going on as i'm talking and i'm still clicking so what we have here is i've got roughly 72 jobs 81 jobs whatever the number is queued up and ready to go so if we go back to the root of our controller we're going to see a very large build queue and remember this is 24 seconds so if you think about it our data is exported every five seconds the script comes through every 15 seconds and depending on when the data is actually visualed up into grafana then we'll actually see the data but since we have so many here i would completely expect our dashboard to start showing up here so we can see now our executor count is at two and we've got one in use so we're going to see these are the free and these are the ones that are in use based on how i've queued up all of those jobs and based on the scrape cycles of the data i expect this to go to zero within the next two or three passes and i expect this to also be at three let's sit here and watch this and now you can see all of our executors are in use we have no executors free we can see that the processing speed is 0.222 our cue rate is 0.222 we can see our jvm free memories going down we can see our cpu usage came back down but it's probably going to go back up and we see the number of successful jobs is 17. so as this controller is processing the jobs here little by little because it's basically start and wait start and wait we can see that each one of these are fully loaded at this point and our grafana dashboard is reflecting that data as it receives the data so why would you want to use prometheus and grafana to monitor jenkins as you've already seen there is a plugin already to help us with prometheus but also that prometheus plugin is able to leverage the data from the cloudbees disk usage simple plugin to help track disk space as well secondly prometheus became part of the cncf in 2016 and it was the second hosted project right behind kubernetes it's also part of the graduated set of projects this means it's probably going to be around and supported for quite a long time and finally although you can visualize metrics in prometheus grafana is purpose-built to not only visualize metrics from prometheus but from many other data sources add in the fact that you can leverage dashboards created by others it can save you time and effort in getting your monitoring environment initially set up and running if you have any questions or comments you can reach out to us on twitter at cloudbeesdevs if this video was helpful to you give us a thumbs up and if you haven't subscribed to cloudbees tv yet why not take a moment click on that subscribe button and then ring that bell and you'll be notified anytime there's new content available on cloudbees tv thanks for watching and we will see you in the next video you

Info

Channel: CloudBeesTV

Views: 2,651

Rating: undefined out of 5

Keywords: darin pope, jenkins, jenkins tutorial, jenkins agents, jenkins pipeline, grafana, prometheus, grafana dashboard, cloudbees disk usage simple, prometheus monitoring, grafana tutorial for beginners, jenkins monitoring, jenkins prometheus plugin, grafana tutorial, grafana prometheus, prometheus monitoring tool, prometheus monitoring demo, grafana dashboard creation, devops training, prometheus monitoring with grafana, jenkins prometheus grafana dashboard

Id: 3H9eNIf9KZs

Channel Id: undefined

Length: 25min 43sec (1543 seconds)

Published: Tue Jul 27 2021