Grafana Alerts with Prometheus and Node Exporter Metrics - Introduction

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey what's up everybody my name is moss and in this video i'm going to show you how to create alerts in grafana i'm going to walk you through the creation of a very simple alert that monitors disk usage and then that alert will be sent to a slack channel so in order to follow along you will need to be an administrator of a grafana instance and you will also need permissions to create slack apps inside of a slack workspace so let me show you what the alert is going to look like in slack when we're done and then we'll create a slack app and then we will create the alert inside of grafana okay so i'm inside of my slack workspace and i created a dedicated channel for alerts and that channel name is called alerts and inside of the alerts channel you'll see that there is a message from grafana and it's alerting me on the disk usage for two nodes so i have a raspberry pi cluster set up and apparently two nodes have reached my alerts threshold and so grafana sent this alert it gives me some imp information on which which nodes hit that threshold and then it gives me the actual value and so this is percentage of disk used and i set the threshold pretty low i set it at uh 40 so when disk usage is above 40 uh the alert will be triggered and it will send a message to this slack channel so both of these devices have disk usage over 40 percent okay so now that we know what our alert message will look like we can go ahead and create a slack app and this slack app is going to give us a web hook url that grafana can use to publish messages to the alerts channel so i'm going to navigate to the workspace and then workspace settings and administration under administration i'm going to select manage apps okay and then here i'm gonna select build and we will create a new app okay so i'm gonna select create new app from scratch and we'll give it a name i'm gonna zoom in here a little bit we'll call it grafana alerts okay and it's going to be with the tech with moss workspace and i'll select create app okay so uh once we've created the app we have to provide some basic information the only thing that we have to concern ourselves with here is incoming web hooks so we want to enable incoming web hooks so i'm going to go ahead and select that and from here i'm going to activate incoming web hooks and then if i scroll down it gives me web hook urls for my workspace no web hooks have been created yet so we're going to create a new one i'll select add new web hook to workspace and then work we are going to select the channel that this application that this web hook is going to have permissions to uh to publish messages to okay so now the web hook has been created and this is the web hook url that we will need to copy into our grafana instance so that grifana can publish messages to our alerts channel so i'm going to go ahead and copy this webhook url i'm just going to click copy and then i'm going to navigate to my grafana instance and in grafana i'm going to go to the alerting tab and open that up i'm going to navigate to the notification channels tab and we're going to add a notification channel and we'll call this slack alerts channel the type here is not going to be email we're going to select slack and then under optional slack settings the only thing that we really need here is the webhook url so there's other options that we can add here like an icon url we can mention people in slack or groups or we can mention an entire channel so there's a lot of useful things that we can do in the optional settings but just to get our alerts working in central alert all we need is the webhook url so we can test it out i'll select test and it says test notification was sent so let's check our alerts channel and we can see that there was a alerting message a test notification was sent to our alerts channel just now so it looks like it's working correctly so we've set up uh the new uh notification channel for slack okay so the first step was creating the notification channel and now that we've done that we can create the uh the actual alert and the alert is based off of uh dashboard data so if i navigate home and then i'll open up this dashboard called raspberry pi's node exporter metrics this dashboard uh is monitoring my raspberry pi cluster using node exporter so i have node exporter set up on each of the raspberry pi's and then prometheus is scraping the node exporter endpoint and i've set up a few sections for different categories of metrics so i have a memory section where i'm monitoring things like memory usage uh cpu utilization and then also uh section for disk monitoring and under the disk section i only have a single panel and it's monitoring the percentage of used disk space so of the overall um available disk space what percentage of that disk space is being used right now so that's what this panel is telling me and this is the panel that we're going to use for the alert so what i'm going to do is edit this panel and inside of the panel we can see the query for disk usage and then to create an alert we have the alert tab but when i go to the alert tab you'll notice i get this basically an error message saying template variable variables are not supported in alert queries and what that means is i have created template variables for this dashboard host and instance so the dashboard updates based off of what variables i've selected from this drop-down so because i'm using those i can actually create an alert inside of this dashboard so what i'm going to have to do there's one way to get around this so in the query in the query tab i can create a second query and it can essentially duplicate this query but not use the template variables and that second query i can actually disable but i can still utilize i can still create an alert based off of that disabled query that's one workaround but what what i'm going to do is create a new dashboard dedicated for alerts so we won't use this dashboard at all but we will use this query so i'm going to go ahead and copy this query to my clipboard and i'll exit out and then we're going to create a new dashboard and from here i'm going to select add an empty panel and in the query field i'm going to paste in that query and it doesn't show me any data because uh in this dashboard i haven't created the instance variable so what we're going to do is we're going to remove remove that variable reference in both of uh in both of the metrics in this query okay so i'm going to remove that okay so now we get the disk usage for all of the raspberry pi's and there's four raspberry pi so we get disk usage for each one and the legend is a little messy so let's clean that up i'll first um reference instance and then i'll reference the device so that should make the legend look a little bit cleaner okay and on top of that we want to modify the units so right now it's just showing integers on the y-axis so from standard options the standard drop options drop down under units i'm going to select percent 0 to 100 okay and now we get the percentage of uh used uh disk okay and i'm also going to update the panel title we'll call it uh rpi disk usage okay and i'd also like to just quickly walk through this query just so you kind of understand uh what it's doing so the first metric here is getting the raspberry pi's available bytes on the file system we're filtering the uh the data that's returned we're filtering the device and we're saying device should not be equal to mmc block 0 p6 and then also the device should not be equal to the temp file system okay so this is just returning the root device and then we're dividing the value of available bytes so we're dividing that value here by the overall size of the file system and we're passing in the same filter for the device on that metric as well okay so now that we've gone over the query let's navigate to the alert tab and create the alert so i'm going to select create alert and the name i'll just capitalize alert here and then immediately after the name we have the evaluation period for the alert i'm going to modify this i'm going to say the alert should evaluate every 10 seconds for one minute so that evaluation period will be the period in which the alert condition is pending so if the raspberry pi's disk usage let's say the threshold was like 40 uh used disk space right and that condition was true so the raspberry pi's disk usage was above 40 it was above the threshold this alert is going to evaluate that condition for every 10 seconds for one minute and if the condition remains true after that evaluation period it will go into an alerting state so we'll go from a pending state to an alerting state after that evaluation period and after defining the evaluation period we can define conditions one or more conditions here i'm going to define just a single condition and it will be when and then i'm going to select this operator the max value of query a from five minutes till now is above 40 okay so i'm going to set the threshold at 40 disk usage then this alert will be triggered and it will send an alert to the slack channel so if one or more raspberry pi's disk usage is above 40 then it is going to trigger this alert okay and under uh conditions we can also set different parameters on when there is no data or there's an error so if there's no data or all values are null we can set the state of the alert to just no data or we can actually say that it should be alerting uh in an alerting state if there is no data and then if there's an execution error or timeout we can also set the state as either keep the last state or set it to be an alerting state and then in the notification section we can send to notification channels we can send the alert to notification channels that we've defined we've only defined one notification channel so if i select the plus sign here we have the slack alerts channel so i'm going to select that and then we can add a message to the alert what you'll notice if i go back to slack and i show you the alert uh up here we have the instance uh names the ip address and the the port of the raspberry pi's uh this is the message field okay so if i go back to the dashboard uh i can use that same message just by doing dollar sign and then the variable name which was instance and this is the same message that i used for this alert so this is the result of that message it just printed out the the uh instance variable value okay so you can keep that if you want or if you don't because it's kind of redundant you can see the instance information anyway i'm going to delete it i think it'll look a little bit cleaner without it but i just wanted to show you that you can reference variables in the message field and then we can also add tags as well so i think that's all we need to do for this alert what i'm going to do before testing this alert rule is i'm going to save the dashboard you have to save the dashboard and i'm going to call it rpi alerts and if i go back into the panel here and i go to alerts i can test this rule so if i select test you can see that the state is currently pending because it's in the evaluation period so after a minute the condition should still be true and the alert should be sent so what i'm going to do is i'm going to navigate over to the alerting section in grafana and i can see the disk usage alert it's in a pending state and and then we'll wait for it to reach an alerting state okay so now it's in an alerting state and if i navigate back to the slack channel the alert slack channel we can see the the alert show up in our alerts slack channel the same as we did up here as well but without the instance variable reference in the message field okay and we get that information anyway this is pulling the information from the legend uh which i think is kind of nice i don't know exactly how to format the value because it would be nice if this could be formatted so that it shows that it's a percentage and without um you know so many decimal places but if you do know how to do that and format this value please comment below in the video i would definitely like to know how to do that so that's pretty much all there is to it and i hope you enjoyed this video and that you found it informational if you did please consider throwing a like on the video and subscribing to the channel for more videos like this there is a new system for alerting in grafana 8 and real quickly i'll show you the documentation page for grafana 8 alerts it's currently in beta and it's an opt-in feature so you can enable grafana 8 alerts if you have grafana 8 you can when you're installing grafana in the grafana.ini configuration file for your grafana instance you can enable beta features and this beta feature is uh called ng alerts so you can enable that when you're installing a new instance of grafana if you want to play around with this feature one downside with the grafana 8 alerts is that there isn't a provisioning feature for it and with legacy alerts you can provision dashboards automatically using ansible so you can export the dashboard as json and then using ansible you can provision an entire grafana instance along with dashboards and alerts defined within those dashboards so you don't have to upload individual dashboards once you've actually provisioned the instance using ansible and that functionality is not yet available for grafana 8 alerts unfortunately so you can't you can't use provisioning yet with grafana 8 alerts but if you'd like me to do a video on griffon eight alerts let me know in the comments and uh i think it's a pretty cool system it's a a cool new way of doing alerting in grafana so let me know in the comments if you'd like to see a video on grafana 8 alerts but anyway thanks for watching and i will see you in the next video
Info
Channel: Tech and Beyond With Moss
Views: 3,018
Rating: undefined out of 5
Keywords: grafana, alerting, monitoring, prometheus, node exporter, linux, information technology, devops, grafana tutorial, devops tutorial, devops engineer, grafana alerts, grafana alert, grafana alert configuration, devops tools 2021, monitoring and evaluation, devops tools, devops tools 2022, prometheus grafana
Id: yWzxbG3YDcM
Channel Id: undefined
Length: 18min 55sec (1135 seconds)
Published: Thu Sep 23 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.