Prometheus Tutorial | Prometheus Server Down Alert | Prometheus Alert Manager

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone i hope all of you are doing great so so far in this course we have learned how to install prometheus server and then we have learned how to install node exporter on couple of servers so we have couple of servers which we are calling web app and web app 2 on which we have installed node exporters and premiere server connects to them and gets the matrix and then we are using prometheus web user interface which connects to prometheus server and then it shows us all the matrix and statistics and graph now what we are going to do is we are going to introduce alert manager and using alert manager we will be able to send an email so there are a couple of things which we need to do we need to define alert in a prometheus and then we need to make sure that prometheus is able to read that alert and able to act on that alert and then once that alert is generated then prometheus is going to push that alert to alert manager and then alert manager is responsible for sending an email so this is what the architecture of prometheus alerting is going to look like now let's do a hands-on and then things will be very clear now here you can see i have three servers it panther zero one is our prometheus server and remaining two are just acting as application servers where we have installed prometheus node exporter and this is the interface of prometheus server which we are using now here i'm just going to write up and going to click on execute and you will notice in the result what we are getting is just a moment uh you can see here in the result what we are getting is we are getting a value of 1 ok so what it means is it is telling us that nodes are up and running okay so if let's say i want to uh check whether all the servers are up and running in that case i can just run up and then i can just click on execute and then it is going to give me list of all the servers with the value of either one or zero okay so value of one means service running value of zero means server is not running now let's see if i just want to see a failed server or a server which is not running in that case i can just write up is equal to zero so show me all the servers where up is equal to zero up is having the value of zero so i'm going to click on execute and right now you see i'm not getting any result the reason is that all the nodes are up and running and that's why we are getting no result if i change it to up equal to 1 which means i want to see list of all the servers which are running in that case you can see i'm getting three results now we are going to use this expression in our alerting rules now if you want to follow alerting rules prometheus documentation then you can follow this documentation they have given a very good example of request latency in seconds but i am not going to use this we are just going to use this example which i shown you here so we are going to see whether the server is up and uh or not and then we are going to show that in alerts now before we move ahead let's just click here on alerts and open it in a new tab and then you can see when we open alert in the alert interface we have three options okay we have inactive we have pending and firing right now we have not defined any alerts that's why you do not see anything else but once you define alerting rules in prometheus in that case you are going to see all the rules which are in in active state is going to be listed here so then once the alerting rule conditions are met in that case alert is going to be changed to either pending or firing state and i'm going to show you when it is going to change to pending and firing later on so let's move on and let's create an alert and for that i have already written a query so as i shown you expression which we are going to use is going to be up equal to 0 and you see here we have a four clause and in four clause we are saying for one minute okay so it is going to check whether the server is up or not and then if the status of server down or server value remains zero for one minute in that case it is going to create the alert and so first time when it finds a value of up is equal to 0 in that case you will see here that in the alert it is going to be showing here in pending state and once the value remains zero our server remains down for let's say more than one minute in this that case it is going to be changing from pending to firing so that is how pending and firing works now let's go ahead let's copy this and we are going to create a rule on prometheus server so it is going to be similar to how we have done or how we have created rules for rules record set so if you have been watching my previous video is going to be very simple for you so let me clear the screen first and let's do ls hyphen lrt and here you can see we have created prometheus.rules.yml file before and let me show you the content of what is written here and this is how we defined the recording rules okay so if you see the syntax of alerting rules it is quite similar so i'm just going to create another file and i'm going to let's call it alerting dot alertingrules.yml and i'm going to paste everything here whatever we have copied okay so here uh the important thing is the expression which is up is equal to zero and for one minute so if server is down for one minute in that case we want alert to be received then i'm just going to save it and now we need to uh tell prometheus that we prometheus should start using this alerting configuration file okay so this that means is we need to modify prometheus.yml file so i'm gonna go to prometheus.yml and now here in the rule files we are going to add another rule and which is going to be called alerting.rules.yml this is the name of our rule file okay alerting.rules.yml and now we need to restart prometheus so as of now i am not really running prometheus as a service i'm just running prometheus with the command line which i shown you in previous videos so that means if we want to find out the process name or we want to stop the process name we have to kill the process so in order to find the process name we can use bs hyphen ef and we can grab from and you can see this is this is how we started prometheus so which means this is the process id now i'm going to use kill hyphen 9 and going to kill this process id and now we can start prometheus again by running this command i'm going to show you in a later part of this course that how we can run prometheus as a service let's keep it for another video so i'm going to run it and now you can see it is saying it is server is ready to receive web request now let's go to prometheus and let's just refresh it and now you can see our rule which we had created is being shown here it is being currently being shown here in inactive state and server is down this is the name of the rule which we have created you can see here service down this is the name and that's it now what we can do is let's go ahead and uh this is one of the nodes where mathias node exporter is running this is a server ip address is 10.128.0.3 so i'm going to stop a node exporter here so i'm just stopping the services of node exporter and now we are going to run this query again up equal to one and here as soon as the value is refreshed we should be getting let me just run up and now we assume we should be getting value of 0 here as you can see here now this server 10.128.0.3 is showing value of 0 with me which means this node is now down now let's move on to alert alerts and here we can refresh it and we should be able to see that it is changing to pending okay so it is able to detect it and now it is going to wait for one minute before it moves this alert from pending to firing state so let's wait for one minute and then we are going to see that it is changing from pending to firing state and if you can also expand it and you are going to see all the details so here you can see that it is showing you a details of the this rule which we have created and also it is telling you that the instance name and you can see the ip address of the instance and you can see some other details like group name and also you can see the alert name for which it is acting so now it's been one minute and you can see alert has been changed from pending to firing state so this is the time when prometheus is going to send this alert to alert manager and then alert manager is going to create an email alert okay so for now uh we have not set up alert manager and i have just shown you how alert is set up in prometheus now in next tutorial we are going to install alert manager and in that case we are going to tell prometheus that all this alert should be sent to alert manager and then alert manager is going to be taking care of creating an email so this is the first part of creating an l and alert see you again in the next lesson
Info
Channel: Vikas Jha
Views: 13,575
Rating: undefined out of 5
Keywords: Alerts in Prometheus, Prometheus Alerts, Prometheus Email Alerts, Prometheus Alerts Manager, Prometheus Server Down Alerts, Prometheus architecture, Prometheus for beginners, Learn prometheus, Prometheus tutorial, prometheus monitoring, grafana prometheus, prometheus architecture explained, prometheus monitoring tutorial, what is prometheus monitoring, what is prometheus monitoring tool, what is prometheus, prometheus monitoring tutorial for beginners, prometheus setup
Id: kY6tXBFdf9c
Channel Id: undefined
Length: 9min 14sec (554 seconds)
Published: Sun Sep 05 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.