Grafana Loki querying basics, log based metrics and setting alerts on logs

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey folks uh this video is going to be about graphana loki graphana loki is an aggregated logging solution and in this video you will learn how to do basic queries how to do log based metrics so basically counting the log lines and third you are going to make an alert based on those um counted log lines so if you get too many errors or too much activity you will get an alert based on that so let's get started so i have this grafana installation and just to verify that i have a working loki connection under data sources i pick loki and i push the test button it came back green so loki is good to go so when you first start out with loki i recommend you that you go to the explore view in grafana and big loki as data source so you could start constructing your queries now with every logging solution there is a cache there is a query language and learning query languages is not the strong side of anyone therefore grafana explorer view has a little wizard that helps you to construct those queries and we are going to use this wizard for the first time and then i'm going to talk just uh about the basic structure of these queries so first of all in step one you can pick a bunch of labels kubernetes labels and based on these labels you will set a a set of label selectors and you can pick basically a log stream so there's not much in my cluster so but it's a githubs cluster so i'm just going to pick the flux system namespace and in the app labels i'm picking source controller these two selected me a log stream and if i click show logs then the logs from the flux system namespace pods which are labeled with app equals source controller are showing up in this view on the top there is the volume of the logs and in the bottom you can see the actual logs so no need to do cube ctl logs anymore [Music] if you are familiar with githubs this source controller component basically goes out to github pulls every i think for me it's 15 seconds and checks if there is a change in github and if there is it will apply on the cluster so in this view what you you see are actually these heartbeats that it goes out go goes to my github's production apps and github's production infra repositories and just checks for changes so that's basically uh eight logs per minute eight log entries per minute and in the top you can see this roughly averaging at eight logs per minute so cool um this is actually the first part of how to do queries with loki first you have to select a stream this is very similar to prometheus labels if you are familiar with that one you can assume that the way you do label selectors in prometheus works quite similar in loki now what else can we do with these selected log streams well you can pipe them you can pipe them into different further filters so for example if i go and pipe it into this regular expression where i'm wildcard select selecting something from this filter for example github's production infra then i will see only the log entries for this particular git source and as you can see this is half of my log traffic so we really narrowed it down further so cool uh basically this is the basic structure of queries in grafana loki first you select the stream and then you pipe them through a series of selectors if you go to the grafana log ql documentation page you can find various other options for for the query language we also summarized it at gimlet uh we have an e-book on uh building a basically a developer platform basically summarizing the things that i i just told you querying basics you first pick a log stream you then pipe it into various further filters you can you have seen how to do a regular expression based filter but you can do direct direct matching or filtering things out cool if you have uh installed graph analogy with gimlet stack through the gimbal dashboard then you will have this dashboard available it's basically similar to the explorer view but if you are not familiar with the loki querying syntax it makes it one step more convenient than the uh the explorer view so basically this is my flux system namespace and if i type here just this random string just on my previous filter then you can see it filtered down let's stick to this dashboard just a little more because by looking at the queries in this dashboard you can further familiarize yourself or just confirm the knowledge that you have already about the loki querying language as you can see in this dashboard what i did is that i did a variable based stream selector then i piped it through regex based wildcard query and there was a possibility to enter row query so i just can't get that those uh querying filters at the end of the of the the um query so this was how to do the querying part and the other interesting part about loki queries is that just like in prometheus you can apply a rate function on selected log streams so if i use the rate function with a five minutes look back then i can get a pair per five minute average log entry count if i change this to one minute then the the granularity of this chart changes changes slightly so this is about counting log lines and with grafana there is one very cool thing is that you can set alerts basically what you can char on a panel or a dashboard you can set alerts for so building a dashboard from scratch and then create a panel on that counting log lines and at the end i'm going to set an alert on that value so if i start editing this uh panel i'm just gonna call it flux source controller activity and pick the data source to be loki and i'm going to replicate the query that that we created before and namespace is flux system the app label is source controller and i'm looking at logs here so the dashboard type is logs just to verify that actually there is some output and i was filtering this down to anything that was saying git ops production infra in a wild card fashion and this is actually a log query to make this a log-based matrix i am applying the rate function on this one and i have to provide look back time range as well so if you are applying if you are doing functions on on a stream of metrics you have to set a time range how far to look back for data points now the rendering is gone it's because it's not logs anymore that i'm displaying it's actually a time series so i am hovering on this 0.07 number of logs what does this really mean i think i have to multiply this by 60 seconds so basically it tells me that there are four log entries per minute so this is a per minute metric right now and it's actually correct because every 15 seconds uh flux goes out to github and makes a log entry uh that means four entries per per minute so that's cool i have this um i have this dashboard i'm going to save it and i'm going to call it flux source controller activity and i promised you that we are going to make an alert out of this so how to make an alert you go back to the edit panel and in the alerts section you start creating your alert you have to name it you have to put into a folder i just created this alerts for the folder previously and this is the query this is the one that we have made and there is a condition that you can set on this query and i'm setting a classic condition when the last entry of the a queries above three then make an alert maybe i'm not doing last but i'm doing average just to smooth out your regularities and because my numbers were very low so for example uh 0.04 and if i run the query i will get this nice view so this is the the query result and this is my alerting threshold if i place this to 0 6 and run the query again or maybe 0 9 then you can see that the red line is above the green line so there will be no alert this case now just to make this video useful like um what uh threshold would be meaningful to make here uh with flux and git ups i think what i want to be notified about if there are not four um polls uh in a single minute so i'm actually uh i'm not something that is above condition i'm selling on is below condition and i'm setting it to 0 0 2 so basically if flux's activity drops below this line i will get an alert i run the query everything is fine um i can preview the alert it's in normal state just to just to make this a different state how about i preview preview now it tells me that right now with these thresholds it's alerting so that's actually behaving correctly uh i am previewing it again it's in normal state and cool so basically this is my alert i save it and exit from here and actually this was it maybe i navigate back to the alerting rules section where you can also look at your current alerts and you can define notification settings and policies and so forth you can set the contact points and you can set policies and you can even silence alerts so graphone as alerting has come a long way and i think since version 8 the alerting experience is brand new and i recommend you to take a look here so cool in this video so far we first covered how to do basic queries um then i showed you how to apply functions like the rate function on on the query volume and we have click created an alert as well so that was that if you liked the video i recommend to hop over to bookkin.io as well to maybe read about the loki section and maybe share it with your team and if you liked it then i hope you will stick around next time too so thank you
Info
Channel: Laszlo Fogas
Views: 62,164
Rating: undefined out of 5
Keywords:
Id: VEGYgPiAazk
Channel Id: undefined
Length: 13min 22sec (802 seconds)
Published: Thu Apr 28 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.