How to build a PromQL (Prometheus Query Language)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

uh welcome to is it observable the main objective of is it observable is to provide tutorials on how to observe a given technology today's episode is part of the kubernetes series we have already started few episodes on it we saw how to collect metrics how could how to collect logs we also saw service mesh and istio now um it's related to kubernetes but in fact it's also beneficial for any environment using prometheus we can almost say that this episode is part of the prometheus series in fact because we will focus this episode on one of the language called prometheus query language prongql so if you enjoyed today's content don't forget to like and subscribe to the channel what are we going to learn out of this episode so first we're going to introduce the various data type handled by prometheus this is very important we see because when you're going to do some from ql it's important to understand those different data type then we'll jump into the verse format of data available in prometheus to store our metrics so if you're planning to build an exporters that will be very useful for you then we will jump into the various way of filtering the data and last we'll see the various operators and of course like usual we will do a small tutorial on building first few queries using our cube state metrics and node exporters that we have previously installed on a previous episode so pronql is essential when using prometheus because basically bronchule is the main angle which that will help you to visualize the data in grafana or prometheus or even build alerting rules so without knowing anything about promql it will be very difficult to take advantage properly of prometheus so let's see the prometheus data type so when you apply a prom ql prometheus data will be evaluated in four type of metric so we have string scala range vector and last is instant vector so we already had an episode on prometheus we saw the architecture and how to deploy it and the various components involved and we mentioned on this episode that prometheus store the metrics scrapped from the various collectors in a time series database which means technically what happens is that you have an identifier for your metric that you are planning to store and then it will be stored related to the timestamp where we were able to collect that data so at the end of the day we can set it in the prometheus storage we will have time series one u value one for metric a for example and then same thing time step two value two times three volume three and so so on the structure of the identifier um will of course you will have a metric name but in prometheus you also have the notion of labels so here we have an example of a metric so http request total and you can see that there are uh after the the name there are a couple of the various labels that comes along along with it so you have the usually the job so the job will be where did i scrub that that that matrix so basically the the job will be the name that you have defined in your scrap configuration file of prometheus then in this example you see that there is a second label called method so you can have in our case with http request you we know that we have different type of uh of request with the various methods so the gets the puts the post and so on and then you have the path the path is basically the endpoint of our request so with those couple of labels the advantage that we would be able to filter you will see that later on uh the data stored in our time on in our time series diagrams in fact technically the metric is stored in a sort of json object where so this is the the other representation of the of the metric so you see the various labels that we previously described and last you have another one called underscore underscore name underscore underscore and this is basically the name of your indicator so if you want to you'll see that in the filtering we can also try to filter to say i'd like to collect all the metrics starting with http request for example that will be also possible to achieve due to this structure so we when we run from ql we mentioned uh permitted will transform the data and present it on various formats so there are form formats we mentioned so the the string format i mean no surprise it's a string so it's a text so when we run a query prometheus depending on the on the function that you use will transform it and return a string simple then you have the scalar scalar is basically a numerical volume so this is also very simple to understand because we used that in the in our id world that we were all working on but then there is the two other types which are a bit more complicated so instant vectors and range vectors so let's start with the first one the instant vector so the data collected by prometheus there will never come and stored at the same moment remember you have prometheus that will reach out to the various exporters to collect the various data so which means i can start with one exporter then second one the third one and so on so all the metrics arrive in prometheus at different timings so when you run a query on a specific matrix we you can evaluate the data based on a time current time or let's say a specific time so if i use for example this query with rate cpu memory usage total with various labels the host the type the name space and so on then at the end here i'm adding like um a filter on on the timing so this 30 seconds means uh evaluates the rating from now to the last 30 seconds so this query will return the value of the cpu memory usage per over the last 30 seconds so instant vectors is basically it gives you a value one value for a series and if you don't precise it will be the last value reported so instant vector one value for a specific timing at the opposites you have the range vectors so it's a range so which means it's not going to be one value it's going to be a set of values measure between two time steps maybe now and last month or last week and last month so keep in mind that yeah instant vector one value range vectors several a set of values that you will basically be able to collect so now let's see the various way of storing the data in prometheus so if you're planning to write your own prometheus exporter it's gonna be very crucial to understand those various data types because you will have to select the right one depending on the the nature of your metric but even if you don't plan to write your exporter it's very important to understand those various data type so then you can utilize the right functions depending on the type the data type so what are the data type available so you have counters you have cause you have instagram and you have summaries you will have to select the right set of data type depending on your use case so for example if you build your exporters there are plenty of http uh prometheus clients to allow you to build your exporters you have the go language explorer the client you have the java client you have the python client the ruby ruby and uh and so on so those clients are usually very easy and there's a set of functions depending on the data type so when you write your exporter you have to say i want to have a counter you give a name of that counter and then you will basically increment or or set values in that counter so the couple of methods helpers to allow you to achieve the building of your exporters and it's very very simple to be honest so let's see the first one the counter so the counter type is a type aim to store metric that will increase in time so for example let's say the request count this is gonna be increasing in time the error count it will increase also in time uh a total for example if i'm counting the number of bytes that i'm writing a disk that could be a counter as well so counter should not be used for metrics that decrease it's not aimed for that so what's the purpose of a counter in prometheus well you usually use counters for like i said metric that always goes up and totals um and there's plenty of metrics by the way there so what you can achieve in terms of query because counters are very specific and it's very hard to correlate or understand uh a pattern related to a counter so that's why there is one function that will only support counters and this function is basically used everywhere in most of our from ql it's rate rate takes as an input on the counters so you need a metric name of course and the period of the time and the rate basically is returning the per second of this given encounter so if i say in our example previously we had the http request count so if i said the rate of the ftp request count it will mean that we'll have the number of http requests per second for a given period of time and rate is returning an instant vector remember that so the biggest limitation of of uh counters is you have to to to to know that is that for if some reason your exporter is basically crash then if kubernetes restart your exporter then the counter will restart from zero and it will keep incrementing again and again so it means that you have to consider this challenge when you are using counters or when you are planning to build prom qls with counters second type is gosh gosh can be used for metrics that could go up and go down so for example the type of measure that goes up and down response times uh memory usage cpu usage i mean there is so many counters it's not counters sorry it's a lot of many indicators that goes up and go down so garage will be very a lot of use quite often so how do you can query garage in prometheus there are also functions on an operator that would say that will take as an input gauge and those usually are aggregated functions so like average over time so if i take the average overtime of an hp request during a specific period then that will return me the average response times over the last let's say five minutes for example rate of course we mentioned it's only made for counters so don't use rate with garage metric the third type is histogram so instagram will allow you to specify a value for predefined buckets so let's say i want to report the http response name and the response size for example so every time i use histograms to report let's say i have two seconds then prometheus will take my response times and will basically count the number of requests that were fitting in a specified specific packet and so let's say we have a bucket like between one and two seconds so the two seconds will fit in the two seconds bucket so histogram will count the number of requests or indicators that had the value in our buckets in fact there is a default bucket which is here you can see that it's there's a clear structure of the bucket so the default bucket will all only support metrics that goes up to 10 so 10 seconds if it's a request response times so if you need to report a metric that will be that will have higher values than 10 10 then you will have to define your custom metrics so it means that you need to know how the minimum value potential of your potential metrics and the maximum value of a potential metric so then you can structure correctly the buckets depending on your need eastern instagram are usually used to report any measurement of a value to calculate let's say averages percentiles and others so if you not interesting to get the raw data out of your measurements then instagram will make a great job so guards will be like raw data and instagram will be like aggregated metrics so prometheus also when you use histogram will automatically store the number of elements in the histogram and the total number reported so we can easily then calculate averages so it's let's take an example here we got rate with a metric duration sum over the five minutes and then also the sum is automatically calculated by prometheus with the help of the histogram and then you can do the rate of the other uh default metric that is calculated is the count so i'm taking the sum divided by the number of total number of metrics and that surprise give me an average simple like this so uh and then of course every time you have to precise the uh timings over the last five minutes or for a specific time step we could also you with instagram there is also predefined functions to allow you to calculate person titles and this function is histogram underscore quantile so if i do instagram i'm sorry of 0.95 and then i precise the uh the operator so i said some of the rate of the request duration buckets for example for five minutes and i want when i do this um we'll see it later on you can use an operator by that will separate that will or that will basically split the metrics on several dimensions last is summary so summary is very very close to histograms so so let's say instagram has function we sew it to the instagram quantile to calculate the percentile so basically it means that i can basically calculate person tiles on the prometheus server by using these functions in summary it's different you will have to calculate it on your exporter so summary data cannot be aggregated from a number of application instances instagram requires to use a difficult bucket we mentioned that or define your own buckets and if you're working or utilizing a metric where you don't necessarily know the values so you it's very difficult to predict those values in advance to to design the right buckets then summary would be perfect for your case because you we don't know the values yet up front so it makes sense i don't want to define that bucket so i'm going to use summary for that so we usually use summary when we need to collect as many measurements and then later on averages percentile so if you're not interested in raw data and precise values of your metric then summary will make a great job for you so we can use summary for response times uh response size or number of basic chains or how you write your disks to build again those buckets let's see now the way of filtering the data because at the end we're going to do some cronkite so we need to filter so we saw that our prometheus metrics has labels so there will be of course a label filtering operator so that will allow us to filter the metrics that we have stored in our time series database so we have operator equal so it means that select labels that are exactly equal to provided string then we have not equal simple not equal to that string and then we have all the irregular expressions operations so either a positive regular expressions or a negative regular expression so this which one important concept that you need to keep in mind is that you need to use a filter in your label filtering that is not an empty string so you cannot do for example equal to nothing does it doesn't make sense this request is forbidden it's bad now equal nothing you would never do it of course but i'm telling you that because if you're planning to use regular expressions make sure that your regular expressions always return a value so if you do this regular expressions with that nature with that expression there's a big chance that we return empty streams so that is a bad query you should probably replace it with this expression that says at least one character the second type of filtering is the range operator where you can specify a time duration that will filter vectors between now and a specific timing so we had a couple of examples previously and those were basically your range selectors so time to raise time durations are specified as a number so it's 30 and then it follows by the units of the time so you have several units available so you have milliseconds you have seconds you have minutes hour days week and last you have of course years months and so on then you have offset it allow you to request a value from x second ago or days or minutes so similar to the range uh range selector you need to precise the unit as well so as an example i can say http request tool offset five minutes so that will return the value of the http request total five minutes in the past then you have the modifier so the modifier allows you to change the evaluation date for a specific time frame so if you don't use an email modifier by default it's the current timestamp so you can combine range selectors and offset with modifier so let's say we took an example at the beginning say rate of the http request total uh of the range of the five minutes um and then if i add a modifier at the end i could precise at which time i'm referring to so let's say i have the timestamp of last week and i want to run the query and that will basically return the number of http requests per seconds of the last minutes at that time and that date now let's jump into the operators so there are two type of operators in prom ql so you have the aggregation operate operators that will only use instant vector as input so remember or if you want to do aggregation you need to transform and unuse range uh vectors but instant vectors as an input and it will return instant vector as well as an output so input in some vectors output instant vector so what are the what are the type of uh aggregation operator that we have you have a sum we have min you have max you have standard deviation uh you have averages there is plenty of them now and then you have the binary operator so basically arrhythmical operator operations like plus minus divided multiply and so on you have comparison uh operator like equal not equal less greater than and you also have end or unless and then there's also a couple of functions so rate you mentioned that rate takes as an input a range vector and you will measure the per second of a counter remember a counter for a given period of time rate functions is smart why i'm telling you because remember counters could be restarted if our exporter is crashing and rate is detecting those crash so for example if the valve goes up one two three four five ten and then goes back to zero then the rate has understood that it crashed so he's going to go back to 10 and go up so it's very smart so you don't have to worry about it he will manage uh those counter that has been probably restarted all right i think we cover most of the meaningful operators and filters allow us to build queries so let's jump to the trolls and make some few examples of course alright so so like usual uh every tutorial that you will have in easy observable will have its own github repository so the github today's episode is about prom ql like we mentioned um so the all the steps required to uh deploy these tutorials at uh your own environment is here um so like you may like you we mentioned we already have with the previous episodes especially the the first one on communities and prometheus we already have deployed um google uh cloud communities cluster and gke we have deployed prometheus and grafana and uh and that's it so when we uh are not going to do those steps we are going to focus directly in the from ql so let's jump into grafana so in grafana uh the like we we had in the previous episode the the storage that the dna source of prometheus has been already configured so then we can either uh run our queries in explorer either run our queries by building a dashboard looking at the the metric itself so let's do explore for our case and for this um tutorial i'd like to go through various things um so um if we we we have the node exporter and a cubesat metrics so i want to basically uh look at few metrics and there is one metric exposed by the cube stack metrics uh it's the cube parts that you space and we're gonna just type it so the great idea is that uh a great thing with uh graffana is that it's uh yeah you can see the metrics coming in here you don't have to exactly know all the metrics so that it helps you to search for metrics um so let's do put status and the one we were looking for was the face so we we are going to simply run that query and why because we want to see the label so the first thing that we want to do is look at the various labels that we have and uh try to use some view label filtering all right so we can see here uh in this particular metric coupon status phase that we have uh endpoint uh labels for the endpoint instance job namespace phase pod and so on so let's try to um use a query for example that all right i like to only look at the phase field for the namespace hipster shop all right so similar to what we did before uh we need to [Music] put the bracket here to see the various labels so uh with compression it also gives you the various name space so we want to have the uh we set the phase equal failed and the second condition would be on a namespace equal hipshop so let's run that query so now it's going to basically return for us just instant vectors so we don't have any counter here but we could basically now play around with few a few functions so let's say for example from this query uh failed doesn't make sense but let's say i'm interesting in counting the number of running pods per namespace so without not so let's remove first this uh this nemesis-based filtering and let's put the right filter which is running here it is and now we want to do account all right so count is an aggregating functions so here it is i'm taking count and which is great with all those aggregation functions is that you have the buy operator so then in the buy operator you could basically uh split these this counter per in my case namespace if i run that query now i have a count so you can see that i have 19 15 12. so here it's showing me each individual um values for each of those namespace so here we are in explore which means you're not able to do any visualization so if you want to transform this one into a graph in the dashboard so we're going to jump into dashboards and we are going to basically run the same query here so i'm going to copy paste here run the query again up here we have the values and now we could basically says embargo why because here it says it allows me to uh to to basically see the number exact number of running parts per con for them spaces to hear default hipster shop easter systems and cube systems so let's jump into a fresh new dashboard so i'm going to do a new a new graph here so i'm going to add a new panel and now we want to play with more a bit more with the operations so i'd like to pick a metric which is account so either you know the metric either you have no clue or which is a counter value or not but there is a there is a rule to be honest when you build your metric name if you put total then it makes sense that it's a it's a counter so let's search for total so if i search with total like this i will be able to see basically potentially all the metrics that are that are counters they will basically increase in time and the one that i'm interesting is let's say we wanted to utilize in the node exporter there is one metric called the node io node disk io time second total so this is the one i'm looking for so now if i run that query you will see it's going to be a counter with the various values um so let's now use rate so we have a counter we know it and we want to evaluate the number of disk io time per seconds all right so i'm going to add rate at the beginning here and i need a range selector all right so if you put for example let's say 10 minutes or five minutes just keep for the seek of the example there is and then if you run the query now we can see that we have the number of disk io per seconds that is displayed in this graph so you can do pretty much of things here this is an example with counter so counters remember we mentioned that you have to use rates to be able to uh to utilize things now let's have a look at uh uh any metrics that use instagrams uh so similarly if you wanna searching for instagram you can basically search for buckets because this is going to be added by automatic automatically by by by prometheus so here you have plenty of them and there's one in fact that i'm interesting is is the in lcd there is a metric called these object count all right so now we have this one here uh but these object counts and there's the bucket here you see the bucket here and if we for example do this this query or very quickly we can see that uh there's a plan different type of buckets so it's the customized packets by the way um which you can see here and now uh i like to uh report for example the um the histogram the quantile so the percentile 95 of this query and as you can see either i can type it but here graphana has automatically detected that it's an instagram so maybe we can add instagram quantum in in the beginning so which i i'm gonna click here and blah blah i didn't have to type it it's basically it was exactly the desired metric that was looking for so you can see that we have some every certain uh couple of hours we can see that we have a couple of objects that has been uh that has been released in etsy now uh last thing that i want to show you is maybe to let's say i'd like to count the number of the object created in lcd in our cluster so and for that there is also another there is a similar to this this value there is a object counts if i run the query you will see it's um it's uh it's it's the counts that is provided by the the bucket so and to consume it we will have to do a rate similar to the previous time so rates first so we're going to count the number of object counts uh a number of objects that we have in std uh so we can say every five minutes for example and then i can close this one and i run the query so now we have the rates of the number of the objects per seconds that we have in a cd um but now uh i want to take advantage of the sum so i want to i have the number of objects per second and i want to do a sum and why i want to do a sum or i can do a count in fact it's going to be the same thing um and i want to do a sum of the uh by splitting by dimensions so i want to do let's say here you can see that resources is different various type of objects that we can find in cd so if i do resources i will be able to split this one so if i run the query here it is i see all those uh various resources here and you can see that is mainly one that is going up quite often it's the events um it's created so again that's a pure examples um what is also interesting here in in in grafana is that you have transform and transform basically is a way of uh helping you to build a query so it's basically helps you to design the query and by there is another solution which makes completely sent to show you today it's a solution called prom lens uh so if you go to promlence.com you can either have access to the live demo which i'm going to show you or you can deploy it on your cluster it's going to be basically the an extension version of your prometheus ui and what i really like about it is the fact that you can start a query and either is a form and you can put in basically here you have a couple of wizards say which query type you want to do which metric name the label filtering all the things that we were playing around and explaining during today's sessions is basically dried by prominence so if you're have if you struggle to build um a complex queries problems would be a good solution to help you draft those queries so that's it for today's episode i hope you enjoyed this content uh don't if you do at least don't forget to like the video and subscribe to the channel and don't forget if you put your comment uh and let me know if there is any specific technology that you would like to have a tutorial on so please reach out to me you have all the contact details on the youtube channel alright so see you soon for another episode bye you

Info

Channel: Is it Observable

Views: 1,021

Rating: undefined out of 5

Keywords: Prometheus, cloud, observability, promql, kubernetes, k8s, Gauge, counter, histogram, Observability, grafana

Id: hvACEDjHQZE

Channel Id: undefined

Length: 38min 34sec (2314 seconds)

Published: Thu Sep 16 2021