PromQL Data Selection Explained | Selectors, Lookback Delta, Offsets, and Absolute "@" Timestamps

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone so before you can even do anything useful with your data in prom ql or Prometheus like Transformations or computations you first have to understand how to select just the data you care about now there's two overall types of data selectors in chromeql instant vector selectors and range Vector selectors that help you select the time series and the data points from them that you want but they also come with subtle behaviors and modifiers that you can add to them to change the behavior so those are things like the five minutes look back Delta stainless handling offsets or even absolute evaluation timestamps those nuances can often be confusing if you're new to Prometheus so let's explain and demystify all of them in this video Let's Go thank you so the first type of data selection we can do in prom ql is to select the latest value for a set of Time series relative to an overall query evaluation timestamp in the simplest form we can just write a metric name itself as the entire query and that will select the latest sample value for all time series that have the metric name demo memory usage bytes in this example you can see a list of Time series in the output here with one sample for each series and all of those output samples are internally aligned to the same timestamp that's why this query construct is formally called an instant Vector selector with Vector referring to a list of Time series and instant referring to the fact that there's only one sample for each series all aligned at the same instant in time that instant is the same as the evaluation timestamp of the overall query Now by default in the tabular UI in Prometheus this will be the current timestamp from the point of view of your browser but you can also set this to any arbitrary custom evaluation timestamp at which to run your query for example a couple of minutes in the past and then all data selection happens relative to that custom timestamp the only thing that changes when you switch to the graph mode is that the same expression is evaluated not just at a single evaluation timestamp but at many successive timestamps along the range of the graph the distance between those successive time steps depends on the query resolution and by default Prometheus will choose a reasonable resolution depending on your zoom level but we can also make these individual resolution steps more obviously visible by setting the query resolution to something really low like five minutes between output points now you can clearly see each individual step here and you can imagine that this same instant Vector selector is run independently at each of these resolution time steps to produce output points for exactly that step let's switch back to the tabular view for now besides the metric name you can also filter the return series based on their label values you can do that by adding a set of label matches in curly braces after the metric name and there are four different types of label matures first you can use an equality matcher to return Only Time series where a label has a certain value for example only memory usage of a certain type and if you wanted to invert that you could also use an inequality measure and finally you can do regular expression matches on the values of a label and return either only the series that match or the ones that don't for the positive regex match we can use in equals tilde measure and for the negative one rewrite an exclamation mark followed by a tilde so for example we can select any Series where the type is either buffers or cached like this and we can select the opposite set of series by using a negative regex matcher just keep in mind that all regex matchers try to do a full string match so if you wanted to match only part of a label value you would have to write the regular expression to explicitly allow for that for example by adding dot asterisk at the beginning and at the end of the regex to allow matching any arbitrary number of prefix and suffix characters as well you can combine multiple label matches by separating them by a comma and in that case all of them have to match in order for series to be returned so in this case we are only selecting non-free memory usage for a particular single instance okay but let's talk about the details of the selection Behavior a bit more imagine that we have five time series in our tsdb and our instant Vector selector matches three of them what does it even mean to select the latest value for each of the Matched series relative to the evaluation timestamp that we can freely choose and which is often just right now we probably don't want to show values for series that ended a day ago for example because the process that exposed them got turned down on the other hand we also don't want to require updates to series every couple of seconds either that would require us to have super short scrape intervals the compromise that Prometheus strikes here between those two extremes is to look back for a maximum of five minutes for the most recent data points this is the so-called five minutes look back Delta and it's a server-wide setting for instant Vector selectors you can theoretically change this using the query lookback Delta command line flag but unless you really know what you're doing that's usually not a good idea now within that five minutes window the instant Vector selector then chooses the most recent sample and returns that as the output value for that series but with its timestamp being aligned to the query evaluation timestamp and if one of those series does not have a sample within that window it just gets emitted from the output in a graph query you would then see a gap for that series while the five minute look back Delta is a reasonable timeout to remove stale series that haven't received recent updates from the result it turns out that we can do even better in some situations Prometheus can explicitly notice that a Series has terminated at a certain point in time for example when a series that was returned by a Target in one scrape is no longer returned in the next scrape or when the target scrape fails or when the target disappears completely along with all of its series and the same is true for individual series from Rules as well as entire rule groups in all of those situations Prometheus writes out an explicit stainless marker for the affected time series into the tsdb that marker basically means this series terminates here and an instant Vector selector can then immediately stop returning that series if the latest sample for it was a stainless marker in this example one of the series has a Stillness marker drawn as a red circle with an X inside if we evaluate an instant Vector selector right after that marker the output for that series will be empty even though there are other samples in the window before the stainless marker this way we don't need to wait for five minutes to remove stale series from our results now if a stale series does reappear again later and we do evaluate the same instant Vector selector after that time it will appear in the results again okay let's go back to our Prometheus UI instant Vector selectors are great if you want to see a single sample for each series either at one timestamp or at each resolution step in a graph but often you want to select multiple samples at each evaluation step and then aggregate them into a single output point for that step for example using the rate or the derivative functions to select the range of samples we can specify a duration within square brackets at the end of any instant Vector selector to turn it into a range Vector selector the selector then returns all raw samples within that past period relative to the evaluation timestamp in this example we can see four samples being returned for each series for a one minute range Vector selector since Prometheus scrapes the underlying time series every 15 seconds note that you can't directly graph a range Vector result since it would produce multiple output points for each series at each resolution step in the graph which would be hard to display in a reasonable way so you would first have to feed the range Vector into a function that condenses it back into an instant Vector like the rate function in this case then you get the per second rate of increase as measured over the provided time window at each step if we visualize the range Vector selector behavior for the same time series as before there are multiple differences first we can freely choose the Look Back Time range for example one minute 5 minutes or one hour and then the selector will return all samples under that window not just the last one stainless markers are also only relevant for instant vector selectors by the way so if they appear in the window of a range Vector selector they are simply ignored for both types of selectors you can also time shift the data selection window into the past by adding an offset modifier with the duration to the end of the selector in this example we are selecting the latest data point for the metric demo memory usage but it's not right now but relative to two minutes ago in the graph view that means that each resolution step now selects a sample that is two minutes older than it would otherwise have been effectively time Shifting the graph data to the right as you can see here as I'm iteratively increasing the offset duration let's visualize this to understand it better first we match our series again but instead of the five minute look back Delta applying relative to the query evaluation timestamp it now gets time shifted to the Past by the specified amount we then still select the latest sample within that offseted window and make it the output sample again aligned at the end at the overall evaluation timestamp now you might be asking why would you even want to time shift the selector like that if you could just as well move the entire query evaluation timestamp to the past instead well the main use case for this is if you either always want to show a time delayed result relative to the current time or more commonly if you want to have multiple selectors in the same prompt URL query with different relative offsets for example maybe you'd want to compare the memory usage from a week ago to the current memory usage then you could write one selector that is time shifted by week and another one that isn't and subtract one from the other to get the difference offsetting works the same for range Vector selectors as you can see in this visualization except that we then select all the samples in the time shifted window sometimes you also need to be able to select data not relative to the current query evaluation timestamp at all but always anchored to an absolute timestamp the most prominent use case for this is graphing a stable set of top K or bottom K series over the range of a graph and if you want to understand this use case in detail check out this Prometheus blog post down below which I'll also mention in the video description so we can anchor a selector to an absolute timestamp by adding an add modifier to the end and specifying a timestamp at which it should run you can either provide a raw Unix timestamp in seconds or better in most cases use the start or end pseudo functions to refer to the beginning or to the end of the graph range for most use cases of the admodifier like stabilizing the top K or bottle K series you will want to use the end function to Anchor the selector to the end of the graph range so let's visualize how the add modifier works with a custom timestamp that is in the past relative to the overall query evaluation timestamp you can imagine this working just like the earlier offsetting that we did except that the time shift is not relative anymore but absolute and other than that the selector still works the same way you could even combine offsets in absolute ad timestamps in which case the selector window first moves to the ADD timestamp and then moves backwards by a relative offset from there if you want to combine all the selector features and modifiers that we learned in this video you need to know in which syntactic order they go here's an example of me writing an instant Vector selector with just a metric name then adding a label matcher to it then turning that into a one minute range Vector selector and finally adding both a relative offset and an absolute add timestamp at which to Anchor the selector so this selects one minute's worth of memory usage for the first of three instances relative to the provided Unix timestamp and time shifted into the past by one hour alright so hopefully now you understand everything there is to know about data selection in Prometheus and if you want to learn Prometheus properly and from the ground up please check out my trainings at training.promlabs.com also please go like the video if you found it helpful and subscribe if you want to see more and I hope to see you in the next one
Info
Channel: Prometheus Monitoring with Julius | PromLabs
Views: 6,927
Rating: undefined out of 5
Keywords: prometheus, monitoring, selectors, selecting data, instant vector selectors, range vector selectors, offset, at modifier, lookback delta, label matchers
Id: xIAEEQwUBXQ
Channel Id: undefined
Length: 13min 57sec (837 seconds)
Published: Mon May 22 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.