Is Nginx Ingress Controller Observable - Part 2 using Loki with Cyril Tovena

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

if you always wanted to observe your ingress controller then this episode is for you we are going to study how to observe nginx ingress controller utilizing logs and especially with log ql from loki [Music] [Music] welcome to is it observable the main objective of is it observable is to provide tutorials on how to observe a given technology today's episode is part of a series related to the ingress controllers and you see in fact it's more related to logs we will try to answer the one question is ingenix controller observable this episode will cover three solutions in three distinct parts part one will explain what is an ingress controller and how to consume the metrics exposed by the prometheus exporter part two we'll focus on how to extend the observability by utilizing logs and especially how to retrieve relevant kpi utilizing log ql part 2 we will build a lock stream pipeline exposing the metrics from our logs from the index controller's logs and build metrics for out of it so if you enjoy today's content don't forget to like and subscribe to the channel so let's see what we're going to learn out of this part two of the episode related to engineering's controller we'll start with an interview of one of the core contributors of the loki project and i'm referring to cyril tavena from grafana we will explain then how to build log ql so we will briefly remind you the various concept and aspect of the ql and then because we will have all the right knowledge we could start doing a tutorial so let's have a look at how to collect logs so probably with a log agent collector and four and transform those logs into metrics to do that i would like to call a friend an expert that is in fact one of the core contributors of the project loki and i'm referring to cyril tavena so let's have a call with syrian hi cyril how are you hi i'm great thank you thank you for thank you for having me henrik we were just i'm just doing an episode on how to observe an ingress controller and we naturally looked at various solution and of course we want to utilize the prometed exporter we saw quickly that the promethean exporter provides couple of metrics but there are dimensions that are obviously missing to build a very deep dashboard so we are not able to get the right level of detail so we would like to probably look at how we can enhance our absurdity by utilizing the logs so yeah i'm calling you because i would like to have your advice on how to extend our visibility but before sharing any best practices or recommendation could you briefly introduce yourself to the community yeah sure so uh i'm still been working at graphene labs for almost three years now and most of my time and most of my focus have been uh on the loki project which is the log irrigation system that we have at the graphene labs i know that promtel is the default agent running with loki i mean i'm i'm very interesting on how was the order did you start building prom tail in parallel of loki or did you just introduce promptel later on uh no actually it was from the start because the i think the main idea for loki was to be a bit a bit like a promoters but for the logs uh and so the only way to really reach this like um party was to use the same service discovery as promoters so that's what spontel is is doing so from the start we created pontel pontel is is using the same service discovery as promoters and so that will allow you to get the same labels for your metrics and your logs and so you can switch seamlessly between the two so from the start we added hometail but we keep adding new uh new targets to from there so that you can scrape more um different type of logs but at the beginning it was mostly focused on kubernetes so this is a really important information for the community so i suggest to rephrase so it seems that if you use promtel with loki and graffana all your graffana filters will be working smoothly between prometheus and loki data source yeah so if you use another agent it doesn't mean that it's not possible to do it it's just maybe a bit more difficult because palm tail and and promoters share the same services cover the end configuration you can take your um configuration from proteus there's like a you know part of the configuration can be used exactly the same in pom tails so you can copy and past the configuration and you'll end up with the same labels so that's the that's the idea and because you may already know about you know the protist configuration and you may already know how it works then it's gonna be easier to understand the frontal configuration because it it basically uses bits of the pronunciation configuration so it's not like you don't on you know it's not only with home dialect you can achieve that but it's easier with plum tail before we jump to the actual solution maybe it's a perfect moment to talk about a road map is there any feature announcement that you can share with us yeah so we just uh we just released 2.4 which has a lot of features one of them was probably the most awaited feature which is being able to send logs out of order and that was very complex for some people who were you know sending logs from fmr workload because it's difficult to synchronize log and send them another so now it's not necessary to send them in order uh and that has been um released like this week after we have like we always like the team is growing a lot so there's like uh we at the beginning we were just three to four now we are almost 12 uh in the team so there's a lot uh that we that we plan to do um we want to uh solve the top k at scale i don't know if you if you see what this means but the idea is like um we we now have a lot of data and we want to be able to answer a question like what's the top k out of uh all those logs you know for for a given ip or for a given customer give me the top k this is currently already in the language but at scale it doesn't really work very well because you can have like terabyte of data so we want to go uh towards some sort of estimation um we definitely gonna always try to improve also the language like the local language we want to uh always add new features that's going to make it easier to pass your log and i think in pontel the idea is we just keep keep going adding new targets so that you can scrape your logs from different sources we recently added kafka for instance which was uh asked by a lot of enterprise organizations like a lot of big enterprise they really like to shove their log into a kafka and then redirected kafka to a different source okay i'm very interested on the top keys so the idea i guess is to provide like a dashboard i guess in grafana that will highlight the main load patterns that has been collected over the last week or last month is that the main objective behind this feature uh the main idea is more like an api to answer those type of question after obviously it's going to be you can use that in grafana to to shoot the top customers for instance querying your database or creating your nginx or your invest controller but you know the end goal is to provide an api and to be able to answer those questions currently there's no like there's one that's i don't think it's scaled very well at least depending on your own iq depending on your scale perfect that's awesome all right so everyone wants to have an answer on how we could resolve our observative problem related to the ingress controller what would be the journey if we want to utilize logs and extend the visibility on our ingress controller yeah um so i think the the the nginx use case is very interesting because you um you probably already realized that it doesn't provide a lot of metrics and um it's not really your application so there's no way for you to kind of like create metric um out of the blue or like you need the code right and it's not your code so and that's that's kind of like not a uh use case like there's a lot of application as a developer that we use that are not our own and maybe they got a lack of uh instrumentation in term of magic um and all you have is the logs um and that's that's the that's a great use case for loki because you loki can use those logs and usually logs you know contains quite uh quite a lot of detail compared to metrics um and so you can pause this log and and try and try to create your own matrix out of them so that's the that's the idea of luxury and how to use it uh on on logs so i guess your solution is to get the logs coming out of the nginx controller and then use promtel to collect the logs of course and build like a lock stream pipeline i guess to transform your logs and expose them the the new metrics directly from prom tail would you do it that way or or differently no i wouldn't i would so i mean it depends but i think the rule of thumbs in term of like trying to process your logs is if your logs are already human readable on your side you shouldn't try to maybe make them passable by a machine because you're going to lose that human readability and i think after all logs are made to be read by human you know to look at what's your log line uh you know the the the trend of your lock line and json is so difficult like when you have like a big blob of json and you have thousands of them on screen it's very difficult to go through with your eyes so i wouldn't recommend to try to make any of your log easier to pass by g7 or by your machine but keep them human readable i see i see this mistake that there's been like this mistake has been um done very often um and it's because most of the system not doesn't really allow you to uh ingest anything than json for instance and so people will try to transform their log into json uh they may already they may also log in json because they think it's gonna be easier um but i think the best type of logs are the ones that are human readable so nginx logs are already human readable like there is a single line they don't have a specific format they're made to be very easy to look to look at the lock format also is very uh so log fmt which is a very popular log formatting go is very human readable so i wouldn't recommend you to change anything um from your from your log at the source uh it really depends on the use case obviously like there's some case where you want to process the logs in case for instance that it's jason and you you just you're enough with jason and so you could use a pipeline in comtel too and wrap g7 and only use the log line so there's a case or there's some case where you want to uh pass but most of the time i'd say try to keep your log as they are especially if they are human readable okay then i guess in your case the transformation and the creation of the metric itself will be done through locuel so at the end you will keep the original source and then you transform it directly with luckyl just to remind the community so log ql stands for log query language as we briefly slow on the episode on loki we have a lot of labels with nanologs so the log ql will filter the log stream based on a label and then transform slightly the log to expose any metrics like response times or by exchange by looking at the documentation of loki related to log ql there are several type of operators or pipeline steps like the parser the line filter the label filter and more when building a log ql are there any recommendation where on when we need to use those filter we all want to retrieve data but in a performant and reliable way one two three clap we want of course to retrieve the data but we want to make sure that the way we are retrieving that data is performant and reliable so is there any steps that you need to do before or in a specific order to be more efficient and more robust and more and even more performant yeah so uh local is actually heavily inspired by promutus so i wanted to to let you know that because if you already used to punctual then using locus is going to be it's going to be a bit easier because every you know it reuses a lot of concept from prom cure as for the you know advice or tips to make um quiz faster i think i wrote a blog post to give a bit of idea how to get the most of your queries but usually you should be aware that um loki actually scan the logs because it's not using a big index uh so the first step which is uh um using your index to find all the logs and that's like a very similar than the label matcha query uh in poncure and that's what's gonna decide on how much log you're gonna create so make sure you matches exactly the logs that you need to match like if if it's for instance if you want to do a query of a full name space then you use the namespace but if it's a specific application use the specific application plus the name space so try to reduce at the sources as much as possible the amount of lock that you're gonna pass and scan and then after there's a succession of uh pipeline that you can do um the first one uh usually is a filter so that means that you're gonna try to do some sort of prep filtering the logs based on the world there is like two types of filter one is a regex and one is like contain like a string the later the string one is the the fastest one so try to do a combination of like contain multiple of them all together and then if you really need at the end of the day to do regex do it at the end so you can chain them all together and you should try to make them the most precise at the beginning and then at the end the one that you that are more specific to the lock line because it goes from left from left to right as you know the more the more you filter the more the last operation is going to be faster so that means all the operation like passing for instance uh when you need to pass the log line and then add new labels that you can use after for more aggregation or for more filtering uh that's the same like you try to filter first before doing the password if you have if you're looking for a namespace and you want to uh pass specific logs and some of them are not in the right format you can you should try to maybe filter them out before uh throwing that into the parser so always filter and then pass and then after all the filter if there is uh filtering based on what you extracted but usually it's the best the best uh way to go about performance to use like the the line filters i remember when i played with low ql on the episode related to loki i used the json parser and it seems that sometimes your log source are not fully json friendly so the json parser could cause some errors due to the conversion issues how do you manage those exchange exceptions because i know that there are few specific fields to catch the error and manage properly the query do you have any recommendation when using the parser like the json parser yeah um so i guess there's like two ways to deal with those uh the first one is maybe just discarding the the the logs that are not in the json format because i guess you you were expecting that all the logs were in the json format uh correct json format and some were not so there's a way to tell a loki with a like a specific uh key which is like the underscore the score error on the screen the score and can tell to discard anything that has an error so that's one way the other way is pretty simple is maybe just make sure that you select a lock screen that has only the specific format that you're looking for uh but that's yeah that's the idea so now i got the idea we will ingest our logs using prompter build a lock ql where we will filter based on the label related to the nginx ingress controller we will parse the log stream to match the nginx log structure parser will help us to extract new labels and then we will use unwrap on those new labels to expose the right metrics of course we could use any functions to plot the rate the average the percentiles of a given indicator after playing with loki i was pretty impressed but if i build a query that filter transform expose metrics in a dashboard i was thinking that it's almost like there's a sort of an api call done by graffana to refresh those dashboards to loki so i guess in a very large scaled environment loki could be blasted by a lot of calls coming from different graffana dashboards so is there any sizing or any recommendation on how to utilize loki in a very large-scale environment so the the this specific case where like a dashboard is refreshing or you have like a mini dashboard and every 10 seconds is refreshing it shouldn't really be a problem because uh if you think about it on the dashboard there's just one point that has been added based you know from the last time so what we do with loki and uh on graph on the cloud is that we actually cache those response they're very cheap to cache because there's just metrics uh now so we catch them and then when you ask the same query but for let's say five minutes more data then we're just gonna compute the five minutes missing um so that shouldn't really be a problem only the first query is gonna be a difficult one because it's going to have to crunch through maybe 12 hour of data but then the subsequent way is going to only uh crunch the five minute missing um so we're using cash in loki and that's that's available also for you in the open source usually we use memcache but you can also use radius if that's what you want prometheus as issue to scale ion you can only scale it in a vertical way after a certain number of metrics prometheus start to consume a significant amount of memory in our environment so what about loki could we scale it horizontally [Music] it it is possible so there's actually a path here uh that i usually recommend uh usually i recommend people to use the single binary version of loki which is like a just a single binary scale only vertically for now and i just you know get used to that play with it uh get your um you know get used to this um it's usually is good enough for uh testing test uh you know test cluster development cluster it might be fine for a small kubernetes cluster but at some point if you want to ingest more than one cluster or if one cluster is is very big then you need to scale in that case there is like multiple path in 2.4 we release a new way to scale loki which is you decide so you basically have two type of deployment one which is the read and one which is the right and you can scale them uh independently so depending on what you need usually you want to scale maybe more the right path because you're consistently constantly receiving logs and then the read path depending on you know the type of quiz that you're going to do you can also scale it accordingly and then there is another way of running it which is like all the components that loki actually contains so lucky is actually designed to run as a micro services architecture so there's a lot of components that we bundle up into a single binary uh if you run it that way or into two different read and write path but at the end it's actually us who's running it also for graphing a cloud and we ingest like you know 50 terabytes of data per day or even more than that so we receive a lot of data and so we need to be able to scale for all the customer that we have in a single cluster for a single region for instance uh and to do that we run grafana loki uh as a microservice architecture like everything is distributed so you the root path in the right path is different for sure but inside the read path you're gonna have a lot of different components that's a bit more difficult so i will recommend to go down that road only if really you are at scale that it's not working anymore with the two previous type of deployment that i explained earlier um there is a end chart that is called i think the loki distributed that child i think it's good enough to do to do that uh otherwise we have a uh also gstnet library that we provide in the loki repo that does the distributed deployment but just to uh sum up make sure that you start with the simple case of deployment at the beginning and then try to increase the difficulty of the deployment as you scale and and that's the that's the idea i was not aware of a read loki and of right loki so you have two distinct components one will be in charge of ingesting and the other more to deliver the data back to grafana do you have any resources to share that could help us to deploy look in a large environment yeah there's a there's a there's actually like a graphene observer which you can talk that happened this week and that goes deep into the detail of this new type of deployment which makes it simpler to scale loki so you don't need the new distributed micro services version anymore especially if you're not used too much to loki that can be a pain i know that graffana provides lots of standard dashboards that you can import from the community from a log perspective we depend on the source and the label that are exposed in the source are there any standard dashboard using loki as a data source or because of its structure does it really make sense uh there's a there's a graph on a dashboard sharing feature on graphina.com website and i'm seeing more and more of those dashboards that contain logs now i think there is one for community that is based only on logs and is only in the event logs of kubernetes so you can maybe look at um at graphic.com and see if there is any dashboard is useful for you but we also have like a another we have like a cloud integration so when you use graph on the cloud you get uh integration depending on what you uh what you have as a what you're running as a workload so if you're running casama or kefka or server there's like you you can select in the in the processes what type of workload you're using and it will install automatically dashboard and those dashboards start to slowly also get logs added to them we are going to try out your suggested solution utilizing log ql do you have any last recommendation for the community when starting using loki and not ql um yeah we said i think the last recommendation i have is we said that uh you shouldn't try to pass too much to logs at the source and do that in loki and loki i didn't mention but there's like four passer um two of them are pretty easy to know which one you're gonna choose because um they have uh heavily depend on the format like the if it's a json log then you're gonna use the json passer if it's a log fmt lock format then you can use this one but then when it's not either of them you are you have to choose between the pattern and the regex passer and i think you should be aware that the pattern pass is a bit easier to use than the rig x1 and it's going to be faster also so you should try to privilege to use the patentasa with human readable logs and if really that doesn't fit your use case then fall back into the rig x parser which gonna be consuming more resources be a bit slower and difficult to write regex who knows how to write regex nowadays thanks for your time and your recommendation yeah thank you thank you for following me and rick was a pleasure a bigger place for grafana because loki is really an amazing project and it helps the community to take advantage of the logs so let's see the log ql so log ql stands for log query or language it's technically a prompt ql for logs from ql if you remember prometheus query language by the way we did a dedicated episode related to promptql so if you want to learn more about promql don't forget to watch this episode if you're using an observative backend solution supporting lock ql then it becomes crucial to understand what you can achieve with it and probably know how to build your own queries as you probably understood everything starts with a lockstream pipeline that will collect logs from various sources and store it into a lockstream storage solution like loki so there are obviously several a log collector and for water in the market so we sew from tail provided by grafana loki there is flindy and flimbit so once the log stream are stored we obviously want to be able to consume those logs to try and transform them to be able to build dashboards alerts and probably more in the episode dedicated to loki we briefly introduced locuel so log ql will allow us to filter to transform to extract data as metrics once you have a metric you will be then be able to use any functions from the traditional promql to aggregate those metrics a log ql will be able to return two types of format first a lock stream and a second a metric usually you you build a metrics from a lock stream so you will build a log stream that will be transforming your logs and then afterward you will probably expose only matrix a lockworld have a specific structure so first you have a string collector um and then you have what we call a lock pipeline not the lock stream pipeline but the log pipeline where each step of this pipeline will be separated by a pipe when collecting our logs the agent collector and for water add context to our logs we saw that like the pod name the service name anything to bring more context so technically the log stream selector collector will allow us to filter our logs based on labels that are currently available in our log stream so similar to a from ql we will be able to utilize a label matching operator like equal so exactly equal to that label not equal of course and then you can use any type of regular expressions that will match or in the other way that won't match that regular expression once we have selected or filter uh the the log stream we probably want to apply a log pipeline so let's jump into the log pipeline a lot pipeline basically is once we have filter-based dialogues based on labels we one probably wants to process filter put a format on the logs um and you will see that there are several way of doing this so a log pipeline could be composed of first align filter um a parser a label filter a line format a label format and last we have an wrap and wrap you will see it's dedicated from it let's so let's see those type one by one so starting with the line filter line filter is similar to a grip where we will apply it over all log stream it will search for content that is available in the log line so you can do equal that will say the log line contains that string but you can the other way is a different so not equal so the log lines doesn't contain that string so if you don't want to have errors for example you can filter with this operator and then you can also use regular expression so a line containing that match that regular expressions or line that doesn't match that regular expression so here is an example so here first i filter my lock stream by applying a selector so here i have a label container and i'm only interested in the container called frontend then in that log stream i just want lines that has errors contains errors the other example is i can say i'm only once uh lock stream with cluster equal us central and i want to have lot errors but not timeout errors the other thing is the parser expression parser expression can parse our log stream and transform it to add extra labels then once we have those labels we'll see we'll be able to filter more or apply various functions so there are already a couple of the predefined parsers so like json so if your log stream is in a json format then json will be very useful the other one is log format so log format is the standard format of logging so if you have that type of logs then you can apply that parser then you have parser which is very powerful and pack and regex so let's see the the previous example that we had with the container equal front-end where we are only interested in the logs of we're having errors so here is the official lock stream so you see there is the lock stream is obviously in a json format you have the pod name the id you have the namespace and so on so if i apply the json parser um basically as a result of that all those uh json attributes will be exposed as labels so now we'll have two new uh labels the pod underscore name underscore id and then we have another one called namespace so you can also precise a parameter to the json to specify which label you would like to extract so basically avoiding to get everything log format will extract all keys and values from a log format fine pattern pattern is very powerful you can explicitly extract fields from a lock line for example so let's see we have this lock stream this lock stream by the way looks pretty familiar to the njnx controller so you can see here in this lock format you have the ip address the date and then you have a string with containing the http method the path the status code the number of bytes and you have the user agent yeah lot of lot of useful information so this log in fact line could be parsed with the default expression using this those characters so you basically you can say that here we have the ip here we have the date here we have our sap method and so on and so forth and if you don't want to collect new labels there are a few information that you're not interested in then you can use this pattern means that you are not interesting to keep that label regaped will be basically similar to a pattern but you will have to precise the expected format utilizing regex irregular expressions now label filter once you have partial log stream extract and added new labels you can again apply new label filtering for instance if we have utilize a transfer transformer logs you're using log format man we have now two let's say two new labels duration and by consumed so then i can after this parser i would be able to do apply a new label filter and say that i'm interested in the duration that is above uh one second or one minute um and the buy consume is over 20 mix then you have line format expression line expression will help you to rewrite your log content by displaying only few labels so here we have an example so container frontend remember then we parse it with the log format and then here you can see we apply the line format where we want to structure the line differently so we have first the ib the statuses and then we can also apply calculations so we have the duration which is in milliseconds and we want to basically expose it in seconds we divide duration by 1000 then we have the labels format expression label format can rename modify and even add new labels to our modified log stream because at the end we start with lock stream and we are parsing transforming adding things and now we are at a certain state where we probably want to add more labels as well so then you have the metric queries to extract log stream and build metrics out of those log streams metric queries is up is applying functions to a log query results the log query will retrain return a range vector so if you don't know what is a range vector you should look at the promql episode that will explain that format there are two types of aggregators the log range aggregator and the unwrapped range aggregations so let's have a look at the first one the range aggregation so similar to prometheus a range aggregation is a query followed by a duration you precise uh how what is the duration where you want to agree so you can apply functions like rates that will calculate the number of entries per second the count over time where you will count the entries of each lock stream within a given range the byte rate calculating the number of bytes per second for each of those stream lock streams uh by over time that will count the amount of bytes used by each lock screen for a given absent over time where basically it will report if there's a missing metric so the value of absent over time is you probably want to build alerts saying that if there is missing indicator missing lock stream it could be a sign that there's a problem let's take an example here we have a log ql uh you can see that we have we are summing by host uh so basically we're already putting a split and so basically we do we are filtering the logs that are only related to the drop my sequel we're only interesting on the logs containing errors but not time out then we parse it to add new labels using json and last we are doing a label filtering where we're only looking at duration above 10 seconds so if we apply this it means it calculates the num the the number of lock stream per seconds where we had a duration above 10 seconds and we are summing it by host then we have the unwrapped range aggregations and wrap will precise which label will be used to expose the metrics so there are several functions that you can apply so you have the rates of course to calculate the rate per second you have the sum of a time to sum all the values of a specific interval you have you have over average over time to the average values of all points in a specific interval you have the max over time to get the maximum the min over time uh the first overtime the last overtime and then statistics operator like standard deviation std bar and last uh the quantile so get the percentile of a given data so let's take an example with this log ql so here quantile over time means 99 means the percentile 99 and then we can look at the log the the log ql so you have the cluster equal ops tool so we are filtering only on this specific label and we also add another filter it's called container equal ngx controller and then we parse it to using json and then we are putting underscore error to remove potential errors because we're using json probably there could be a parsing error and last we unwrap the request time to over the one minute at the end it means that we are calculating the percentile 99 of the request time uh splitted by path coming from our nginx log files in this tutorial we will reuse all the steps of the tutorial of the part one where we installed prometheus the ingress controller graffana and we configured the ingress to expose grafana and the demo application which is the hipster shop now in top of that we will have to install loki and prom tail we'll use the default logstream pipeline of promtel to collect the logs from the pods of our own cluster we will need to update the login format of nginx to add extra information to our logs like the response times information related to the service the pod and more once everything has been deployed we will connect grafana and build a locule to extend our previous grafana dashboards with metrics extracted with the help of our locul all right let's start the tutorial all right so like uh usual like every tutorial i would deliver there is always a github repository related to this episode and here we are touching the ingress controller episode uh and uh we already covered the part one on the previous episode and now let's jump into the part two so like explained uh we won't walk through uh all the uh deployments of the ingress controller the prometheus uh the configuration of the ingress because this has been technically be done on the part one so we will mainly focus on the things that are left to be able to first install loki and uh also configure a bit the ingress control so the first piece that we need because we want to utilize logs properly we probably want to extend the logging in that the logs that are generated by nginx so like briefly mentioned during the part one when you we installed the ingress controller we had we defined a name for the configure config map of ingress control so let me show you if i bring my terminal here here and if i just do do a quick cuddle and get complete map i have all the conflict map to prometheus and the one i'm referring is the nginx so if i do so if i edit this config map uh we here is the the detail that we have so the complete map is obviously completely empty and uh we want uh so basically the this is the the equivalent of the nginx configuration file you make sure to copy the line that i have added here in the uh github repository uh don't put any quote that's very important to make sure that it's going to fully work uh oh i see that i have a small minor change let me update this one i see that i have a double quote and let me [Music] this and to be able to uh to take advantage of the new settings with the new log format uh we need to basically uh kill the pod uh that runs our ingress so let's get the pot here's the ingress and now i'm gonna need that pod uh so i will force basically communities to reapply uh the new deployment the deployment with so the part there will be a new part with that will take the new config map and uh and we will have it so let's get the new port id and check the and generate some traffic uh to uh just to see uh if everything is if the new log format is being considered all right now we have the our our logs here and let me do some few traffic so let's go to and back and maybe do open engine x add another online boutique and do some traffic here and then up let's check out if we get the logs all right so let's have a look at the logs generated so we have our ip address all right perfect we have the the path that's perfect and now you can see that we have 200 the bytes that's the response times and then we have a couple of information so i have a few things here um i have the upstream time so that i have a difference between the upstream and the response times and the and then i can see which uh which what the name of the the host that uh was taking it with the name of service i have the uh the the name of the online the ingress uh and i have the name space and i have the service that will serve so it we have all the the right dimensions uh but keep in mind that loki is going to grab the logs and he's going to also add some labels because it's the value of log collector and for order so he would do it as well all right so next step now is uh deploy uh simply our loki uh and for that uh we are going to uh add the helmer repo uh so here is the home repo but i already have it so it won't change much i will just do a helm repo update to update all the a helm upgrade of loki all right here it is and uh let me check what we have here so we have our three prom tails running uh as because prompting is running as a daemon set so on each node and we have our loki server which is just here uh i just need to configure graphana and we'll also get the service because at the end i need the name of the service that will with the port of loki so here it's 3 000 for loki and also so this will be required to configure loki in grafala so let's grab this one and configure this in graph here in graphing to do that very simple i simply need to uh click on data source i'm going to add a new data source and in our case it's low key here it is and it's going to be http low key and the port with 3 100. all right so now we have our loki data source connected now we can jump into explore so keep in mind that uh you the log will coming in for you a few seconds for a minute so if if you don't come directly you don't see the nginx logs don't worry it will come for sure so just be patient so now i'm selecting the low key data source and i will basically look at the various labels that i have and i'm looking for let's say the pod so it should be something with nginx that will be our first filter here it is we have the nginx ingress here it is so now we can see the logs coming in with all the various formats so first you can see that there's a timing here so i can either add the timing or not so this is the default so keep in mind that the log starts from here all the lines be just uh in front of that is just the the structure of the logs uh or the the low key uh so i can it says that it's just cd out and then here f means starts the lock line itself now so now we have already added a label filter like you mentioned but we could also if i wanted to say okay maybe i want to add the let's say the container or the app up so now i have two uh streams selector applied and now after just this one i will be able to also apply and some some lock stream uh some pilot pipeline so remember i would need a grep so i'm gonna grab all right so first we wanna do is we have the label we are going to put a filter on ingress to make sure that we only collect the log that comes from uh that has ingress so it means it's a traffic that goes through the ingress so that's important for me uh so it means that we'll have all the information about the service and so on if not uh that will cause some problems now second thing we want to do is to utilize the function the to process uh parse the user parser so pattern is the obviously the the right one you need to start so put everything between those two characters and then you simply describe the pattern so let me copy paste the structure of the log format that we just uh configured in uh in our uh in the the config map so here you have the the engine expirables and then we're just going to basically replace it with the bracket so here up node address all right then we have the time okay then we have the request duration okay status in between if you look uh you have the url and just after the url uh well let me check just after the url you have the stats you have the http version so this one we don't care so let's get rid of this one this with this syntax we saw that then we're going to take the body sends that's interesting label to get response times all right upstream address uh abstract response times the proxy host let's say we don't care so let's remove this one so we're gonna not collect any labels for this one like this oh like this uh then we're gonna take the status that could be interesting uh the resource name okay and last it's the resource type the name space and the service all right so now we have everything uh let's me run this one up so uh we don't didn't cheat much because we didn't use anything but now let's say we want to put a label filter so we have new labels now we have a label called uh oh request is more than just this i need to put methods because method is before this is my mistake and so method and then there is request so i'm gonna run the query oh small mistake all right and now we have another so now i can try to say i just want to look at um the the lines that have the that has a new label called method that's the one and we want to have equal get okay all right so now let's have a look it was working so it's filtered to only the get and if i do post uh i don't i'm not sure if there's any post request to be honest but let's see oh we have one so see it's very easy because now because we've did this by par we had this new label that allow us to do uh either even more um more query on this so let's keep on the get all right so now we have our query uh with ingress and so on uh what you could do to figure out if the path the or your pattern works fine you can expand one of the lines so that will show you uh all the log labels that we have so in the if there are for some reason issues you will see it directly there so we can see that we have a namespace here that works fine so here it seems that there's some problems so uh on this one so there's a warning here so let's say we don't want a warning we just want the one with we don't want stdr so let's remove this one so i'm first where is stdr i don't even see it here so let's first put a another one say not equal std [Music] s t d r all right which one query all right what do we have now let's check so we don't we only have the std out so let's grab one of those lines but why do i'm doing this because i want to make sure that all the the the labels are properly defined as you can see resource name here has a problem so let me grab this query here and we're going to do a new dashboard all right so now let's say that i want to do so now we have a lots of our stuff i'm going to do a sun by service because now we have a service service label and and fro with it will be uh with the rate per sec the rate of the the query [Music] so i'm gonna here it is so now we have this uh uh the the you can see the query so we can see that we that will count the number of incoming uh so here we just we don't do any of wrap so you will count the number of incoming uh log stream that match that pattern so it it sort of the get there could be the requests per seconds because we we match every request coming in which are with the status under 400. that could be uh interesting so let's put this so number we can say number so let me put some some just to stand some traffic on [Music] all right so now we can see that we have new comm new incoming traffic coming in count by so now we are counting the number stuff so let's put this as a number of http requests per seconds per ingress okay now we have this one okay so we can move this one here over there so that that's now a metric that are 100 collected by logs let's add another one and let's say we want to have response times so we already have the response times uh from two uh two different areas so maybe we can do the same thing so i'm gonna copy paste all right then put the unwrap so uh make sure to sector the low-key data source that was my mistake one time over time 9.99 and then i'm gonna add this one and put the and wrap at the end because otherwise you want to work and wrap and wrap and we're looking for the request time uh over let's say one minute okay and we want to do by service [Music] all right so now we have this this is the basically the response times [Music] that's another one so you can see it's uh it's very interesting to do the through logs we can also maybe uh play around with let's say so let's say i want to plot number of bytes coming in so we can sum by service because we want to still want to split by service and we would do a sum over time so we're going to sum the number of bytes some over time here it is and at the end we do unwrap okay of byte send okay uh unless [Music] okay this one so now i am summing the number of bytes so we have 1 000 bytes per second on this so that can be a number it's gonna be the amount so that's it for today's episode the part two related to the ingress controller today we discovered how to basically transform logs into metrics utilizing log qr as you can see there is lots of great operators and you can filter way more by adding more labels to your log street we did a tutorial so it expire me it's great i think it's a great solution but let's see how we can do the same thing using another large agent collector and forward like fluent d of little bits so this will be the purpose of the next part part three on how to retrieve metrics from your logs using a luxury pipeline all right see you soon for the part three [Music]

Info

Channel: Is it Observable

Views: 82

Rating: undefined out of 5

Keywords: cloud, grafana, ingress, k8s, kubernetes, logql, loki, nginx, observability, prometheus, promql

Id: rrPP6ITOXt8

Channel Id: undefined

Length: 61min 46sec (3706 seconds)

Published: Thu Dec 09 2021