How to collect logs in k8s with Loki and Promtail

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

uh welcome to is it observable the youtube channel providing tutorials on how to observe a given technology today's episode we will talk about one of the main popular technology in the cloudy space and in fact we already started to cover it on the previous episodes i'm referring to of course kubernetes so last episode we covered how to collect kubernetes metrics with the help of prometheus today in this episode we will cover how to collect logs from your community's cluster we are extending the observability so let's see what you are going to learn out of this episode first the importance of logging so we need to define what is standardized logging and centralized logging and because this episode we will use one specific solution called loki then we need to define few concepts so the loki's architecture of course and loki will see utilize different log collector and processor a profile order sorry and the the default one is going to be prom tail of course there's a lot of other agents out there and of course the one that you probably think of is fluent d and fluent bits but i think they deserve a dedicated episode for that so last we will finish with this episode with a tutorial where we will install loki on our existing kubernetes cluster that we already deployed in our previous episode with prometheus in grafana and then in the tutorial we will basically connect loki to grafana and build a dashboards with the help of locuel so let's see the importance of vlogging so every developer here has reading logs to send debug info error information logs are so crucial there are a lot of meaningful information like the time span in the methods the stack trace in case of an error so basically if you are basically from the source of the logs there are so much information we could utilize to understand our system how our system is behaving uh and also be more efficient when we need to troubleshoot so to be able to take advantage of the log there are a few things to consider so first standardized logging so what it is since ages we've been uh writing applications and those applications was writing logs into files but there is no standard so i can have my own application writing on a specific folder a location in my in my disk or network and then after if you don't pay attention and you let this application writing logs then after a while you can run into disk issues oh god please no sure and this is somehow something acceptable so how can we avoid this there are several uh options possible the first one is okay let's say i don't want to write in a log um i want to take advantage of those cool agent log collectors in for order that makes sense make sense but remember we didn't put any standards which means if i deploy that the log collector and for order will be in charge of taking the logs where from the location where it's stored uh process it and then forward it to a central solution that will store the locks but again because we don't have any standard it would mean that you will have to configure all those endpoints all those targets where the agent has to scrap the logs from so it's going to be just annoying to go through all those configurations so if we don't put in standards it could be just yeah time consuming for us so what are the other options the other options could be instead of writing into a file we could maybe code directly in the application say i don't want to write them too far anymore i want to send it to the storage to the centralized solution that will store in index logs and that makes completely sense i'm getting up one step in fact so let's do this so we take a couple of days we add some few lines of code in our application and ba-bam we got our logs in the right location perfect but there is a disadvantage you know there's always someone in your organization that comes in the morning say today we don't want to use application a anymore because it's too expensive or it doesn't make sense we're gonna use application b all right so the consequence of that is that you have to go through your all your code that you already done to change the destination of your logs instead of sending to a you have to send it now to b so i think that doesn't make sense it seems very expensive just to manage your locks so how can we make it more clever so for that the best way is to leave our application writing the logs in a standard format and what is that stand format if i'm writing a bash in the linux environment or unix environment and i put an echo then i run my bash script you will see that in the terminal in your console you will see the logs arriving those logs the echo just has sent the logs to std out so basically sdd out has printed out the logs in our terminal in the container environment it's the same thing if i run an application and just do a system out of my logs and if you type the command docker logs then you will see that the logs will come come up on your terminal similar to the bash so in fact in the container world in a docker world the docker runtime takes the logs sent in std out and managed for us so basically it will take it and write it into a log file and that file in docker world is stored in var lib docker and then inside of that folder there is a folder called containers and you will have one individual folder for each of your container and the name of the folder will be the id of the container which means we know that if we run in containers then all the logs that you're writing in by using system out they will be directly pushed to that full file so we have a standard so pushing the logs in std out is clearly making a standard so now we know kubernetes is using containers which means we know that we could take advantage of that standardization because if you write a logs and we just send logs there boom we got our files written in the right locations so we don't have to deal with dozens and thousands of locations to be able to collect those logs so now it's clear we know what is standardizing and now let's see the centralized logging so now we know where the logs are located so we could clearly build a process that will run in each of our nodes and collect those locks make sense but it doesn't make sense in fact there is so many great solutions out there it doesn't make sense to reinvent the wheel so why don't we take advantage of components that are named log collector and for water so you see it you can hear it in the name you have collector and you have forwarder so collector is obviously collecting things but it doesn't do only collecting it will also transform remember we could write our logs in our own format so we won't probably want to transform the logs to make it standard and we are running in a kubernetes environment so we probably want also to add some context details so like the pod name the name space the node name and so on and then we are going to forward log stream once we have the design information to solution that will be in charge of storing the locks and probably indexing there's a lot of solution out there and most of them they're using index like elasticsearch and splunk they're using index index are great because it helps you to do a lot of queries on the logs but it has a consequence it just consumes more memory today i wanted to have this episode to present one of a promising solution on the log uh area called loki so loki loki is a project built by graffana labs so basically they wanted to provide to the markets a logging solution that will be easy to use reliable and performant we can set up some prometheus is the is used for metrics and loki will be the prometheus component for logs a lot of the people in the community compare loki to prometheus in fact so you'll see that there is a lot of similarities between low key and prometheus when you deploy loki there are several components that will be employed deployed on your cluster so first naturally you will have the loki server the loki server is acting like a storage it will store the logs in a time series database [Music] it's very similar and but it does you won't index it so loki is clearly a data source two store logs so you're not going to use locate to visualize your logs loki will act like a data source for grafana so you will use grafana to create dashboards with the help of locul so we will briefly introduce docql but it deserves clearly an episode for that then aside of the server you will have some agents that will deployed as a daemon set which means on every of your nodes you will have pods that will be there to collect your logs in terms of agents loki is pretty much flexible it supports various of agent collector and forwarders but the default one is named prom tail so prom tail like expected will collect the logs and forward it back to loki there are various forwarders supported by loki like i mentioned i mean the most popular one is fluentd influenbit but today let's focus mainly on prom tail so let's see promtel promptel is doing several actions when collect logs so discover the targets having logs he will attach the labels to the lock streams and then he will push it to a low key to that will start store the logs prom tail has a configuration file it could be config yaml or prom tail in fact this configuration file is stored in a config back config map when you deploy it with the helm charts in the configuration file you are several things that you were precise so the server settings which ports is the prompt you're listening to proptail also exposed in http endpoints while you will be able to push logs to another prom tail or directly to loki server you have a half endpoint and of course last very useful you have a slash metrics so promtel is naturally exposing metrics in a prometheus format so then you could also track the number of price exchange the stream ingested the number active and failed targets and so on so forth then you also have the client configuration it specifies how it connects to loki um and also you have another setting which is important is the positioning so what does it mean positioning position is required to make loki reliable so basically when loki uh prom tail is collecting the various lock streams he marks where it is in this job in a position file why because if your prompt will crash for some reason then when kubernetes will gonna relaunch it then he will reach out to the position file to figure out what was the last thing he has collected so then he can resume the work so you don't won't have any duplicated lock stream stored in loki then you have the scrap config scrap config sounds very familiar as well so scrap config will precise jobs that will be in charge of collecting the logs then you have a relabel config that will control what to ingest what to drop and what type of metadata you want to add to the log stream you can also automatically extract data from your logs for example if your logs are exposing the time spent in the method or the number of bytes exchange whatever my type of statistics then you may want to also extract that type of metrics and expose it in a metric like prometheus let's see now the configuration of loki loki has also a configuration file stored similar to the prom tail in a config map the configuration will allow you to specify where to store the data so you can store it directly on the volume or you can also store it on a cloud provider if you want how to collect uh how to configure the queries as well for example the timeout of a query the maxed num tail durations and so on so forth when deploying loki with the harm charts all the expected configuration to collect logs from your parts will be there automatically so of course if you need to change the way you want to transform your logs or if you want a filter to avoid collecting everything then you will definitely touch the configuration of prom tail and few settings in loki but in our example we don't need it we don't need it we want clearly all the logs to then visualize it in grafana so now let's jump into the tutorial like every tutorial that we deliver with e is it observable you will have it uh in github.com an organization called is it observable and within this organization you will have one repository for each individual episode that we deliver so here obviously it's the episode about kubernetes and logging with low key so everything is described step by step so you can do it from your own at your own pace um the only detail here is that we are not going to install a fresh new cluster and redeploy prometheus and grafana we already did that on the previous episode so let's take advantage of this so let's jump into we will start the tutorial from the step where we will have to install low key on our kubernetes cluster to install low key on our community cluster uh i would recommend to have a look at graffana.com the documentation of loki there is the details of the uh loki's helm chart and there is various type of deployment with the helm charts you can deploy it loki by its own or with loki and problem tail and graffan and prometheus so basically pick the installation that makes sense for you and then you just have to do that so for us we're just interested in installing uh low key with prom tail so that will be almost at the default one so let me bring up the uh the the terminal and i'm just gonna do a cube cuddle to show you that we have uh prometheus and graffana installed uh like we did on the previous episode now the let's jump to the step we are where we are going to basically deploy loki so here the first thing i'm gonna do um so i already have done it so i'm gonna add the uh grafana loki hem charts to my repo it's already done so i don't have to do it next i would recommend to do a helm repo update uh to make sure that you're always referring to the latest version and last uh we are obviously going to install loki and you'll see it's going to take a few seconds to do that um and that's it we have it so let's jump let's do a cube carol get parts all right so let's we can see that we have a loki uh server which is this is the part related to the loki server the the the storage um and then we have a side of that we have one from tail pod uh per node so in my case i have three nodes that's why we have three parts for the uh loki promptil now uh i'm just gonna display the services so why because i need the exact name of the service why do i need that because we're going to basically configure grafana to connect refiner to i mean connect loki to graphana as a data source and uh they are in the same name space so i just i don't have to utilize anything i can just put the name of the service that's that's great it simplifies a lot uh so let's jump into grafana so here is grafana i'm already in the configurations of grafana as you can see there's the data for sections and i'm going to add a new data sources you can see there are various categories of data source and the one really interesting is in the logging side and we're going to select loki here and terms of so the default port is 3100 and the the endpoints for our loki server is the name of the the service in our cluster because they are like i said grafana and loki is located on same name space same cluster so there is no no issues to connect both of them so that's it loki 3100 and i will save and test to figure out the data source is pretty well connected which it is now that we have connected loki to grafana let's go to explore just to look at what we have here so in explore of course you can see that we have prometheus and loki so make sure you select the loki data source there's two dimensions metric and logs so if you're just querying the logs you can stay in logs and then from the moment you're going to add functions with the low ql to basically build a graph out of your logs then you will have to switch to metrics in the explore of grafana so the first thing i'm going to do here so we have a couple of uh jits here you can figure out here on how to filter on various uh jobs and apps and label here what i'm like want to do is i'm interesting on collecting the logs related to my product that we have installed on cluster which is hipster shop so now you can see all the logs uh stream arriving here in our screen there is a lot of errors in std out both are are collected by by default by uh by loki let's say that we are interested to just filter first uh on have a better filtering based on the label so let me select one of the log stream that makes sense for me so this one and uh it seems that this stock stream comes from the service front-end so i can i can also add here for example a filter say i want to also filter on the app front end which i did i didn't change the settings of loki but i could have it installed in order to basically have a new label to differentiate std out nsdr error so i could easily filter here so i didn't do it but we could basically change those settings to have a better filtering now if you don't have the labels you can still do filtering so here i'm going to add like a a pipe like a grep um and if i type equal it means i'm looking for a lock stream containing this character or this string so here i'm interested in collecting std out so i'm going to basically say std out so i'm filtering on this so if i rerun so you can see here he has already undefined it that i'm running uh that those has what i'm looking for and then if you look at the line i'm i'm looking at the information i'm looking for um i'm looking for it seems uh because there are a few information about the status code the response times the bytes it's it's basically you can see here uh it's a severity a debug uh here you can see debug here so let me add as well another filter here so i'm going to another another pipe and we're going to add deep bug here so that will help me to again filter again so here i only filtering to all the std out having the uh filtering uh the visibility at the debug level and then if i want i can also transform the stock stream into json which means all those properties that you see here for example http dot request if i convert it into json what would happen is that um those those fields will be considered now like a label and i can also apply extra filtering on top of that so if i do json like this you won't see much in fact to be honest uh but if i had a battle of filtering i could also say all right so now let's take an example we have let's say http underscore resp underscore status okay and we are looking for let's take the one of the line that are looking for let's say here five hundred let's say five hundred it makes sense uh equal 500 right and if i run that query then it will filter me to this so you can see that by this uh i was able to take advantage of these functions that will transform that a label and add extra filtering into that once you have this done this uh there's plenty of functions and log ql is very pretty it's a big topic so we're probably gonna do an episode dedicated with that but i can still use for example a rate function so for those who familiar with um with pro with prometheus you probably know these functions i need to do a sampling so i'm going to do like a 30 seconds for example now 30 sec 30 seconds and now once i have this i want a metric because it's right it's going to bring me a metric so it's a graph and i can run that query oh there is an issue oh yes because because i use a json here then basically the conversion could also introduce errors so you can also have to filter that so let's remove just just for the think of the example of showing you to how to plug the metric on this i'm going to remove this filtering on the status and and the field and now you can see now it's it's gonna call me count number of log stream per seconds arriving with the filter that i've applied so uh the filter is a sdl debug and for hipster shop so here you can see that i didn't have much but at one certain stage here i had like a 295 um 95 lux stream per seconds that has arrived in the in in grafana so that could be a a rate again we could i i we we could you could be very creative and extract metrics that could be uh utilized for for a dashboard so now we have that query in explorer so then i want to take advantage of that so i'm going to jump into uh the previous dashboard that we've built on the previous episodes um and i'm going to add a new panel here and uh it's going to basically utilize loki here again and i'm going to send that query now we have it and then i could basically use graph here and if i want of course i can customize a bit and the display of this graph uh to to add columns or or steric or are more precision so i can let me put some bars to have more colors uh so you can do a lot of things i mean it's more a pure grafana feature here it's not not related necessary to our interest of bringing back the locks in in uh incovar so uh let's put a name so a lot a debug hipster hipster shop debug stream per second it's it's not a super relevant kpi but now i'd say i'm i'm utilizing it so uh as you can see this i didn't have nothing at the beginning because we just freshly deployed loki but now we can see that we have a big amount of logs and i can follow that again this was a pure example but again you can by looking at the logs they are very uh meaningful in information that you could collect uh also you can figure out how many stack trace and errors or exceptions that you're having just by acquiring this all right so that's it for today uh we learned a couple of things today uh everything about logging how what is low-key and uh moreover how to connect loki with grafana uh we've seen that there are a lot of interesting things we can achieve with the log ql so we are clearly going to do a dedicated episode to cover those aspects all right so if you enjoyed this video don't forget to subscribe to like the video and stay tuned there will be soon other episode on is it observable see you soon bye you

Info

Channel: Is it Observable

Views: 1,357

Rating: undefined out of 5

Keywords: kubernetes, k8s, logs, loki, grafana, cloud, observability

Id: XHexyDqa_S0

Channel Id: undefined

Length: 28min 45sec (1725 seconds)

Published: Thu Aug 05 2021