Kubernetes events

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

uh [Music] welcome to easy observable the youtube channel providing tutorials on how to observe a given technology today's episode is part of an existence series related to a cloud technology named kubernetes so we already have several episodes where we explain how to retrieve metrics with the help of prometheus how to retrieve logs with loki and everything little bits and i will cover a very useful angle of the kubernetes observability the events if you enjoy today's content don't forget to like and subscribe to the channel let's see what you're going to learn out of this episode so we'll do a short introduction about events and the life cycle of the various communities objects um then we will just remind the structure of the event object when utilize the api and you'll see that it's not only the api we will touch base few type of events that we should pay attention to and i would like to present so a couple of solutions available uh to retrieve the events but i will focus mainly on two of those solutions because at the end of course like usual we'll do a tutorial utilizing two of those solutions and i'm referring to the event exporter and case ban so let's start with the introduction as you may all know kubernetes is generating lots of events oh poop related to deployment uh our workload the scheduling of our pods and more communities events is a really rich source of information to understand what is currently happening in our cluster so if a party has been killed or restarted but just by looking at the events we have a pretty good understanding on what is currently happening in our cluster to understand the issue you can describe your part so you can cut cuddle describe part and then the idea of your part and you will see all the various events but you can also utilize another command called cube cuddle get events and that will list the events for a specific resource so usually events is the first thing to look at to understand if you have any infrastructure problems or application problems every kubernetes object is going on several state until reaching the desired state as we explained in the first episode of communities metrics the masternode and the worker node has several core components that will allow kubernetes to orchestrate their workload on our servers if you remember well the scheduler schedule part on the node the control manager detects state changes through a schedule the part in case of a crash or whatever it is netcd will store the status of the various communities resources but only for one hour all those core components are able to orchestrate our workload a restart update and everything is done based on events so events is really important to understand our giving situation let's take a really short example if i deploy a pod everything starts with a scheduler so the scheduler will try to identify the node where it has to start to pop based on the resources based on the memory based on detains there's a lot of things that will select the node at that particular moment the pod is in the pending state once the scheduler has identified the right node then the part will be in a creating state to start this pod of course it's based on containers we need to pull the image from a docker register in fact it's the node who will pull the image from the external docker registry the scheduler has also a preference to schedule parts where the nodes have already the image so it's faster once the image image has been pulled the part will change state to running if for some reason your parts crash due to i don't know application errors or there is a memory issue or whatever it is then the control manager will reschedule the pod but if the part is crashing again and again and and kubernetes is restoring several times and having the same error that the pod will go to a specific state crash loop back off and you're pretty sure you know this one you can have pods also stuck in pending so what that mean [Music] pending pending oh yeah it could mean that there are no resources available on our nodes or there is a infrastructure problem on those so the scheduler is not able to find a node to host our pod pod are also have as you may know health probes and readiness probes those health and readiness probes are there to help kubernetes to determine the state and the health of our pot so you can either use an http probe like you you define an endpoint on your container and uh this this endpoint will return a response that communities can check so slash health for example slash ready or whatever uh communities in fact the component in charge of checking the health and the readiness is cubelet yes of course cubelets he will reach out to all those endpoints or even it could be just a program to launch you can also define an init container with a specific image so kubernetes will first start in a container and then run the others containers and you can also have an end sort of container that will be excluded at before killing or stopping the e-pod if you're giving a wrong image in your deployment file so there's a connect so basically the node tries to pull out an image uh which doesn't exist so he will just have errors or if your docker register requires to authenticate and you didn't pass the right password the password has changed then he won't be able to pull out the image so your pod will never reach the running state so if you describe your pod you will see that image pull back off event is there so let's jump into the structure of the event from an api's perspective so we mentioned the events can be retrieved with the help of the communities api so either just in a pure http call or you can use cube cuddle in fact cube cuddle is just interacting with the api so when you describe a part on you do or you get events then basically it's interact with the api and the structure of the object event that you're receiving is exactly the one that you will have from an http perspective there are a few attributes of properties related to the event so you have got the message the details about what's going on the reason that there are clearly some hard-coded reasons in the kubernetes code so for example image pull back off or those are reasons the type warning information error uh the object involved in the event the kind it could be a part it could be a stateful set uh stuff like this and then the great thing is is not um pushing events on and on once he has detected the event he's just counting the number of occurrences of an event also the source the source of the end who has detected the event so uh it could be cubelets it could be others of course the source is also a very interesting information what are the various type of events so you have information events so pod has been scheduled image is pulled nut is healthy deployment is updated replica set is called container is killed like this this is pure information then you got warning so i have a pod that has errors uh i have a persistent volume that is not bound yet and so that could be a potential future error last is the type errors type errors the node is down the persistent volume is not found or we're not able to connect to the persistent volume i'm gonna we cannot create a little bouncer on the cloud provider for example so all the errors that makes uh the entire workload not ready but we can also if you want if i build my own app and i know that i will be running on kubernetes i can still if i want push my own custom events through the api of communities so what are the kubernetes event that we should probably pay attention to so i could i'm not gonna list all the events uh because there's tons of them uh and it will be very boring to listen to to list them and it won't bring any value for us so let's try to pay attention on the events that are important or that could be considered like a problem of course if you think that there are more relevant events uh that we should pay attention put a comment on the video uh send me a feedback i'll we would love to update that content the first one is crash loop back off i mean we all have experienced that type of event and i've already explained it before in the interaction our part is crashing we nee we need to know why so we need to describe and figure out the second type of error which is usually it could be either a management problem or just a pure human mistake when i define my deployment file it's image pulled back off so and the node is enabled to achieve the image then you have all the the events related to evicted events so this event happens when the node determined that pods needs to be evicted or terminated to free up some resources so cpu memory uh and so on and or so you can all put some code test on your on your notes as well so it could also be char start to invict parts in a case when when we evict um if none we queries when it starts evict it's not supposed to kill it and not restart he will try to reschedule the pod on another node then you have all the events related to persistent volume so failed mount failed attached volume so some pods requires to have persistent volumes so a persistent storage so like a database for example so this event will prevent the part to start because if there's no volume for database then of course there will be the major errors on the application level so basically community will check that the desired volume of businesses storage is accessible and if it is then it will let the pod running then you have the events related to uh failed scheduling events so this happened when the scheduler is not able to find any nodes uh to run your parts uh so and so there are in that type of events are basically related to the node uh so there are events that are clearly dedicated to nodes like node node node ready rebooted host port conflict and so on let's jump into the various solutions available to retrieve the events so you have event router you have cube watch so cube watch is interesting it's a kubernetes event watching tool that tracks every resource changes in a cube watch um so basically cube watch supports notification and it will be able to publish notification in slack hipchat web or sending webhook call vlog snmp and so on so it's pretty interesting because it will notify a third-party product that something is going on on your queen's cluster you have kubernetes event exporter and sloop and that's k-span so today i would like to highlight specifically two solutions that explains the event exporter so it will expose events count in a prometheus format so basically you deploy a pod that of course requires to have the right cluster role to be able to interact with the api so it's retrieve the events and then count them and utilize a prometheus client and we mentioned that on our promql episode and it will basically create two counters that will count number events per type with different labels you can you can do a lot of interesting queries and event exporter is way much a simpler concept but it's pretty much powerful uh so like and a lot of solution i mean observability solution out there already has the support of that type of feature so uh dynasty for example has it but we will still in our tutorial you'll see we'll deploy it not to bring value to the entries it's just that i the rest of the tool will be based on blind trace so let's continue the tutorials about everything on dandruff so k-span is a project built by weave work the concept is simple k-span uh has a pod that pod needs of course to have the right cluster role similar to event exporter uh to interact with the api to collect events from the various resources of your cluster if you have any custom resource definition don't remember to add in your cluster all the also those new resources so also k-span can interact in the up the events of those new objects so k-span will collect events no surprise and it will turn them into open telemetry span it will try to join them group them into an opponent telemetry trace which is really cool to be honest you can understand precisely the the lists and the history of the events that happens so k-span crate that spans and then you could then basically it will export it to a collector so it utilized at the moment the open telemetry grpc protocol so if your collector that you're utilizing doesn't currently support the open telemetry grpc protocol then it would make sense to export it to uh an open telemetry collector that will receive the traces and you will then export it to your solution so then trace currently support http protocol open temperature protocol so in my case in that tutorial we will basically deploy an open telemetry collector so k-span will send the traces to the open temperature collector and then the open temperature collector will send the traces to that trace on the api to ingest open temperature traces so like every tutorial that we do in easy observable you will have a dedicated github repository with the various instructions to do this tutorial at your at your home at your own pace so today's tutorial is about events like expected and like explained um i want to utilize data trace to send over the k-span traces um so that would be a good exercise um and also i want to showcase the even exporter uh the event exporter could be connected with prometheus of course but here in our case we use perdana trace that also also have the ability uh to do the similar feature than even exporter but here it's more about showing you how to ingest uh prometheus metrics with dynatrace as well all right so first thought is first thing if you are not a dietrich user no problem you can go to dana trace and in that race you have the option to do a free trial and you will be able to get a new fresh dana trace tenant for this tutorial so in my case i already have my own diatrace a tenant available the thing that i have to do here is obviously to connect a new community's cluster which is mine and the one uh the one we use for this tutorial so i'm going to click here on kubernetes and i have the abilities already have one one cluster here available and i can basically click on add cluster to add a new cluster so here let's give it a name and put it is it it is it observable same like this so it's a kubernetes platform uh and uh you need two tokens so you i can let dana trace uh generate this uh this uh this token for me so i don't have to to worry about it and then uh i will in my case i'm using jke so i'm going to enable the uh the the volume storage as well so here you can see there is a command here available so another thing that i have to do is copy this one and bring up my terminal uh here it is and i just simply have to paste this one so uh i am now up uh so now the the community's monitoring is successfully set up so now i can click on show deployment to figure out where it is and you need to make some traffic and once the traffic has been generated then you will see your new cluster in this page all right so the the deployment of the monitoring of our cluster with that trace has been done you see was very very easy and not a big deal in fact two things i want to do first i want to enable a few features so here yeah no metrics coming in because the traffic is there like expected but here one i want to click on this one and i want to jump into the kubernetes settings and you see there's few things i'd like to look at in our easy observable tenant so here i want to enable monitor annotated prometheus exporters because here this is what we are going to do at the day and also thing that i like to do is monitor the events because uh i mean we were going to collect them with prometheus but naturally we also have events coming in with the night race so those two things have to be done um so i want to also today to use those events uh in the analysis and alerting of course and uh include all the events for davis davis is the ai of diatrace and in case of a major issues or problems related to a cluster i will be notified by the direct trace ai as well so i'm going to save this one here it is oh yes one thing i didn't i need to remove so here you see requires a valid certificate so here in my case uh i will just remove this one because it's just not a production uh cluster it's just a tutorial cluster so no big deal so i just gonna yeah so first step has been done so we have been uh installed we have a cluster we have uh deployed our kubernetes operator there and now i want to do is to deploy the event exporter so let's bring me the source code briefly i mean the one that you will find in github of course and i'm going to show you two things so here i'm going to deploy so here is my deploying file of the event exporter and by the way before we do that i'm just going to run over close for the project so you can find here is the url of the github repository for the event exporter and here it explains everything the type of metrics coming in in fact there is two major metrics coming in on our premium formats the one is cube event count and cube you event unique events total so it's two two counters and i described before so here i just have uh i have already have uh added my uh in my repo the deployment files and everything so you can see here it's the event exporter so everything is related to what you can find in a github the gita guitar of the project the only thing i have added here is uh i have you by default you will have the prometheus annotations um on the deployment uh of this particular uh component and uh you can see here just underneath i have added two three annotations uh to let dana trace automatically scrap the metrics from this exporter so dietrich won't reach out to prometheus directly he will reach out to the prometheus exporter that we are going to deploy all right so that was the thing that i wanted to show you so the only thing that we have to do is to apply this fresh new deployment all right so uh in the repo of our github repository i have a folder for the event exporter and you have the deployment file there so you're just going to do a cuddle apply on the deployment file so event exporter and it's deploy all right now it's been created so let's jump parts all right so now i can see that container creating and now it's running so i got the event exporter so i'm going to do just a very quick check that the metrics are running and and i'm able to collect things um to make sure that uh everything works like to expect it so to do that i'm gonna do port forward why uh simply because i want to to uh basically check that on the port 9102 there are officially prometheus metrics exposed there so let's jump and so i'm going to expose on my local host the port 9 on 9002 and on the local machine on the on the cluster sorry not local machine but on cluster here it is so now this is exposed so the only thing i have to do is i'll go to this browser and reach out to 9102 slash because it's on slash metrics and here it is so now you can see uh i got metrics and you can see that it's currently reporting a couple events from my cluster so everything is working like expected so that's wonderful i can remove uh this uh this part the the port forward i'm gonna jump into that trace to see if the the some traffic has been generated if not uh i will do it uh no without showing it so here it is i get is it observable i can i can see my workload and the ver the events has been already uh reported so now that we have our our our cluster well monitored by dynatrace um the quick question is where do i see those kubernetes those prometheus metrics good question so for this you have to jump first i mean not first but i would recommend to go to metrics so if i go to metrics here it's basically list all the the metrics uh match available in that trace and uh i'm going to search for one of the metrics that we have on our new fresh exporter it's cube events i don't know let's cook event and you can see i have the cube event unique total which is the one that we were looking at and you can see there is a notion of dimension so basically the the labels that we can find in prometheus when we do our prom ql they are they are transformed into dimension uh in a denied race terminology uh so that will help me to filter to split to do a lot about different operations and then also it's it's also uh remind me the type of aggregator that i can do on this one and and so forth so you can see that we already have some couple of metrics coming in which means it's positive so you can either i can create a chart from here but i want to show you the other way around and the way that i prefer usually is to go to the metric explorer so explore data here and similar to what we did before i just have to type the cube event something so here i can see the metrics so i'm going to select this one and if i run the query i will have the similar chart that we saw before but what is interesting here is i can split so and the dimensions that has been collected uh through uh through the this uh prometheus in just metrics uh there is here uh objects involved names like if i do that i rerun the query you can see that here it has splitted by object name which is one of the the labels exposed so you can do pretty much everything let's say i want to look at the reasons and the type because the type will be your warning and so on uh so here you can see that i've split it by card service unhealthy warning so that's all the three are very split uh so you can do that and then i can i can do basically some some upper some some various aggregator operators so here and query count let's say and so on so forth so this is the the wizard that helps you to build the query but you can also do from a code so code is like a query language um if if uh you are interested to get a tutorials on how to do query on metrics in diatrace let me know i could basically try to build dedicated tutorials for that type of query language as well all right so now we have the metrics we see that events exporters is working of course i could have also been able to re send over the metrics to uh prometheus and then do the similar thing in graffana as well so first part is done okay now let's move to the second step where uh i oh yeah basically i didn't do the graph that we were looking at but i didn't mention but here if i want i could count uh from the exercise that we did i couldn't count did it say the i would like to do or remove this one if i run the query you can see the difference schedule normal and one thing could be interesting in fact uh it's a good good point uh i can type look at the [Music] the name on another the object type is it object kind why here it's up so here you can see schedule a replica set polling and so on so this is the the the the object type is on the at the end the pod so i can basically reorder if i want it of course and now if i want to do like uh instead of a pure graph here uh i would be able to do a pie chart so here it will be is basically that is 19 unhealthy warning pods but the order is is pretty wrong i should reorder that but this just give you an example how to do stuff uh just same thing all right so first piece uh done let's move on to the other one which is caseman so let's let's have a look at caseman so k-span is a project that has been uh like mentioned by we work so it's still experimental uh so that's why in this github repo you don't find in any getting started guide they did a blog recently explaining it but again keep in mind that it's experimental it's it will probably uh be involved and improved in the future but i think it's it's a pretty interesting project because i really like the fact that you can see those traces in open telemetry format um so uh let's let's have a look at it by by deploying it on our cluster so to do the deployment uh first thing i need to do in my uh in my internet trace i need to generate an api token for our exercise so i already have done it so i don't have to do it twice uh but keep in mind follow these instructions to get your api token and you need at in minimal this api right which is ingest open telemetry traces which makes sense so that's why i want to deploy the open telemetry collector and at the end the process will be very easy the case band collects the events transform them send them to open telemetry collector and then the open telemetry collector it will basically export those traces to dynatrace directly so we're gonna see that one by one so first uh there are a few things so here i need to in in the open tree uh deployment file um there in fact there is a there is a section so i'm going to show you it directly will be easier um is it's here open telemetry deployment file here the the the data is basically similar to like a sort of pipeline showing you what's going to be how it's going to be receiving the the various traces and how to export them and so the pipeline is you have a receiver i don't have any processor i have no intent to change anything on my on my traces i just want to send it over all right so as it i have two exporters one is just for logging purposes so if you wanted you could you could you could removed it if you wanted but uh but it's for debugging purposes i think it would make sense don't don't don't be uh don't don't worry uh i will do a dedicated episode on open telemetry explaining everything the collectors and so on so we i'm not going to explain everything here but what i'd like to do here is to send over to that trace i need a an exporter that will be basically exp interact with the api of the entrace the open telemetry ingest traces um api and to do that of course you need to have your api token so that's why you we have to uh adapt this deployment file so either you add it manually but i'm suggesting to do those set command here that you can see on the screen so let me do it alright so i've exported my two variables so uh the only thing i have to do now is to uh to uh do those set commands so let's do the first one here it is and then do the second one all right so now we have our files updated uh so i just have to i could basically apply uh deploy my open telemetry collector as it is but there are a few things that we need to do in that trace first to be able to ingest those traces i mean it could work directly it's it's not a problem the only thing that we need to do as a setting is basically there are a few information that will be sent over to die trace through open telemetry and by default that trace for uh security uh purposes uh it won't store by default all the span and resource attributes of your traces if you don't precise it so so here in a case span case ban is sending a details related to our events so the event the kinds similar to to the thing we described on the api level so if you don't define those uh span attributes uh where you precise hey dietrich please store those uh of course down trace won't store it them automatically and same thing for the uh resource attributes as well so where do you do that it's it's very simple so in the settings of the entries there is uh in in the sec in the settings screen here there's a section called server side monitoring and there is pan so span is everything related to open telemetry so spans attributes you can see here i have already created them so i just have to create simply the name of the attributes that i would like to store and i will also do the same thing with the resource attributes service name serious instance so if you don't do that then the consequence like explained is that damage race is not going to store the information uh because it could be sensitive in fact so uh by not storing it it's gonna be just not very useful because we won't understand which objects is involved on the event uh whether the kind you the reason and and it's gonna be unproductive all right so now the settings has been done so we have two things to do here in in this deployment uh so we need first to deploy roles for a case ban the the rules that that has been uh utilized is uh basically a cluster role uh to be able to uh enlist to get uh the uh events from the api on various objects so that's basically what you're gonna find on this uh the i'm not going to show it but i'm just going to apply it so you can see k-span up and here it is so now i have the cluster rolls and the cluster roll binding defined uh second i'm going to deploy our open telemetry collector so now they also deploy the config map the service deployment and everything and last is our case band so now we have the components that will interact with the address and now i am deploying the k-span uh caseman solution that will interact with the open m3 collector so we now then now we have deployed a case pan and it's connected to the trace uh we did need to generate some workload uh to do that we're gonna deploy sockshop uh which is here the instructions are here so i'm just gonna simply copy paste this part here and deploy it on my cluster and now so let's have a look at uh just the uh the part of this new application to see if it's running uh so i should collect some traces normally in the entrance so let's move on to that trace and see what we have so all the distributed traces are exposed in uh in the the menu called application and microservices and there is a section called distributed traces and here you can see that there are several things coming on so here there is a so we did it in few seconds ago so there is a deployment update so if i expand it it's on the suck shop production uh and in fact yes i use cube cuddle so it makes sense i agree um so you can see that there's real things so because we didn't do a big deployment but several deployments they are uh each of those deployments has its own trace in fact so let's pick one of them so here i have a deployment update there's a replica set and you can see that the pole has been scheduled uh the pod has been pulled uh created started pulled created and so on pawn in hell so now it's currently it's unhealthy because we have a healthy probe so the if you look at the timing here i'm able to see exactly what's what happened here on the various phases and uh because we did i did an update uh that's the reason why uh it keeps basically the trace for your deployment so if you do any changes and you did the deployment two weeks ago then the overall timing will be two weeks ago so that would be quite impressive so let's pick another one which has three seconds here or eight seconds let's say this one so if we look look at this so here you can see there is a persistent volume update so i can see all those various events coming in which is pretty interesting to be honest uh here we have a trace which is much more easier to read so you see here the update took took took about uh eight seconds uh and uh and then you can see all the various faces here so part scheduled part failed and then part rescheduled and then he he really was able to start it here uh and and then you can see all the various faces so it's very interesting because at the end you you you feel uh the the process on how the the parts are running so it's an interesting project uh i think in terms of uh usage i would prefer to be honest uh the ability of building graphs and alerting uh so metrics i think with the exporters makes sense but keep in mind that if you're using the androids that those events could be transforming metrics naturally because they are naturally collected so you don't need necessary to to utilize the even exporters if you use that trace so that's it for today's episode related to kubernetes events i hope that you learned quite a few things related with today's tutorial uh the advantage of case span event rather and or event exporter of course so uh try it by yourself let me know what you think of those solutions uh give me your feedback and if you like the video don't forget to like and subscribe to the channel see you soon for another episode bye [Music] you

Info

Channel: Is it Observable

Views: 364

Rating: undefined out of 5

Keywords: kubernetes, k8s, dynatrace, kspan, events, prometheus, observability, Cloud, Kubernetes, opentelemetry

Id: K-F-p-ekSsM

Channel Id: undefined

Length: 44min 25sec (2665 seconds)

Published: Thu Sep 30 2021