Kubernetes Troubleshooting And Management With Komodor

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

a while ago exploit Commodore and concluded that it is probably one of the best if not the best tool for kubernetes troubleshooting now over a year passed since then and commodore has evolved a lot so today I'm giving it a second try is it still a recommended tool for kubernetes troubleshooting what are the new features should you use it in your organization and a few more things let's find out [Music] foreign for those of you not familiar with Commodore it is at least from my perspective first and foremost a kubernetes troubleshooting tool or at least that's how it started to date allows us to do much more so much more we can use it to monitor our clusters and to send alerts triggered by specific events it shows us the differences between the current state and the previous States it guides us towards applying best practices it shows us the events that happened in a cluster it allows us to edit and scale our applications and so on and so forth if Commodore was a great tool for troubleshooting let's see whether all the new features make it even better so we'll start by exploring services let me start by deploying a silly demo application that we'll use to see how Commodore works with my applications inside of my cluster so I'm going to execute quick couple namespace's production apply whatever is defined in a case directory simple this is the Commodore web UI and I'm in the services section and I would like to see what's happening with the application I just deployed it is one of the in my case 37 services that I have over there and to simplify how things work I'm going to specific actually to filter the services to show only those that are in the production namespace over there there is only one application one what Commander calls a service which is misleading because services in kubernetes have a very different meaning but nevertheless we have services and I can see that there is only one silly demo so let's see what's happening with that service right now there were some availability issues early on that's normal because it takes a couple of moments until the application is up and running and healthy and scaled up that was resolved you can see that it says closed in the right hand column and now my application is up and running you know it's up and running because it's green there in the dashboard if you click the deployment itself I can see the status what's going on right now what is the current status actually what is the status in that moment in time not the current one necessarily always I'll get back to the details later for now I want to make an intentional mistake do something silly as a way to simulate what would happen if there is an issue in my cluster with one of my applications so let me do the diff of the case deployment manifest that's the one that I'm using right now and deployment pad that's the one that I will use in a moment or two and you can see that the tag is this tag does not exist the name of the pack tells you everything I'm going to deploy an application based on an image tag that does not exist and if that does not trigger some warnings nothing will so let's see what happens if I make a mistake now it's intentional but in real life it won't be so I'm going to execute clubcattle the namespaces production I want to apply whatever is defined in deployment bed yaml and then I'm going to go back to Commodore and see what's going on what will it tell me we can see that there are some new events and that I can click the link to update that's kind of silly it should just update of course I want to see the latest events maybe it's demanding on CPU memory or something like that anyways let's see what is that new event that just came in and so far it looks good the deployment of my new release just started there are no red flags nothing wrong for now and then look at it a few moments later it failed it takes a bit of time until kubernetes realized that it cannot pull the image that doesn't exist it tried it failed and now I can see in Commodore that there is something wrong with my deployment it says failed and I can go and see the details of the deployment to see what's going on what's the potential issue imagine that I do not know what I did even though I did it anyways let's see the details let's see what we'll find out and this is where troubleshooting starts right there's something wrong I go to details and I can see quite a few pieces of information that will hopefully help me deduce what's wrong there is the error message obviously the system cannot pull the image because it doesn't exist there is an explanation with possible causes of the tissue and how we might be able to fix it which could be very very helpful we could roll back to the previous release straight from the UI which could be helpful or dangerous depending on what you're doing right if you have some kind of automation if you use githubs let's say Do not please do not roll back from this UI and if you use some other type of automation like cicd pipelines you know training GitHub actions whatever you're using again do not do it this way anyways if you're keeping your manifesting git do not roll back like this that's not a good idea you should push changes to git and then allow your processes to do it but some people do not have any sort of Automation and then for those types of people you can either roll back from here or just do better don't don't do it without automation anyways you can roll back but you shouldn't for drone we can see the events that happen to that pod and through those events we can use the order of things that happen then why what happened happened and so on and so forth very useful we can see the difference is between what was there before and what is there right now what you see in the details page without clicking any links is actually very useful we can see that hey this was the previous image and this is the current image that's probably what's wrong somehow and then we can see the full diff now diff is great and not so great at the same time it shows us the differences between the previous state of the resource and the current one which is great I can see that there are differences in the image that could be the cause of the issue but then it shows me the differences between runtime information injected by kubernetes itself which is noise this shouldn't be there Commodore shouldn't just blindly apply the differences between two resources in kubernetes because they are enriched with kubernetes information it is most of the time not relevant it should show me by default the differences between what I applied before and what I applied now without kubernetes specific runtime information at least by default because that just produces noise that distracts me from figuring out what's wrong and then we have the GitHub section that is empty right now and if you click on that link we get the instructions how we can enrich our manifest so that Commodore knows from which git repository which committee which file directory information comes from from where the Manifest come from but I'll comment on that later I think that this feature is silly for most of the users so I'll get back to it later and finally there is the link section if you want to add additional links now everything has showed so far is related to the events of a specific service we have also a couple of other sections like pods I can see all the pods related to this application which is great and I can see the nodes where that application of the pots of that application are running that's also great and we have info additional information about a specific service don't call it service application now in that info section there is an interesting maybe hidden a bit feature and that specs practices I can go there and see that I have one warning so let's see what are all the best practices and what is the warning what is the thing that I might want to improve everything seems to be fine except that pool policy is not set to always that's the only warning and that makes sense I should never I should always set the pool policy to always at least by default and unless I have a special need not to have that policy applied I should anyways uh let's see what's the description of that warning I have information about what is checked and why that is checked and I should now go and fix that I will not do this do that in this video uh what I will say is that I would like to see some kind of a snippet okay so this is the problem this is the best practice that you're not implementing how about you inject this into the deployment because I already know that you have kubernetes deployment I already know what it is so why don't I give you a suggestion of what you should do that would be awesome so don't tell me about the problem only tell me how to solve the problem in an intelligent way an intelligent way just to show me the snippet within my manifest now let me apply the previous the good version of the application and then we go to the next section let's talk about monitors but first let me execute Cube Cutler namespace's production I want to apply whatever is specified in the directory case and put my application back to the correct State effectively what I'm doing is rolling back but in a good way not through the UI we'll get back to that later now let's talk about monitoring if you go to the monitors section we can see the monitors that are currently applied to my cluster I'm interested in the cluster called Dot and right now I have only those monitors that come by default there aren't many of them 10 maybe 15 something like that but what matters is not what comes out of the box which is awesome but that I can create my own monitor so I can create my own triggers that will show me the information when something goes wrong depending on the conditions that I specify myself so I'm going to click the add the rule within the availability Monitor and then I'm going to create my own rule I like creating my own stuff as trigger conditions I will say so actually let me explain first what I want us to accomplish I want to create a rule that will be triggered the alert will be triggered if I have less than two replicas of a application in the specific namespace because I care about the production namespace right now so how would they do that I will say that the trigger conditions are a number not the percentage and that condition will be triggered in the number of replicas is smaller than two less than two replicas meaning one replica effectively or maybe zero and this trigger will go into effect only if it is active for more than 30 seconds so I don't not want false positives like hey something was bad for a second or two so what I really care here is to find out if something did not scale up an application in the production space for more than 30 seconds and since we're talking about production I'm going to change from cluster to custom scope I'm going to select namespace production and I want the alerts to be sent to my slack workspace inside the names how do you call it Channel I think tests and that's it I can say save the Monitor and from now on I have additional monitor or alerting system so let's create conditions that will generate an alert under those conditions with those rules as before I'm going to show you the differences between the Manifest I'm using right now and what I will deploy and the difference is in the number of replicas right now I have three replicas and I want to have an application with one replica internationally never do this in production anyways I want to have one replica so that I can show you how this trigger works so I'm going to apply by executing you know Cube cat learning species production I want to apply whatever is defined in the file name deployment one replica and then if I go back to Services silly demo I should not see anything right now except that the deployment updated because I need to wait for 30 seconds approximately and this might be a great time while waiting for you to what should you do yes like the video and subscribe to the channel and join the channel if you can afford it and tell your boss if you're working a software company to sponsor the channel thank you so much for this and now we can see the effect of the Monitor and now there is an availability issue over there so the trigger that I created Justified approximately 30 seconds no later 30 seconds after it discovered that there isn't a sufficient number of replicas if I go there I have checks and findings with information sometimes Commodore get confused right now it is telling me that it found zero pods and it wants one which is not correct there is one pod and there should be more than one pod but this is probably temporary bike that will be fixed soon now monitor view is potentially interesting over there we can see uh The View with all the monitors that are currently set up within this section and which fonts of them are actually executed and which ones of them are working correctly which ones failed so it's a graphical representation of all the monitors and all the triggers which is potentially nice now I forgot to the info section I should have two best practices violated right before I had only one and now I should have the best practice it says run more than one pod one replica so let's see whether that's really the case so I'm going to the info and indeed there is best practices two warnings right now and if I go to the list of best practices I can see that the deployment has only one replica and that's bad that should be fixed again I'm not getting a snippet I'm not getting much help how to fix it with one that I should do but what I do get is the violation of one of the best practices I should fix that I mean I intentionally unfixed it right but I'm pretending that I'm simulating failures here I'm not pretending to simulate I am simulating failures now how would I fix the issue of not having enough pods now the new we would say hey just specify the number of replicas if in your deployment which would be correct but the silly thing to do I'm going to fix it in a right way or at least the better way so I'm going to scale my application by deploying horizontal for Delta scale it and what I want to do with that or to check is how Commodore behaves on two fronts first of all whether it will detect that my application scale to multiple replicas second whether it will discover that the number of replicas in the kubernetes deployment change not because I changed the deployment because I applied horizontal for the autoscaler which will change it for me so very simple still slightly more complex scenario and I want to see how Commodore deals with it so let's take a look at the famous HPA yaml and there is nothing special here I'm just saying Hey I want minimal of two replicas maximum of six replicas and they should be applied to the deployment called silly demo and the target CPU utilization percentage 80 . this is a silly demo it will not work but it will because I would need to simulate CPU usage but as a result of this it will scale my application to minimum of two replicas so it should immediately fix the issue so let me apply this by executing Q Cutlass species production apply whatever is defined in HPA yaml now if I go to services and silly demo and wait for a couple of moments I can see that the deployment is now green again you change the number of replicas but in this case computer continues freaking out it says now we have plus 13. we don't have plus 13 replicas but the issue was fixed I can go to pods the list of PODS and see that yeah there are two parts now so the issue should be fixed and if I go to info and the best practices I can see that only the one about the pull mechanism prevailed and the one that was complaining about the number of replicas is there it's fixed it's green but Commodore does not know how it got fixed it thinks that I modified the number of replicas of my deployment which is not the case I created I applied HPA horizontal for autoscaler which did that for me so it is showing me the Manifest with the deployment which is fine but it should show me the HPA as well because that's what fixed this uh best practice issue and at the same time the triggered the monitor trigger that I created myself now let's go back to the details of the application of the service silly demo and over there there is the scale button now you might be tempted to use that button you might click on it and change the number of replicas in this case from two to something else but please please please don't do this this would be this is very dangerous this is dangerous feature this is good only for people or teams that are using the most basic kubernetes setup ever because Commodore thinks that I suppose specify two replicas of my deployment but I did not do that HPA did that and if you would change the number of replicas here it would modify the deployment and that would work for a couple of seconds and then your change would be undone by HPA and this is very dangerous it's very very dangerous to modify resources to Commodore or any other similar tool without really understanding what you have in your cluster so beware beware before you use this feature another useful feature of commodore is to see all the events happening in your cluster and you can do that by surprise surprise going to the events link on the left hand menu and there you can see all the events that happened inside my cluster so in the events page we have all the events of everything happening in a cluster which is very useful we can filter it in different ways and that's also great you can see that my cluster started with a lot of issues but they were resolved which is normal but the beginning cluster needs a bit of time until it settles down we can go to any of those events see the differences between the changes performed before the event and after event and so on and so forth very very useful I mean you have to go through kubernetes events one way or another and why not going through them through commodore now one thing that many similar applications struggle with and I think is very important and that's the way how they handle custom resources and now you might be saying hey custom resources is not something that I use I'm not creating my own operators and controllers and share this but that might be true you're not creating your own but your cluster is running almost certainly a bunch of custom resources and I want to know how Commodore deals with those and to demonstrate that or to double check that I'm going to use one of the projects that you're likely using yourself which is Argo CD or if you're not using Argo CD you're probably using flux anyways I'm going to delete the application that I deployed manually and then I'm going to deploy that application with Argo CD and see whether Commodore will be able to deal with that so everything goes out I don't see that comes in I'm deleting my application by executing keep cuts on namespace production delete whatever it was defined in the directory case and I'm deleting the HPA Resource as well the one that I I mean both of those I deployed manually and then we're going to take a look at Target City manifest that will automate the deployment of my application which is how I should be doing it in the first place so cut Argo CDL and this is a typical standard Argo CD application there is nothing special about it it says Hey whenever there is something in the directory case of the repository Commodore demo 2 you should synchronize it easy right be like easy so I'm going to apply that manifest with Cube cattle apply file name is Argo CD apiamo and from now on I will not be deploying my application I'll be pushing manifest of my applications to git reposit and inaugural City will synchronize them great but that cargo city is not the main subject of this video I'm interested in Commodore and finding out whether it can deal with what I just did but before I do that let me go to Argo CD blog in and double check that it's working and yes you can see that Argo CD deployed my application with all the resources that they have okay so I'm going back to Commodore back to silly demo and we can see that I got a new deployment great calendar detected a deployment it doesn't care really better I did it with Argo CD or by executing catalog plug and I can go to details and you can see that the commander detected quite a few differences namely related to annotations and labels that's what Argos CD added to my deployment great and then and here comes the interesting part right it is telling me still hey you should add calendar specific annotations so that coder can discover what is the relation between your manifest us and your git repository and you might say great I'm going to do that but I say don't and let me explain why I'm saying don't do that Commodore should do it for you or figure out how to do it if it can do it right now let me demonstrate Why by executing cubecutive namespaces Argosy I want to retrieve the application called app and output it as yaml it's very important what's inside of this kubernetes application running kubernetes cluster is critical for what I'm going to show or say or prove now let's take a look at status and then sync section and within that section that is compared to and over there there is source and inside source there is Ripple URL right so it is discoverable what is the kit repository related to this Argo CD application is discoverable because the information is in that manifest and even though Target revision head meaning whatever is in the main line the last comment of that Branch the revision gives me the specific Comet that was used to apply this manifest and that means that if Fargo CD has that information and then Commodore can fetch it from there and you might say hey but this is argosity application not my application but my application is a child resource or child resources of Fargo CD applications so you could through parent-child relationship discover what are all the resources related to this argosity application this argosity application right now is related to this specific comment so what I'm trying to say is that GitHub information is discoverable and if it is discoverable then there is no good reason why I would add additional annotations for something that already exists it's already discoverable so what I'm trying to say is quite a few things and the first one is that GitHub integration Commodore demands too much work there is no point for me adding Commodore specific annotations if I'm using github's tools like cargo CD or flux or even Branch fluid since at least part of that info like GitHub repository is already available at runtime through the resources of those tools now even if I ignore githubs and say I'm not using githubs which would be silly then those annotations would make some kind of sense if there would be some sort of a standard if I'm to put those specific annotations specifically to Commodore into my manifest I would probably have to do or add similar annotations and labels for other tools as well and I do not want to do that I do not want to have 5 000 annotations and labels in my manifest because I have a bunch of tools that might need them and each of those tools want a different format silly I know what to do with that so Commodore should do one of the two things first discover if any existing resources in my cluster have the information it needs to discover the git repository that it is associated with those resources and if that's not the case then start take kubernetes Sig or special interest group if one doesn't exist already to create extended standard annotations I do not like the idea of using Commodore specific annotations you might manifest knowing that Commodore is one of the many tools that might force me to do that and I don't want to do that especially when information is discoverable so let's go back to Commodore web UI and see what else we can discover I already commented how scaling the scale button is dangerous at best so I urge you not to use it unless you're really doing everything manually or in cases of breaking glass situation right kind of oh danger danger I need to fix it immediately and I I do not know how to use Cube cattle now the same thing can be said for other operations encoder that allow us to modify the resources at runtime like edit yaml don't use it if your manifests are in it then you have some sort of automation githubs or pipelines to apply them so those features new features at least compared to the my previous review of commodore are dangerous at best now the good news is that we can set up users encoder and say Hey you can be an admin or you can be read only user and that can mitigate those issues somehow what else do we have in common or well we have I'm going to call it resource browser because that is not named for this section in the menu that shows me the nodes and the workloads and within workloads I have pots and replica sets and deployments and so on and so for the standard things the things that come out of the box in kubernetes now this is great again if you're using very simple out of the box not the extended version of kubernetes the moment you start using something like Argo CD you will not see it over there if you start using something like K native you will not see that there either so you will not see anything that is not uh basic kubernetes type of manifest those that come out of the box because Commodore does not yet know how to deal with custom resources which is disappointing nevertheless within the workloads we have different actions we can fetch the logs or delete the workflow fed the logs great delete not so great unless you actually deleting a pod is okay it's it's a type of action that you should be able to do because sometimes you might want to you might have a pot that is hanging and pots are not controlled directly but through deployment so it's okay if you modify those as long as you don't modify deployments or other types of higher level resources and then we have configuration Network you know standard stuff and crds now share this is a mystery I do not understand what crds mean in Commodore because I do not care about crds I do not care much about custom resource definitions at least not from the troubleshooting perspective I care about custom resources not custom resource definitions so this is is just a list of crds not very useful Commodore please give me the possibility to do stuff with custom resources because everybody has them now the last part of commodore I want to explore are Integrations and if you got Integrations we can see that it integrates with commonly used tools like hey it can integrate with GitHub we saw it a bit unfortunate with gitlab as well probably the same story pager Duty we can add kubernetes clusters of course we can integrate with slack I already did that and so on and so forth one possible I mean there are quite a few different interesting Integrations the one I will I want to check right now is the integration with Prometheus metric server so I'm going to install that integration and see what we'll get I need to give you the name the address of the server so I'm going to do that it's internal address through service of Prometheus itself which is already installed in my cluster and I'm going to install the integration so let's see what we'll get with that integration for example now if I go back to services and let's say back to Siri demo application you will see there two new columns one for CPU usage and one for memory usage now this is Commodore getting information from Prometheus and showing it in its own web UI which is awesome absolutely awesome because it shows it Commodore most how to work with other tools one way or another and we can extend and add additional or remove the fields from this view as well apparently the last thing did I say the previous section was last and then this this is the last I promise is to talk about only for a few seconds about Commodore pricing it's not free it's a service it's a SAS offering that you need to pay for which is okay because every project needs some kind of financing the only question is whether the price is fair or not for you that's up to you to decide you have business and Enterprise pricing check it out check out the pricing page you should know that it's not free that's the only thing I want to say and you should check whether the price is right for you or not after I recorded the material for this video something new happened and that's something new is that Commodore released free forever plan so if you want to try it out there is no Financial obstacle you can get your plan for free try it out let me know how it works now let's jump into the most interesting part and that's pros and cons what are the pros what are the cons and Battery you should use Commodore or not let's start with negative things things I don't like about Commodore and the first one is that it does not replace alerting tools like Robusta for example nor it replaces kubernetes explorers like lens even though it has features from both types of tools but nevertheless it could be useful to have these features alongside the primary goal of commodore which is troubleshooting right so Commodore first and foremost is about troubleshooting but it does give us some form of alerting mechanism and some form of or way to explore kubernetes resources none of them are amazingly good so you still need additional tools for alerting and exploring but those additional features are nice if you don't need something extra you can still stick with Commodore until things start becoming really complicated there is no support for custom resources which is my biggest complain because there is no cluster without custom resources whether you know it or no next the idea to add the links as labels or notations to your manifestoid code or can detect where those manifests come from like what is the git repository over what is the tool that applied them is great but only as the last resort many of the tools already have that information a Commodore should be intelligent to discover it instead of forcing us to write another set of labels and annotations I don't see the point of having crds in the menu over there in Commodore Explorer or resource section it's just a list of share this what was the point of it it is a waste of space next oh yeah there is no logical grouping of resources so I would like to see something like something similar to what Argo CD UI shows like hey this is your deployment X excellent I know that but now that deployment created those replica sets and those replica sets created those pods and those pods created those containers because in kubernetes everything is based on some resources being parent resources of some other resources which are child resources so that parent resource so there is a three of resources in kubernetes and Condor is not really showing me in any meaningful way next the edit button should exclude the entries added by kubernetes itself first of all if I'm going to edit and manifest at runtime directly in a cluster I should not be presented with the entries added by kubernetes and runtime because that's very dangerous and we should not see those those are internal to kubernetes or at least we shouldn't modify those um what else oh yeah you know if I do add give Commodore links or information about the git triple which I don't want to do but if I do it then I do not think that Commodore edit button or scale button should be working directly with the cluster but with that git repo because if you ask me for a git report then I assume that you will be pushing changes that you enable me to make back to that git triple but that's not what it does either then there is a completed Commodore does not really understand the relation between resources I already complained about that but that means that when it allows me to edit something without storing it in git and that something is a deployment and replicate number of replicas in a deployment and it doesn't know about HPA it creates a whole whole mess around and then whoever is the Cs admin or SRE in your company or devops person is going to be very very mad at you so those buttons to um edit resources are very dangerous and when you say dangerous I mean dangerous especially dangerous given that Commodore wants to know about your gift repository but then it doesn't do much about it when editing resources and finally the last Grant is that there are too many predefined things in Commodore and that's great because having predefined out of the box experience it's always great because it's easy but then there is no ability to extend that predefined look and feel and information available with something custom I was complaining about custom resources but that can be attributed to other things like hey I integrated with Prometheus wouldn't I like to actually bring some additional custom fields from Prometheus to Commodore I would but I can't so enough about complaining let's talk about the good things to begin with Commodore is potentially best kubernetes troubleshooting tool available today it was a year ago when I reviewed it last time and it still is now those additional features are a bit shaky but the core functionality which is troubleshooting is absolutely amazing next we have Integrations that's a really really nice addition we had Integrations before now we have more Integrations which is even better we have improved or to say differently better actually simplified installation process which is great I didn't show the installation itself but trust me it is much better and easier than before and finally we have new UI UI blocks and pills a bit better than before so those were nice improvements to what we had in the past all in all Commodore might be the best or one of the best kubernetes troubleshooting tools available today now I'm being very in this video I've I've been very very very harsh from Commodore but that's not because I don't like it I do but because I think it's great I think it's a great tool and I want great tools to become even better so I did my best to find things that could be improved rather than think of them as complaints right converter wants to expand its scope which is great go beyond relatively simple and straightforward troubleshooting which is great and now it needs to polish that expansion all in all give Commodore a try try it out let me know what you think of it or if you're already using it please let me know what you think of the improvements it made or the future improvements that I suggested huh what do you think about those improvements should come with our folks Implement them let me know in the comments thank you foreign

Info

Channel: DevOps Toolkit

Views: 4,669

Rating: undefined out of 5

Keywords: devops, devops toolkit, review, tutorial, viktor farcic, k8s, kubernetes, komodor, troubleshoot, troubleshooting, troubleshoot kubernetes, troubleshooting kubernetes, troubleshoot k8s, troubleshooting k8s, observability, observability kubernetes, observability k8s, alert, alerting, alerting kubernetes, alert kubernetes, alerting k8s, alert k8s, management, manage kubernetes, manage k8s, kubernetes management, k8s management

Id: kZPOtz85_zI

Channel Id: undefined

Length: 39min 18sec (2358 seconds)

Published: Mon Jan 23 2023