AI for Kubernetes with ChatGPT and k8sgpt

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

I did my best to avoid making a video like the one you are watching right now or you're about to see I fought hard against entering the artificial intelligence hype cycle this is the first time I'm even uttering the word AI in any of my videos nevertheless artificial intelligence is coming no heck it's already here there is no denying it it will change the way we live and the way we work the only question is whether the technology is ready or we the consumers should wait a while longer before adopting it long story short I was avoiding AI until now and now I want to see whether I should continue avoiding it or it's time to embrace it now let me be clear from the start I'm not interested in AI for General usage actually I am but not today today I want to explore whether AI can help us develop manage or troubleshoot applications actually even that is too broad so let me narrow it down a bit more today I want to explore whether AI can help us with kubernetes for clothes so kubernetes and AI can we combine those and get a tangible benefit [Music] foreign the most common usage of AI in software industry today is a replacement of stack Overflow in the past when we had to write a piece of code find the cause of an issue or do anything that we do not know how to do we would go to stock overflow search for a similar question posted by somebody else or if there is nothing similar to what we are experiencing post a new question from there on we would go through the answers and pick one that suits us the best a significant amount of today's code was created by copying and pasting answers in stock overflow and similar Services many issues are solved through stock overflow and many more were created by blindly applying the recommendations from there stock overflow and similar services are probably the biggest originator of the code configuration and fixes today second only to GitHub then came Ai and chat GPT all of a sudden people started asking questions to AI getting answers and applying the solutions if someone asked me who is the main competitor to stack Overflow I would say chat GPT but there is a problem though chat GPT is not where we are we work in Terminals and visual studio code we observe the system in grafana we talked to each other in slack now we can get a lot of information from chat GPT but that means change of context we would need to switch from wherever we are to chat GPT in a browser and ask questions more often than not we need to switch back and forth a couple of times go to terminal find events of the workforce that are misbehaving copy them go back to chatgpt pays them get answers that are not what we are really looking for go back to the terminal look for more events copy some more go back to child GPT and so on and so forth that's not what I want that does not work for me what I want is AI being where I am and that is to begin with a terminal so I set out to find a solution that will allow me to interact with AI from a terminal I wanted a solution that will scan my kubernetes cluster find potential issues explain them to me and ideally fix those issues and if that is not an option propose Solutions so scan explain fix or propose now to be clear I do not expect AI to be good enough to fix the issues by itself but one can always dream so I'll keep that requirement after all that should be the end goal of AI it should work for us it's to do the work we do not want to do all the work it is better than us at doing so with those requirements in mind I set out to find a solution and I found one candidate it's called case GPT and we are about to take a look whether it is worth using or it is yet another gimmick I already have a cluster with an application that might be working perfectly or it might be broken it's a mystery that I would like to solve with genjibouti so let's start by analyzing the namespace of the application and see what we'll get we can see from the output that he detected two distinct issues there is something wrong with the service it claims the 10 points are not ready but cannot communicate with the pods the second issue is with pods themselves or pod in this case a replica of my application cannot pull the image now this is both good and bad news the good news is that it did Identify two issues I might have more but it's a good start the bad news is that it is not very intelligent I expected it to triage to filter issues I hope that it would push me into the right direction since pawns are not running because they cannot pull the image the issue with the services be irrelevant the problem is not with the service but with the pods in this specific case what I'm trying to say is that it is doing the right thing in detecting issues but it is not helping me narrow down the scope now before we proceed I do not think that there is AI artificial intelligence involved in this phase of the process AI is hopefully coming later for now we are dealing with a simple analyzer that does not seem to be very smart and is nothing I haven't seen before nevertheless even though the results are not exceptional I can say that the first of my requirements is met scanning checked now let's see whether Kate's GPT can explain those issues to me it takes a few moments for case GPT to contact the API of the AI model of choice we can use many models and today I chose to use everyone's favorite GPD now the outcome is really nice I got the explanation of the issues and I got proposed Solutions this is the error and those are the steps that might fix it this is another error and those are the steps that might fix it as well that's cool I like that I would have gotten the same results from GPT but this is much more convenient I did not have to copy events and logs from terminal to the browser and back that's really useful especially for those who already use chat GPT or some other similar service now if you happen to have trouble with English you can switch to a different language I'm sure that that's not the case with you you since you're listening to me speaking in English with a very strange accent nevertheless I can for example instruct it to respond in let's say Spanish and then waiting and waiting and there we go for those of you who are having difficulties with English you can now get explanations in Spanish or any of the other supported languages we can also filter the analysis by the type of the resource and for example people say that we want to see only the issues related to pods there are quite a few other things we can do and I will leave it to you to explore all the arguments instead let me modify deployment channel in hope that I will fix the issue so I will open deployment table and I will modify silly app image to BC demo because that's the correct one and I will apply the changes with your cattle customized and so on and so forth I expect that the analyzer will now stop complaining about the issues it detected previously those should be fixed so let me execute kgpt analyze again it still complains about the service and the pods but now it also detected issues with the deployment and that sounds strange I was sure I was completely sure that I fixed the issue I do not know what to do now but to just drop forget about case HTTP and take a look at what's going on in the old-fashioned way no more AI for me for now I'll I will get back to it so I will start by retrieving all the pots in the namespace now I can see that the old pods are still crashing that's okay that was to be expected but also that the new one was created that's the one after I modified the image and that its statuses create container config error it seems that there is a new issue at hand so let me describe that pod now it is pretty clear what's going on the pods expect the secret with the key postgres pass which is not there so let me take a look at the secret itself and there we go I found the cause of the issue the secret has the key postgres password and I configured pods to look for postgres pass without war that should be easy to fix I'll just open a deployment patch yaml replace postgrespass with password and rerun cage GPT analyze this time there were no problems detected nevertheless that was a disappointing experience it correctly detected the first issue but once I fixed it it completely failed to find the second one even though it was staring at it it was looking at the issue but it couldn't find it and it was a simple very obvious issue and that changes things now I can safely say that it does so far help with explaining issues and proposing fixes that's the air part uh and in my case I'm using GPT but you can use any other model however I was wrong to say initially that it correctly scans for the issues it does not to be more precise it does technically scan for issue but it does not detect all of them as a matter of fact it failed to detect the one that matters the one about the secret key and it detected one that does not matter the one about the service so explanations are checked proposals or fixes is checked but scanning not checked anymore now let me under the changes I made the fixes and try toll again but with a Twist instead of executing commands to analyze kubernetes resources explaining issues and propose fixes it would be great to have an operator that would do that for us as a matter of fact I already installed it and all that's missing is to configure it so let's take a look at the configuration file catsgpt.yaml this spec is a reflection of what we do with the CLI so there's probably no need to explain it instead I'll just apply it with Cube cattle etc etc etc now instead of executing Kate's analyze we can list results and we can describe them to get the same information as we'll get with khgpt analyze CLI command and that's cool that's really cool I prefer clusters doing the work so having an operator is a great addition there is a dark side to it though case GPT is badly documented the documentation does not explain all the options we can use in values yaml file in the project report is not much better for example there is the option to send notifications to for example slack through webcooks but it is unclear how that works nor what the options are as a matter of fact the example in the docs has that part sending notification but it has it commented and the same thing is not present in bali's yaml file of the repo so it's a bit confusing how to send notifications and send notifications sending notifications is a must right and I don't want to scan look at the screen all the time I want to get notification when something is wrong case GPT analysis is based on filters and we can see them through the filters list command some of them are activated by default While others can be activated if we need them so if you would like to analyze Network policies and horizontal pod data scalers we can activate them with the command filters add and then that quality see horizontal scale it and so on and so forth whatever you want to activate can be activated as long as it is available in the list of filters now you might say that the list of filters is not that big and that you might be missing some additional filters and you would be right you would become sent right kubernetes is much more than what comes packaged by default and you would certainly want to analyze custom resources installed through third-party tools or applications like for example istia or Argo City as well as those you developed so the bad news is that the list of filters is very limited the good news is that this is a project in very very very early stages and I expect at least to grow over time moreover it is open source so you can contribute to it as well and that brings me to Integrations right now there is only one integration with trivi if you would activate it it would add additional filters so if you think that you might need additional filters as you probably do you can create your own Integrations which will add additional filters it's open source so do not wait for others to do it for you that's about it now comes the interesting part let's talk about kgpt based on the experience we had with it so far before I set off to find a solution to troubleshoot and manage my kubernetes clusters with AI I had a few requirements as you heard at the very beginning I wanted it to scan for issues to explain them and to either fix them automatically or to propose fixes so let's go through those requirements are they met to begin with case GPT is doing scanning but not yet at least very well so even if it is technically solving my first requirement it does not do it well enough for me to consider that requirement fulfilled so scanning is a no-go which is a Pity since everything else is based on it explaining the issues found through the scanner works well to be more precise it works or it does not work depending on how good the AI model you're using is chegepity the one I'm using is not perfect but it's good enough Kate's GPT is sending an information found through analysis to the AI model and spitting out both the explanation and proposed fixes the results from kgpt will not solve all your problems not even close but they are a good starting point so I'm proclaiming the second explain and the fourth proposed requirements fulfilled finally there is my dream that AI will one day not only propose fixes when I ask it to but also fix them without waiting for me we are not there yet we are not even close to it so that requirement that dream of mine is not fulfilled that's not even in the scope of what case GPT is trying to do still that's what I want one day sooner or later so the requirement still stands so what they think of kgpd I think it's cool for those using chat reputation Services it is certainly a better solution than copying and pasting outputs into browser-based chatripty so if your chat GPT user I would recommend you to give it a try on the other hand cage GPT is still in the stage that it is more of a gimmick than a serious solution and that's okay as well it is a project in very very very early stages and you should keep that in mind when we go through pros and cons think of cons not this complaints but more like suggestions for improvement we should encourage early stage projects to grow and improve rather than judge them judging should come later so let me repeat it one more time today's cons are not complaints but suggestions for improvements so here we go the first con is that sometimes actually more often than I would like results disappear the CLI sometimes reports no results even though the previous execution might have shown them the same is happening with the operator you will see the results appear and then disappear only to appear again a few moments later so if you're not persistent you might think that everything is okay because case GPT is reporting that everything is working perfectly only to find out that it is not and then report that it is all well again and so on and so forth so it's a bit inconsistent or disappears every once in a while next complain next con next the room for improvement is analysis analysis is the weakest currently the weakest point of case GPD in my example it did not report the issue with the secret key even though it's obvious from the events and the status of the pods that kubernetes tried to use secret incorrectly also it does not have weights for the issues it reports even though the issue reported for the service was correct it was not very relevant given that the real issue was with the container image I'd love if analysis would be more intelligent not much just slightly more intelligent than just to spit out some but not all issues it finds so the problem with analysis is twofold it has no intelligence or weights and the filters it uses are very very limited the next one the third con is that it is not fixing issues I know the case GPT is supposed only to propose fixes and not to be an operator working inside the cluster and applying at the official intelligence to fix the issues I know that we are far from that still that must be the end goal of AI when kubernetes is concerned I have no idea when that will come but I'm sure it will so I will keep this requirement not only for case GPT but for any other AI solution for kubernetes I wanted to start solving problems fixing them itself not telling me what I could or couldn't do next there are no events and no statuses in the operator resources the operator stores results in Spec details which is okay when describing the resources but not when trying to integrate with other tools like Cube State metrics I expect any kubernetes custom resource to have events and statuses so that I can treat it as any other resource otherwise we are not benefiting from the standards imposed by kubernetes and followed by most of the other tools in the ecosystem we are losing on one of the main benefits of kubernetes which is the promise that everything should work with everything else without the needs for special Integrations now let's move into good things the pros and the first pro is analysis now I listed analysis in cons yet I'm listing it here as well it is actually useful analysis is the better of everything else kgpt does my complaint is that it is not as good as it should be still it is useful as is and expected to become much much much better in the future so it's useful and that is a pro it's just that it's not as good as it should be so it's a con as well next we have CLI this is an easy win it is so much easier for the CLI that already discovered issues to make API calls to charge GPT or whichever model you're using then to copy and paste the outputs into browser-based chatripty CLI is a clear win then we have various models for ai ai models even though the name case GPT might suggest that it works only with Challenger PT it can actually use quite a few AI models that gives it a lot of flexibility and does not tie the project to a single model that might or might not be the the best one in the future heck you can even use it with models running locally and avoid paying for all those API requests all in all the project is in early stages it's cool it's useful as is but it's not yet reliable enough to depend on it I would love to see it grow and improve here's what I would like to see next the ideal situation would be to be able to connect case GPT with observability tools it might make sense to export analyze results as metrics or logs hopefully through open Telemetry from there on outputs from the operator could be stored in your observability storage of choice like locky elastic Prometheus and so on and so forth and observed together with the rest of observability data you know through grafana dashboards for example at the end of the day I do not look for issues with clis but either through observability tools or by receiving notifications even if notifications are sent to let's say slack I will still need case GPT to be integrated with obserability tools because that's where I'm going when I get notified that something is wrong the future is without doubt in AI today we might be limited to detection of issues and explanations but that is not the end game in the future I expect AI to fix the issues not only to help me understand them until that day comes you should try Kate Street it is useful even though it is more of a toy at this stage than something you can depend on just be aware that you might need to pay for all those API calls unless you're still in the trial period in let's say Church repeat or you're okay running models yourself thank you for watching see you next time cheers thank you

Info

Channel: DevOps Toolkit

Views: 12,676

Rating: undefined out of 5

Keywords: devops, devops toolkit, review, tutorial, viktor farcic, k8s, kubernetes, chatgpt, gpt, ai, artificial intelligence, k8sgpt, kubernetes ai, ai in kubernetes, kubernetes with ai, kubernetes artificial intelligence, artificial intelligence in kubernetes, kubernetes with artificial intelligence, kubernetes and artificial intelligence

Id: 3Mmw2PyY9j0

Channel Id: undefined

Length: 23min 32sec (1412 seconds)

Published: Mon Aug 07 2023