Troubleshooting Kubernetes Pod Errors | How to find them and fix them

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's going on inside those pods [Music] [Music] that's right this week we're going to be looking at troubleshooting pods and we're just going to have a little dig in look at logs look at events I'll create some broken things and we'll have a look at how we can find the information we need to fix those broken things obviously the problem that we might have is that your issues won't reflect what I'm developing here I'm pre-creating issues here so you know pinch salt and all that and speaking of troubleshooting last Sunday I had some pretty bad chest pains and by Tuesday I was in hospital being troubleshooter they were digging through doing ECGs and chess scans and all sorts and about two hours ago I got back from a CT scan so hopefully everything's okay and I feel fine now but I think out of matter of precaution they're just checking everything over so well there might be a little bit of gapage between a couple of videos if things go a little bit downhill but hopefully everything's okay so that aside let's take our mind off it by troubleshooting some pots this will be a fairly quick video because there's not a massive amount I can really show you in terms of simulating things going wrong there's a few things and some common things that might happen but to be honest it's on those things where you're going to hit some issues that you just won't be able to find an answer online you'll just have to do some log diving essentially to figure out what's going wrong but hopefully this will give you the tools to start that search to find out what is actually going wrong so with that in mind I've got my DNS utils pod here I've deleted the original one that was in the cluster because it wasn't relevant since I don't know episode 15 apparently so yeah that's that's gone now and what I'm going to do is I'm actually going to replace this command here for APK update and the reason I'm going to do that is because APK is for Alpine Linux and we've got Ubuntu 2204 being run so when I deploy this we should actually see an error for the Pod starting up it should be a crash loop back off or something similar to that so this is just one of those things where the pods trying to start up it can't start up because something has gone wrong within it so we'll go ahead and apply this I'll do Cube CTL apply F and 23 DNS utils so we'll do that I'll jump back over to my terminal and we'll take a look at okay namespace learning and see what's happening so we can see here look we've got a run container error going on now let's take a look at this pod so we'll describe it first we'll do Cube CCL describe pod DNS utils namespace learning okay let's see what that says so if we have a look we can see it was successfully assigned it pulled the image and it created the container and then oh it failed so unable to start container process execute APK update execute file not found in path that's if you're new to Linux then this will be a pretty simple one for you to fix you can see straight away that it's trying to run this command but the file hasn't been found in path so it can't run it it just doesn't know where it is it doesn't know anything about it so that's that's a pretty easy one really we can just fix that we can go okay well does it have APK installed well it's Ubuntu so the answer is no but that's one of the things we can do so I'll go ahead and delete that now and we'll just get ready that and we'll take a look at another area that might occur so another one that might happen I'll just quickly revert that change is that you may pull an image from a location that's not tagged or something like that so I'll just stick another number on the end there and we will do apply again so I'll do Cube CTL apply F 23 and yeah that's fine and we'll take a look and see what happens this time so if we do Cube CTL get a part no space learning and see what's going on we'll see this error image pool coming up on the status so we'll get the NS utils let's just describe that and see what's happening so it's failed to pull the image okay saying it's not found failed to pull and unpack the image from this location well that's useful to know we can have a look to see if that exists right so we know that doc.io is Docker Hub or we should know that and all we have to do is take a look at Docker Hub on the Ubuntu official image and see what x exists and yeah it's right I mean there's 22.04 but there isn't a 22.04 too so my fingers have hit an extra number and unfortunately has not been able to pull the image down they might be thinking yeah but you've just put that number in of course I'm not going to do that well you might be surprised sometimes things things happen and you can hit an actual number accidentally so that is one of our example I'm going to show you another one now so what I'll do is I will quickly create an image put it into my private Docker Hub and then try and pull it down and see what happens so I can just do something like a Docker pull and I'll just do Ubuntu 22.04 didn't need an app there I wanted a coat on so I'll pull this down and then I'll just tag that image so I'll do Docker images and I will regret before Ubuntu I'll spell Ubuntu right and we'll get that image ID and then I'll just tag it so I'll do Docker tag and then I need the source image and then the destination one which is just going to be through files slash Ubuntu pair on 22.04 so I'll do that and then all I need to do after that is do Docker push so I'll just bring that command up I'll do Docker push we will push that up and while that's doing that I'll quickly log into my account okay and in this account now I can click on this image and all I need to do is manage Repository and make it private so let's just have a quick look somewhere around here it might be under settings make private there we go so I can type the name of my repository which is this oh is it just you want to it needs yeah that's right so we've made it private so now I've got this private repository so now I've got this private repository I'm now going to try and reference that and see if I can pull it down so I will do through files forward slash ub12 2204 brilliant So in theory it should work if it was public but it isn't public so to deploy this I'm just going to go ahead and delete the current one and then we'll reapply it so I'll do apply we'll jump back over to the terminal cutie towel get a part no space learning again and let's take a look we've got this error image pull so let's describe it ah okay slightly different Arrow so we still get that error image pull but this time it's saying failed to unpack image fail to resolve but now it's then pull access denied repository does not exist or may require authorization well that's an easy fix we've not done it yet but actually it's a really easy fix all we need to do is add an image pool secret so we can do image pull secrets and then I can just add a name of something like Docker regret and then I can just go ahead and create that secret now I could just type this in but for the sake of showing you how this works if we have a quick look over here we can see that we have this Poland image from a private registry I'm not sure if I've done this before but just for the sake of showing you just in case I will quickly show you as I just said so all we need to do is create this secret here so I'll grab that command we will dump that in and let's take a look so file from and the path is online home Drew slash Docker slash config.json so we can do that and I'll press enter we can then have a look see if that secret exists so let's just confirm it actually creating it Okay so we've got that secret it's actually called regret so I'll just change that actually spells it wrong anyway but it looks of it so regret and it's of kubernetes IO Docker config.json type so that's that's great that's come up right and I know that I have this file so it would have created it from there if you want to have a look at it you can it's just a base64 encoding of your registry credentials from the config.json and that's it so we should just be able to apply this now but I'm going to delete it first just to clear everything out and then we'll go ahead and apply so let's do that jump back over to the terminal let's get that pod ah we've got an error image pull again why might that be have I done Docker login yeah that works so maybe something else has gone wrong let's describe that pod and take a look again so it's still telling us that it may require authorization interesting let's have a think about this so I know what I've done wrong already by the way I'm just kind of seeing if if maybe you came across it yourself and if you didn't don't worry I'm going to show you now let's take a look at that create statement again what's missing from here we've got a cube CTL create secret generic red cred from this file type kubernetes.io config yeah it looks good right well kinda does let's just have a look through our description here we've got a pod in the namespace learning and is looking for this secret ah well you may have just spotted it I'm hoping you did anyway I mean like I say I already knew that I'd done this it was more of a showing you some troubleshooting so namespace learning this is important this secret must exist in the same namespace as the thing that it's basically being attached to so what I actually need to do is create that secret in the namespace learning another good step to do is that we can just make sure we delete that first so we're clearing everything out we can apply it again let's jump back over and take a look it's always a good thing if you're just doing it by default just take a look you never know we can describe that pod and take a look in here it's still giving us access denied so maybe we need to check that secret and make sure everything in there is okay so I'm just going to go ahead and do that we'll do qctl get secrets namespace learning and it's called radcloud well that exists so I'm going to actually output this as yaml now now I'm not going to show you the results of this because this will have some sensitive information in from high side so just bear with me a second right and then there's a base64 encoded file here so I'm going to grab that and we'll do Echo the base64 encode a bit other than pipe that into base64 decode and see what's going on in there again you won't see a lot of this but hopefully it'll give you an idea of what to look for now I can see in here that under index.docker.io we've got this blank section and we're also using this cred store of desktop.exe well that's no good because that that means this is blank so actually the the rajkad is not working I need to actually wipe this out recreate the secret with the correct bits in and then see what's going on I'm going to clear that so I'll open up the docker config.json was it I believe yeah and in here obviously I'm blocking a lot of it out but what I'm actually going to do down here on the credstore is I'm just going to go ahead and delete bit which means that when I log in it will install the credentials in that file so I'll do a login again now and now we've got this so I'll do Drew files and then over on the docker website I can go ahead and create a personal access token so I do account settings we'll do security and then we've got this requirement here so I appear to already have had one at some point but that's clearly not working for me for some reason so I'm just going to go ahead and actually delete that we'll create a new token and I'll call this Docker CLI hyphen WSL and yeah I want it to have all that access I will just copy that token then I'll go ahead and paste that in here and that's great it's going to give me the warning because it's telling me it's stored or encrypted in this file but that's kind of good for me because it means that now I can create that secret again and it will exist so let's go ahead and delete the Pod we will grab that secret so create secrets I'm going to go ahead and record I'm going to actually delete the secret instead and I might as well delete it out of the other namespace too because that that won't work for me either and now I'll go ahead and do that create secret command again in the namespace learning we'll do that I'll apply the Pod and jump back over here I'll do Cube CTL get pod again in those books learning and now we have that running because as you can see dnsu tools here is deployed so let's go ahead and describe that and take a look at it again so yep it's pulled the image from there it successfully pulled it and it's gone ahead and created it so great we've now got our own private image being pulled using image pool secrets and you can see there how things can go wrong easily if you're not kind of looking in the right place I guess so obviously I was using a private image here but actually all I needed to do was add some image credentials to be able to pull it and that was it so that's a couple of bits there now I'm just going to go ahead and grab this external DNS one here we'll pop that into this folder so all I'm going to do in here is change this namespace to learning this namespace to learning this namespace to learning and down here I'm going to change cloudflare proxied to cloudflare proxy that is it so I'll do Cube CTL apply F 23 external DNS and that's it let's go ahead and apply that see what happens so let's jump over to here and we'll take a look at this so Cube CTL get pod namespace learning and we should see that it's got a problem so that's fine I'll grab that we'll do qctl get pod and space learning so describe pod and I'll just actually just tail this because we don't need all of it you know the last few lines will do and to be honest I know the cloudflare token is going to be in there somewhere so don't want that exposure so let's take a look we've got successfully assigned pulled image successful created started container image already on the machine that's fine we would knew that we've deployed external DNS already but then we've just got this back off but it doesn't give us anything does it just says back off restarting fail container it's doing DNS in the Pod great okay well what use is that to us let's go and take a look at something else now so qctl get partners which learning and then we've got this one here which is now switched to an error state I'm going to do Cube CTL logs f for follow so this will just continue tailing essentially the log and then put the Pod name in what namespace at the end and let's see what's going on here so if I press enter straight away there we go okay so one line level fatal flag passing error unkloan long flag this okay well I obviously purposely errored that but you might come to your app and go oh well I know what that is because I don't know yeah well I don't know that for like that's not a flag that I'm supposed to be using so I'll go and fix that that's an easy fix let's go back over and fix it so we jump into here we change this to proxied redeploy and this would be up and running so that's that's a log there sometimes you'll have masses and masses of logs like endless amount of Vlogs almost but they will make sense once you hit the error so for example let's just jump into Cube CTL get pod namespace search manager and I just want to show you the massive Vlog I'm talking about here so there won't be an error in here this is working but for dql logs uh cert manager namespace search manager and you can see here look there's a there's a lot of login information going on here and it's a case sometimes of just going through Reading each line looking for like for example this is information this is a warning sometimes you'll see an e for an error so sometimes it's just a case of digging through those logs and seeing what's going on and eventually you'll and all logs are different as well for what it's worth not all of them will have this bit at the start some of them will have you know like we saw here where it's actually level fatal so it depends on the app it depends on the way they log and what login system they use they can all be a bit different so sometimes it can be quite painful going through so it's a case of reading through each line carefully and eventually one of them will give you the answer and it's not always the last line by the way sometimes you'll look at the last line it'll give you an error but actually the error is up here because the error started here and then it gave you an error so it's always worth reading up and having a look through okay and just on top of that you can do qctl logs and then help and in here you'll see that there's a lot more you can do as well so we can you can do tail you can do since a certain time you can do the follow command like I say there's a bunch of extra flags that you can pass on to it here as well so it's worth having a look through the help there just to see if you can refine your logs a bit for what you need and finally that's I'm going to show you is qctl events and the event system is pretty much what you see at the bottom of the describe pod so I'll just quickly show you if I do Cube CTL events namespace learning you'll see here we've got all the information from all the different pods we can see that we had some error image pools see the image pulled back off we can see everything from good to bad and you know we can we can again refine that even further so if I do help on this we can do what types of events so for example if we look at some of the examples we can say I only want the warning and normal event so let's do that okay that's really useful and yeah so we can refine that even further I don't want normal ones I just want warnings here's some warnings that I'm getting now so we can we can refine even further we can also order them by by time and things like that you'll notice that they're not really in time order all the time sometimes they are sometimes they're not so it's worth having a look through that a little bit further through the help command and that's just some of the way we can troubleshoot pods and that's it that's everything I'm going to show you troubleshooting pods now as I say there is much more you can do because you need to know your apps that you're deploying you need to understand how things are being deployed and it's important to know those things because otherwise you're going to struggle looking through logs but crash loop back offs image issues generally checking logs looking at events these are good starting points now sometimes you can be digging through for a while and there are tools to help you collate those logs such as Loki and if we will look at them later it's a good starting point like I say I can't show you everything because I don't know every app and I don't know how to troubleshoot every app in case of using a little bit of logic in the it basically if you can't find it in Crash loop back off or events or any image issues you need to look at logs that's kind of the last place to be looking because if there is an issue it should show up in there if the application is well we'll you'll get decent logs out of it as well so in the next video we're going to be looking at troubleshooting nodes which involves checking the cubelet logs which have pretty good logs so I guess I'll see you over there [Music] [Music]
Info
Channel: Drewbernetes
Views: 92
Rating: undefined out of 5
Keywords: Linux, Kubernetes, CKA, CKAD, CKS, Logs, Events, Troubleshooting
Id: uIYIreo2Pu4
Channel Id: undefined
Length: 17min 9sec (1029 seconds)
Published: Sat Jul 29 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.