Klustered Teams: Raft & RX-M | Rawkode Live

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] so [Music] hello and welcome back to rockwood academy this is a new episode of custard i have been away for three months almost three months now and so don't remember how to do any of this livestream and stuff i don't even know if my little video was going to pop up in the bottom and stuff and it's just been far too long but today's episode of clustered we have two great teams two very broken clusters and we're going to do our best to work through all of those fixes and get things jamming today now before i bring on our first team i want to thank our sponsors for today's episode of clustered teleport wrapping sponsoring clustered from the very beginning we've been using teleport from the very beginning and they recently started sponsoring us the mad props telepro has been great working with you if you like what you see as we use teleport to debug and fix these clusters today you should just check out rockload.liveteleport to show your support it really helps and i will thank you much lately also the hardware has been provided by my former employer equinix metal if you want to try bare metal cloud you go to go to metal.equinox.com and when you have an account use the code raw code this will get you 200 of credits to start kicking the tires these are some pretty nice and beefy machines so have some fun with them i do all right now we're going to meet our first team for today here we go you are all now live on air so mind what you say be polite all that jazz can we start at the top right we'll start with you christian just say hello tell us who you are and then we'll move around clockwise all right hello my name is christian luxena and i'm with rxm llc we are a cloud native training con and uh an engineering companies uh that's me hi i'm christopher hanson i'm uh one of the instructors for rxm i also i'm a cloud native to happy to be here thanks for having us good evening and good morning my name is ron petty i am also an employee of rxm i'm a consultant um we we work on books and trainings and we love kubernetes and we're we're glad to be here thanks for the opportunity uh i think i made a mistake i just went to my youtube page and i'm streaming to the wrong video how wonderful is that for a start right mother the first thing's been hacked yeah but the comments are coming from the right video is this a bug with my streaming software good question because i'm seeing the comments from my video i'm very clearly streaming to part seven of the teleport course all right i'm gonna hit finish just change over and see what happens alrighty youtuber my stream is so far all right so we get to do our things all over again well i've just pushed the live button again i'm going to keep us here and we're going to see what the hell is happening i don't know what here we got some comment rolling in now russell hello hello russell yeah i'm gonna wait till i confirm that it's working and then um but it's off to a fantastic start i've got to see i'm really really happy with that [Music] yeah thanks for that uh it's still not coming through let's do it live all right i'm just gonna post a comment uh with the other video because my streaming software is enemy youtube being weird yeah we got a couple of top chats already yeah so i mean the people are just tuning in i have no idea what's going on the chat from the real video is coming to my software but the stream is going to a different video so i'm i'm going to assume this is a youtube book i have no idea um but we're just going to roll with it now and get started i posted the video link and it's come through on the same chat which is even weird um but we're just going to get started there's 21 people waiting in the other video and i hope the thunder be over here all right so i'm gonna i'm gonna be really horrible i'm sorry could you all introduce yourselves again and then we'll get started on today's cluster no worries i guess i'll go ahead and get started my name is christian luxena i'm with rxm and we are a cloud native training and consulting company happy to be here yeah thanks for having us my name is christopher hanson i'm also a team member at rxm i'm a cloud native engineer and instructor super excited right on and and the the third wheel my name is ron petty i'm a consultant at rxm and uh i am also very excited to be here all right thank you all right well i think we just dive straight into this cluster let's see if we can get this thing working so this first cluster is from our second team who are at raft tech i'm going to pop my screen up here and we have our teleport session i'm going to click connect and open a root terminal on our control plane node now if you can all go to act activity and active sessions and join this and once you're in if you just type echo hello whatever let me know that you're there i can see we've got one more person connected two more you all in yep four there we go perfect so we're all in the same session this would be a good time for you to configure your cubeconfig and test to see if we have an api server i'll let you all take it from here but uh have fun and best of luck all righty thanks thank you david so right let's go ahead and uh get ourselves going i believe there should be an export statement nope let's we don't have an export statement so let's go ahead and do config view so basically what's happening here right now is it looks like my cube cuddle is being pointed to the correct well yeah it's being pointed to the correct file but i don't seem to actually have well the ability to access it so let's see here okay looks like the config is belonging to root i just happen to be rude as well it's read right it's not in the path it's a little weird which keeps cto all right and it's not an alias uh unless i'm missing it hmm [Music] all right so it's not an alias so it would normally be like user bin cube ctl just be pedantic here it is there but no permission no questions interesting which cube ctl we're almost back to life maybe we'll see all right all right now now uh christian export go all right so we'll do we'll then do the export or cube config so usually uh i believe these are cube adm clusters right so let's see if i do ls minus l or well i am root actually at c let's see kubernetes no hey etsy cube adm is that correct ron and chris go ahead and yeah yeah here let me um yeah it should be uh etsy kubernetes admin so there's that so then if you do export equals let's see enter and then cube ctl get pods oh it's export cube config there we go oh yeah yeah so for the audience what we're doing is we're actually pointing our cube config our cube cuddle to use the admin cube config this is actually something you can only do as root so let's see here it doesn't look like the admin conf is working at this point well interesting so let's see here now let's see if we actually yeah let's see if we actually got the correct well what is that oh okay so let's see if we actually have the correct environment variable name what do you think ron or chris uh yeah but i'm pretty sure that was right yeah it seems that way i mean we could be a little more potent [Laughter] or etsy kubernetes uh oh my god yeah cooper i can spell cooper okay um and i mean well i mean it's clearly reading it i guess i guess i was being dumb there it is reading it because it wouldn't have the ip address otherwise right so uh yeah um i'm i'm afraid to oh no i turned off my plugin so this teleport here should work here so let's take a refuse is that the ip of this machine yeah that's what i was going to check too um 144 49 28 6 6 4 4 3 so let's just try telnet 136 so the ip address is the bgp advertised address to this machine and i think yeah that is correct yep so http or yep and then uh tls so that would imply oh let's see crap um sorry so ps waxa grub cube api i always love to see which parameters people use with ps like nobody seems to use the same one twice so look at the advertise address there yeah that ain't uh that's not correct that's uh well it's uh looks like that's the local address yeah so let's just take a look there so uh 10.25 to 121.33 so that should be okay um then [Music] it works the https privileged stuff all right so many settings um allow privileged auth i really have a hard time reading these multi-line things where's the setting to enable tls or disable icon let's see oh yeah the secure report is actually there so minus my secure report is in there uh ideally insecure report should be absent here for you security-minded folks and i don't think i see that i want to move that if you guys are reading but yeah go for it oh i just was looking i just want to look at the yaml file it's a little bit easier to read the flags it's best to put it through a pager just so that we can follow their line of sight as well rather than we have to scroll to where i think you are fair enough i'm looking at the uh flags right now yeah perfect alrighty so our advertised address is different from what's in the cube config so it seems at least it should be oh yes yeah so maybe just switch this you're saying yeah give it a shot yeah yeah okay [Music] so what was it ten dot uh uh i copied it from girl yeah ah this is the 136 49 28 number so here that would be interesting another another stop right here read right oh what's going on here i'm going to sledgehammer this because i'm simple oh come on is this the oh is it is app armor no did they disable did they disable our vi yes no vi oh come on they they did those 30 30 who would hurt vi just choose emacs we'll never get out of it all right user been i'm gonna guess it's alternative words uh etsy alternatives let me guess this is gonna go to like everybody's user [Music] oh my god all these aliases there we go chmod 700 user thirty i don't i thought i like go raft as a team but i don't know i mess with them you mess with my family all right did that work hey we're alive all right um kubernetes i copied something into my buffer so i think i'm going to bork this uh oh no [Music] my plugin hold on let me it's this plug-in let me add that roll save changes irony my plug-ins my plug-in is a vi plug-in for chrome uh what was the ip again oh man there oh yeah i do have it all right six four four three oh wait we we can't use comments camel no that's lovely all right so uh let's see all right 10 25 62 and make sure nothing else has changed in the meantime six four four three okay that's that [Music] yeah all right so what happened there so i think we need the uh more observational view here airport six four four three where'd you say you saw the insecure um oh i don't think it i don't think it was there truly yeah it's not there uh i think i think it was he was just using just talking for education purposes you know it shouldn't be on that's the teacher in christian [Laughter] all right so we are secure port 6443 so it should be doing now usually doesn't the cube api server run on the host network so can we check if this has um host network true yes one second go that's true all right so it's definitely it should be listening on that address in that port let's take a look at the logs uh wait a minute we can't connect to it so we have to do tail dash whatever oh let's just tail it to start var log fingers kubernetes [Music] everybody's favorite component [Music] what's going on here uh let's check it for them but let's see if it's actually valid or read only or empty uh okay so v er what's uh can't etsy kubernetes all right and it's advertising it's not advertising on the local host it appears to be advertising on the um on the nodes ip see that first flag up there ron on advertise client oh that's client urls excuse me yeah so hold on we should if it's not starting there should be a log somewhere right so etsy is logged it should be a pod but we can't get to the api server or can we so so this is a static pod so where does that pod logs go [Music] they do they always go to do we just not think about this is it there's fcds see no such directory oh let me guess we've four dark certs okay so let's take a look at this guy [Music] what's in there okay so it'd be peer that's cert or well so uh well let's see if it's just hidden or something [Music] or wait a minute client is sorry let me use my brain here so this is etsy that's failing to start because it can't find it for a client client uh i guess we could just use a piercer what is uh um as long as they all have the same ca it shouldn't matter right right well yeah we don't really care about the clients yeah right we don't care about the clients but uh let's check that out the api server uh it's gonna use the client cert though so [Music] yeah it's overdied probably because you can't talk that cd uh so this guy uh let's see what is he using api server ncd client client cert um [Music] my god having a brain fart here um yeah no no worries um so it looks like here those are all pretty standard for the api server but it seems like fcd itself can we check the spec for it it seems like ncd itself is trying to refer to that client.crt which clearly doesn't exist so let's see if we have a client.crt somewhere go ahead and type it in well it's not yeah that's what we're looking at the etsy kubernetes cd that's this directory it's not i guess piercer is what we're supposed to use maybe insert file is client.crt so i think i'll point it to kubernetes yes i don't remember what was in the directory oh what was in the directory was i don't know if you can see highlights here folks but oh wait i i got it i found it it's ed city kubernetes pki api server right i mean it should it should be the thing we were ignoring before it looks like this error here is referring to that so i believe this should be the server.crt uh oh yeah cert file duh yeah so that's the server cert and if this is a one this is a one node cube a vm master correct this is big all right so that means that it will eventually come up or it crashed so fast yeah so let's go see anything ncd i think yeah yeah but i i can never remember if it so we just want to see that timestamp is updating right ah it doesn't align i can't read it all right it was 23 so let's just watch it oh it did a 24 now so it should be it should be there yeah your tail command is using the api server instead of the lcd log uh that's not the lcd log there it is so what's this now ah yes sir are you trying are you trying to increase the sla of our certificate [Laughter] all right so caa that's ert all right this is oh that's sneaky all right let's go before we leave oh that wasn't that there was them let's see so i'll take it back to kona's name all right so let's go ahead and try it i do recall that the api server is also trying to connect on localhost but we'll see how that uh oh wait no it's still one one problem at a time i feel like this watch default setting of two seconds is more than two seconds right it should like be like a little spinner or something to prove that it did something i guess on the right it is i think that's the clock yeah yeah there it is all right i guess i'm impatient i should read the whole thing every two seconds [Music] so we're just waiting for it to come back up wait wait wait wait we we were editing that in place weren't we uh yes we were so remember that bug coupe cuddle yeah it came up though okay all right we were being positive here it came up oh there it is kubelet has that bug where it sometimes won't read a file you edit in place so that's true that looks better a little rare all right let's see if the api server came to life nope yeah so let's check the api server log yep all right oops let's just do this tail reverse flow context deadline come on that's what you get you were right 127. go for it the local host was the all right so uh what was the ip again ron do you still have that in your buffer okay oh sorry no i just killed you no i um that's the it's the servers flag right so yeah i think that's normal guys yeah this one is normal what's normal api server that setting oh connecting to api connect to the ncd via localhost is normal so right resolving a problem that's not a problem oh come on so what i'm gonna do is i'll grab that because there is something that caught my eye inside the uh the emel and it's it's this one here so client urls are client connections it's listening for on this ip address not localhost so let's wq out of that and uh a local one shouldn't work too i mean it's the same right everybody's if everybody's if it's listening to zero zero zero that's true i mean so now we have to wait for ncd to come back up hey oh there it is okay i guess you're right all right guess i'm um i'll shut up now what's kubernetes i've never heard of it i know okay so let's see that should have been fine oh come on get pods don't be cruel this is like killing hello world why didn't we kill hello what did hell do to you i would check the api server logs i'm not convinced you have the api server and std talking to each other yeah yeah that's right well that log is that little wrong log fail because that's four or five minutes all day but it's um oh sorry you're right all right thank you thank you thank you delete faster delete delete oh yeah now i don't know which one is the latest [Music] number one right or something like that because it rotates uh all right what am i waiting on here for 1727. about a couple minutes ago so nine zero d yeah yeah i had a comma one before but i don't see that now however oh man somebody heard our etcd storage key not found registering master leases uh all right type this wrong cube admin reset uh read those two error messages slowly the one above the stool each other i think you're missing something silly so we got this oh right our uh this is this is coming from our ad wait explain it ron explain what you saw so for the audience this cube system is misspelled yeah system or system with no e uh where am i looking am i in here nope don't think it's in there uh is this not where you set it it's in here right like the door hold on let me set that um i'm looking for the namespace isn't it in our cubeconfig file am i blind no not unless you set it explicitly so that would mean it would be going to default well let's suppose the api server log yeah you're looking at our admin copy so oh wait a second yeah this is uh for the yeah benefit hold on please manifest error q api [Music] there we go there it is someone drop me a whomp please all right how do you have the uh somebody has a there's a lot of typos somebody should really was this was this ron or me typing is that what happened did we make it happen this wasn't a hack job this this was just my bad skills we got how many api servers we got going here we got two of them now hey we're in stereo now hey it worked one of the api servers worked it's horizontally scaled that's right that's it just update the pod all right all right someone else take over i'm too nervous all right i'll go ahead first off let's get all our pods uh in all of our namespaces yeah wait a second where's the app it should be showing up in the get it should be in the default namespace where's our default namespace we are in the default namespace oh wait why don't we sneak where's our app and where's postgres let's see trickery i called make it explicit do dash n default and check all right maybe maybe there's some crafty thing where they've ebf'd us um i don't think you've been pbps i mean if you look at your cube system namespace you've got one important thing and an interstate true yeah that's true so let's yeah so that means that the deployments the controller manager is kind of important right yeah oh yeah well we don't need the controller manager if we bypass it yeah that in a clustered has also been scaled down all the way too same with the sts i believe uh the staple set i mean for all right well the controller manager needs to get fixed that's why these are zero so yeah let's go after that all right uh ron i'll leave it to the go ahead i heard it i heard it works if you um let it restart 200 times maybe we should just wait the 200th time it's like yes i've come overcome my difficulties oh really come on here i can't i can't live like this like a caveman anymore all right here i go i've got you wrong i've got that one completion bash all right no i'm no longer a caveman all right describe uh why can't i just read my mind come on of course my mind's empty so you know all right images here boom crash why are crash in yeah let's check our logs on it yeah why can't you be friendly you can use kubectl logs now ron savage right all right we got two logs here who's who's the newer lock the top guy's the newer log uh right 1733 1738 all right now let's go open this guy autumn all right i'll start walking backwards [Music] sorry tell me to go slow for going slower i'm just kind of seeing if there's like a big hat i'm looking for ascii art where's the big thing saying you got hacked i don't i don't see it how far back in time we're still at 1733 oh this is at the top port has been deprecated this flag has no effect wow okay so maybe some kind of studying here what is happening here starting waiting uh let's see did the should maybe check to see if it crashed because it is oh wait a minute can't we do like dash p oh yeah uh keep killing yep dash p is the previous container the one that's the previous crash that's what we're looking at right look at all this go code expository you want to type it through a picture so we can you can follow along oh it's uh oh sorry not bad i thought i thought i had it in vi that was my bad um okay here we go let's check failed failed to wait for api server being oh wait for everybody at that point but that was it was down back then i think well that's why you've had to look at the previous logs because it's restarted since we were talking about it yeah maybe now it's healthy yeah maybe um do you like it when things fix themselves right there we go restart i guess it just needs to be yeah just have to restart until it started again you know wait automation automation works wait we still don't have a default anything right where's our scale up just scale up the deployment all right christian can go yep get deployed christmas though all right so then i'll do keep the titles do it show it no mercy deploy one deploy clustered uh was it one yeah yeah let's just do one for now oh my gosh [Music] getting close check if we have our pod first all right yeah let's just grab this pot here uh it's been no you know wow wait a minute yeah who cares yeah um so i'm going to so they're currently scheduled so that means they're not accepting new loads from the uh scheduler so we'll do keep cuddle on cordon uh raft uh uncorded node raft worker one i don't think you need to know it just just uncovered one i think yeah there you go and two all right oh we just need one man yeah just uh come on it may be uh it may have an affinity for the master right all right because it's zero of one nodes so i'm guessing the the deployment has a uh affinity yeah so that's what i'm going to check for by looking at the ammo yeah and all right let's do that there we go so wait clustered that's a lot of these right there so we're going to have to fix that that's the name of the show [Laughter] all right got to fill that buffer i don't see any affinities the eucharist or ron uh you got to search man do do you're in vi or less uh hit forward slash forward slash okay and then backslash c lowercase c and then uh you said taint you just look for taint right yeah you don't need to do capital t though the backslash c was case insensitive um all right uh no you're looking for a taint is a no taint you're looking for a note selector or an affinity so look for it would be under the pot spec so go uh well so we're wrong containers i don't see node i don't see n i don't see affinity all right we'll fix the um all right so watch that is the name [Laughter] there we go um we'll do we'll edit it minimum replicas unavailable and then i say forget it just unpaint the master node and yeah that's true you know who cares isn't that what we did no we he did the workers oh man come on man oh no that's still it's still uh replicating out of control yeah just just to do your just remove the master tank all right keep cuddle what was the tata i'm sorry guys no no and then you can do oh yeah [Music] right so it was no no it's 20 21. come on all right there you go yep there we go no okay error image pull okay so it's still uh the uh thing is still not going through clutter cd it's d it's it's it's not no wait it's not ed it's d cluster integrating it's creating we're good oh wait did i oh i can't spell that it is easy i can't either right i was trying to pull impress all right we got like four minutes left so it'll get pots it's running all right i forgot to start the timer on time so you've actually got a bit longer so you're good okay okay all right wait go go go we gotta do a set set image now to version two all right right so keep cuddle uh set we might we might drop on step one uh deploy clustered and then uh what was it it was uh what was it was the name i don't know the name of the container dude scroll up scroll bar didn't you just show that yes containers is uh clustered it's just called clustered oh yeah if it's just a deployment i think it's the same name right yeah whatever we'll find out oh wait don't get cocky yet we're so close yeah dude roll out history roll out history see if it's oh you didn't record the rollout history [Laughter] oh wow that's been a there's been a lot of changes made to this one but no no i didn't i did not record the rollout history so let's see what's right just describe it just describe the pod and see if it says v2 and then let's open up the browser and see if it worked oh yeah we've got a few more things to fix before it works you've just not seen them yet that was the angel of death talking oh it is not ready though unfortunately so unfortunately it looks like we gotta we do have a couple more things um it seems to be failing it's uh a readiness probe if there is delete the readiness probe go for it like you think yeah all right go faster ready yeah no um don't don't do this kids don't do this again just forget the pose already on this probe you know it's uh it's fine it's also uh [Music] also a rogue replica set that seems to be trying to do something not sure what's up with that fella but it's ready now all right wait wait we're not done there could be like a service blocker you know or something like that we gotta make sure you're getting okay so this is where i was encouraged that i will go to the teleport page i click on applications and we have a nice little helper to launch our clustered app and what you want to see is me dancing are we going to see me dancing no i want to see you dancing come on because our postgres is not up yep it's actually gone get deployed or what is what says connection all right is it scaled scaling or well there is no staple set so yeah was postgres on its own yeah so there's there's a cluster ip but there's no yeah there's no uh there's no database unfortunately that's such a shame so oh no there's such a shame but we got the pod up and running so what's next yeah who who needs data right what's up with ambassador being down hey that's so that's that's what guy needs to fix don't worry okay okay yeah that's yeah that's uh that's fine so we slow down so there's no postgres pod do we have postgres yaml is that in this let's grab the ammo from github.com clustered okay so there's a workload directory flash rock code slash cluster clustered a cluster that's right plus training and inside the workload directory you'll find opt kubernetes and you'll see up kubernetes yeah there you go oops i'm not playing i want to blame it seems dirty deleting the battery yeah right we could have just deleted the kubernetes repo that's not gonna work what happened hey i was gonna say you could just it's a public repo you could just keep seattle apply it from hey he wanted to do the caveman art where you blow pins on your hand right now yeah yeah that's true all right so let's do keep cuddle apply my apply minus f from that guy oh look at you being all fancy pants all right that's right there unchanged unchanged created get some pods tv and our pvcs just make sure that it's not you know backed by anything all right so or should be backed by something yeah or if it should be backed by something let's go check that repo there boot you up man uh probably has a probe or something so taking a second likely yep do it so go ahead sorry yeah sorry all right all right all right let me refresh our clustered application oh yeah hey there you go nice all right good job there's a lot to fix there right yeah yeah did we beat the time you did i mean i started the time early so i'm not actually sure how long it was but it was i think you were under the 40 minutes i think you were okay there was 12 minutes left on the timer and i started 10 minutes late so i think you had three minutes left all right that's just about a minute you mean there was one second left and we defused the bomb exactly yeah yeah that's how it went we'll edit the video to reflect that don't you worry but no good job that was good teamwork you communicated well you were through lots and lots of typos and death by a citizen cuts there almost but you got to then yeah deleting the stateful state i thought was particularly sneaky but there you go all right awesome that was a lot of fun that was fun yeah yeah that really well so we should go to the youtube now and watch watch uh exactly the reverse it's a different video because of the small hiccup that we experienced at the start but if you go to rocco.live you'll be able to see that we have a session live in action so i feel free to jump out with this call jump over there and then we'll have the the raft team join us in just a moment so thanks again well done i'll speak to you all soon all right good luck go raft thank you all right so the raf team will be coming to join us in just a moment once i i'll just boot out all these there we go uh so raf team come and say hello while we wait just because of the awkward start i'm going to thank the sponsors again teleport we've been using since the start of clustered is an amazing product check it out at teleport i really appreciate that and equinix medal my former employer has been very gracious and continues to sponsor the hardware that we use on clustered as well so if you want to check out a bare metal cloud you can use the code raw code for 200 usd in credits so go spin up some bare metal all right we have the raft team so let's bring them in hello hello hello how's it going how was that was that fun to watch it was a blast who sure was screaming at the television going just it's right there no it was hard all right well thank you again and can we start with a little bit of introductions uh we'll just start with you on the left architect and then we'll move over and then we'll get started with the next cluster sure so hey everybody i'm hitach sharma so i'm a senior engineer at raft and i was i've been playing with kubernetes for about a couple years now so really excited to see what we have to fix here now and i'm barack stout i'm the lead rng engineer at raft i've been working with kubernetes for a little bit over three years now uh really excited to see what uh they broke up in this cluster all right awesome thank you very much so let's get my screen shared there we go i'm going to open a session on the control plane node of this cluster if you could please go to activity and active sessions and join this session just give me an echo hello to let me know that you're there and then we'll we'll kick things off there we go one all right let me refresh um remember it's under activity and active session got it all right i can see we also have the rxm team and the chat saying let's go so they're prepared for this too there we go and we've got you both here sweet so export your cube config test for an api server best of luck take it away all right go for it i'll i'll wait 10 minutes that's what i did with the first cluster see what we got all right jumped up all right okay everything is running i hope i hope they remember to break it uh want to do edit the deployment let's see i was going to check if we can get to the application yep you want me to pull it up so sure we'll go to applications we'll spin up clustered adding some of the shortcuts um let's see what we have there me too yep let's try um interesting a lot of all right so let's see check the scheduler um go for it you have access okay that looks good um unexpected end of file yeah did they put some tabs in the ammo like there's a connection refused too i mean these error messages do appear to be from yesterday let's see what they did oh wow look at that i think that's just me okay yeah there's nothing there we try that's unhealthy nice try though oh um so let's talk about what happened there right you did a cube control edit of the deployment and you believe you've set the version to be v2 can you and nothing's happened so maybe you want to confirm that yeah do describe on the on the deployment did they mess around with our back okay that's v2 but zero replicas created for the new replica set um what's responsibility on the replicant sets so this one is not ready this is the new one uh uh where is my mouse not accessible yeah why isn't that clicking on the window what do you what are you trying to do um describe the the new rs let's do the um did we do the source completion uh yep user forbidden iris account all right let's look let's take a look at the i wonder if what are you wondering talk to me i'm [Music] trying to think if the if the service account should have had labels to access that so the user we're seeing and they're described isn't the default service account because you've described a default service account and the default name space but this is the replica that controller as part of the controller manager yeah this is this is from the cube system can you do describe service accounts in the cube system now let's look at these get all of them okay let's do prescribed same one but what i'm trying to look for isn't it gonna be for the replica set though and then the error around the replica set the replica controller yeah okay so a service account doesn't have any permissions right in the kubernetes cluster what do we have to look at next uh if the service account has no permission yeah exactly let's look at the rule maintenance and see if we can work out what's going wrong here okay go for it brock [Music] yeah yeah you know what i'm stopping let's stop being a cave man his name's very good yeah yeah uh all right i gotta do the export thing yep okay let's look at the roll is there a role binding missing then is this all for the oh sounds about right is there a roll um okay let's go back to the error because i feel like we've maybe gone too deep into one place go to deployment and see your replicas that's good how to create user system service account cube system replica set controller so we just need to give um access to a replica set controller yeah we need to um give it a roll binding yeah so for that we can do that's tying this one right yeah okay do you have that uh up um checking real quick that is evil well i guess service account is fair it's making me question some of my knowledge here because i can't remember if the controller manager role mentions are hard coded or not i actually expected to see more from a cube control get role binding ah but it might be a cluster rule by then we didn't look at those so i would do i get cluster uh cluster rule binding and they could or just flatten the namespace there you go that's looking much better now we can see that we have the replica set controller role binding there binding to the cluster role of replica set controller so you're going to want to start introspecting those two [Music] binding doesn't so we need to give x well in the default name space i'm assuming that's correct this robot in here i'm going to buy i'm going to assume binds are replicas that control our service again through this cluster rule so maybe start by just taking a look at that cluster rule and see what permissions it has uh you want to do a describe on that i'm just gonna check something um uh go for a gopher while i'm looking something you can go for it replica set controller yeah so cluster roll space and then i think all right yeah that'd be worse too oh yeah it's okay create list delete update i think we're missing a verb pods yeah create aren't we do edit is it over here we're in the right place you'll need to search for pods there we go oh it still doesn't seem like it's coming up right after um kill it again all right i might have to trigger it again or not sorry deployment range so i'm downgrading it back to v1 okay and then putting it on v2 i can't create because it did something or thought it did something but it doesn't look like it is picking up the changes yet so let's take a look at the the the logs of the deployment again i mean you could just delete that replica and it should to recreate it there you go i mean it's not creating a pod but let's see what this is oh different error cited quota oh that's fine we can just delete that other part and it should be fine yeah i mean sneaky but no ending this v2 no it's still v1 though because that is still going to deploy v1 yeah the the deployment is set to v2 right now right the deployment is set to v2 so replica failure failed to create what is what is to say minimum replica is unavailable um i think this needs to be edited yeah um gotchas here i like threshold period success that status that's not going to make a difference so pat is good which we know because that was one of the things we had broke uh we need to edit the we shouldn't need to edit the image from the rep like i said right that should be picked up from the deployment so if you see another message about quotas what privilege does kubernetes expose recorders do you know oh i don't remember my head unless they tainted all the no no but then it would complain about a taint if they did it with taint there's this sub command cube control api dash resources which will list all the custom resources or all the resource definitions within a cluster i'm not not necessarily a customer i had to correct myself there i don't know if you can do this in canines so you may have to use a vanilla cube controller yeah you found a quarter there you go but it's 72 so it could be okay still but i guess we could increase these to three or you could delete i prefer the shotgun approach really i'm not going to say how but that might have been how some stuff disappeared from the previous cluster that the previous team let's try to debug but do we need to trigger this again no it thinks it's on v2 i might have to delete the replica set though again i'll just speed up the reconciliation something is pending yeah let's see it seems to be a v2 i'm half expecting an ingress mess or something and why is it pending forever uh look at the log so um do they have any tolerations on the thing i don't know can you look at the github and see that they they mess up this deployment yeah there's no logs to it there's nothing in here in the deployment necessarily um the replica set those that it successfully created still in pending mode normally for something that's stuck in pendant mode we'd see an event for where it was scheduled an event for the image being posted we're not really seeing that no it's not pulling images do it describe on the notes i want to see like if there's anything blocking up and running unscheduled falls ah very cubelet has sufficient memory available disk pressure they filled up with this didn't they i think you're okay oh okay oh no no status is false hold on hold on i'm reading this wrong if that what appears to be healthy oh that's efficient memory available so two blood was wrenching i think this is scared i think scuba did come out yeah it doesn't seem like there's anything here that's being overused if you go back to your pod list and the default namespace right so the one thing that right now is pending with no nodes so you can either bypass the scheduler or maybe look at why the schedule is not assigned to that you've got two different options yeah that was my my next thing is to look at the scheduler and see if something is messed up there because it's putting this on the control plane right can you buy can we put uh this under the control plane too let's see so this is one of my favorite hacks for kubernetes but the scheduler is really really simple i mean you can just modify the deployment and put the node into the spec and then bypass the scheduler altogether the cool hack that nobody should ever do in production ever i don't think anybody should do anything just don't run kubernetes in production there's the answer to everything scheduler right after that spec line you can literally just put node name and then the name of a node really okay yeah i did not know that today go grab that name of my note okay that should be fairly easy to remember wait connection refused i don't like that was it a camel case it is yeah if i remembered it correctly we'll find out oh they did not want to say the name is not correct look at other parts you'll be able to see it there because the scheduler adds it um look at the pod you mean or not look at the ammo yeah because i guess added to the pod but you can set it oh yeah you have to set it on the pod spec within the deployment yeah i'm just being silly you have to go to the deployment go down to spec templates back and then settle oh i see what you mean we were on the yeah wrong spec it won't be the container it'll be the oh it would be at the same level as containers yeah elmo was such a good idea there you go there we go well something got scheduled that's v2 do we want to see if the url is exploding or not i really hope that horrible hack is not just what ended us up yeah it has side effects it could just be my browser cache i've had to have refreshed 40 000 times before before it comes back i think oh is that it oh damn it we fixed it it's up my end yep there we go all right i didn't know that was going to be the end of the the carnage no i feel bad oh well it's fake i wonder i wonder if we're missing stuff but those are those are some pretty pro downfalls i mean we still haven't fixed technically the the node right we kind of hacked our way just yeah something right do you want do you want to attempt to fix the scheduler or do you want to just call it you've that's it you've done it it's good i don't know i'm okay with saying it's good good enough right it's deployed to production the site's live let's not touch it yeah uh all right perfect well good job uh we we've listened to that one fun things to fix there the scheduler is always a fun one and i mean the scheduler does a lot but it's really easy just to bypass it and move on to something else so yeah all right well that was fun two good clusters dude what did you prepare watching someone fix yours or fixing someone else's i preferred a fixing part because like sitting at the screen like just throwing popcorn at it i like yeah i mean when we were watching i was the whole time i just wanted to type in the chat like no no look at that name space look at the name space there's a typo there so but i mean they did a great job too i mean we had we also had some a lot of um small cuts on the worker nodes as well but then the workaround for them was also just to spin up the application on the control plane so that worked out as well because there was no way the worker nodes were coming up ah right okay yeah i don't think i noticed that yeah yeah we had a lot of small little cuts uh we'll put it in the pr um but that was that was sort of rmo it's just a fat finger a bunch of stuff that nobody ever should touch uh in themselves right like the spirits yeah especially the search on the worker nodes yeah all right sweet well thank you for taking time out it's your day and week you know breaking those clusters is not easy it takes time and thank you for joining me live and you know typing in front of people which is also impending doom every time we do it so thanks for sharing your knowledge and uh good job i'll let you both get back to your day and i'll say goodbye to everyone else thanks all right all right two great teams two great clusters that was a whole lot of fun apologies for the weirdness with the youtube video at the start i do not know what happened there i'm gonna have to dig into that thank you to our sponsors again teleport and equinix metal and we use teleport to debug these clusters to fix them to share terminals to get access it's a really neat open source product that everybody literally everybody should check out and have running in their production infrastructure if you want to know more go to rocco.liveteleport i would really appreciate that and economics matter provided the hardware if you want to check out bare metal cloud for yourself it is a whole lot of fun use the code or raw code for hundred dollars in credit all right we will be back soon with more clusters and even more at the rockwood academy things will be getting back to normal now almost that i have finished my paternity leave lots more coming soon thank you for enjoying and watching with us i'll see you awesome thank you [Music] [Applause] thank you for watching [Music] you
Info
Channel: Rawkode Academy
Views: 201
Rating: undefined out of 5
Keywords:
Id: B-EhbctQBDs
Channel Id: undefined
Length: 80min 33sec (4833 seconds)
Published: Thu Dec 09 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.