Kubernetes 101 - Episode 6 - DNS, TLS, Cron, Logging

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
and we are back in this first episode of the new year on kubernetes 101 we're going to make drupal run a lot better in kubernetes we're going to pick right back up where we left off last episode in 2020 that crazy year and we're going to give drupal a friendly domain name using ingress we're going to give it a tls certificate so you can access it over encryption using cert manager we're going to set up a cron job to make drupal happy and we're going to figure out what we can do with drupal's logs to make it easier to inspect what's going on inside of drupal as with all these live streams especially now that we're out of that crazy 2020 year um please introduce yourself on the live chat introduce yourself in the comments after the fact and let us know where you're from it's always great to see the global audience that we have with this series with ansible and kubernetes 101 that i've done on youtube and thank you so much to everybody who subscribed this just a week or so ago i passed the 100 000 subscriber milestone which is way beyond any expectations i ever had coming into all this last year i mentioned during the ansible 101 live stream that my goal for last year was to get 5 000 subscribers something like that and i blew past that in a few days after starting the ansible 101 live stream now i'm past 100 000. who knows where we'll go next you might notice a few little differences in my office here and uh in a future video i'll talk about some of that stuff but it's 2021. so that is cause for celebration right out of the gate um somebody asked a few minutes ago whether these episodes will be weekly again and yes they will however i have one caveat there is a baby do in some time in the next couple weeks in the girling household so that baby will probably cause me to not be able to uh have a live stream for at least one week so i don't know when this will happen follow me on twitter at gerlinguy if you want to find out precisely when this this milestone occurs in my life now a lot of you might be saying at this point oh congratulations this is amazing a new baby now this is baby number four and i know a couple people watching this are probably at this point not they're not going to mention anything in live chat because that would be seen as rude but you're thinking he's one of those people and yes i do have some irish heritage uh but no i don't plan on having like 20 or 30 kids uh anyways so yeah we we have a baby on the way um it might be next week might be the week after i don't know when i'm just gonna keep doing this weekly until the baby comes and then we will be going back um and uh i plan on on uh starting it back up a week or two later uh it all depends on on how the baby does and how my wife does and how our family is adapting and all that but just stay tuned on twitter um anyways uh let's see let me go over to uh safari and i'll i'll show you where i am from last session if if you watched episode five you'll know exactly uh where we ended up there um but if you didn't watch episode five i'll give a quick rundown um i created an nfs server and we did this so that drupal can scale up and have its file system be persistent and shared across all the different instances of drupal that are running without that we had a problem where drupal could only scale up to one pod which is not very scalable so i set up an nfs server for that i recreated a kubernetes cluster i deleted it between episodes because leaving a cluster running for a month or so without any use would just be a waste of my money and i wouldn't be able to afford the lighting and things like that so that you can actually see me in one of these episodes so i also deployed the nfs client provisioner so that it can provision a volume for those drupal pods in the metric server so that i could have an hpa configuration hpa's horizontal pod auto scaling and we covered all of this stuff in the last episode so i'm not going to cover that again today but i got everything back up and running we have our kubernetes cluster we have we have one pv for mysql and i mentioned last episode we're not going to scale the database in this series here inside of kubernetes just because that's something that i would say would be more like a kubernetes 202 or 204 or 408 or something anyways yeah debugging the baby at 3am usually the baby just wants milk or a diaper change and we'll be definitely doing a lot of both of those things so anyway we're to that point um and all the notes for all these episodes will always live at cube101.com cube101.jeffcurling.com it is not cube101.com and if you ever want to go back and see an old episode everything is listed on this website and there's an episode list here so you can go back as well as a link to all the examples from the episode and of course this beautiful subscribe button here it always helps me if you can subscribe to my youtube channel it helps me to be able to keep making these videos and invest more and making them even better and maybe doing other series in the future like this so anyway that that's where we ended up from last week and this week the main thing i want to try to solve is um i just opened this in the wrong window here hold on uh the main thing that we want to solve first is it's not very user friendly to tell people uh to tell people to go to an ip address like 69 164 207 70. you're not going to go to your your ceo and say hey i have a new website for our marketing department and here's the address for it and you've got to enter all these numbers in so we want to have a domain name for it and there's a few different ways to do that right right now the way that we're doing it is we're using a node port and this is a little architecture diagram that kind of shows how that works you have internet connections coming into your cluster and here's the different nodes the hardware running on your cluster and kubernetes in the service that we configured for drupal which is right here down in the service we configured it with a node port and that told kubernetes every node in our cluster take one port one high port and open it up to the world and that port will route to drupal's service here and then it will route to one of the drupal backends so that's the way that we have the setup right now but it's again not very friendly you have to know the ip address or a dns name for each server and a node port it's not very scalable and those that node will route internally to one of the back ends but you're sending all your traffic through one of the nodes and that's kind of crazy and you could set up a load balancer in front of all these nodes but you know kubernetes does this stuff for you and that's what we're going to use today um what i want to do is i want to type in ep6.cube101.jeffgearling.com but right now if i do that i'm not getting anywhere so that kind of stinks so the easiest way to make this change would be to just change this to type load balancer oops load balancer and deploy that change what that would do is inside oops that's the wrong window inside of linode that would set up a node balancer in front of the cluster for just that drupal service so that's great if you just have one service or two services maybe you could have two node balancers or something but node balancers cost money and same thing if you're using amazon and you use e e not ebs that's block storage e what are they the load balancers a albs or elds if you want to use those they cost money every month so imagine if you had 25 web applications and each one of them needed one of these balancers that's 25 times 17 a month what is that that's like 500 bucks a month you're paying just for getting data into your cluster and that's kind of ridiculous so kubernetes has a much better way to to let you get traffic into your cluster using domain names and map them to backend services and the best way to do that is called ingress and you can tie ingress let me you know goodness i'm totally out of out of my practice here i have a new monitor on my desk that i got from an old radio station that's also where these new fiberglass sound panels came from that are slightly higher quality although you can probably hear kids running around upstairs it doesn't do anything to stop that kind of noise but anyway i'm my mouse keeps going in the wrong place because i haven't set up the monitor the right way anyways so this is this is that the node balancer approach that i was talking about and and if you use ingress then what it will do is it will set up one node balancer in front of your cluster that goes to an ingress controller and this controller can be many different types of software so for example you can have nginx which is what i'll be using here you can have ha proxy uh traffic or traffic i don't know how to pronounce that envoy and there's like five or ten other ones too in kubernetes land you remember when i showed you the cncf landscape there's like 300 applications for every single purpose and they're all competitors so it's not like they're all just going to go away so you have a lot of variety to choose from i typically try to stick with an open source solution one that i'm also familiar with outside of kubernetes so you know traffic nginx ha proxy aj proxy i really like but sometimes it's a little bit harder to find use cases and documentation since it seems like it's a little less used but anyway nginx is one of the most popular ingress controllers but it sets itself up between the node balancer and all your services so again for one service this is not necessarily going to be any better than one load balancer and it's it actually adds a little more complexity but as you add more back ends as you add more services and more applications in your cluster the ingress controller can route all the requests for all of them and save you a lot of money on not having to have a bunch of load balancers so we want to set this up in our cluster and we can't do that just with the service we can't say just use an ingress controller and kubernetes does that for us the first thing that we need to do is set up an nginx an nginx ingress controller and there's a few different ways to do this one of the one of the ways that i often do it for my own clusters that i'm building in production is i will get all of the manifests all the yaml files that describe it and maintain them in my clusters definition for this particular example i'm going to use helm just because it's a little bit easier and quicker to get started that way so i'm going to say helm repo ad ingress nginx or injinx i don't know how to pronounce that either and then https colon kubernetes dot github dot io slash ingress nginx is where that is and i already have added it of course so it's saying that i already added it helm install ingress nginx and jynx https if anybody works at nginx and wants to correct me on the pronunciation that'd be great ingress nginx ingress nginx all right so this should install an ingress controller in my cluster and i can check on it once it installs it did this quicker quicker before the live stream so i wonder if my oh here we go okay so and then as with many other helm charts it gives you a lot of good information about how to use it and how to connect to it and things so now i can say qb cuddle get service and i'm just using the shorthand this is service but svc is the same thing ingress nginx controller and that will give me the ip address for the load balancer or the in this case the linux node balancer that's associated with this nginx controller so this is now the ip address for the node balancer that will go to the back ends for all my services so the next thing that i'm going to do because what i want to do as i said is go to ep6.cube101.jeffgearling.com i'm just using that because i own jeffgearling.com and i didn't want to buy a new domain name just for this demo yeah it's 12 bucks or something you know i already have enough extra domain names so i use a service called name.com and i'll talk a little bit more about other options for this later but i use name.com and i'm going to log in here and it's going to need my security code so i probably should have logged into this beforehand but hold on all right uh so let's find jeffgearlin.com go to my dns records and i'm going to add i'm going to add a record for let's see ep6.cube101 dot jeffgreelen.com a record and this is the ip address of the node balancer and just to confirm that kubernetes so the cool thing about a lot of these different backends with kubernetes if you used a managed hosting provider like linode or aws or google cloud or digitalocean or any of these they all have integrations that tie in and it automatically did all this for me when i said i wanted an ingress controller the ingress controller told kubernetes hey give me a cloud load balancer and in linode it's a node balancer in aws it might be an uh not an ebs oh my goodness my brain is just not working in 2021 whatever alb an elb it'll give you an elb or i think you can specify an alb now so you have this one definition for an ingress controller but it gives you the right thing no matter what cloud you're in so it created this for us and here's that ip address too and you can even inspect it and see where it's it's sending traffic um you can see that it's sending traffic to three back ends over port 80 and port 443 and then inside of kubernetes it routes that for us so i'm going to take this ip address and this is dns it's always a bad idea to do a live demo and then throwing dns into the mix on a live demo is also probably a bad idea but this is the domain name that's the ip address of the node balancer let's add that record and let's make sure it put it up here ep6.q101 and if i refresh this without https okay so it works and but you're saying that's not working that's a 404. well i set up ingress but i didn't tell kubernetes yet to tie any ingress records back to drupal so until we do that this domain which i just pointed here and we could point any domain we want i could point hosted apachesolar.com at this i could point whatever domain i want at this ingress and it will handle it but until i have a record inside the cluster for what dns domain name maps to what services it's not going to it's just going to give me 404 not found so the next step is to tell nginx to route requests for this domain to that drupal service and we do that using an ingress record which i have right here and you can you you can still access the the drupal site through that ip address in the node port unless we switch that at some point like i could go down here and switch the service from node port to cluster ip i can do that and then it will only be accessible inside the cluster itself in its own namespace and the only way to get to it would be through this this domain name for now i'm just going to leave it so that it's also accessible by that node port though but anyway so we want to add this ingress record and this will tell kubernetes to add an ingress an ingress resource named drupal that uses the nginx ingress class that we just created we just created it using that home chart and it's going to use this rule to map a back end so in this case the back end is drupal running on port 80 to the host ep6.cube101.jeffgearlin.com so if we create this ingress record which i'll do now [Music] over here if i go into what is it episode 6 into kate's manifest and it's the drupal ingress no yeah drupal ingress.aml so i'm going to say qb cuddle apply dash f drupal ingress dot yaml so it adds that record and then i can say cubie kettle get ingress i can spell out ingress or i can just say ing for ingress dash n drupal and we can see that there's this drupal ingress that's on that's routing to the port 80 in the back end it was created 12 seconds ago and it might take about 5 or 10 seconds but now if i go back here this should be directing me to that drupal site which it is so now we have dns working through an ingress controller and i can set up as many sites as i want through this ingress controller another thing that's important to note is you can actually have multiple ingress controllers sometimes you might want to send certain types of traffic through one ingress controller and other types through another one nginx might handle certain types of data and certain types of web traffic better or worse in some situations than other ones like traffic or something so you can have multiple english controllers and you can control which ingress controller a service a backend uses through this these annotations we'll talk a little bit more about annotations in a little bit uh yeah someone is mentioning what is the editor i'm using sublime text three i've uh i used to use what is the what is it called on the mac uh not limewire uh i don't know i used to use the one that had a flower as its icon but that one kind of went abandoned for a while it's not abandoned anymore but sublime text 3 is nice because i can have the same configuration between linux mac and windows which sometimes i use those other platforms to do my editing so anyways getting back to this everything seems to be working now i can get this website through this url that's awesome but one thing that i also want to do is right now it's saying not secure and it's always a good idea like if i if i uh go to a private browsing window and go here and try logging in yeah it gives me a red warning here not secure bad idea to go to it browsers are starting to warrant a lot more and it's a bad user experience for a lot of cases to not have https on your website so you want to set that up too um actually before i get to that i do want to mention one other thing uh i i mentioned that i used where is it i don't have it up anymore i use name.com for my domains and name.com they're good they've been around forever and ever almost forever and ever forever and ever being like 20 years anyways i've i trust them you have to have a lot of trust for whoever you're managing your domains with but they don't have a lot of api integrations they don't have a lot of integrations with other cloud providers and things so for some things i actually use route 53 or route 53 depending on what kind of english accent you want to use but there's another project for kubernetes called external dns and there's a couple different ways you can approach this i could set up an entire domain to be used with name.com that points to kubernetes and then i can split up that domain using ingress internally in kubernetes another way that you can configure things is to tie kubernetes straight to your dns provider if it's supported through this external dns project i'm not going to get into that in this episode but all these different services are supported if you use any of them for your dns records kubernetes can actually tie into the dns system that you use and update your records for you when you do things like add a new load balancer or change some record that it needs to to update a cname or an a record or something like that so that's that's kind of a cool way to to make everything super fluid and not have to manage anything in dns and not have to do any manual management i know for myself for most of my smaller clusters that i use personally it's not a big deal for me but i do run a couple services like hosted apache solar which has i think 1200 dns records in it now and for all the various services that it integrates with and all the all the different little client interactions that happen and there'd be no way i could do that manually because because it's creating and deleting dns records all the time so that is tied into a kubernetes external dns integration so as you get to a certain scale it might become necessary to do something like that but i'm just pointing that out and i'm not going to go deeper into that in that episode uh again my mouse is getting stuck here between my screens um so and i have a note here i use name.com sad trombone like i said you have to trust your your dns provider and i've trusted name.com for a long time they're not the flashiest they're not the most amazing but i do enjoy their service and they they don't have as many dark patterns when you try doing anything with your dns um all right and just checking up on live chat everything looks good um let's see so uh you know using name.com can be a little bit of a chore with something like kubernetes because there is more manual process but something that's even worse if you're wanting to use certificates to get to get tls and https on your site especially if you want to use something like let's encrypt trying to manage that manually would be a chore too so luckily in uh kubernetes land there's a thing called cert manager insert manager manages certificates for kubernetes that's that's its big thing there are a lot of different ways that you could use cert manager and the way that it works is it ties into your ingress controller and as long as you give a certain annotation to your ingress records you can tell cert manager through their you know get me a certificate through let's encrypt or get me a certificate through hashicorp vault or get me a certificate through an internal ca provider in my company so certain manager integrates with all these different things all you have to do is set up a cert issuer and in your backends in your services when you set up an ingress for it you just tell it i want to get a certificate from wherever and store that certificate in this place or that place insert manager does all the work of updating the certificates of getting the certificates of deleting old certificates things like that so it's a really nice automated way to to handle certificates in your cluster and the website for that is here certain manager if you want to read more about the documentation for it i'm going to cover the thing that i think you know 90 to 98 percent of us would use if we use cert manager with a kubernetes cluster for web applications like drupal and that is using it with let's encrypt so um to install it uh there's a few again with kubernetes there's always like five different ways to install it uh with cert manager in particular it needs to have certain crds or custom resource definitions in your cluster and there's a couple different ways to install them the easiest way is to follow the documentation i think i don't know if it's under learn more installation kubernetes where is it so it has a few different ways to install it here you can install it with manifest which you could download and track in your own version control i'm going to do it with helm so i'm just going to follow this this guide here just because i'm using home for a lot of other things in this in this course and it's nice to be consistent so i'm going to create the cert manager namespace like it wants me to do and then i'm going to add the repository i'm guessing it's already on my computer yeah it already already exists and it wants you to help update the the repo information so i'll go ahead and do that and then there's two ways to install these custom resource definitions this is just describing your cluster all the different things that that cert manager is going to manage in your cluster so i'm just going to do it this way uh grab this yaml file out of their repository for version 1.1.0 and i'm going to apply those crds like this and it will just get those get those things out of this url and install them into my cluster once that's done i can install with helm and i'm going to do everything except for this line here so i'm just going to grab this command which i actually have copied somewhere else for simplicity of copy and paste without the line breaks so i'm going to install it using helm into the cert manager namespace same version now one important thing is if you're installing the crds and using the helm chart it's a good idea to make sure your versions are in sync because some things could be different and then again as with many home charts it gives you a nice message about the next steps for actually using it but before we want before we start trying to use it or do anything with it i want to make sure that it's actually working so i'm going to say cubie cuddle get pods dash insert manager see what's going on and it looks like it's running so that's good and now so cert manager is in the cluster and it's ready to manage certs but there are two things that we'll need to do before it can actually give us a certificate through let's encrypt for drupal the first thing that we want to do is create a cluster issuer and a cluster issuer is something that will issue certificates for requests for certificates and it ties cert manager into a service like let's encrypt and get my mouse back there it is um so this is the cluster issue that we're going to add this is basically straight out of their documentation but one interesting thing to note is that let's encrypt does have some rate limits and things for their production servers and you can actually switch this over switch the server and then of course you'd want to switch the name to let's encrypt staging for non-production environments so if you're testing in a stage environment or in a testing environment or in a dev environment or something like that you'd probably want to use the non-prod luts encrypt so you can control your rate limits and test things without actually affecting production certificates and things like that but anyway the way that this works is you you give it a name we're just going to throw it in the default namespace and you get a spec and this is for an acme type of cluster issuer acme is the standard that let's encrypt uses to generate certificates for you and then you give it a reference to a private a private key secret this is where it will store the private key for your cluster you can actually use an existing private key if you have multiple clusters and you want to have one key between them or something like that there's a way to do that i'm not going to cover it here and then you give it a server this is an email address that you're going to tie into let's encrypt this is helpful because let's encrypt will actually send you an email if one of your certificates is due to expire and hasn't been renewed recently they'll they'll warn you about it through this email address and it ties your certificates to an email address in case you switch clusters or something and and you need to to rework things that way and then you can configure what kind of solver it uses this some of these some of these keys are different depending on what kind of certificate provider you're using if you're using an internal ca provider or if you're using hashicorp vault or something these will be different but this lets you choose how you want to connect let's encrypt back to your cluster for its test it has to make sure that you actually own that domain name and that that domain name is pointing to your cluster so we're going to use the http solver and cert manager will kind of do all this stuff behind the scenes for you you don't have to configure nginx yourself or anything to do this http test you can also use the dns solver too if you have external dns integration in your cluster anyways this is basically straight from their documentation i just edited my email address and now when i deploy this i'll say qb cuddle apply dash f cluster cluster issuer dot yaml all right and so that created that and if i say qb cuddle get cluster issuer i think then we can see that it added it in and now the cluster is finally ready to actually give me a give me a certificate if i request one so the next step is to update our ingress record for drupal so this was the one that we just put in and it had this one annotation and had the rule that tied this domain name to a back end to drupal but we're going to make a couple small modifications to make it also grab us a certificate so we're going to add in one more annotation that says hey cert manager i want to use let's encrypt prod and that helps our cluster issuer see hey this this ingress record needs a certificate and then to get that certificate we tell cert manager also here's our tls configuration so we have one host name that we want for this certificate and i want to store that certificate in a secret called ep6 tls that that cert manager is going to create for me in my in my name space here and store the certificate in there so that nginx can can get that certificate and tie it to my traffic so i can get https so right now we're not we're not serving any secure traffic i'm going to update this record and since since these two since these two ingress resources are the same thing just with a couple editions i can just apply this newer one over the old one and it will update it to add this this cert manager configuration so i'm going to say qb cuddle apply dash f drupal ingress tls.yaml and you can also monitor the logs while it's doing its magic cert manager if you want and you can see it's a little bit the logs require a little bit of of parsing to figure out what exactly is going on but basically it does its challenge with acme to find out do you actually own this domain name and it's using the http http challenge which we had configured in here and so so cert manager is going to do all that magic for us and we don't have to do anything special on the back end but it looks like that seems to have worked [Music] it looks like it's ready it says the ep6 tls certificate is ready so if i go to the site now and i refresh kubernetes will automatically redirect me to the https version and if i click on the little lock icon i can see that now i have a r3 certificate which i believe that i think that's the newest one i think that let's encrypt is switching everything over to r3 i used to use a different certificate authority before that but the new one that one's like going out in the next few months and r3 is the new one so anyway it looks like that is working so that's awesome and now we have another problem solved uh we have https support on our site and it's accessible over a friendly domain name so if you wanted to visit the site you can just visit ep6.cube101.javagano.com and this time this week i actually changed the admin password so hopefully people won't be hacking into it and changing things on me but you know it is drupal it is php so there might be some zero day exploit that i don't know about that somebody's exploiting right now uh but if i if i go to the logs i can also see if people are are yeah so somebody's trying to get in let's see who it is whoever has i no it doesn't give me the ip address here but yeah so somebody's trying to log in is oh and flood control is blocking this uh so that's an internal cluster ipa with with drupal you can actually configure it to get the ip address of the actual person who's coming through of course somebody who is trying to log in as admin if they're good at what they do they might be using a vpn or something so it would be harder to figure that out but anyway yeah hopefully you won't be able to get in by the end of this episode and change things for me although it looks like oh that this might be this might have been something created as part of drupal's uh defaulting the other thing i should note is that i installed drupal's umami food magazine demo site setup instead of installing the standard drupal that looks like drupal from like 2006 this one is like demonstrating all of drupal's capabilities so this is still drupal it's just a way of seeing drupal as it can exist with multiple languages and things although i don't know why i got into the spanish version i'm usually in the english version i don't know how that switched over maybe i clicked a button somewhere anyways all right so we have https one other thing that drupal is going to need to keep it happy is a cron job running now if if i go to drupal's status report it shows me the last time crown runs and kron is running for drupal so drupal used to have a problem where people would install a drupal site and they would never know about this cron thing that it had to do and so you'd have a brand new site and it'd work for a while and all sudden it would have big issues and uh and and the problem was nobody would configure cron for the drupal sites so i think this was like five or six years maybe even a little longer than that a new module was added to drupal called all the called automated cron and it solves that problem cron is now run at some interval at the end of a server response but it has some downsides first of all the configuration out of the box only gives you one hour intervals three six twelve hours a day or a week which is better than nothing if you don't have cron running at all that's that's a really bad thing for a drupal site because it cleans up your file system it checks for updates all those kind of things but there's two problems with it one is you can't do it more often so there's a lot of sites where you would want to run crown every five minutes ten minutes or even somewhere you'd run want to run it every minute because you have a process that runs and cleans things up or gets new content or checks for something so that's one downside to this automated module and the other one is that this runs it on web requests so it has a little extra code that checks is it time to run cron or not so it's nice to clean up that code path and not have to have drupal track it and check it and also it runs it as part of that apache request it doesn't make the end user experience any worse but it does tie up another web process for however long crown runs that's attached to somebody's front end experience so there can be other concerns with that so we want to use a cron job and in the old days back when we had servers which we still do you know the cloud is just someone else's computer they're still servers and technically you could probably find a way to log into one of these lino [Music] lino nodes and add a cron job that ties into drupal or something but that's the old way of doing things if you're using kubernetes kubernetes has a cron job resource that we can use to tell to tell kubernetes to kick off new jobs and those jobs can do whatever you want anything that you give it a container and a definition of what to do and it'll kick it off it'll wait until it completes and then it'll track that in kubernetes for you and a cron job just takes a job definition and gives it a schedule so run it every minute every 10 minutes every hour whatever you want and and so that's what we're going to do with drupal 2. 2 to make sure that the cron jobs run on a schedule that we want and and and then we'll let kubernetes take care of scheduling it for us a lot of people still also use other things like an external system for things like cron tasks you could use jenkins for it you could use a lot of different systems out there and that you can still integrate that with kubernetes it's not a big deal kubernetes cron jobs are a convenience and i'm going to talk about in a little bit some of the caveats to using kubernetes crown jobs that i've encountered but i actually wrote an entire blog post on this i'll paste it in the description later but also i'll paste it into live chat so that you can get it if you want to read that if you are using drupal there are some concerns and things about kubernetes cron jobs that i mention in here and talk about the way that i'm going to do it today but there's a lot of different ways to architect cron jobs and kubernetes and the way that i'm going to show is probably not the most amazing way but it works and it scales and it's easy to integrate with any drupal site and in fact a lot of different cms's that are like drupal you can do it the same way drupal gives you this key and i'm going to delete the site soon so if you see this key and you want to sit there and copy out every single one of these characters be my guest and run crown all the time on it but it gives you a chron key it's basically a url with a hash at the end of it that you can use to run cron on your site through a web request and again it still has that downside of it's using a web request to the cron task but one advantage of doing it using this key and an external system or a cron job in kubernetes is that that that web request is tied to an automated process and you could technically find a way to to move that web request out to a specific container or something or another way that you could have this run is you could use a system like drupal has a thing called drush a command line utility and you could have a drush container that runs cron with drush but that makes your cron job set up a little bit more complex complex and i'm not going to do it that way today i'm just going to set up a cron job that calls out to this url and does it on a schedule that i set up in kubernetes and you can see that cron job if i go right here the cron job is i'm going to call it drupalcron and put it in the same namespace as drupal technically speaking since this is just an http request over the public internet i could put this anywhere in a cluster i could even put this in another cluster if i wanted to but i'm going to give it a schedule of every minute now for this site do i need to run cron every minute no but for the demonstration purposes it's nicer than deploying it and then waiting five minutes to see a result um there's a few things to keep in mind when you create cron jobs uh that you can configure one thing that that is important to think about is your concurrency policy most of the time you don't want more than one cron job running it at the same time especially with drupal but with some other systems too you don't want to have a cron job running doing something and then another one starts running again but there are some systems where you do want to have if one cron job takes a long time you might want to have another one start and another one start but that can lead to like a cascade of of thrashing on your database or something so you really have to know your application to know whether you want to allow concurrency or forbid it or there's a third option you can have it be if this cron job is still running and another one starts and takes its place kill the first crown job and start the next one that's another option that can be kind of a nuclear option and cause even more problems but you really have to know your application to know how to set this it's usually safest to save for bid and wait for a cron jobs job to finish before starting a new one another thing to keep in mind with kubernetes cron jobs is the schedule is more of a suggestion and uh it's not you might be used to in linux with cron crontab you you put in a cron job and if you say run it every minute it's going to start at zero zero seconds of that minute with kubernetes it might start you know around the time that that minute starts or it might start halfway through that minute kubernetes scheduler is not quite so precise and so if you say like run it on sunday at 5 a.m or let's say 5 21 am sometimes it might run at 5 22 or 5 21 in 30 seconds so you have to keep that in mind with kubernetes and this is one reason why some people choose not to use kubernetes cron jobs they're not precise about the second that it runs and so if you want to run a cron job that does depend on running at the same exact time you're going to have to use a different tool or you're going to have to have your application have logic in it that determines what the closest minute was or something along those lines anyways the way that a crown job works is you give it a job template and you can read more about kubernetes jobs in the kubernetes documentation but a job basically runs a container until it finishes and that's it so you can tell kubernetes run this job it'll run it and then it will store the log from that job and when it completes it'll say success or fail and that's about it with a cron job you give it this job template and what i'm going to do is give it a container called drupalcron but all it is is a curl container so it's just going to have a container that runs curl inside of it and i'm going to give it the options dash s and give it this this key from the site which i believe is correct make sure the key didn't change yeah gpg at the end and then if it has a problem it's going to retry and it'll retry as many times as kubernetes has configured by default there's a lot more options that we could configure with this and i might talk about a couple more in a couple minutes but i'm going to go ahead and deploy this cron job to the cluster so qb cuddle apply dash f drupal cronjob.yaml and it created it and i can say qb cuddle get cron job dash and drupal and we can see it's here uh it looks like it hasn't it hasn't run yet uh probably because what time is it 10 43 i don't have i don't actually have a clock in my office with seconds on it so i don't know when it will tick over to 10 44 but sometime around then kubernetes is going to run this job oh look it's active right now so three seconds ago it started and we should be able to go into drupal and check this should be updated to eight seconds ago there we go and if i go into my logs in drupal i can see that a cron run completed at 10 44 so this particular minute and it gives us a lot of detailed information here so that's a good thing it looks like that's working and now if i want to i can actually turn off that that automated crown module so i can go to uninstall turn this off because i don't need that to run anymore it will still be checking all the time if it's time to run kron but that's just one code path i can take out of my drupal site so i'm going to do that and then i can also check so i can get the cron job and see its information i can also excuse me i can also say uh qb cuddle get jobs dash and drupal and so it's showing me that there was a drupal crown that ran here and it looks like it's just about time for a new one yeah so here's a new one right here and it already completed that was very fast because it didn't have to do much because this particular site doesn't do a whole lot on cron jobs but it'll keep doing this and by default kubernetes is going to keep three jobs in its history so after after the next job starts and then one after that this job will be deleted out of kubernetes history and then you'll still see three and then this job will be deleted out after the fifth one is run and so on and so forth there are a few things that can be a little weird about kubernetes cron jobs and again this is one reason why some people don't rely on them for everything one of those things is what happens in failure modes by default the failed job history limit is 1 i believe and you can configure that in here you can change the history limits but if i say qb cuddle describe cron job drupal what is it drupalcron fish and drupal if i look in here uh and see that yeah the failed job history limit is one so if you have a cron job that failed like two times over the past week or something and you go into debug it it only saved the history of the previous failure and if there were two failures to go is where the big problem was you're going to lose that so you might want to increase this failed history limit also i found another weird issue that happens because i guess kubernetes back off system is not tuned perfectly well for cron jobs and things like that that might be running with a very high frequency if you have a lot of failures in a short amount of time sometimes cron jobs just kind of stop working at all and i think it's related to the way that the back off logic works in kubernetes with jobs that fail a lot and and i believe that that's not i i i'm pretty sure that that's a bug i don't think that's intended behavior but i've i've seen in kubernetes 1.16 and 1.17 i haven't run a 1.18 cluster long enough to know if this is still the case or if it's still a bug that i have to watch out for but after months of running sometimes a cron job will start hitting failures and then it will just stop running for ever as far as i can tell and the only way to fix that is delete the current job and create it again so just something to keep keep in mind you do want to monitor and make sure that cron jobs actually are running maybe have a daily job that just goes through and says hey did this crown job run in the past day if not maybe alert me about that um another thing that you might want to keep tabs on is how many jobs you're creating in your cluster with linode and this is one other way that you can also see how much memory your masters have available to them because i did some testing locally on my computer on linode and on amazon and on my raspberry pi's this is what happens when you have an idea that's like yeah that idea would take like 10 minutes and then it takes like two weeks i for 50 000 subscribers uh let's see kubernetes what is it for 50 000 subscribers i was like i'm gonna make 50 000 kubernetes jobs that's easy right but it wasn't very easy i made a whole video about this and you can go on on youtube and watch it i'll actually put a card up above me that links out to this video and i started off with this silly little intro but uh i was just having fun it's it's a it's a subscriber milestone video i'll have one of those for a hundred thousand soon um but i my goal is to create 50 000 kubernetes jobs in a cluster and i found out that it's actually a lot harder than i thought even though kubernetes is just tracking the job output and what happened something between the scheduler and the job scheduler and the api itself have an issue once you pass like 10 20 30 000 jobs and it depends on how much ram your cluster has when this inflection point is reached so the one thing i learned from this is if you have maybe like 100 drupal sites and they're all running cron one time a minute you're going to have to be careful with how much history you keep for those jobs because the more history you keep the more it seems that the scheduler is going to start having issues once you reach five ten twenty thousand jobs depending on your hosting provider and what kind of control plane they set up i know on lunod i think it was around ten or twenty 000 jobs where things started falling apart and the whole cluster started getting a lot slower so just something to keep in mind most people probably aren't going to be that crazy but i did i do like testing the limits that's what i do a lot of times for my youtube videos so if you like that kind of stuff subscribe i guess anyway so those are a few caveats things to watch out for with cron jobs and kubernetes and thank you for the congrats on the 100k subs it is a cool milestone and i'm trying to figure out some really fun way to celebrate that uh so stay tuned uh let's see uh so those are the caveats um and and we disabled drupal's automated cron so it's nice to take out any modules that you can take out anything that you don't rely on that's one less path for security problems to come into your site one less code path to run on every single web request ever so it's it's nice to be able to do that for your site uh the last thing that i wanted to talk about today and uh you know seeing that we only have about 10 minutes left in the hour i think it was a wise choice for me i i was going to have a demo where i took drupal's uh watchdog logs this is called the watchdog log nowhere in the interface does it say watchdog but watchdog is what it's referred to internally in a lot of places so anyway this is drupal's log drupal's log is not exposed so if i if i say qb cuddle logs dash n drupal special app equals drupal the logs that i'm going to get these are apache logs and drupal's logs don't actually make it into the apache logs or the php logs or any kind of logs anywhere except for drupal's internal database drupal has a module called syslog which is way down here it has a syslog module and it will log out to syslog or rsyslogd on a server but if you want to do that that requires that daemon to be running in the background on your server and in this case we're running on a container that's just running apache so i'd have to change my container architecture to run two services which is not really the best way to do things in kubernetes so another option that you could do for drupal's logs using the syslog module that's built in is to have a sidecar container and the sidecar container runs alongside your main container and you can tie in another service and you can have it you can have drupal write syslog entries and that sidecar could pick them up and that sidecar container could be what you monitor to see drupal's logs or you could have that sidecar container deliver those logs somewhere else and so then i was like okay well you know that's starting to get a little bit complicated i didn't want to talk about sidecar containers too much in this episode because that's in my opinion a little bit more of an advanced topic than a kubernetes 101 course then i was like maybe there's some other options for logging and there are there's like 20 different options and i'm going to run through a few of these different options some of the things that i've used before some of the things that i do and don't recommend in different cases and but the bottom line is i didn't find a way to make a quick demonstration that i could do in less than 20 minutes of how to get drupal's logs into something that's easy for you to to work with at least not for drupal 9. 9.1.0 which this site is running there's actually a couple modules that would have made this a lot easier had i been running drupal 8 but i'm running the latest version of drupal and some of those modules haven't been updated unfortunately so because i didn't have that i just wanted to run through a few different ways one of the easiest ways to do logging in kubernetes is to use one of the service providers like i know i've used sumo logic in the past uh zoom logic is well i don't know why their site is like this probably because my privacy uh plugin in my browser is causing their css not to load or something but sumo logic is one data dog has well that's not it data dog yeah let's just search for data dog here so there's datadog and there's there's a few other services too that lets you send logs to a cloud service there's a couple a couple good things and a couple bad things about this too also another one that's very popular is elastic the elastic stack and i'm talking about the paid service for it so you can have them running in the cloud somewhere one of the best things about doing one of these is it's easy to integrate and they usually have plug-ins for any kind of system so you saw a few seconds ago i saw there was a drupal module for datadog and you just install the module and then it starts sending your logs to datadog so it's pretty easy you just have to give it an api key or some other way of authenticating and for most of these systems there's there's a plug-in or a way to get that running really quickly and some of them there's even like a kubernetes integration where you just click a button and all of a sudden all your kubernetes goes in there the the annoying thing about a lot of these though probably the biggest downside is the fact that they're expensive they get more expensive as you expand and if you if you aren't um i guess picky about what logs you send them you can end up spending a lot of money on a monthly basis if you have big applications that send a lot of log data so in that case a lot of times what you end up doing is you have a calculus of how important is this log data and a lot of times a lot of log data is very important when when things hit the fan and uh then you really regret the fact that you did prune a lot of logs just to save some money so you know it's it's uh there's benefits and downsides to both of those um and uh so another option that you have is running your own elk stack and i've done this for a few clusters and to a certain scale it works okay if you want to run elk inside of your kubernetes cluster one downside to having it in the kubernetes cluster is if your cluster is having problems then elk is probably having problems and the log data is probably going to have problems so some people set up a separate elk cluster and elk is elasticsearch log stash and kibana and you basically have a backend that stores log data a front end for viewing that log data and searching that log data which is kibana and a transport layer which is logstash that gets that receives log data through http and puts it into elasticsearch so that's definitely an option and there's home charts that help you build that into your cluster um but the the biggest downside to that is the scalability of it there's a reason why sumo logic and data dog and elastic and things are expensive and that's because it takes a pretty good amount of effort to make sure that those systems are resilient can scale well and don't fall apart when you send it you know 10.1 gigabytes of data in a day or something crazy like that when something bad happens and it's always always when something bad happens when your monitoring system goes down especially if you're running it on your own so that's that's one reason why some companies are willing to pay a lot of money to these external services because they typically don't go down on those bad days sometimes they do though um anyway so it's the the biggest thing is it's difficult and i i have a little a little animation here uh this is what happens on a bad day you're basically fighting a fire with a fire hose and usually what happens is your water runs out when you're running your own your own hosting system and that has happened to me a few times and it it makes you fight the fire blind and the fire starts consuming everything around it so you know if you have the budget for it i would recommend using one of the cloud services now another option is if if you're using cloud hosting so you know if you're using amazon maybe consider using amazon's logging system cloudwatch i think if you're using google they have a logging system built in a lot of different cloud hosting providers have their own logging solution some of them don't have the same bells and whistles as a dedicated solution like datadog or sumo logic or elastic or something like that they don't have the easiest dashboards they don't have open source alternatives and things but they are built in and a lot of times they can scale with their needs pretty well and speaking of cloud providers that have integrated logging one of the best things that i found out in researching all of this is amazing.io who's sponsoring this series so thanks so much to them they actually maintain a project called lagoon logs lagoon logs for drupal and you know this is one area where it is really nice if you if you host your like let's say you're a drupal user if you host your drupal site on a provider that has first class support for drupal they're going to have something like this where you just install this module and you have a full logging solution and they use elasticsearch logs dash and kibana i believe on their back end uh so this sends your logs straight into their system with no problem and it's easy and it's built into your hosting platform that you have with them so um you know i i'd say that a lot of people uh harp on hosting providers sometimes for the pricing and things like that but if you want to get things done if you want to get a complex enterprise scale site or something that needs to scale up a lot if you want to get that done with minimal minimal back-end effort and minimal pain a lot of times using these types of solutions like what amazing.io has can save you so much time and let you focus more on the actual building of the site and the marketing and uh designing new things and things like that unless you can spend less time on the backend parts of kubernetes that you know honestly in in my experience trying to get elasticsearch to scale and to recover from a split partition and things like that those things aren't very fun i don't think it is fun for smaller scales when i build those clusters but when they get to bigger scales i typically do like to go to a managed provider just because they can take care of those things and they have a full team of people taking care of those things so anyways that's a little bit of a you know sorry for the letdown that i'm not going to show you magic how to log all the things in this episode but i have a note here where where was it i have a note that logging ain't sunshine and rainbows and unicorns it never has been i you know if if you have set up hosting for a multi-server solution before you're going to find the same thing has always been the case you know having multiple servers logging to a central resource it's not impossible but there's always going to be weird issues and then trying to expose that log in a friendly user interface or in a way that multiple people can get to it that's another layer of complexity and kubernetes just takes that and makes it easier in some ways and harder in other ways easier in the fact that kubernetes encourages your applications to have one logging output which is the standard output and standard error on the containers so that you can aggregate your logs more uniformly but there's also complexity when you have applications like drupal that have their internal logs that don't get output into the container logs as easily anyways so that's going to be it for today's episode in the next episode we're going to go uh from managing our single drupal instance to talk about something in kubernetes called an operator and i'm going to get more into it next week but it's it's a way of making it so that you can take your application knowledge and kind of put it into one little bundle and then manage things like a drupal site or a wordpress site or a custom go application or something you can manage a fleet of them or even if you're just doing one or two or something you can do them uniformly and have a good process for managing them so next week we'll talk about that and as always until next time i'm jeff gearling you
Info
Channel: Jeff Geerling
Views: 13,117
Rating: 4.9526067 out of 5
Keywords: kubernetes, devops, introduction, intro, beginners, guide, k8s, geerlingguy, kube101, kube, kubectl, docker, apps, applications, linode, development, cluster, create, deploy, deployments, drupal, ingress, cert-manager, ssl, tls, certificate, https, secure, nginx, cronjob, cron, log, logging, logs
Id: E1_uINjq2As
Channel Id: undefined
Length: 63min 14sec (3794 seconds)
Published: Wed Jan 06 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.