GitHub Actions: Dive into actions-runner-controller (ARC) || Advanced installation & configuration

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
actions Runner controller is a kubernetes operator that is designed to manage and scale self-hosted Runners for GitHub actions this is a project that has been started by the community but later on adopted by GitHub I'm quite passionate about it because this is what my team and I have been working on for the past year actions Runner controller is now generally available which means anybody could use it and of course by making it generally available we have published documentation but who likes to read documentation anyway so in this next one hour and a half I'm gonna dissect actions Runner controller for you I'm gonna go over the architecture the design decisions what are the different components and then I'm gonna show you all of the different installation steps that you need to go through and the different configurations that you have and limitations of the solution this is going to be a long video that is obviously dissected into different chapters so you can jump to whatever is relevant for you so go grab that preferred drink or snack or whatever it is get comfortable and we're gonna start right about now before we start our Deep dive it's very important for us to acknowledge a few things this is first and foremost a community-based project it was started by Summerwind and then mumusho and toastier joined the team of the main maintainers and They carried over that effort they have obviously done a fantastic job because this project Rose in popularity and it even got the attention of a lot of people within GitHub which has led to the decision to adopt this project and migrated to the GitHub actions organization so that we can sponsor it and continue developing it and then in December of 2022 I announced this migration on behalf of my team now don't be fooled this project is anything but simple it takes a small village to build it maintain it release it and publish it and all of these awesome people at GitHub have definitely contributed to the success of this project so far and of course actions running controller would not exist without the generosity of the entire community members and their contributions so first of all let's start with the basics I have spent a ton of time writing this documentation so you better read I'm gonna show you where you can find all the information you need in documentation then we're going to talk a little bit about the architecture of actions running controller so first of all for you to get to the documentation it's pretty easy there's a short link you can go to it's gh.io Arc docs and when you go to this URL it's just gonna direct you right into the quick start page for actions running controller now the purpose of this page is not necessarily to describe everything about it it's just to get you up and running as fast as possible without going too much into the details so it will walk you through the prerequisites for this project what you need to have installed in your environment so obviously you need to have like a kubernetes cluster or you could use mini Cube or kind whatever you need open shifts openshift clusters are currently not supported and a lot of people have asked about this why don't you support openshift and the reason for that is we only support kubernetes vanilla because we don't have the capacity right now to test on all of the different flavors and distributions of kubernetes so kubernetes in this vanilla form as well as eks and AKs are the ones that we're going to be supporting for now if that changes you'll obviously be the first to know also you're gonna need help to be installed we decided that we're going to be using Helm charts to facilitate the installation and setup of actions running controller just because it seems that Helm is a sort of a community standard that is pretty much used quite widely now you might ask why not customize and anything like that we just made this conscious decision we stuck to whatever the community decided to use also in the previous versions of actions Runner controller prior to the adoption and this is what we're going to be supporting moving forward if that changes obviously you're also going to be the first to know now installing actions running controller is quite simple you just need there are two Helm charts I'm gonna describe them and talk about what they are and why we have two Helm charts but there it's pretty simple you just go through the helm installation you supply the configuration don't worry I'm going to walk you through all of this in a bit and then you configure your Runner scale set I'm also going to talk about what that is in a bit and then you can just simply get Arc or actions learning controller up and running and that's pretty much it there's really not much into it if you already have a personal access token that you could use for authentication or GitHub app then Things become quite uh you know quite simple and then you can just uh create your workflow and reference the runner scale set in your workflow file and then the jobs will just go towards art and it's going to create the necessary Runners for the jobs to be executed now the next part of the documentation is about Arc so uh about Arc is the detailed page where you're gonna find pretty much everything I'm gonna spend some time describing And discussing this architectural diagram over here in the next chapter once we finish the walkthrough of the documentation but it's pretty much self-explanatory I spent a lot of time in you know describing step by step what actually happens under the hood so I've written all of this stuff for your information purposes I would really hope that what you would actually read it then we talk about the different components for actions Runner controller we talk about the container image we talk about run creating your own Runner image what is contained in the default or the supported Runner image that we provide obviously we advise you to create your own Runner images over here just because everybody has different needs and no matter what we do we're not going to be able to create you know an image that satisfies everybody's needs so we advise you to create your own and then we talk about how workflows are executed what happens when you want to scale these self-hosted Runners the software that is already installed on the runner image so on and so forth there's also a section describing the different authentication options you obviously can use a GitHub app for authentication or you can use a personal access token I'm not gonna go through these details I personally prefer GitHub apps although the only problem with it is if you're gonna register these Runners on the Enterprise level you you're not going to be able to use a GitHub app you have to fall back on personal access tokens just because we don't support GitHub apps on the Enterprise level just yet and then deploying Runners skill sets describes everything you need to know about Runner scale sets what they are why we introduce them and we talk about the different deployment options that you have the different Advanced configurations that you have with actions Runner controller how you can choose to run a destination how you can do the authentication how you can create the secrets managing Runner groups configuring the proxy setting the maximum and minimum Runners DLS certifications custom DLS certifications using Docker and Docker or kubernetes mode these are for the jobs or the workflow runs that have containers in them whether you're building or you are running a container as a service using private container Registries configuring your Runner image updating the pods back using Arc across organization so on and so forth and the last two sections are the simpler ones one of them is about use how you can use Arcane workflow and we talk about why labels are not supported with the runner skill sets there's a pretty much big distinction between this version and the previous version of actions running controller which is labels are no longer supported with it the runner skill set name is what you would use in your runs on segment to address Arc and lastly there's a small troubleshooting section so if you see some of these problems you can come here and try to resolve them yourself however Arc in this version of Arc this specific Runner skill set version is supported by GitHub which means you can create support tickets with our support team and we will be more than happy to help you get up and running obviously we cannot support kubernetes itself we cannot support your specific environment but we would help you get up and running and this is the level of support that we're going to be able to provide and obviously if something happens on the back end with the services so on and so forth this is something that we are responsible for alright that's it enough about documentation let's actually talk about the architecture right now all right so what are the different elements first of all we have on the left hand side GitHub right this is github.com or GitHub Enterprise server whichever flavor you're using and inside of this element over here you have your Enterprise you could also have an organization or a repository because a runner scale set can be registered on any of these levels you can register it against a repository organization or an Enterprise inside your repository obviously you're gonna have your workflow yaml files workflow definitions whether for CI CD whatever it is you're using by the simple Factor watching this video then you're already familiar with the concept of sap hosted Runners but GitHub actions uses machines to run these workflows and execute you know the work that needs to happen across the different steps and different jobs inside a particular workflow I've talked about that extensively in my GitHub action scores go check it out previously Runners were installed you know individually one by one you could install a single Runner you know you need to configure every Runner yourself there were not many great options for managing you know the automatic scaling expansion and shrinkage of the number of Runners you have running and if there are big organizations you would obviously want to solutions that will allow this automatic you know scaling up and down of your self-hosted Runners there are some options that have been created by the community but there was nothing really provided by GitHub actions running controller we adopted it so that we can provide this option for some of the largest organized larger organizations that have you know dynamic needs for self-hosted runners why Runner skill sets and what are Runner skill sets this is a concept that you should be familiar with before we dive into the architecture of Arc in previous implementations of actions running controller we relied or the community has relied on two options you either had the web hooks option or you had the pulling option why because these were the only two ways that the integration with GitHub was possible now neither of these two is the ideal way to deal with automatic scaling neither webhooks nor pulling pulling has the problem of hitting API rate limits it doesn't really scale well there's a lot of you know API requests that are made all the time it's not really designed for that purpose and at the same time you don't always get real-time information about workflow runs and the second problem with web hooks is that Bob hooks are not always reliable they the delivery of the web hooks we don't guarantee an order web hooks that are delivered on a best effort basis there are no guarantees for the delivery and if something happens in between it's very difficult to troubleshoot what happened or to go and do a retry of the delivery of a certain web hook and then your workflows might end up in some weird State Runner scale sets is a New Concept that we introduced so Runner scale sets are pretty you need to think about them as a sort of a grouping of self-hosted Runners right it's like this abstraction that we created on our back end which represents a group of Runners that are homogeneous in nature so these Runners all have the same configuration the same setup if you want to have a heterogeneous setup you were gonna need to have multiple Runner scale sets instead installed or configured that's how you should approach it so a runner scale set will spin up Runners that are homogeneous now Runner skill sets have boundary you need to think about them as having boundaries and maximum and lower boundary and these are the you know Runner boundaries so you can have a minimum number of Runners all the time or you could have it scaled down to zero and you can have a maximum number of Runners so that you don't overshoot that you can limit how far the skill set can scale upwards now the nice thing about Runner skill sets is that they are addressable by their name so whenever you have a workflow you can specify in the runs on the name of the runner skill set and then whenever a job spins up that addresses that particular Runner skill set the job is queued and then it is assigned to that Runner skill set and then the runner skill set will acquire it once it acquires that particular job it's gonna talk to actions Runner controller which I'm going to describe how in a bit to make an attempt to create a new Runner to scale up because it's going to scale up whenever a new job is assigned to it it's going to make an attempt to create a new Runner when that Runner goes up it's gonna register with GitHub and then that Runner is gonna acquire that job and start executing it and this is the just this is the short version of what is really going on so let us go a little bit more in depth and describe what happens in the flow end to end so we described on the left hand side that we have GitHub we have our workflows and then in between obviously we have the internet and we have the GitHub apis but we also talk to the Action Service Direct there are two elements to actions running controller there's the element where we talk through the apis available or exposed through api.github.com but we also talk directly to the action service through this endpoint pipelines that actions.githubusercontent.com whenever you have a firewall or reverse proxy or whatever you're using to access or not gateway to access the internet you need to make sure that both of these endpoints are accessible four actions Runner controller let's say the first step is we create the workflow and part of the workflow we are trying to you know address a run to our runner skill set which is called Arc test runner in this case because actions Runner controller is already configured and installed on our kubernetes cluster actions Runner controller will be composed of two main pods two main elements the first pod is going to be called controller manager this is the pod that's gonna contain the manager which has the different controllers that are gonna manage the different resources we're going to be creating in the cluster actions Runner controller really lie on multiple components that come off the shelf through kubernetes but we have also introduced a number of custom resources that we need to manage ourselves if you want a list of these custom resources it's quite easy you can just go to our you know documentation and you will see all of the different resources that we have in introduced you can see here on the left hand side the different types so we have the auto scaling listener Auto scaling listener list the spec uh you know the status we have the ephemeral Runner we have a family Runner list if I'm Runner set if I'm a runner spec so on and so forth to manage these resources we have introduced four different controllers which are as follows so we have the auto scaling listener controller or the scaling Runner set controller if I'm a runner controller and if I'm a runner set controller now you don't need to worry about what these controllers are and how they work they should operate seamlessly for you what you need to know about these is that the auto scaling listener controller manages the listener pod and The Listener pod is something we introduced to manage the decision making about scale whenever you install actions around your controller the first thing that's going to come up is the controller manager pod why because this has the controllers which are responsible for managing the state of the different resources we need the controller manager is responsible for creating the resources and making sure that we have always the resources that match the desired count as you know kubernetes works by you specifying you know a desired configuration and the controllers will always make sure through the control Loop that you have the desired count if one of these parts goes down the controllers are responsible for spinning data however the controllers in this case in our case in our design are not responsible for deciding how many resources to create the listener in this case is responsible for deciding how many Runners we need to have so what happens in this case is whenever you install actions around your controller and whenever you configure a runner skill set you will have one listener per Runner scale set so each Runner skill set will have its own distinct listener and you can install multiple Runner skill sets in the same exact cluster either on the same namespace or in different namespaces this is very important okay so when The Listener goes up the listener is also an application a Go app that runs in a container inside this pod and what happens is that the listener is going to try to connect to the actions backend let's actually put them side by side so that we can that we can read and understand what is happening right so first step is action Runner controller is installed using the helm charts the controller manager pod is deployed to a specified namespace and then a new auto scaling resource is deployed and then the auto scanning one asset controller will call the github's API to fetch the runner group ID for the runner skill set that you know it needs to be registered against so the first call is from the auto scaling website controller to api.github.com to fetch the group ID obviously you're going to be specifying we're gonna We already specified the authentication mechanism either via GitHub app or through a personal access token and it's going to use that for authenticating these calls to api.github.com next the second step is for the auto scaling Runner set controller to call the API one more time to fetch and create the runner skill set right so the runner skill set doesn't exist right now we just configured our controller the controller is responsible to call our backend and make sure that we have created a runner skill set and at that particular point in time if this call is successful you should be able to see your Runner skill set in the API either on the repository organization or the Enterprise level after that happens the runner skill set listener not necessarily the bot but the application that is running in the Pod uh you know it's deployed and the auto scaling listener controller is responsible for making sure that this listener is always up and running based on our configuration so when you do the helm install for the runner skill set the auto scaling listener controller is responsible for making sure that it spins up this listener pod and that this part is always running now when this spot goes up it's gonna try to make a an established a long pole https connection to the actions backend so this is number three over here The Listener will make a call to pipelines at actions.githubusercontent.com using the same authentication mechanism it's gonna create a long poll session which is gonna wait for jobs that are assigned to this particular Runner skill set number four when a workflow run is triggered on the GitHub side this is gonna send a message through the long pole connection that we have created with the listener so the actions backend is going to send a message to The Listener saying I have this job that is assigned to you right I have this job that is directed towards you can you accept it there's some you know handshake mechanism that's going on so can you accept this job you have the capacity to handle it and The Listener is going to make a decision based on um you know the the configuration that we supplied maximum minimum Runners so on and so forth and it's going to say yes I accept this job when The Listener accepts this job it's gonna send a message back to the back end saying yes this job is mine Please assign it to me then the back end is going to assign this job to The Listener when the job is assigned to The Listener The Listener is gonna go and try to make an attempt to create a new Runner and by creating a new Runner I mean creating a new Runner ephemeral Runner pod and again these Runners are not static these Runners are ephemera so they will go up and down once they finish their job they're destroyed what happens in this case is the listener it's gonna go and it's gonna try to patch the runner set resource which is a custom resource we introduced and the runner set resource in this particular case and don't confuse it with the runner scale set I know the names are are very confusing but now we're talking about the architecture of actions rounder controller okay so we have a runner set resource this Runner set you can think of it as a grouping of self-hosted runs and it has also a lower bound and an upper back The Listener can patch this resource so it can make a call to the kubernetes API saying I want to change the configuration of this particular resource and in this case The Listener will come and Patch the ephemeral Runner set resource and say bump it up by one I want a new desired Runner to be created once this patch happens the controller in this case the ephemeral Runner controller because in it does it has a Reconciliation loop it's indefinitely running to make sure that we match the desired configuration and because we change the desired configuration over here now the controller says okay I have a new desired count and my actual Runner account is less than that desired count because we added one to it right and in that case the family Runner controller is gonna go and try to create a new Runner and it's going to create a new Runner pod in this case whereby inside the spot you're gonna have the actual Runner image running as in a container and then this Runner when it's created and when the application inside this container starts running it's gonna try to authenticate two actions service backend and it's going to use a just-in-time token to do that authentication so this is one of the advantages of using the new mode is it's a bit more secure we don't inject the personal access token or the or the app installation token from the GitHub app as was done in the previous version of Arc right now we create a new ephemeral token for the specific Runner that is sent to that Runner okay and then that Runner can use that token to register itself against the back end this token has very limited Authority and it cannot really do much so it can only register the runner and once the runner is registered and its status is up then the actions backend is going to assign the job to that particular Runner and then once the runner receives that job it's gonna try to run it and execute it and it's gonna all upload the logs and it's gonna report back the status continuously so on and so forth so at this point if anything happens to your Runner uh the job is not completed it connection breaks whatever it is this is where you need to start troubleshooting because the connection between the runner and the actions back end might have been broken this will definitely happen in the case if you're using a reverse proxy or if you're using some not Gateway you have some weird firewall rules whatever it is you have in between the runner and the actions back end this is where things could go wrong and of course this applies to pretty much everything in this in this cluster over here anything you have between the the controller manager pod The Listener pod and GitHub this is where things could start breaking right and we've seen all sorts of stuff also it's very important whenever you're doing your setup uh to make sure that the runner skill set listener has access to the the internal kubernetes API so that it can patch the fender Runner set extremely extremely important now when the runner completes this job it's going to report back the status and this is when everything is done the runner once it's once it finishes it's just you know removes itself and the next reconciliation Loop the desired count will match you know the number of actual Runners running and then there's pretty much nothing nothing to do unless we have a new workflow that is running and if you have multiple workflow runs coming in simultaneously The Listener pod should deal with that pretty easily as long as the maximum number as long as we're creating Runners within the boundary of the minimum and maximum configuration that we have specified we talked a lot about the architecture we talked a lot about the internals so let me now show you how everything actually works we talked a lot about the theory now let's get actually into practice and let me show you how you can set up actions on your controller yourself and different configuration options we're gonna start with the actions controller repository because I just want to clarify a few things that are confusing to a lot of people adopting this project was a bit troubling why because we had already you know in the repository a working solution for a lot of people and we obviously didn't want to break this while still introducing the new and improved way of dealing with you know self-hosted Runner scaling and that is why in a lot of sections of in a lot of the sections of this repository you're gonna start seeing you know this distinction actions.summerwin.net and actions.github.com actions.github.com is everything new we created when actions running controller was adopted this is the stuff that is actually supported by GitHub and actions.summerwin.net is everything that the community still maintains and supports itself so this is a collaboration with the community and we didn't eliminate the community from the equation obviously we're still working quite closely do with the main maintainers yusuke has been doing a fantastic job in omumusho and maintaining this and responding to the community queries the community is quite active and we definitely want to keep that ongoing however we introduced you know the new apis the new custom resources and the new everything and this new namespace so what we've done instead of creating another repository or forking in the project or any other mechanism which is not necessarily great we decided to create a new namespace within the same operator and to put all of the new stuff in it so here you will see all of the new custom resources that we have created actions uh sorry auto scaling listener Auto scaling Runner set if I'm a runner if I'm a runner set as well as you know the different tests so on and so forth you can see this also on or this namespacing on the level of the controller so this is where the controller logic is defined actions.github.com and actions on summerwind.net so if you go here you will see that different controllers for the different resources and these controllers are responsible for making sure that you know we spin up the necessary pods to maintain the desired configuration and this is where pretty much all of the controller logic resides and again if you go back to the root you will see also here in the command folder that we have a couple of things so this is where the GitHub Runner skill set listener application is located so this is where the listeners code base is available this is the listener that connects via long polling to our actions backend so if you want to come here and read the source code and see what's happening this is pretty much it it includes all of the different tests so on and so forth so when we deploy this controller we are running different applications right so in the first part we mentioned that we're running the controller manager which was the controllers folder that's what you've been seeing and then in the second part The Listener we are running this particular package or this particular application right and the rest is you know just stuff that is needed for the proper functioning as well as different packages that are required by most likely the former controller now one thing that is very important you will see here there's a folder for runners right and in here you will see different Docker files for the different Runner configurations we are not maintaining those so GitHub is not responsible for maintaining these Runner images these Runner images are part of the old Community managed version of actions Runner controller our runner image is hosted in another repository anything you see in the folder Runner that's part of the old actions running controller we are not or GitHub is not maintaining that anymore and we don't have plans to take over the maintenance of these Runner images so let's be very clear about that our runner image is actually maintained in the self-hosted runner repository that is hosted in the action source so if you go to github.com action Runner this is where you will see the code for the self-hosted runner this is the same Runner that runs in you know your cell phones and environments the same Runner binaries that we use in GitHub hosted Runners right and then if you go to this folder images you will see a Docker file over here and this is the docker file that we are using and officially supporting as part of the new new actions Runner controller offering so whenever we create a new Runner release that you can pretty much see over here so yesterday we released the version 2.306.0 automatically we are publishing a new Runner image so if you go to this package over here you will see that yesterday we published or the latest tag has been applied to the version 2.306.0 so whenever you pull the latest image you will get you know the latest binary and this is the reference for it and our Helm charts we are pointing to this Runner image now what we advise you to do is to come here either one of two options so either use this image as your base image and build on top of it or if you don't like what we're doing you can come here and create your own Runner image so you can specify or use whatever base image you like in this case we are using the image that is published by Microsoft and it is using Ubuntu as a base image if you don't like Ubuntu you want to use something else feel free to do that you can use whichever distribution you want as long as as it is supported by our self hosted Runner binaries so then you can come here and configure the same working directories and then you install the same binaries this is this one is for the runner binaries and this one for Runner container Hooks and it is used in the kubernetes mode to spin up containers that run container jobs and we also installed Docker because it's needed for the docker in Docker mode so if you don't need this you can also eliminate some of this stuff now we are adopting a slightly different way of building Runner images so we have the first image for you know building the runner and then the second one is the one that's a bit stripped down that is actually used to run the different binaries okay so this is how we are specifying it and as you can see also we are using the 1001 user ID as well as one two three for the group ID also feel free to change those as needed now you will notice that not nothing really is installed besides the runner binaries the container hook binaries on this Runner and we don't plan to install anything we want to provide a bare bone stripped down version of the runners because we first of all don't want to maintain all of these different third-party distributions from different for different applications we definitely don't want to come in and create a two gigabyte Runner image we want to keep it as simple as possible and we will advise you to create your own customize it whichever way you like and add whatever you want on top of it and this is this is the philosophy that we've adopted for now and that we are gonna move forward with obviously coming back to the actions running controller repository you will know notice here that we are you know referencing different packages and these are the packages that have been previously released these are the runner images the different variations that have been provided by the community before let's discuss a little bit what are these packages because they also are the source of a lot of different confusion okay and I understand it's a bit messy here but yeah I will try to clarify it as much as possible and the reason for that is when we migrated this repository to the actions organization there were a lot of different uh you know we didn't really want to break anything so there were some leftovers they were different problems that we needed to tackle so first of all I want you to ignore these packages actions Runner Docker and Docker actions Runner ignore them okay I'll explain why in a bit actions Runner Di and D rootless ignore it as well actions Runner controller 2 ignore it we need to clean these up actions run a controller charts controller 2 also ignore it things you want to focus on are actions Runner controller charts GHA Runner skill set controller that's first package that is relevant second one is actions running controller charge gh8 Runner skill set the second package that is relevant and then GHA Runner skill set controller these are the three packages that are relevant from this particular link and I will explain these so first of all the charts the helm charts that we are publishing we're not publishing them through the traditional way of using GitHub Pages we don't think that this is the appropriate way to publish Helm charts we are against it it's been used as a hack or as a workaround to publish these charts we don't really agree with the with this approach Halm supports uh you know oci images and packages supports the publication of oci images that is why we are bundling these charts you know as oci packages and we are publishing them so whenever you try to install the helm chart you actually download this image from our packages and then you install it as opposed to downloading you know uh downloading it from downloading enamel file from GitHub that is hosted via GitHub pages so this is the first Helm chart this is the helm chart that will install the controllers okay and then there's the second Helm chart which is responsible for configuring the runner skill set which is going to enable your listener okay so there are two charts one of them is to install the controller the first one and to configure the controller which name space they need to belong to what is their name so on and so forth and then the second Helm chart is to configure your Runner skill set and by configuring the runner skill set the listener is gonna go up now you need to be aware of a couple of things this Helm chart is not going to install your listener it's going to allow you to configure the runner skill set and and The Listener you know it's it uses the same image as as the controller so the application for The Listener as well as the controller manager they're bundled in the same container image now why do we have two Helm charts because we thought at the beginning we want to provide a very quick and easy mechanism for people to get started with actions Runner controller and there's really nothing easier than just having a home chart for the controllers install configure whatever you need and then another Helm chart to configure Runner skill sets why another one because you might want to configure different printer skill sets right because as I said in the beginning each Runner scale set is homogeneous it's going to contain the same type of Runners same name same everything right so you cannot have different types of Runners running at the same time so you cannot have I don't know Debian Runners and Ubuntu Runners and maybe Runners with some applications as well as Runners with some other variation of applications running under the same Runner scale set that's not going to be possible that's why you might want to manage your entire configuration via hum charts so you can do multiple installations of this particular Helm chart so that you can have multiple Runner skill sets running in parallel for different organizations different repositories different Enterprises under different namespaces you name it and you're gonna see how in a little bit I'm gonna demonstrate this and then the final package we have is the actual container inner image which contains pretty much the code that I've just showed you in this repository which contains the code for the controllers right controller manager and it contains the code for the listener as well as the code for exposing the metrics and all of that other fun stuff these are all the elements you need Helm charts the controller image as well as the render image three components three main components right two hand charts and two images the previous the other packages you can ignore them they are either not public or you just know there are not they are not accurate they came as artifacts as part of the migration now you will might be asking where are the runner images that are published for the old Arc and it's quite simple so if you go to github.com actions Dash Runner Dash controller this is an organization which contains all of the Legacy stuff okay so it contains you know the helm charts the previous ones for the previous versions they are still you know the helm charts are being published still through GitHub pages and this is the repository where these ham shots are being published so you can see here the index.yaml file for the previous Helm charts this is this is pretty much it and all of the new releases are being published to it but nothing regarding the new Runner skill set mode is has any relationship with this okay and the second thing you will find here is also the packages which are the old runner images so here is where you will find the images that map to these files over here so let me show you so let me show you these images again we said that we have this folder Runner and inside this folder we have these different Docker files right so whenever we build and publish this is where we are publishing the images right inside the actions run the controller organization and this repository is inside the actions organization right so when we moved stuff we moved this repository and then the old packages remained under the old org and you can if you click on either of these you will see that they also have you know the other images they have the latest 2.306.0 but they have different variations so we have one with Docker and Docker uh one that is uh quite vanilla and then we have uh the image for the old actions Runner controller right and then we also have the actions a runner Docker and Docker rootless now you will see that the last published was 24 hours ago yes we are still helping the community maintain them but GitHub is not directly maintaining them you can see here umusho or yusuke is leading on all of this effort and we are supporting the community in this in this effort but we don't have plans to maintain them ourselves so what are the other elements that are important for our setup um I would recommend that you ignore the documentation that is available here at least for the you know the new mode because all of this has been migrated to docs.github.com and this is no longer relevant or maintained I have a request that is gonna clean up this document moving forward obviously we have a lot of different workflows we have you know for testing end-to-end testing so on and so forth this all of this documentation over here is for the old mode all of the new stuff is on docs.github.com okay now let us uh create a new setup let's move on with the actual setup I'm gonna create a new code space and Y code space because um I mean it's the easiest way to create a new environment that you know I can install whatever dependencies I need and the tools and all of that stuff so I don't have to worry about polluting my local environment and everything like that I'm gonna spin up an eight core 16 gigabyte of ram you don't need that much obviously but to run mini Cube and all of that I think this would make the most sense and this is going to spin up my visual studio code it's going to take a little bit for this code space to spin up so we're gonna wait for it awesome you don't really need the source code to set up actions around your controller and so you don't really need to clone the repository or anything like that because all of the packages have already been published I personally just created a code space because I wanna you know just use it as as my compute not necessarily for any other reason and because it just comes pre-bundled with a lot of the tools that I already need so I don't have to worry about setup and all of that funky stuff and what's nice about code spaces that I can also SSH into them so I can pretty much do anything I want directly from the terminal so first things first we need to make sure that we have all of the tools that we need right and you know maybe a better idea would be to try to to authentic to SSH into the code space as opposed to using my vs code so here we are I just SSH into it and by the way this will not work locally for you this is just an alias you can use CS SSH and then you provide the code spaces name I created a an alias to make my work easy and I'm using the GitHub CLI for this I'm inside the code space right now let's make sure that we have all the tools we need first of all do we have mini Cube and we do that's that's what we're going to be using as our kubernetes you know distribution and it's going to create a single node cluster for us I'm not going to create a new kubernetes cluster just for the purposes of this demo it just defeats the purpose and second thing we're going to test is that we have Helm so Helm is already installed on my code space let's see which version we have all right three great uh I also will need Cube CTL let's see do I have it yes perfect okay great and I think we are ready the first thing we're going to do is we're going to start a new Mini Cube cluster so we can do that with a mini Cube start Dash B PS4 profile and we're gonna call this one you know I don't know Arc you can call it whatever you want then we're gonna specify the number of CPUs we want let's say four then we specify the number of the memory let's say we want eight gigabytes and then I'm gonna add a couple of things you don't necessarily need to do them yourself it really depends on what type of configuration you want now this particular you know configuration is not necessarily needed by your environment but in my case I'm going to be using open EBS for a dynamic provisioning of persistent volume mounts so or persistent volumes and open EBS has this requirement where we need to map or Mount run Dash udav onto the cluster or the node itself and this is what it is used for I think the rest is pretty much vanilla we don't really need anything else to get started and I think with this I can just you know execute it and now it's gonna try to create a new cluster for us using Ubuntu 20.04 obviously your kubernetes setup is going to be much more complex than this maybe you're even using some managed kubernetes flavor like AKs or eks whatever it is the setup shouldn't vary that much now of course a lot of different Enterprises have different requirements different policies different you know restrictions on what can be installed and cannot be installed I'm not going to be able to go over all of these things today I'm just going to show you a vanilla setup and then for the specifics of your environment you're gonna have to you know either work with GitHub directly through expert services or by doing some research yourself on some of these requirements I'm going to show you what are the different elements and then you need to make the decision for yourself how these will fit into your particular environment what mini Cube normally does is that it creates a VM or a virtual machine in your environment where it's going to be running you know that particular kubernetes flavor and in this case because I'm running in a code space which is already a container so it's going to use the docker driver in this case and the node is going to be an actual Docker container as opposed to being a full-fledged VM and as you can see mini Cube has finished its work and if I type a mini Cube status Dash B Arc you always need to specify the profile now because we use the profile you can see that the host is running the control plane is running everything else I need is up and running now the next step for us is to actually configure the values.yaml file for the helm charts and we're gonna do that quite simply so if you go to the repository and because I have the repository already checked out here you can either download this file from you know the the repository itself or you can recreate it yourself or there are many ways just grab yourself a copy of the virus.yaml file so this is the root folder of the repository and you can see there's a folder called charts so we're gonna go into uh the sorry I'm not sure you can see this we're gonna go into the charts a folder right here and if I do a listing you can see that I have the first chart here for the controller and then I have the second chart for configuring the runner skill sets Okay so we're gonna first create the values the yaml file from the controller so I'm gonna go into this directory and then I'm gonna do another listing and you can see I have a values.yaml file again this values.aml file is a template so what I would recommend you do in this case is for you to copy this file into some other location so let's say I'm going to create a new directory in my home folder and I'm going to call it sorry not like this I'm going to create a new directory and I'm going to call it Arc configuration okay so I'm going to go to Arc configuration right now and I'm gonna try to copy the values.yaml file so that I can configure it let's make a couple of folders here the first one we're gonna call it controller and then we're going to make another folder we're going to call it I don't know Runner scale set one let's say okay and then maybe you want to have another Runner skill set so you're gonna need another configuration file let's call it Runner skill set two okay so you can have as many as you need obviously your naming convention apply whatever you want over here so we got a copy the first values.yaml file from our checked out repositories so again charts and then DHA Runner controller set sorry Runner set controller and then values that yaml and we're going to copy it to the folder home slash Arc configuration slash controller okay that's the first step and then we're gonna copy uh we're gonna make two copies of the other values that yaml file now okay so workspaces actions Runner controller charts and be careful here we're not copying the same file we're copying the file from the GHA Runner set Runner skill set chart now okay the previous file was being copied from the jha runner scale set controller these are two Hound charts so always keep this in mind now we're copying this file into home directory Arc configuration slash Runner skill set one so we're gonna make one copy in this one and then we're gonna repeat this and make another copy in the second folder okay now if I do a tree you can see here that I have a values.yaml file inside my controller folder I have another values like ml file inside my Runner skill set one and I have another values that I am a file inside printer scale set two let's start working inside the controller okay so we have our values that yaml file over here we're going to open them open any editor you want doesn't really matter so that we can start configuring our controller and as you can see here this is a pretty much a regular yaml file yeah I mean just you can just use any yaml validator really there's nothing major going on here so let's talk about the different values and what are the different configuration options you could have alright so the first item you can configure are the labels that are applied on all of the resources that are created by this chart and the way you can understand this is just a very simple key value you know like dictionary or structure and you can see where they are being applied by going to the uh you know our repository go to the charts folder and then GHA Runner skill set controller you go to the templates and then pretty much everywhere it says include sorry not this one everywhere it says include jha Runner skill setcontroller.labels this is pretty much where the labels are going to be applied and they're going to be applied to this deployment they're going to be applied to the service account um and yeah there are a bunch of other places right so you can configure these labels to be whatever you want and labels will help you identify the different resources you can I don't know have a certain naming convention applied on them or you can specify that these labels belong to I don't know this these resources belong to a certain organization or a certain team there are many reasons why you would want to use labels to tag your resources the second option here is the replica account you can obviously specify that you want multiple replicas of the controller manager however at every single point in time there's going to be only one of these replica replicas that is responsible for the reconciliation otherwise things are not going to work because you're going to have a split brain problem we're going to talk about a lot of details about this but there is a leader election algorithm that is applied that's going to select one leader out of these multiple replicas to be always the one that's responsible for the reconciliation so if you want some form of high availability this could be an option for you whereby if one of them goes down and a leader election will pick up another controller to take over the reconciliation work next is where you specify the image that's going to be used for the controller manager as well as the listener thoughts okay and here you don't really need to change this unless you are mirroring this image inside your own private registry because you don't want to pull from the public registry for whatever reason you might want to scan these images before you deploy them in your cluster or do whatever you want so the idea here is you can specify whatever repository you want this is the default I showed you where you can find this package the second option is to specify the pool policy and then the third option is to override the image tags instead of pulling the latest version so this is where you can specify a specific version of actions Runner controller to run and the reason why you might want to do this is maybe there are some breaking changes in future versions and you want to be prepared for that so you don't want to always pull the latest image and run it you want to be more in control of what you are deploying and what you are running and this is the way to do it and obviously what is not described in this value is the yaml file but image pool Secrets can also be configured if you are running this oh actually it's right over here sorry never mind so you can specify the image pool secrets so if you are pulling this image from a private registry you need authentication this is where you can do it I think this takes as a reference a secret that you can configure on the kubernetes level so you don't have to provide the secret as a hard-coded plain text value over here now when we are creating the controller pod and all of the different resources we obviously have default names for those that are compliant you know with the kubernetes API limitations and storage limitations however if for whatever reason you need to override this and specify a different application name you can do this with these two parameters and again the way to understand those is if you go to our repository you open the helpers.tpl file you can see here all of the different places where the name overrides are being used obviously these names are going to be truncated because we have very strict naming limitations that we need to abide by I wouldn't recommend you change any of these unless you really know what you are trying to achieve the defaults values are reasonable and there's really no particular reason why you would want to change them unless you are an advanced user next this is where you can define environment variables for the controller pod obviously we create our own environment variables for these pods but if there's for whatever reason you want to you know provide some information addition to the controller or you want to inject some environment variables or maybe you want to inject environment variables that are coming from a secret so you can also specify these in your values.yaml file you can specify here secret reference from which this template will pull the value of and specify it as an environment variable next is where you can create your own service accounts obviously the controller needs to have elevated permissions on the cluster we have our own role definitions right so you can see here in the helm chart we create the cluster role with all of the different you know the API groups and the resources that we need access to along with the different actions that we want to apply on these resources so this is very important if you are using our back you know role-based Access Control if you don't want to rely on the defaults you need to create these on your own you need to create the service accounts on your own if you don't want us to create them for you and then you need to specify them if you have your own custom configuration maybe you don't like our permission structure or for whatever reason however these permissions are the bare minimum that we need so that the controller can function appropriately if you create your own roles if you create your own service accounts and whatever and if you don't if you miss any of these permissions we have no guarantee that the controller is going to be working okay so if you start seeing authorization problems pods are not spinning up appropriately in the logs if you start seeing authentication sorry authorization issues API responses that look weird from the kubernetes API then come back and revisit the permissions that you have applied for that particular service account that is being used by the controller and the resources okay these are very important so we have multiple roles multiple role bindings you can just read them inside these charts they are very well defined and you can pretty much you know see everything that we are requiring permissions for this is where you can specify pod annotations this is where you can specify the Pod security context obviously the controller pod is it can have you know a security context on its own maybe you want to specify an FS security group or you want to specify some resource limitations or you know the user that the user level that you want to run this spot at or the group or whatever it is this is where you can specify it now there's a small caveat in the time of making this video the security context did not apply to the listener part the option to configure the security context for The Listener pod was not available yet we are working on a fix for this that we're probably going to release in future versions but I'm not covering it in this video because it's not out yet all right this is where you can specify the resources also for the resource limits the quotas on the controller pod and again this doesn't really apply to the listener so just be careful and make sure you understand this point very clearly node selectors you have a particular node that you want to bind to this spot you want always the spot to spin up on a particular node for whatever reason even though we don't recommend it this is where you can specify it tolerations Affinity priority class name all of these things you can specify them from the helm chart now evidently if we're missing anything we recommend that you create your own charts and customize them whichever way you want you don't always have to pull our charts and you can modify the resources whichever way you want you can also apply some of these configurations post setup so it's all it's all up to you how you want to how your money manages we think we did a okay job in defining you know the defaults and we think the defaults are quite reasonable next are some feature flags and configuration Flags so we have a we have a couple of them first of all you can specify the log level all of the logs are being streamed to standard output so you can pull them and you know push them to some whatever logging storage solution you have it's quite easy so if you want the logs to be quite verbose you can specify you know the debug level this means that you have all of the exceptions errors whatever it is you can see them in your logs be careful because I don't think we do but it is possible that through the debug log level some confidential information might be exposed so if you are pushing these logs somewhere else if you're copying them whatever make sure you vet them and make sure that they don't contain any confidential information next is the log format you can by default it's text but you can also use Json the next thing you can configure is watching single namespace so obviously the way we have designed this controller is to observe changes in different namespaces and this is very valid if it's trying to modify resources in different namespace but if you want to limit it's you know what it observes to a particular namespace this is where you can specify it next is a bit of a convoluted concept which is called the update strategy we have two modes immediate or eventual and this is related to a situation where it's a bit tricky but the problem is if you have your controller running and you have a number of jobs that are either pending or they are also running right so you have your controller set up and you have let's say 100 jobs that are in a pending or a running State and then you apply an upgrade so you modify the helm charts and then you upgrade your solution right there's a big question of what happens and when do you apply this upgrade you apply it immediately or do you apply it when all of the jobs have finished if we apply this patch immediately there is a situation where while we are applying this change the controller will destroy the existing resources it's not gonna shut down the ephemeral Runners that are running these are just going to continue behaving as normal but the ephemeral Runner set for example and The Listener and these custom resources that we created they're gonna be removed terminated so that we can apply the new changes that have been specified through the upgrade through the patch so right and what this will do is whenever the listener and the ephemy runner set go back up again it's gonna try to provision the same number of Runners that are already in a pending or a running State obviously these new Runners are not going to do anything they're not going to be able to acquire the jobs because these jobs have already been assigned to other Runners but still this could lead to a situation where you have some level of over provision and if your cluster is being hammered by workflows and you already have like let's say I don't know 500 Runners your customer might not be able to handle this over provisioning of resources so obviously you want to avoid such a situation and the way you would avoid it is by changing the update strategy from immediate to eventual and what eventual will do is it will remove the listener and the ephemeral Runner set so if folks using or creating workflow runs addressing this running skill set these jobs will not be assigned or at least they will they will not be assigned to the runner skill set because the listener is not there to acknowledge those jobs right it's been removed and then what will happen is it's gonna wait until all of the existing jobs that are pending or running drained so they finish once that is done the controller is gonna apply the upgrade and then the listener and the ephemeral Runner set are gonna pick up the new jobs and then they're gonna execute now obviously this is going to lead to delays that of an unknown duration because the duration of the delay is based on how many jobs you have running and how much time they need to finish it's based on the draining duration of your existing jobs right so this is the risky aspect of it but also it solves the problem of the over provisioning when these go back up they're not gonna recreate the resources because they're gonna start fresh there they're just gonna start dealing with the new jobs that have been queued be careful about this don't change the update strategy unless you really understand the implications of it and what is really happening we are still trying trialing this mode so we know how it's going to behave but there are might be scenarios that we didn't really anticipate so also if you face issues with this please do let us know and we will try our best to fix and that's it this is this is everything about the controller configuration now obviously you don't really need to create a values.yaml file you can apply the changes to this configuration through the CLI I can show you now how it's done but yeah you can you can also modify any of these values and apply any of the configuration you want I'm gonna keep everything to the default because in my opinion the defaults are quite sensible maybe the only thing I will do is I will change the tag but I'm not going to change it from the values I Amplified I'm going to change it in the CLI so I'm just going to exit this values.yaml file and now we're gonna you know start installing so let us proceed with the hum chart installation of the controller okay and we can do this very easily with Helm install and then you can specify the installation name I'm gonna use ARC in this case I'm going to use the backslash because I think it's easier to read if I go down the line and the second thing I'm going to do is I'm going to specify the namespace where I want this controller to be installed let's specify it like this and then we're gonna call this Arc systems you can call it whatever you want you can specify it as an environment variable and then we're gonna specify create namespace because if this name space doesn't exist already which you can create that Standalone separately but Helm can also take care of that for you and then this is how you can override some of the values in the values that yaml file if you don't want to change the file itself right so um you can say set image.tag in this case because I want to use a specific image version of the controller and um you don't really need to specify this because it's always going to pull the latest which I recommend or you're just going to pull the image tag that is appropriate to the Chart version you're trying to install but in whatever case you need to override it this is how you can do it and then you can specify the file the values that yaml file right so the values.yaml file in my case is located in home code space slash Arc configuration slash controller values.yam and then the last step is to no sorry we have two more steps one of them is to specify the helm chart we're going to be installing right so here you can either specify the health chart that is available locally because we cloned the Repository so it's going to be using the one from the master branch which I don't recommend because you know it could be unstable for whatever reason or you can use the oci package that we have published to GitHub packages you know you can do it with this value over here so you can see it's going to our actions organization and pulling this particular package from our packages re registry and then the last thing we can do is to specify the version of the helm chart we can use and in this case I want to specify version 0.4.0 always make sure that the version of the arm chart matches the image tag we try to guarantee as much as possible that there is no mismatch between these two they're gonna go hand in hand but again for whatever reason that these drift be careful when you specify these the latest option is also fine but in this case I also don't recommend using the latest just specify this version and ignore this one because it's going to be pulled from the default values in the helm chart itself and then when you run this command you can see here that it pulled the image you can verify that the shaw matches the one that is it already did the verification but if you want to be extra extra secure and then you can see here that it has been deployed and this is the first revision and now if we do the following if we do Helm list Dash a you can see here that Arc is deployed you can see when it was deployed and you can see which version of the chart is currently running now what does this actually mean for us if we do Cube CTL get pods dash n Arc systems because this is the namespace where we installed our controller you will see here that we have our controller pod already up and running if this is not running in a running state that means we have a problem revisit your values.file configuration or revisit the settings that you have passed in the CLI and make sure that they are correct you will see here that also this name is quite lengthy we are working on shortening this name so that you don't hit any API limitations that's probably going to be fixed in future releases of actions running controller now we have the controller up and running we can also do one more thing extra which is to inspect the logs so we can say Cube cuddle logs pod then we provide the Pod name dash n Arc systems and then if we inspect the logs we are seeing some infos but we don't see any errors that means we are in an okay situation right now and then they are there are really no problems obviously the controller is not going to do much right now it's just gonna stand idle just because there's really nothing configured yet there's nothing to reconcile so in order for us to actually install and configure the runner skill set we need to use the second health chart and let me show you right now how we can do this so let's go to the previous folder and let's go to the runner skill set one this is where we have copied the first values.yaml file right this is this one is for the second Helm chart which is the GHA Runner skill set held chart let us again open this values.yaml file in vim and this is where we start configuring the first things first GitHub config URL this is where you want to register your Runner skill set and subsequently the runners against this could be an either a repository organization or Enterprise however this cannot be multiple values it's just a single value and in our case we're gonna install it on a test organization that I have created for the purposes of this demo so if I go to my browser this is going to be my test organization this is the URL for it so let me copy this URL and then if you can see here I have already a GitHub app configured and installed I'm not going to go through the configuration installation of the GitHub app I've done this many times in other videos you can go and check them out we are going to use this app application over here and you can see it doesn't really require a lot of permissions so let me just authenticate very quickly if we go to permissions and events we don't really need a lot of stuff we just need to be able to have administrative privileges on the repository if we're going to install repository level Runners however on the organization side we're probably only going to need uh read and write access to self-hosted Runners that's it we don't need anything else if you want to install them on the repository level unfortunately you still need to supply the administrative permission Administration permission uh you can read more about what are the permissions required in our documentation going back to my values.ml file let us actually paste the link to my organization and that's it you don't really need to change anything just copy the link to the org put it here and you're good to go the second thing we want to do is to configure the GitHub secret that we're going to be using obviously a lot of people are gonna complain about this should we really put the value of the secret like this inside our values.aml file you don't have to you can obviously do it like this by specifying it yourself or you can create the secret in kubernetes and then just add the reference for it over here and this is what we're going to be doing right now so we're going to create a secret that's going to contain our GitHub app values and then we're gonna add the value of the secret or the reference for the secret in our values that yaml files and the way we're going to go about creating this secret I'm just going to paste the command over here in this editor just uh for clarity's sake so you can see here we're gonna create a secret it's we're gonna call it a Nebo GitHub app and then we're gonna put it in the namespace that is not the namespace where we installed the controller we're gonna put it in the namespace where we're gonna be installing the runner skill set and where the runners are going to be created this is very important otherwise you're gonna get authentication failures because the controller is not going to be able to pull the secret the second thing is we're going to provide the GitHub app ID which you can get it from the configuration page and then we're going to provide the installation ID and this one is a bit trickier because it's not in the configuration page and the way you can find the installation ID is by going to the settings page and then scrolling down to GitHub apps and then selecting the application that you own or just installed the installation ID should be right in the URL bar as you can see over here so just copy that number and you can just drop it over here otherwise there are obviously tools where allow that will allow you to fetch this installation ID but we're not going to go through that right now I've explained them in other videos next the final part which is to define the GitHub app private key I have moved my private key the Pam file to a location that is again secure and I'm just gonna read that file and it's going to be fed into this literal then I'm just gonna copy this whole command and I'm just going to drop it over here and then we're gonna run it and as you can see the secret has been created now let's verify that the secret is in the correct shape And format so you can do the following Cube cuddle describe secret and then provide the name that will GitHub app and then provide the namespace arc Runners and you should see here that the data is actually correct so we have the GitHub app ID six bytes installation ID 8 bytes and then the private key is 1 600 bytes if this number is less than what you're seeing right now that means you potentially have a problem and that the content of the private key is not correct you might want to double check the command before you continue otherwise you're going to have a lot of problems and it's going to be painful to troubleshoot this so let us go back to our values.yaml file and let us embed this newly defined secret over here and let us change the value of this to Naboo GitHub app right that's how this one will work obviously what we want to do also is to comment this line because we're obviously not going to be using the GitHub token and we're just going to be using the uh configured secret so GitHub configures config secret as well as the GitHub token are both going to be commented out and this is going to override that other configuration next is the proxy a lot of Enterprises would want to use a proxy in between the listener or the controllers and the GitHub API endpoints as well as the actions backend endpoint and this is where you can configure your proxy settings now the proxy configuration is pretty simple you need to specify the URL for your proxy as well as the port you're going to be using and then you can also specify credentials for the proxy you need to use and this could be a secret that has a username and password as key just like we defined this other secret right from literal so you need to specify here for the proxy another secret this should be username and this should be password and you should have values for those then we're gonna be pulling the secret and we're gonna embed it or append it to this URL so that we can authenticate also you could use a Anonymous proxy like we are doing right here and you don't really need to provide this credential secret graph you can just comment it out and you're not going to need a proxy obviously I have the no proxy parameters over here now a couple of things you need to keep in mind whenever you specify this proxy they're going to be applied to the controller listener and to the runner as environment variables for the controller and The Listener apps these are gonna change the behavior of the HTTP client within the app so that all of the traffic and all of the requests that we make are gonna go through the proxy transport that we have configured over here however on the runners level they are going to be specified as environment variables so if you have an application on the runner that does not recognize or does not you know abide by pulling the proxy configuration as in from environment variables the traffic from that particular app are not gonna go through the proxy and you need to figure out other mechanisms to Define find this proxy configuration on the runners we are not responsible for making sure that the proxy configuration works on every single potential app you might be running on your Runner we can only guarantee that our runner binaries are gonna push their traffic through the proxy configuration and they will we know that it's tested we know very well that does we have end-to-end tests to prove it if there's anything that's not working it's probably because you have changed the runner base image or you are using some other configuration that does not recognize or removes the environment variables that we are injecting to the runner some problems that you might face with proxy and again this is a nightmare all of the different problems you're going to be seeing are probably going to be related to the proxy every time a proxy team creates a proxy configuration they claim that it's working perfectly and then after days and weeks of troubleshooting we end up realizing that it's actually the proxy so if it's not DNS it's the proxy verify your proxy configuration very well make sure it has access to these different endpoints what is more important is you make sure that this setup actually runs properly with the proxy and you need to keep on troubleshooting it we know very well that the proxy configuration works but you need to test it your environment could be completely different than any other environment so in order to activate this just uncomment all of these lines and Supply the proxy configuration and then you should be good to go the next step is to configure the maximum and minimum Runners here you can uncomment it you can uncomment the maximum Runners and you can specify that you want like a maximum of 10 or 100 or a thousand whatever there's no upper boundary or if you leave it uncommented that means that the controller is going to scale until it hits resource limits that's how it's going to work minimum Runners if you don't specify it that means it's going to scale down to Zero by default if you specify it to be zero it's gonna scaled down to zero if you specify to be any non-zero number like five it's gonna scale down to this number always so it's going to make sure that you have five Runners always available and ready in idle mode to be picked up this is good because for the controller to spin up a runner especially if the runner image is substantially big it might take uh 30 seconds to one minute maybe two minutes depends a lot of variables could come into play here if you use our image it shouldn't take more than a few seconds but if you have a bulky big image for the runner that you're using startup time could take could take quite a while a remedy for this is to specify a minimum number of Runners that are always idle and ready to be assigned to jobs the minimum number of Runners cannot exceed the maximum number of Runners configured the next item is to specify Runner groups so this is where you can control the permissions that these or the repositories that can access this Runner skill set so for example on the organization level I can go and I can create my own Runner group let's actually do that they run a group if you specify buy it here it's not going to be created on your behalf so you need to create the runner group yourself first and then you can specify it in the values that yaml file so I'm going to go to actions I'm going to go to Runner group and I'm gonna create a new Runner group here I'm going to call it custom group and then I'm gonna specify that all of the repositories in my organization can access this with the exception of public repositories and all workflows can actually access this group once I have created this custom group I'm gonna go back to my configuration I'm gonna uncomment over here and then I'm gonna change the value of this Runner group to become custom group that's it and then the runner skill set is going to be created in that group the next step is to specify the runner skill set name this is very important because the way you're going to configure your workflows to Target a particular running skill set is using this name labels they don't work anymore so you cannot have a runner skill set with multiple labels like self-hosted Ubuntu 16 gigs whatever these will no longer work and you cannot use running skill sets like this that's why it's very important for you to create Runner skill sets names that are quite indicative so for example if you are using a container image that for example is Ubuntu based you could say here Ubuntu Docker in Docker and maybe a resource limit would be like 16 gigs let's say this could be a runner skill set name uh you could also say Ubuntu 20-04 and then without Docker and Docker 16 gigabytes whatever you can use any variation here as a name for the runner's cassette if you don't specify a runner skill set name the installation name for the helm chart is going to be the what's used as the runner skill set name gonna show you just like we did when we did Helm install and then we specified an installation name we called it Arc if you don't specify the value here this would override the installation name if you don't specify it then we're gonna default to the installation name now moving on to the second nightmare all Enterprise administrators have to deal with are self-signed uh not necessarily self-signed but like certificates that are signed by a custom certificate Authority and this is quite popular within larger organizations where they have a root CA that signs all of the intermediate certificates that are being used and this is very relevant in the case of using GitHub Enterprise server for example where you can upload your you know custom certificates and um the key that has been signed with them so that they can be used now that means that the TLs Hench take cannot happen between the listener and GitHub Enterprise server for example unless these certificates are also trusted by the controller listener as well as the runner otherwise all of the API calls that are being made using TLs are gonna fail claiming that this is a self-signed certificate and you know there's a problem from a security standpoint with that and this is why we also provide the option to configure your own you know certificates for Arc however this is a quite a complicated topic so first of all you need to understand the following this certificate cannot be trusted automatically on the runner what we can do is if you are reusing our runner image so for the listener pod and the controller we have full control over them that's not a problem we will deal with however for the runners if you are using our runner image what we will do is if you provide the proper configuration we will create a new GitHub server TLS cert value in your cluster and then we're going to mount that volume onto your runner in the appropriate location okay it's gonna be mounted to the Mount path that you specify and then it's gonna be whenever we run the runner we're gonna run update CA certs which is going to take that certificate and trust it add it to the trust store of that Runner now if you change the runner base image this will no longer work because this works on Ubuntu based images or Debian images if you use something like I don't know Centos Centos or some red hat distribution this is a completely different process and this will no longer work obviously you can change the mounting path so you can put it somewhere else if the operating system has a location from which it trusts or automatically adds at startup the certificates to the trust store is how you do it if that's not available then you need to configure your Runner image to read from this Mount path and add the certificates in the Mount path to the trust store you have to do that and you have to manage that yourself we cannot really predict how that is going to be done the second thing is it's going to set node extra CA certificates environment variable to that mount path that you specified which will allow node to also add these certificates to the trust store because you know node is quite special and it needs its own mechanism to trust certificates and the last thing is it's going to set Runner update CA search environment variable to 1 and this will instruct the runner to reload the certificates on the host this is the last part which is gonna what reload means it's going to add them to the trust store for that container so in order for this to work you need to create your own config map that where you specify the name and the key so what does this actually mean and how can I really create these this config map let me show you an example so let's assume I'm going to create a dummy certificate right now you know so let's assume I have this certificate file I'm going to call it you know my custom ca.10 it's very important that the certificate you're trying to use is Bam encoded any other format will not work psk and all of these other variations it has to be bam encoded right and then the next thing you need to make sure of is that not only it's Pam encoded but it also has the extension.crt this is not our requirement this is the requirement of the of Ubuntu the operating system it will not recognize certificates that have the extension.pam they have to have the extension.crt after you have the certificate rename it and make sure that has this extension then what you can do is you can create a config map with this configuration okay again just to show you how this is done let's open up a new file and so let me break it down for you so Cube cuddle create config config map and then you can specify a name for that config map it can be whatever you want it doesn't really matter and then you can specify the from file so in the in our case where did we put the certificate let's have a look so we copy this one from here and you can specify it like this and then what did we call it we called it mycustomca.crt right so this is how it's going to look like when you specify it like this that's it that's that's all you need to do and then when you create this it's gonna create the config map with the content of that certificate but because I also the the file is empty let's first add some stuff to the certificate let's assume you know there's whatever let's say uh begin certificate blah blah blah and then let's add this to my Customs just so I can show you the content once the config map is created right so again let's run this one more time Cube cuddle create config map CA cert oh we need to specify namespace I forgot to do that sorry dash n Arc Dash Runners so you will notice right that the namespace I'm using here is different than the namespace I used previously just because the listener is going to be configured in another namespace I'm not gonna it's not gonna put the listener pod in another namespace but it's going to create the runners in that new namespace right The Listener pod and the controller manager are always going to be in the same namespace but you can split where the runners are created and you can have those created in a separate namespace and this is what we're doing here right now and that is why also this config map needs to be in the other namespace because we're going to be using this config map to install this certificate on the runners themselves and the runner pods need to have access to this config map right otherwise we're going to have a problem so that's why we are using this particular namespace now this might fail because the namespace has not been created yet which is fine also we can create it so let's first of all test it correct it failed because the namespace doesn't exist so let's create the namespace cube cuddle okay let me clear first of all Cube cuddle create namespace Arc runners and the new name space has been created now if we make another attempt to run this function or sorry run this command it should work if we want to verify this or verify the content of this config map we can say Cube cuddle get config map CA cert that's what we called it right and then dash n Arc Runners Dash o yaml and then when we print it out you will see here that this is the data this is the content of the certificate which is exactly what we had as part of that file and then this is the key right and the name of the config map great we have what we need so we can use this same configuration in our values.yaml file now so we can go here and we can uncomment all of these so let's do it like this and delete word and then repeat all right cool so here we can specify the name which we called uh sorry delete end and then CA oops CA cert and then here is the name of the certificate which we called my custom mycustomca.crt and then this you can keep it to the default unless you have changed the base image or for whatever reason you want this to be mounted in another path if you keep this path and the runner will make sure that it loads and runs the update CA certs application to add this certificate to the trust store and this is pretty much how you can configure your DLS certification and again if you see in the logs TLS certificate problems that means you either have the wrong formatting for the certificate you could also have the wrong file name or it could be mounted to the wrong directory so you need to inspect and make sure that the runner is actually updating the certificate in the trust store correctly okay these are things we cannot guarantee ourselves and this is done on a best effort basis many things could go wrong here your configuration could affect this tremendously and there's really no way for us to really identify all the possible configurations and deal with those okay now moving on to yet another complication but we still have to talk about it container mode and this is a point of confusion for a lot of different people but again this is where kubernetes this is not really ideal for these scenarios that's why I'm getting hesitated really in adopting this project because Israeli kubernetes the proper place to have CI workflows running and to managing self-hosted Runners this yet to be determined a lot of people are finding success with this but you will see so container mode what's the limitation and why do we need it in GitHub you can have workflows where you could build container images or you could run images or you could run containers as services or you can run container-based workflows whereby you can have a bunch of steps that run inside a container image that you specify when we are running inside a kubernetes cluster already that means we are using containers for our self-hosted runners and that also means that you cannot really run Docker inside these Runners by default at least that's why we need to rely on something called Docker and Docker whereby we need to install Docker inside these images or at least use the docker and Docker image which comes with a bunch of own customizations and configurations and unfortunately due to the how Docker is designed and this is the limitation that we cannot really work around these containers need to run in privileged mode because these containers require High privileges to access the docker socket and that is why Docker and Docker exists and what Docker and Docker what Docker and Docker allows you to do is it allows you to spin up a container within another container right because we have the container which contains our self-hosted runner image it's running our Southwest Runner image and then one of the jobs needs to run inside that Runner uh container itself right so that's why we need Docker and Docker Docker and Docker is going to create another container inside that original container and it's gonna run that workflow or it's going to do whatever now the problem with this also is that you cannot build Docker images using this approach okay let's be very clear about that so if you want to build images you need to rely on other Solutions so canico for example would be one of the options you could use to build container images if you're running in this particular mode now another alternative for this is because sometimes in clusters you cannot really run privileged mode so the alternative is to use kubernetes mode and what this means is we have a solution we created which is called Runner container hooks and Runner container hooks is a simple app that it's a simple node.js utility that is installed along with the runner binaries inside the runner image so if you use our version this comes shipped the ships already with that with that Runner and what this will do is whenever you have a job that uses container mode or uses another container as a service right the container hooks is gonna try to connect to the kubernetes API and it's gonna tell kubernetes through the API to create another pod that's gonna be running your container job next to the Pod that is our runner right so there's going to be our runner and then there's going to be another pod which is going to be our container job or service and of course in order to for that to work you need to provide these the following permissions and it needs to have access to kubernetes API and it has to have the ability to you know create and fetch and delete pods from that particular namespace right but because we have our Runners isolated in a namespace that should be okay because then the container hook scan spin up new pods only within that particular namespace and you should be okay so this is another viable solution it doesn't really solve the building process or or the building problem but it also allows you to avoid using Docker and Docker and running containers in privileged mode so you have to choose either of these options okay there isn't a third alternative so let's start by configuring you know Docker and Docker in this case and I'll show you how it works and in the second runner scale set which I'm going to spin up I'm going to try to configure the kubernetes mode so that I can see how that works so for Docker and Docker all you have to do is just to uncomment these and that's it once we have that specified that that's all we have to do okay so the next step is to specify the template and the pulse pack for the runner pod right we don't have to change much here unless you are running your own custom Runner image or something but let's go through it anyway so template and because we changed the container mode to Docker and Docker we're gonna inject some extra stuff for you inside uh this template right because we want to create an init container and that energy container is going to be executing this command so it's going to copy all of the externals to that mounted volume and it's going to be Docker and Docker externals why because we need to share these externals in between Docker and Docker and our runner right it's it this is how the behavior is going to work uh we need to share these binaries that are available in this directory between the two and that's why we need to create a volume Mount that is um shareable between these two containers and the only way you can do that is via an init container so and a new container is going to spin up and it's going to do this work for us and then we have the runner container this is the configuration obviously if you want to change something this is the place to do it so you can specify the different image you can specify whichever tag you want to use you can add more environment variables or you can just rely on the defaults that we specify now we are sharing all of this with you because if you want to override any of them this is the way this is the place to do it otherwise if you keep it by default these are the this is the configuration that's going to apply so that's the first container sorry the second alongside the init container and then we have the third which is Docker and Docker it's gonna use the docker Docker and Docker image it's going to run in privileged mode and it's gonna Mount all of these different volumes to these paths these volumes are also going to be created and that's that's pretty much it I'm Gonna Keep It vanilla for now obviously if you're configuring these uh then you pretty much know what you need to do I'm not gonna uncomment any of them because I want to keep the configuration to the default so here this is the spec for our runner container that's what we're gonna apply we're just gonna use the latest version obviously we don't recommend that you keep it like this because we could be upgrading at any point in time and I think you should be managing when you upgrade your Runners so I'm gonna switch this to using version 2.306.0 which is the latest and then the executing command or the entry point is going to be run message that's pretty much it for the runner skill set obviously if you create your own service account just like we said with the controller if you create your own roles and role bindings this is the place where you can specify them right so you uncomment these and you specify the namespace as well as the name for the service account that you'd like to use but then you are responsible for all of the permissions and privileges required for The Listener to operate and the controllers to operate properly you can read our Helm charts to understand what is needed at that point and I already showed you in a previous section where you can find that information now that we have all of our configuration ready we can proceed with the installation however before we move on I need to comment out this section because obviously my certification is not correct so I don't want this to ruin my installation then we're gonna have to go on a troubleshooting hunt and all of that stuff right and just like we did before let us try to craft the command with which we're gonna be able to install install this Runner skill set so the command in our case is quite simple and it's very similar to the one from before Helm install and then we're gonna specify the installation name but as you recall and we talked about it this installation name is just going to be for the purpose of managing our Helm installation and it's not gonna be the name that's going to be used to Target our runner skill set because we have overrided this name in our values.yaml file okay so this is not going to be the one the name that's gonna we're gonna be used in the runs on value in our yaml file so we can call it whatever we want I'm gonna call it Arc Runner set and then let's go to the new line we're going to specify the namespace and we're going to call it Arc Runners it's very important here to use the same namespace that we used when we created the secret right so we have Arc systems and then we have Arc Runners and that's what we're going to be using over here now obviously The Listener pod is going to be created in the same namespace as the controller manager pod so don't confuse these things however the runners are going to be created in the new namespace which is going to create this security separation and isolation which is good then we're going to specify create namespace but we know that this namespace exists already but still it doesn't hurt and then Dash F and then we're going to provide the location of this file which is in my home directory and then code spaces and then Arc configuration oops there's something wrong with my yeah our code space actually our configuration and then Runner skill set one and then values.yaml file and then we're gonna specify the link to the package right where our Helm chart is published so this is going to be it and make sure here that it's you're using the GHA Runner scale set chart and not the GHA roundness cassette controller chart and this is the distinction so you can configure as many jha Runner skill set Hound charts or installation as you want but you can only have one controller all right the next step is to specify the version of the helm chart we would like to use and also you can skip this if you want to use the latest version and now we are ready let's hit enter and then you will see that the installation has been successful now let us do a quick check helmless Dash a and you can see here that Arc Runner set has been installed or deployed and the version 0.4.0 perfect but this is not the end of it all so let's first of all make sure that our pods are up and running so cubecado get pods Dash and Arc systems and we need to see two parts over here right which are the controller and then a pod for The Listener this means that we have a very successful installation if you don't see the listener pod then you have a problem and there could be many problems over here so the proxy could be misconfigured you need to go and figure out the proxy settings the DLS certificate could be misconfigured then you need to go and figure out the DLS configuration settings if you don't see the listener that also means that the secret that we have created either the personal access token or the GitHub app is incorrect either the format or the structure of the secrets data as incorrect or the reference is incorrect or many things could go wrong in that particular section so if the listener doesn't go up then you have a big problem now if the listener goes up then we are probably in a good State because if things worked well so far it's highly unlikely that they will take a wrong turn but still things could go wrong everything is quite complex when it comes to these things now if things go wrong what you can do is you can inspect the logs of the controller right so if you don't see the listener you can start by checking the logs of the controller and if you see errors then you need to start addressing these one by one until you resolve all of the problems so you can go about it like this keep cuttle logs and then you can fetch these logs and see what's going on as you can see here there are no errors so you can see only info which is like fine it means everything is running smooth and as it should be expected a couple of things also to note that we have specified a minimum number of Runners so we should see five Runners also spun up in our cluster but they are in the second namespace right so we can say Cube cuddle get pods dash n Arc Runners and we should be able to see five Runners that are currently in an idle State also you can see here that we have two containers inside each pod why because we have configured the container mode Docker and Docker so if you only see one uh container running inside the Pod that means you have a problem and Docker and Docker is not running as it should be what you've just seen right now is all thumbs up our setup and configuration has been spot on we have no issues whatsoever and now we can start using these uh runners in our workflow so let's go back to our values.yaml file because I need to get the name of the runner skill set so the runner skill set's name is the following Ubuntu 2004 so let's copy this because I need it and then we're going to go back to our organization and then we're gonna go to our actions tab and then Runners and then here you should be able to see the runners right because we have five of them and then you should be able to see the runner skill set which is this one and it belongs to the custom group and if you click on it you should also be able to see a small view which shows you the number of available jobs jobs assign jobs for this Runner skill set busy and idle Runners this is obviously Gonna Change we are working on improving the UI but it hasn't landed yet before we move on with the demo you're going to notice a couple of small changes first of all my t-shirt color has changed and this is because I'm recording this demo on a different day I bumped into a small technical hiccup on the original day and I was not able to solve it in time and the second change is that I'm using a different organization don't worry about these changes they're not going to affect your setup in any way shape or form I have the exact same repository workflows configuration and everything on the new organization as the old one I just wanted to clarify things to avoid any confusion you will notice here in this repository that I have a simple workflow that I'm going to explain right now what it means it's the short sleepy Matrix this workflow takes two inputs and it is executed On Demand with a workflow dispatch the first input is the actions Runner skill set name I wanted it to be configurable because I wanted to do multiple demos and I really didn't want to hard code the name of the runner skill set in the workflow the second input is a delay of a number of seconds and this is because this workflow will run a job Matrix which will run three concurrent jobs and for each one of them it's gonna use my input of the runner skill set name as the input for the runs on property over here which means that these jobs are gonna run on my Runner skill set and Arc is gonna spin up the runners to get these jobs assigned to and execute them and I only have one step which is going to Echo the name the Matrix version and it's gonna sleep for the duration that I have specified this gives us a little bit of more time to see what's going on on the other end now let's get a sample workflow run happening we're going to go to this workflow and then we're gonna go to let me put them side by side just so that you can see what's happening I'm gonna click on run workflow and then I'm gonna paste the runner skill set name remember we call the Ubuntu 2416g and then I'm going to specify the duration to sleep of 20 seconds and then I'm gonna run the workflow and as you can see here we're gonna have a run there we go you will see I have three jobs and if I click on one of them you will see that these jobs have been assigned to this Runner skill set and they have been assigned specifically to this particular Runner and the job has started execution now and if I go here this is my cluster you can see I have the listener and the controller up and running and I have five running Runners already so the job is going to execute on one of them and in this case it's going to be this one and if I inspect the logs of this Runner you can see here that all of the you know the job is actually executing and when the runner is done it's work it's just gonna exit and then when it exits the controller is gonna clean up this pod and because we have defined a minimum number of Runners it's going to create three others because we have three jobs so three of the runners we have already set up were executed their work and then they exited and then the controller is just gonna spin up three other Runners to make sure that we meet them desired minimum count okay if the minimum count is zero then you will see no runners being spun up instead of the ones that executed the job and you're going to be able to scale down to zero that's it now let us see a more complicated workflow that demonstrates how you can use the docker and Docker feature so in order to test out Docker and Docker I have another workflow that I'm going to be using it's called test service container and this workflow is pretty much the same it gets dispatched with the workflow dispatch and then it's going to run a job on my Runner skill set however this time it's gonna try to set up a service container which is going to run redis and I'm gonna use the container busy box as my main main container where I'm going to be running the job and then I have this step over here which is gonna simply run a hello world and then it's gonna sleep for 360 seconds and it's also gonna have a couple of environment variables which is the redis host as well as the red Sport and then I'm gonna have the service redis which is going to use the image redis to be created alongside my Runner container in the same pod and I'm gonna have access to redis from my Runner container and this is good if you want to test for example a scenario where you need to connect to another service like redis and you want to execute some behavioral testing inside your Runner this is the way to go about it and Docker and Docker will facilitate this for you so let us see how it works in action I'm gonna jump to my actions tab I'm gonna go to my test service container workflow and then I'm going to click on run workflow and I already need to I already specified the workflow the runner skill set name over here so I'm just gonna run another job and I'm gonna go and inspect it you can see I already have one running you can ignore that one and if I go and see to which Runner it was assigned I can see here that it's assigned to this particular one which is already in my list over here so now I can actually go and you will see here that I have the docker in Docker container this is the Pod I have the docker and Docker container I have the init container which is supposed to do the configuration and you know create the copy the externals working directory and then I have my Runner now the docker and Docker is running my redis service and the runner is running my job which already has access to redis and let me demonstrate this for you I'm gonna execute into the container and I am going to start by installing netcat just so that I can demonstrate how I can connect uh to redis so we're going to start with the ift get update that's the package manager for Ubuntu and then I'm going to install netcat and netcat is just a nice small utility which allows us to connect to a host and a port and then if I do a Docker PS you can see here that I have the redis container running and it's open on the port 6379 so if I do the following nc127.0.0.136379 I'm now connected to redis if I say something like hello you will notice here that this is the redis server and it's running version 7.0.1.12. if I go into the docker in Docker container and then if I list the running processes you will see that the redis server is already up and running on the port 6379 it's as simple as that all right now what we want to do is to try to configure our second runner skill set and we're going to enable it in kubernetes mode where container type is going to be kubernetes mode as opposed to Docker and Docker first thing we're going to do is I'm gonna remove the values that ml file from this directory and then I'm gonna copy the one that is in my previous configuration why because we're just gonna change it a little bit and yeah I don't want to start from scratch let us jump to the top and just verify very quickly configure RL pretty much the same nothing's going to change here for the secret also the same nothing is going to change we're not going to use a proxy maximum aluminum Runners are going to be the same Runner group is going to be the same we're gonna change this to Debian as opposed to Ubuntu just to create some variation and then we're gonna change the type here so instead of Docker and Docker we're gonna specify this to be kubernetes and then what we're going to do is we're going to remove the comments here because they're needed and then we're gonna fix the indentation otherwise we're gonna have a problem here so these need to be on the same level storage class resources I think this should do it if the yaml complains we will come back and fix it for the template everything is going to remain the same however I need to draw your attention Okay so this is what's going to be injected if the container mode is kubernetes right so the template spec is going to change and this is going to be the spec for the runner and you can see there are a bunch of additional environment variables that have been added one of them for the path of the container hooks the second one is for the actions Runner pod name the you know the new one that's going to be created and then this one is for uh making sure that if the workflow does not have a container job it's going to fail so it's important for you to disable this why because otherwise you're gonna see different types of failures happening because if you have a runner skill set that is configured only to run container jobs that's totally fine if that's how you want it to be but usually it's a pitfall that many people fall into whereby they don't know that they can disable this Behavior with this setting and then they assume that kubernetes mode only works for that type of workflows that's not true we're going to disable it the rest we're going to keep the same because it's you know the similar volumes the claims I'm not going to change those you can obviously configure those on a neat basis I'm gonna skip them for now so let us add the environment segment over here and let me paste this all right there we go and I'm gonna switch this to false and that's it this is only the only configuration I'm gonna do I'm gonna say save this file something got my sorry something got my attention here all right the volumes I think this is fine cool I'm Gonna Save this file right now and before we install this first I mentioned that we're gonna be using persistent volumes and you can either create these volumes yourself make a number of them available but that's problematic because obviously that's a manual effort that you always need to do so whenever a new Runner spins up it needs to claim a new persistent volume and if that happens then if you create 10 volumes and you create 10 Runners that's it the volumes are gone you know now you're gonna have a problem that's why I would I personally am going to use in this setup a dynamic local provisioning local persistent volume provisioner it's called open EBS you can pretty much use anything else you want for me this is the easiest easiest setup that we can do so first of all I need to install open EBS before we can proceed further with kubernetes mode so be aware you're gonna need a solution for this particular problem how do we install open EBS just a couple of extra Helm charts again that we just need to make sure they are available the first step is to add the open EBS repository so we can do that with Helm repo add open EBS and then add URL the second thing we can do is Helm a repo update done and then we can say Helm install open EBS and then specify open EBS PPS sorry I'm just making sure that I don't have typos namespace we're gonna put it in open EBS and then we're going to create the namespace if it doesn't already exist obviously all of this stuff is configurable so feel free to change this whichever way you want and then let's make sure that we are up and running as usual I'm just going to access my favorite tool K9s and as you can see here I have my controller I have my previous listener which we configured I have my five Runners up and running already and then let's scroll down a little bit and you can see here that I have the open EBS you know provisioner NDM and operator all up and running as they should be everything should fall into place right now no further configuration is required here in this state then I think you are in a good shape now the next step is for us to install the new Runner skill set just like what we did before so let me find that command I'll install what was it Arc Runner set this one yeah so let us use this one but instead of this we're gonna call it Debian Runner set and then we're gonna keep the same name space however this time we're gonna change this directory so that we can point to the new values that yaml file right otherwise you're gonna install the same old configuration so please make sure you don't fall into the strap and then we're gonna install and hopefully everything goes smooth perfect it says that it's deployed but again we need to verify ourselves and make sure that the listener the second listener is up if it is up that means our configuration was successful so let's have a look and yep The Listener is up and as you can see here also the five minimum required Runner are preparing themselves and obviously you can inspect all of these configurations right so if I describe this listener for example I see a lot of valuable information here so I can see that it's already it has a label where it says uh the name of more icon organization and we do this on purpose we add the label so if it's configured on the Enterprise by describing the listener you can see to what's the target of its configuration whether it's an Enterprise organization or repository then you can see also the skill set name the skill set namespace which component it is and what it is part of what is the release version as well as the runner spec hash uh you can see a bunch of other files like the image that you have been that we have used uh right and what are the different containers that are part of this pod you can also see the command that we have started in the spot for the container and you can see that this is the listener as opposed to for example if I go and check the describe the controller you will notice here that the command is actually manager right and this is what's going to create the distinction from the same controller image which application we're running whether we're running the controller or we are running the listener and also you can inspect here the arguments that I have been passed to the manager so if you change for example the log level another argument is going to be passed if you change the log format another argument is going to be passed so there's a lot of useful information here now I'm not really sure why my Runners are not up and running so let us describe them and see most likely it is something to do with there we go the provisioner the persistent volume claim so so I think there's a mismatch between what I configured versus what open EBS is provisioning so I will double check this give me a second okay I know the issue let's go back to our values.yaml file and we're just going to modify a couple of things here the storage class name should actually be open EBS host path and then I think the rest should be the same so let us deploy this and see if it's gonna work how do we upgrade the configuration same command as before let's find it but instead of Helm install we do Helm upgrade and then we pass dash dash install that what this will do is we're just going to upgrade the configuration as opposed to installing a fresh one what that does is if you inspect the namespace you will see here that the listener is going to be recreated and then the runners as well and hopefully this time everything will work as expected let's see I I just described it and there we go they started coming back up online the slide delay is because open EBS needs to provision these different persistent volumes let me see if I can see them PV PVC there we go so these are the persistent volume claims and you can also see the PVS these are the persistent volumes one per Runner one gigabyte each read write access and then they get deleted once these Runners are down so let us go back to the pods and um yeah I think this is it this is our setup now these are these Runners just like the others to be used in our workflows and we can easily Target them and have different types of workflows running on them the way we're gonna test kubernetes mode is using the same workflow we used in the docker and Docker scenario with the first runner scale set why because I want to show you the same workflow using the same service container and how that looks like in the cluster itself so that you can understand the differences without much changes in the you know initial setup and variables so let's do that right now again as a reminder this is my workflow we're gonna test service container and because the runner skill set name is configurable in this workflow we're not going to change anything and we're just gonna jump right to the execution we're going to click on the actions tab we're going to go to the test service container and then again as a reminder we call this workflow Debian 200416g so we're just gonna change this to Debian and we're gonna run the workflow you can see here on the right hand side I have my cluster I have the second listener for the second runner skill set up and running I also have five containers running and also immediately you can observe a difference in these containers or these pods I have only one container as opposed to the ones who have Docker and Docker they have two running containers okay so you can see right now what happened when I ran this workflow is that something new came up this one and this is the Pod that I mentioned that's going to be created using the container hooks so the pod that's actually has the runner is going to be the one that's referenced here so it's the one that ends with 7l GH4 which is this one and then a new pod is created with two containers in it and this is the second one that's going to contain actually our service and these uh two pods are obviously connected networking wise so let's have a quick look at what's inside of it so you can see here in the second pod we have the busy box and we have redis so both of these are going to be running in the same pod why because they need to share the same network if they are across different pods Things become a lot more complex so what happens is we have the initial pod you know for the main workflow run and then we have a new part that's created with the container job as well as redis and if we repeat the same test as before we're gonna have the same exact outcome so if I go into my busy box container and I run bash you'll notice first of all that Docker doesn't exist here and now you'll be asking like but how can I connect to the second one well these share the same IP address right so if I do NC 6379 I should be able to connect directly to redis from this particular container and if I type in hello you'll notice here that the Regis service response server responds and mind you if I try to list the processes uh running here you will notice that the redness server is not running in this container it's actually running in the second container they share the same pod space and that's why there's direct connectivity between them and that's how you need to think about this that's it honestly it's not it's not that complicated if the job once the job completes and once the timeout is is over the delay that I have injected this container here this pod over here plus this spot over here with all of the containers in them are going to be removed and recreated that's pretty much it for kubernetes mode that's all I had to share with you I hope you enjoyed this video as I enjoyed making it and I hope you found it useful and beneficial and if you did please consider supporting my work there are many ways to go about it you can either become a member of this channel you can join And subscribe as a patreon or you could also do a one-off donation either via YouTube or via buy me a coffee another way also is to buy a copy of the knowledge graph and the very least just like share and comment on these videos and everything I post this really helps me keep going and keep producing content that is hopefully useful to you thank you very much for watching and I will catch you next time [Music] [Music]
Info
Channel: glich.stream
Views: 20,874
Rating: undefined out of 5
Keywords: deep dive, github actions, actions-runner-controller, actions runner controller, kubernetes operator, self hosted runners, github
Id: _F5ocPrv6io
Channel Id: undefined
Length: 124min 31sec (7471 seconds)
Published: Fri Jul 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.