GitLab Runners

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone and welcome to another exciting installment of the customer success skills exchange my name is Chris Reynolds I lead customer success enablement here at gate lab and we are joined today by Brendon O'Leary senior developer evangelist who's going to talk to us a little bit about runners today I put the agenda into the chat so today we're going to cover a foundational understanding of what runners are and how they work the life of the gitlab job some best practices for a runners rollout and then some artifact management using runners to speed up jobs and then open it up for questions and answers take it away Brendon great thanks Chris I really appreciate it happy to be here presenting on one of my favorite talk topics the gate lab runner and really excited for the questions that everyone's gonna have at the end so I will try not to speak too quickly but I also am NOT going to spend most of the time the slides hopefully because hopefully there's a lot of great questions and so to kind of talk about the runner will we'll kind of talk about what it is and why it is how its architected we'll talk about what platforms it runs on what's and what an executor is and what they are a little bit about auto-scaling and then some of the other advanced features that Chris was alluding to earlier so first a little overview so the gait lab Runner is basically the agent that's responsible for running sorry for defining a word with itself but running the jobs that you define in your CI CV pipeline so this runner can be is a nice generic term for it because it's actually you know can be run in a many different ways right but the reason that we have this concept of a runner is to have the ability to make you know the jobs that you run on get labs see ICD to be you know multi-platform that can run on any platform the runners written in a go the multi-language any language you're building with get lab you can build with the runner it's also built from the ground up for paralyzing builds for building with docker and you know then of course having this extra agent outside of get lab means you can have one gate lab installation but many many runners for the different needs that you need and that may be ones that people bring themselves and we'll talk about that or it may be a pooled model where you have you know job execution at the pooled level so the example of that would be you know get lab accom is a single install of get lab but has both pooled compute the compute that you get when your get lab comm user you know two thousand minutes for free and then up in the number of minutes after that for each tier that's that pooled runner shared runner compute that we allow folks to use and so um the life of the gate lab job I feel like I have the wrong slides because this slide says needs work but this is fantastic so that's okay we're gonna make it work sorry I think I might have one wrong slide deck link but we'll talk about it so the general outline of a gate lab job is it's going to pull the code or pole I'm sorry the runner will pull for the for the occurrence of new jobs so this is a critical thing to the way runners are architected we'll talk about it in more detail later but what's a job is received it then clones the repository to itself and then runs the script so in more detail for to pull those steps apart there's actually a number of steps that happen so the polling still is first and the running the script is still last but there's a lot of things that happen in between them so there's pre build and pre clone jobs that can be run which can include you know the ability for the administrator of the runner to say something runs before every build or for the writer of the job to say I want to run this code or you clone the repository you also can have jobs that don't clone the repository then there's post clone jobs scripts that can run and then of course the actual user script so we could talk more about that in a little bit because again sorry about the slides there as Chloe said everything's work in progress clearly agate lab so I don't even know why it was labeled as such but first let's talk a little bit more about this runner architecture so as we are kind of talking about earlier the concept of the runner is a you know it's a it's a one-to-many relationship of gitlab installation to runners right and so those runners can be virtual machines they may be kubernetes pods like it's spun up they could even be you know the developers laptop that is running the job so they can see how they're running exactly and it's an important part of the important part of the architecture is that polling method we talked about before so gitlab has a queue of jobs that it stores that need to be executed and which runners are able to execute that and as a runner I pull gitlab and I say hey do you have any jobs for me this is who I am this is my configuration and then get lab will decide yes I do have one that matches your configuration here it is and that's an important distinction about the way that the gate lab Runner is architected because it allows you to have your runner in a completely separate network from the gate lab server as long as it has access in that direction and the gate lab server doesn't necessarily need to know anything about the network that your Runner is in or you know have the ability to actually reach into that network right the runner is reaching out what this allows for is customers that maybe have multiple multiple network segments or customers that want to use our gate lab comm but to store their code but run their runners within their own private networks it allows them to do that without having to open you know ports on the firewall so the for instance would be you know a simple example would be here in my house I have a Raspberry Pi that has runs my DNS for the house it's called piehole slope software and when I update the configuration of that I'd actually store that configuration on get live.com but the Raspberry Pi itself is a runner so get lab comm doesn't have a direct connection into my house but the runner is always going out and polling for new information and if it finds that there's new commit it pulls the latest in builds and reconfigures itself so that means I didn't have to you know open up some port on my router I don't have a static IP address at my house none of that was required for me to get going with that it's Darwin I think to that this is not a small point I think a lot of customers assume the gate lab has to build a post into the runner at an API call level so I really emphasize this with customers because I think they don't get it that they don't have a lot of security requirements to run private runners wherever they want yeah it's really critical it's critical especially in the essent you know SMB and mid market where you know we have traditionally seen a lot of demand for comm but also just as critical as we've seen growing demand in the enterprise for comm to emphasize that point that just kind of seems like a basic architecture point was a really important way that we architected our CI system and one that other systems are either trying to model themselves after or failing because I can't model themselves that way with with other CI systems you end up with a lot more patchwork to achieve the same level of security rather than just kind of being native to the orchid the way that the runners are architected as it is with gate lab now I said that the runner is polling yet lab with its configuration asking for new jobs and so how does get lab kind of decide what a runner is going to run so there are a number of properties of the runner that help make that determination and so I'm going to go through each of those and so they're they're mostly mutually exclusive properties a runner can be a shared or a specific runner it can also be a tagged or an untagged runner and I can also be protected or not protected runner so then you can have any combination of these things as well of course so you could have a shared tagged protected runner or a shared untagged protected runner or a specific untagged not protected runner there's a you know any combination of these things exist and we're going to kind of go through next each of those so shared versus specific a shared runner is in that kind of general pool and can be used by any project in the entire get instance that's installed and and configured to use so these are all managed of course then by a gitlab admin on the administration side of the get lab instance so that's at you know your get Leben's the slash admin / runners and typically these are have some sort of auto scaling associated with them so again the the key example here would be get lab comm that has a number of runner scaling managers assigned to be available to any project on all of get lab comm and they auto scale and create and destroy a virtual machine actually for every single job that gets created on get lab comm so even if you have a one pipeline with three or four jobs you get a new virtual machine that exists only for the life of that one job specific runners are tied to a project or to a group and they actually can be tied to more than one project or group they're then only in the pool for those specific projects that they're tied to and can be ran managed then by the project or group owners right so in the namespace under CI CD settings you have the ability to add specific runners typically this is for special IDE builds or if you know an organization doesn't want to provide share compute across their get Lab instance they might do this so that people can bring their own compute you might hear that sometimes and then of course this is how I would as a get lab comm user add my my specific runner that's inside my network that I'm not freeing up the compute to all get lab before or all of everyone that users get back on for then again a runner can be tagged or untagged tag runners look only for jobs with that same tag and untag runners run jobs with no tags so an instant for instance use case here might be while I'm doing a Windows build or I'm doing on iOS build that requires a specific operating system to build on and so I'm a tag my winner my Windows runner with with attack like Windows and then I tag my job here you can see I'm building a c-sharp project here with with Windows so that I do that or I may say that you know I'm just gonna be pulling the Maeva you know maven JDK image so I can run on anywhere any dog any untag runner that has docker so that's the two examples here and then finally a runner can be protected or not protected so protected runners only run jobs from protected branches or protected tags so this is typically for runners that might be doing deploys they have the deploy keys on them or the deploy runs specifically on them and that allows administrators of those production environments to have more direct control and not allow any developer on the project to execute arbitrary code on those runners and then not protected is obviously inverse of that Ron's on any branch could be used for any build and then finally I said that was finally but there's actually some more options so you can actually deactivate a runner so it doesn't accept new jobs you can say whether it's going to run on tagged jobs or not so a tagged Runner actually can pick up jobs without tags and you can also lock it to the project which means no one will be able to assign it to another project so as I said a specific runner could be tied to one or many projects a lock specific Runner would would be locked to that particular project there's an example project with a lot of these different kinds of tags and runners that I have not done a great job maintaining but hopefully we do doing a better job in the future at gate lab comm slash all the things where I'm trying to do get lab CI for all of the things and so speaking of all the things let's talk about platforms and executors so as I mentioned earlier the runner was written and go and go is a language that can run almost on any platform in the world and so the gate lab runner is able to run in Linux and mat on a Mac OS on Windows and docker in kubernetes there's lots of different platforms with which you can run the gate lab runner and then for each install of the run arm you choose what's called an executor so the executor is how the runner is going to execute the script surprisingly enough and so the most common ones are our shell runner so shell runner is exactly like it sounds like it just runs commands directly as if you were typing them into your bash terminal or your PowerShell terminal docker uses a docker image and execute the build inside of that docker image it's obviously our most common use case another very common use case today is our docker machine auto scaling so this is where a machine scales up runners and basically it's kind of a bastion host that then creates new runners on demand this is how dr. machine is how our auto scaling works on get lab comm and then kubernetes where the runner runs as a pod and a cluster and can also enable auto scaling because Eddie's enables always Galen there's a number of other executors though and so some least common ones include VirtualBox parallels SSH these are just different methods of running your jobs and they're actually now is the concept of a generic executor where you can actually provide we actually provide G RPC endpoints to allow you to customize the build environment and set up your own executor and that's allowed a number of executors outside of what we'd be able to test for instance in high-performance computing environments in proprietary virtualization software that we may not have access to and in fact it is how we as an infrastructure team created our windows shared runners on gate lab comm was actually using the custom executor method to create a custom autoscaler for those and so I expect to see that method used to also replace docker machine in the future as it is in maintenance mode from docker and we're gonna need a better way to auto scale in the future and let's talk about that auto scaling right now so again the way that folks are able to do that so AWS there's a lot of ways to just generally auto scale compute in general and so folks have written you know ways to have auto scaling groups in AWS stand-up runners as they would stand up other servers again the most common use case use usage today is docker plus machine so doctor plus machine was a is a project that tries to act as middleware between you know the api's of many many cloud and virtualization providers and providing a common interface on the other side so the fact that we have doctor machine in the gait lab relay means you can auto scale on all of those things listed here and so that's obviously gonna be that's obviously a lot of work that was done and would be a lot of work to maintain and that's I think why doctor chose to stop maintaining it but it works pretty well right now it's getting patches for security and then we're working on plans to bring native auto scaling for all the cloud providers directly to the runner and then again kubernetes you can use a config map and spin up a pod purge AAB to auto scale the runner oh and then this so just to talk a little bit more about these so docker plus machine again allows you to create a new virtual machine for instance an ec2 instance which we'll talk about next and it also has a lot of great features like how many I want to keep around idle how long I keep them around for when my peak usage is there's a lot of really great features of that version of auto scaling that allows you to really manage your kind of I talked use you normally talk about it as your compute spend versus time to wait for a job ratio right and then an AWS again you can use ec2 you know there's the ability to have these across network so of course you have to consider all of those AWS networking considerations like security groups and your V PC etc as you build them out but there is the ability to use spot instances so you can either be easy to spot instances or unused compute in the ec2 environments that you can bid for and basically get at a super discounted rate without you know if the the trade-off there is that it may go away if if compute demand increases but the fact that you know jobs can if a job goes away it's not the end of the world means that you can save a lot of money in your compute cost by doing that the runner also has the ability to use s3 compatible storage for its caching and so this is a really great solution to share the cache among runners that may be different machines and then kubernetes again the easiest way to deploy this is with helm and you know we have that one button deploy in the gait lab manage apps a section of gate lab for kubernetes but then again you've got to consider caching and if you need persistent volumes and a number of other things in order to make kubernetes work let's talk about a few security considerations so we talked a little bit about how the architecture kind of kind of helps from a secure perspective but I think there still are a number of concerns that you have to be you have to think about depending on which excuse and how you're going to be using the runner so this talks about it in more detail don't necessarily need to go into it super in detail but things that you'll hear come up are the ability to use self-signed certificates there's a method to configure that you'll hurt here about privilege mode versus unprivileged mode with docker so basically the difference there is how many how much do we need allow the user to execute outside of the docker container and you know there's some other issues you know if you if you need authentication through a proxy you have to configure both docker and the gitlab runner to use that go ahead sorry it's a question just a breath fantastic awesome so let's talk about how you install the runner so again it can be installed to any platform that runs Co so that's almost every platform you can name you basically grab the binary that's right for your system and then you registered through registering the runner where you put give your token and setup the executor and how you're going to run builds for that runner and then the configuration so if we're talking about advanced configuration some of those things we talked about like the docker plus machine configuration how you configure it to use cache that's external to itself how you configure it to scale those things are typically contained within the config tunnel and so that's often something that folks will turn into a template you know if you're for instance auto scaling runners with an auto scale group in AWS you might have this this templatized so that you can put it in place yourself rather than doing it kind of interactively through the interface and there's a lot a lot of options here I could spend another half an hour going through everything that's available in the tomah there's obviously fantastic reference documentation for it as there is with everything we'd get lab but I would mostly be interested in what questions you might have about that or other things runner we have a lot of really good questions in the documents awesome all right let me to share the doc or stop sharing my screen we can just talk or how would you like to do that yes you want to show me the link I can put it in the agenda well it's a doc from the agenda right yeah okay cool I'm saying do you want me to share my screen while we do that or should I just stop oh yeah that's up sharing okay great cool hey Brendan I think I've got the first one here it's really just around Tompkins was in there sorry love that could you riff for a bit on on sort of how we should be positioning things like run our best practices like I've had multiple customers say to me what's the best practice for creating an auto scaling pool of runners today and I've you know in my five customers I've seen them manage in five different ways so I'd like a couple of epics here they talk about kind of what I what I believe Darren's path forward to be but just curious on your sort of hot take on where we're going with that and how we can sort of credibly stand behind that without being able to do that with calm today yeah so I would say there's two things here there's there's the consideration of how the customer views share compute right that I think greatly impacts their answer and then there's the the issue that docker plus machine is in maintenance mode so that kind of stinks if it wasn't for that that would be the answer and I think it still is the answer for a lot of customers the fact that it's in maintenance mode especially if I'm running in my own environment I don't think that's a huge concern it will not be the only open source library in your production environment and maybe that's about right like but I understand that cut some customers have that concern so I it's my understanding that at least it was when I was yeah I think one of the epochs you listed we opened when I was PM of verify and then the other one looks like it's a little newer maybe let me reframe like to my knowledge then we've got reference architectures today if you've got 2,000 5,000 10,000 25,000 users you can kind of go look at a doc page that says here's what we've tested very thoroughly and would recommend in terms of get lab infrastructure I think that directly speaks to runners and I'm curious if we've got thoughts or goals or ideas around you know hey if you're an Amazon customer AWS like here's how to build sort of a pool of ephemeral ec2 runners using docker bless machine workers the recommended PKS configuration and here's the sort of determining characteristics that would suggest use one or the other yeah so I would say for AWS there is a great article on that I just linked it in for auto scaling in general it's much more generalized but I think for AWS what's listed in that article is is current best practice it goes through all the details of how to set up a runner manager and configure it and again I would if I was instead if I was you know at a large company today and installing gala runners this is how I would do it in a depress to Jaime I'm actually working on a template I had already worked out a template before that's Auto patching for Linux when I'm done it should do Windows and Linux both docker executor and shell and do either you can put both on the same set of machines or separate them by type and then as I have time I'll improve it we we had something similar where I came from before that would actually patch to you just rerun the cloud information it'll patch all the runners and then you'll be able to set the I am profile as well so that you can give specific permissions to each runner cluster and also even our docker machine patterns that we've done and the ones I've seen the community do they never make the docker machine machine a che it's always sitting there by itself it's not an ASG so what this will be as everything's in ASG so that it's also a che so if the runner dies and just built itself up again and it's ready to go as well as scaling so that's something I'm playing with on the side so if anybody needs that can you show me yeah I would say that that is the way that the professional services team installs chair compute if they're doing it is with the ASG for the runner fashion hosts and then doctor machine after that and it's it's alright I would say everybody has a different enough way of looking at the what what they want to do is share compute that that can be often a limiting factor and I was Cuban at ease sorry Brennan I was just curious like I want to just get check my base response to most customers that asked me like hey can I do this in Cuban at least I usually say we recommend dock machine is that is that still the right answer it depends on the use case the thing I would say is we don't auto scale the runner today in kubernetes because of the requirement to run in privilege mode and so we don't do that on to get lab calm because we don't trust the folks putting putting stuff in to share compute there and I would say for most large enterprises they don't also don't trust but they may have a scheme of kubernetes today where they've got multiple clusters for production versus test and then it's not a big deal in a you know test cluster to have stuff going willy-nilly and then production is locked down in a different way but I would say the reason we end up not doing that is because people want to have shared compute on trusted share compute and that's not ideal through Cooper net the kubernetes executors today Brendan there was also a customer who gave a presentation in Brooklyn where they're using cayenne for permissioning the pods so that they could have separated permissions but still use kubernetes Ivan I've never done it but I just remember that I recalled that because I that was one of the reasons why we were holding off on yes I've never done it either but it was a great talk that's good it's kubernetes I am so you can actually see me did it answer your question yeah yeah by all means please move on appreciate the time but I would say can I add one thing sorry Chris I would say that it this is the most important question in my mind because I think our Technical Account Managers can have a huge impact in their jobs if they're able to get customers over this hump of having shared compute 4runners that's the thing that will let get lab take off at your accounts so fantastic for first questions here awesome DT you have the next one question yeah great good morning from uh sunny Santa Cruz California so time to give you an example of what we deal with in the field I had a call this morning this is kind of level of detail that we have to handle but basically I thought you might know the answer which is so our customers been running their their server in HTTP mode forever and now they want to wire it up to octa which requires them setting up the certs which is pretty pretty straightforward but their concern is they have an entire fleet of runners that are out there they're wondering if they have to change the registration call back to the mothership the protocol has a good question I don't know it depends on exactly what they're doing like if they're good like my assumption is probably they're gonna have to change it because probably their corporate policy is gonna be say turn off HTTP access to get lab right like why would they be enabling HTTP access to leave HTS access to be leaving HTTP on well obviously they leave HTTP on then it should work but if they're you know forcing everyone through HTTPS then no does that make sense it does yeah yeah of course yeah their concern is having to go change a hundred hundreds of runners that are out in the field yes that would be my concern too I mean we have to follow up supporters yeah it's mostly like a policy like thing like if they're gonna leave HTTP on then then yeah the runners won't even know us but I I would assume they're gonna turn it off once they're done and so then I have to go they would have to go to re-register everything right I would test it if I were them in fact if they've already got HCP set up I would testify with them I wonder what happens if you just changed the URL to have an accident and then want to configure it I've seen so much counts this is Andy I've seen a couple of counts use rewrites if they have a load balancer in place rewrite yeah it depends on what they have in between you know the runners and good lab but that's one option and it can address that yeah that's a good point anybody thank you all right did you do that answer your question yes a good thing beautiful all right let's see I think you have the next one uh good morning afternoon from semi sunny Morro Bay California yeah I guess let me be a little bit more clear about resources I'm so this would be things that we need to persist between jobs but maybe we want to hook into it could be even like you know a text file so maybe not so much dependencies which are you know our documentation clearly states is better reserved for caching but I still feel like there's a little bit of you know fogginess between those two and and or is there other approaches that we should be talking about or thinking about there is fogginess okay so this is a great question so cache and artifacts just for anybody that might not know what they are there's there's two different ways of persisting data out of a job on the gate lab runner one is cache and one is artifacts you can read more about them in the documentation but the important point here for the question is what's better for forgiving use cases so the number one thing to keep in mind first off is that cash is only guarantee well it's not guaranteed at all we'll get to that but the only time you should rely on cash at all is in the context of the same pipeline all right so if it's possible that you're going to need something in another pipeline you're already on artifacts period the next thing to remember is that cash is a best-case level job in sidekick and so that means that it there are going to be times where it fails if get lab is overloaded with a bunch of other stuff cash is one of the things that falls off the wagon first so if again it's so if it's mission-critical you might want to consider an artifact but if it's not mission-critical used cash so example would be it's not super mission-critical for my test results text file to get between jobs I might use cash or maybe my node modules if I'm doing a node.js project versus if it's mission critical that the build artifact from this job is updated before my downstream job runs right like that consumes it well now it's an artifact for sure or if there are mission critical tests that I have to parse every times and it's an artifact so that doesn't give you one answer but that's the answer I think to the question okay so for pipeline to pipeline then you should consider your artifacts already yep yep now having said all that there is plans around a concept of workspaces someone maybe can find the issue I'm looking if you google get labs CI workspaces you might find it where so one of the big advantages get lab has is that we don't have workspaces that's something that if you've ever administered Jenkins you know it can be a massive pain but it's also a disadvantage when I understand those pain points but I still want to workspace and I want to share a workspace between jobs and so that we've got plans around how we're going to sorry not shoot workspaces as an overloaded term yeah it probably will change names because now we have a concept in managed called workspaces simply be called isolation yeah naming things is fun right there are only two hard problems in computer science naming things cache invalidation and off-by-one errors so the anyway the concept would be you'd have a shared workspace which then would kind of almost be in the middle of caching and artifacts right like it's like not super like forever feeling like an artifact it's not an artifact but it's also not and it'd be nice if like a cache it's like no this is how we build something we do this in the workspace and then we do this other thing in the workspace and we'll separate the jobs and sorry for using the word speak workspace over and over again because who knows what are we called by the time it's out but that was what we used to call it all right I think I found the epic punch that created it shared workspaces for CICE jobs I just liked in chat I'll get into the doc eventually arises thank you we post images in chat because I have something to hear around that workspace thing like if I drag and drop something does that work I don't think so I might put in the dock then thanks - okay MC I think you're next whoever MC happens to be Brendan just want to know if there's any key differentiators yep can hear me mark yes are you so is there any key dip different running the Jenkins all the time right I'm sure everybody does any key differentiator between arms runners and agents without calling your baby really ugly yeah I mean their baby is really ugly but the so there's a video that I did on this that I will find a link to where I it's entitled what makes Jenkins better than get lab see I like it was kind of like a theoretically it was tie from product marketing interview me about it I spoiler alert I end up still talking about what get loves better but I think the biggest thing when talking about Jenkins agents versus runners which I'm glad they changed the name to agents used to be master slave so you might hear that terminology sometime is that you you have to configure oftentimes those agents to have the right tools in them so again the disadvantage that I talked about the gitlab has is not necessarily us being smarter than Jenkins but our time to market so we came to market with CI when docker had already kind of won the day and so our CI is very docker first at docker friendly whereas Jenkins has kind of bolted that on so the traditional Jenkins install has these agents where you have to go put the tools on so I have to go install nodejs or install Java on and so meeting and watering and managing those agents then becomes a very you know cattle if you ever heard the term cattle versus pets you know those agents are oftentimes their pets that's yeah whereas get labs kind of designed from the ground up to have be cattle now a lot of folks have spent so long with Jenkins that they've created their own systems of cattle on top of it but they're still maintaining that right whereas we're maintaining the ability to do that with get Lab CI that that would I say is a big difference excellent thanks anybody that's run Jenkins has had that problem at once and maybe they've solved it really well in their massive enterprise oh my gosh look at us but that's again that's that's not any easy thing to do I saw it the next year the context for mine is is I've got a large sort of GPU manufacturer customer that you doing crazy things where their hand rolling get replication to nvme based servers elsewhere in a different physical location to try and speed up a CI job with a custom CI tool they've written so it's kind of insane and maybe Tanish to be relevant but I was curious and I think D T's question relates when it comes to you know partial clone or shallow clone in the Runner itself and building that if you've got some monster large multi-gigabyte repo you know what ways can we sort of pointed the runner and say this is what's defensively better about our approach versus like a champions approach I mean you kind of just talked about with the cattle versus that story but oh yeah I did a little you're probably gonna have more pet like servers in this case right and so again I would say that our advantage is you can still do everything the way Jenkins does it right they would say oh well Jenkins it's the agent and the repo is on there well you can do the same exact thing with gitlab ci you just have the option to also not do that I would say so I would say that if I had these kind of repos I probably would have more closer to pets right I'd have you know runners that were longer-lasting then then you know just completely ephemeral right because it's just a use case where you don't necessarily want it to be completely ephemeral it's also probably a customer that has you know access to a good amount of compute and that might not be a big deal for them the other thing I would say though is the beauty of gitlab geo which I know there's not many folks that do this but the fact that get lab geo is a read-only copy of your entire get lab repository means if you had some test bed in some other country or you just had a lot of runners that we're gonna be hitting the heck out of your get lab server you know geo might be a solution for that for the latency that you wouldn't otherwise be able to overcome you know then you've got this if you do want to ephemeral servers you might be able to put a gate lab geoserver right next to them so that they have quick access to to clone and a clone from the geo node rather than from gate lab itself you can also do a lot of stuff like that around like traffic shaping right so you could traffic shape the run or API calls stuff to go to specific compute nodes right that's a lot of what we do on get live.com that's maybe a little bit crazy or an event or advance but there's even some cool stuff coming with or cool stuff that I've been discussed with gittel eh-eh where you can actually tag from storage types so prefect could be aware of a certain note that's on SSD storage for high-performance repos need to say alright repos tagged as you know high importance for high speed could could be redirected to be stored on this specific yep some pet pet to buy speed note right exactly exactly yeah so there's there's that kind of stuff is gonna be really helpful in these kinds of cases so D T's next question relates pretty closely that I don't know if you want to add anything in terms of shell Oh clone or Cadets yeah I know ya get depth and shallow clone if would have existed for a while on the runner and I think they're probably some of the most underused underutilized features of the runner to help your help you speed up I remember when we moved it in 1202 50 there was like us whole uproar about it but like 50 still is a crazy get depth to clone if you're thinking hey I'm building head right like right like why do I even need that so you can definitely if you have especially a repository that might have a lot of alpha objects that change often enough or a repository where you're implementing shadow clones for your developers then get depth is going to be a very good friend of yours and a quick a quick win thanks Thank You Ricardo give an ask your question America and my my question is related to recommendation to reduce or optimize the build time for the and for the runners yeah so I think it's again it's very use case specific like for instance the use case Jamie just brought up is well probably the right solution is to have runners that have their repositories on them and they're just doing differ the answer may also be so I think for a lot of customers at least I mean I found this myself we introduced the concept of what's called a dag pretty recently so dag is a directed acyclic graph and so this the kind of more English he is you can have jobs in different stages depend on different state different jobs and not just wait around for an entire stage to finish I've like cut some build times of mine in half just by refactoring that correctly because I might have you know test build deploy but there's multiple components right and there's seven jobs and build and you know one relies on job one in test and two of them are alive job to actually using the dependency stuff the stuff that came out with dag really helps you architect that pipeline to be a lot faster and then the other thing again I think this is a a balance of how much do I want to pay for compute versus how fast I want my jobs to run right like I've got to make that trade-off right if you threw a massive VM at all your jobs well they'd run really fast but you pay a lot for that compute so you can also tweak you know how much impact does you know extra memory or extra CPU actually impact the job and then is that worth it the cost difference worth it to us Hamas for them yep so TC you asked a question about the fort version of dr+ machine in the chat but Brian answered it and in the interest of time I think we'll move on to the next question which is your string e oh gosh we can offline it's it's nice skip I'll take a pass all right I would say yes but yeah we could talk more about it my short answer is one if it's not then we've got bigger problems is my short answer Vladimir I think you're next yeah correct my question is about what are the best recommended practices for customers to manage large and diverse fleets of runners so I have cases where the question comes is hey we have four different developments creating the runners scope the project's scope the groups and in the end we have no visibility what are there what's a state how busy they are and how we can reuse them what are the best recommendations on this site I would say that's somewhere that we're lacking you know the today basically answer is don't let people install their own runners and have a method that they come with through that you where you manage this or let people install their own runners and have this run or sprawl that's a pain so I I don't know if there's a right answer unfortunately I do know it's something that we're concerned about and the runner team has a number of open epics around just run or administration in general I think there's a lot of quick and easy wins there I don't know if they've been where they're where they are prioritized priority wise like just filtering the list of runners and a couple other little things like that could go a long way I mean I think there's we could do a lot better than filtering a list of runners but even just doing that I think would have a big impact so if you've got customers that want that I would encourage you to share their use cases on your issue because it's something I've heard a lot it's something that I was considering pretty pretty important when I was running the the verify stage but I don't know where we are with it today all right next up a Hugo is that your question that is Mike that is my question so it kind of goes along with the question and the first question hand right but for JCB and you know suggesting docker machine Graham to VM you know and I said I felt like we don't in the end of the day I guess we don't really have an answer I don't think it's you first much from AWS right Bennett I think it follows this yeah it's it's very similar I don't know GCP pre-emptive machines as well as I know easy to mm-hm I'd recommend if you have a customer that's got a lot of questions here you talk with are like infrastructure running team right because they are running a massive scale gate lab runner operation on GCP today and is exactly what i asked would have more experience information here since we run on this yeah so there's a couple engineers so steve who's one of the lead engineers on the runner i'm tomash who's also went to lead engineers on the runner who actually was part-time i sorry for a runner service for a long time and then alex on the infrastructure team I highly recommend you talk with those photos they're really friendly and probably would be happy to spend 15-20 minutes talking to you about it awesome thank you there's a there's also that retry setting Brendan in the gate lab llamo that detects a catastrophic failure specifically I think the help with these ephemeral instance types so whenever I've been telling customers about a lot of times they they hesitate to run their own runners but of course I tell him about spot but we also tell him about that setting so they know that you know it's smart enough to go oh it died so restart it yeah that's a good point link everything together Thanks doing follow-up question is it different from AWS yeah that's again that's where I don't have a lot of great experience I think those other folks would have great experience it was answered anything I've never run a production system on GCP TBH but I've run the money to be us alright I think you if the last question I do have the last question is it's similar to Vladimir's question it's based out of a specific use case where got a customer with the 2500 runners that just got virally created by teams and now they're trying to roll out shared brothers but they want to do it in a controlled fashion so that they're not they're supposed to get to everybody but they also don't want to just expose it to a single team or a single group so they they're struggling right now we're just figuring out how to roll out an army of sure runners in a controlled fashion like maybe a hundred two hundred teams that they would like to use the same pool for I don't know if you have any ideas around that yeah there's not a really great answer I have ideas though one is if they've got you know high-level groups that are big enough that represent some of these folks like make it a group runner to start off with can always re-register it as a shared runner later like if I get lab admin I can do both so that that would be the easiest way if there get lab group architecture matches this thing that they're thinking about experimenting with right now is tagging it through the API I think you could take it in the GUI you can just tag it with one item right and they p.i orth when you register you can tell you with multiple things yes that was gonna be my next suggestion is kind of like a a you could just make them share it and then tag it which means people could maybe the self discover them but in reality probably you're just gonna most you know it won't be a run on the bank necessarily right and I add some tag to them and then those teams know to use that tag in their jobs and they get to use your care compute and and he just monitor it and make sure that you know it's not it's not blowing up so that was my second my second idea was tagging yeah I'm prime physical servers one caveat I guess but tagging is that for every tag it'll king the server once per tag based good to see if there's anything to do so if you scale this up to say a thousand runners that's gonna be a lot of chatter to see if there's anything we do anything we do that's a good question that I would post in the runner Channel I didn't know you just taught me that I think I did not know it was one a PR KPI call for tag that seems silly so yeah I would I would ask him the runner channel if there's a different way to handle that I actually got a chest him and up with Darren Eastman this afternoon I'll bring it up with him yeah yeah he'll know or maybe he won't because this product manager if it didn't realize it was one API call for tech so you could also set the bid what's that well our our runners pinging the server all the time anyway why would tags be any different well I think what I was saying is that it tags it increases the number of pings by the number of tags I have which is then painful I didn't know I thought it just pinged and said hey these are my tags these are my things what's up but if it's pinging once per tag like API call for the tag a API call for tag B API call for tag see them yeah that doesn't scale very well the the thing to do with it yeah I would ask Taryn about it the thing I would yeah you you can set the ping interval on the runner and so then I'm like how does that relate to that I don't know maybe maybe you can make it cuz it's like five seconds by default and that's pretty quick like maybe if it was 15 seconds it would then save you this crazy amount of pings but also not be waiting around for jobs that's just another thing to consider all right thank you cool and with that we are out of time thank you all so much for joining us today thank you to Brendon for coming to share this great information with us and for everyone who had questions thanks for those of you who helped me keep track of the notes in the doc I'm gonna go ahead and stop the recording

Info

Channel: GitLab Unfiltered

Views: 6,457

Rating: undefined out of 5

Keywords:

Id: JFMXe1nMopo

Channel Id: undefined

Length: 55min 4sec (3304 seconds)

Published: Thu Mar 05 2020