Dagger: The Missing Ingredient for Your Disastrous CI/CD Pipeline

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] not long ago I claimed that your cicd pipelines are wrong back then I was complaining about problems related to treating all tasks as one-hot actions on pipelines not being designed to react to events other than a push to G repo and on them being too monolithic watch this video if you missed it or don't since this this is something completely completely different today I'm going to continue complaining about cicd and pipelines but I will focus on a different set of problems today I will make some outrageous claims like Ci pipelines should not be running only remotely pipelines should not do orchestration of tasks pipelines should not be defined in declarative format remember declarative format no and pipelines should not depend on specific tools today's session aims to explain how you should ditch pipelines almost entirely no matter whether they are done in Jenkins GitHub actions GitHub CI Circle C or any other tool you might be currently using today I will tell you that you should use pipelines almost exclusively to trigger execution of tasks defined somewhere else most important l i will blame your laptop for all those angry claims I will make there you go whatever you hear from now on it's your laptop's fault today's conversation will be dark but hopefully there will be light at the end of the tunnel I will not only tell you that you're doing things the wrong way and that your laptop is to blame but I will also provide the solution let's jump into the first claim there are quite a few problems with CI pipelines that are typically not addressed with the tools we use today and the first and potentially the most important one is that they're designed almost exclusively to run remotely you might be using Jenkins or GitHub actions or gitlab C or tekon or Argo workflows or any other tool that is running on a remote server and you're triggering builds through git web books that's clear right you might be thinking that's the way to go but I'm here to tell you that it is not pipelines are not supposed to run only remotely simply because most of the work that requires execution of pipelines is done both remotely and locally you're running tests while you write code right or if you're not a tdd practitioner you're running test before you commit the code is that correct it must be right if you say that it is not I think you should be ashamed of yourself you're pushing code to a git repo without even having an idea if it's working or not you're probably building binaries or container images so that you can run your application you're probably doing that locally as well you're likely performing many other tasks locally you might be linting code performing security scanning and so on and so forth the point is that you're not only WR in code and pushing it to a git repo if that's what you're doing and I hope you're not your feedback loop is too long since that means that you're waiting for pipelines running remotely to tell you whether you made silly mistakes that could be detected in a matter of seconds if you run some of those tasks locally we could solve those problems by creating scripts instead of defining everything in pipelines we could have a script called build sh that performs all the tasks related to building building binaries or container images we could have a script called unit test sh that would run unit tests we could have a script called deploy sh that would deploy the application somewhere now if we convert groups of tasks into scripts we could solve a few issues we have with pipelines we could run those scripts locally when needed and we could rewrite our pipelines to use those scripts so that we do not repeat ourselves too much that's something I've been advocating for a long time I always believe that pipelines should be dump instead of writing all the logic inside pipelines we should have scripts that pipelines would call those scripts can run anywhere including locally and the job of pipelines executed remotely would be to orchestrate execution of those scripts but that is not the good idea either if you end up with a bunch of sh scripts one for each top of the tasks pipelines are indeed becoming orchestrators of automation defined in those scripts but local execution becomes a problem just as pipelines are orchestrators of scripts we need something to orchestrate them locally I do not want to run one script after another every time I make a change to my source code that might result in more time orchestrating scripts then actually writing code write a few lines of code and execute build sh wait to see the outcome and depending on whether it was successful or not execute unit test sh wait to see the outcome and depending on whether it was successful or not execute deploy sh wait to see the outcome and it's ridiculous you get the point right I do not want to waste time orchestrating scripts while developing so what do we do I could create Uber script that would do the orchestration but that is problematic as well now that we establish that some if not all the task tasks executed through pipelines should run locally as well we can turn our attention to orchestration apart from executing specific tasks pipelines are deciding what to run when to run how to run and where to run tasks if task a fails stop the execution of the pipeline if task B is successful execute task C and D in parallel take the output of task D and generate a list of tasks that should be executed if both C and D were successful that is is what we call orchestration now I can say that pipelines do orchestration well but that would be only partially true assuming that by pipelines we mean tools running remotely and typically executed as a result of an event like a push to a git repo they do orchestration well but only as a result of specific events we need that orchestration to happen not only when we push changes to a g repo but also when working locally I need something to to execute an orchestrate tasks on my laptop or to trigger that execution from my laptop now important thing to note is that I'm lazy I do not want to Define that orchestration twice once for remote pipelines and once for local ones hence I need a way to define that orchestration of tasks in a way that will work both locally and remotely that means that I cannot use typical pipeline tools like Jenkins or tecton since that would result in too much hustle setting them up locally nor I can use GitHub actions or Circle C since they work only remotely now to make things more complicated not all the tasks are always executed and there are variations depending on where they are running for example when working locally I probably want to deploy my application into a kubernetes cluster local or remote using Cube cutle or Helm on the other hand when running the pipeline remotely that same application should probably be synced into a permanent environment using Argo CD or flux and that means that the pipeline should probably push some changes of the app manifest to a g repo I could solve all those problems by creating a huge shell script that would contain conditionals and loops and whatever else I might need to execute the tasks while respecting the variations between ephemeral and permanent environments if I do that I could run that script locally and also execute it through whichever pipeline tool I'm using that would remove the need to Define orchestration twice now I do not like that idea to be more precise I do like writing shell scripts as long as they're relatively simple and short pipelines I'm describing might be too long and too complex for a shell script to be clear when I say complex I don't mean that they contain complex logic but complex enough to be a burden to Define and manage as a shell script I should probably write all that in go or whichever other language I'm comfortable with but wait why not use a declarative format F go or any other language now you might be freaking out because I just said that we should Define pipelines or to be more precise orchestration of tasks using a programming language while everyone knows that pipelines are typically in most cases defined in a declarative format Jenkins might be one of the exceptions with its DSL based on groovy but others like GitHub actions ton Argo workflows and GitHub CI to name just a few are all declarative yamama Json HCL and so on and so forth they're all de declarative formats at least when pipelines are concerned today however I will tell you that's wrong we got confused declarative formats like yaml Json HCL and others are meant to describe a state of something this is the VM or the DB or the cluster I want this is the exact specification of what something should look like and I'm telling you that in a declarative format why you might ask well declarative formats are easier to understand for both humans and machines if I go to a git repo and open a yaml file it will be relatively easy for me to understand the intention behind a desired state of something it's a kubernetes cluster that should run in AWS it should have mediumsized nodes whatever that means and it should start with three three nodes before cluster autoscalers kicks in if I would get that file in a pull request it would be easy for me to review it and approve it I would know exactly what the intention behind that change is I would also have an easy time to deduce what the actual state is at runtime this is the cluster this is the node group this is the cluster o and so on and so forth if I would want to deduce the exact desired or the actual State I can simply take a look it now if I would look at the imperative definition of state of something defined as let's say go it would take me much more time to deduce the same there would be conditionals and loops and other imperative constructs that would make it much harder for me to understand the intention behind the code since I would need to translate those constructs into a mental model of what that code is trying to achieve the same is equally valid for machines it's not a coincidence that kubernetes API for example expects yamama it's declarative with clear intentions of what the state of something should be now you might think that I'm advocating for a declarative format to Define everything but that is not the case quite the opposite today I want to talk about the cases when declarative formats are not a good idea today I want to state that there are cases when declarative formats are a norm yet we should abandon them in favor of using imperative languages like go python JavaScript or whichever other language you're comfortable with now I already talked how Q is a great language to Define kubernetes resources that will be later on converted into decorative yaml you can check those videos if you're interested to know more there over there today I want to attack CA pipelines by saying that they should not be defined in declarative formats most of the tools will tell you that they should but I'm here to tell you that they should not pipelines are not about declaring the state of something they're about executing tasks they're glorified Crown drops orchestrated through conditionals loops and other imperative constructs we already went through those constructs build something wait to see the outcome and depending on whether it was successful or not run unit test wait to see the outcome and depending on whether it is successful or not deploy the app run functional and integration test in parallel take the output Parts generate the list of other Tas depending on what's in the output and so on and so forth that is not the state and that is much easier to Define as code there is no point in trying to figure out what the intention is since it is not the state of something and even if it would be it might be different it might vary from one build to another and from one environment to another and from one you get the point it's not the state hence there is no good reason why it should be declarative hence it should be a shell script or go code or python or whichever language you're comfortable with so all we have to do is write all the tasks with all the variations in a programming language of choice compile that into a binary and execute that binary locally or remotely from inside the pipeline or locally or from anywhere else doesn't matter from where or where you're executing it that should solve the problem right well not really there's still one more issue we need to address there's still the problem of tooling different tasks require different tools I might build container images with Docker locally and kico remotely I might need to sign images using cosign I might need to package manifest using Helm or timoni I might well you get the point different tasks require different tools and we cannot assume that all those tools are always installed the server where pipelines are running might or might not have Dion or Helm or cosign or whichever other tool we might need the same can be said for laptops now that's not a problem when running remotely through pipelines Jenkins adapted to use containers tecton and Argo workflows were designed exclusively around containers and so on and so forth most pipeline tools support containers one way or another and all we have to do is say run those tasks in this container and then switch to a different one for those tasks and so on and so forth however that poses a major issue to begin with if you want to switch from one container to another we need to Define execution of tasks inside the pipeline and we already established that that is not a good idea due to the need to run them locally as well most if not all tasks will be executed from inside a large script or more likely through go or whichever programming language you choose so we cannot rely on pipelines to provide the environments with the tools we need that needs to be done from inside the code we are writing and it needs to work no matter whether that code is executed locally or from a pipeline now given that the automation is written in your programming language of choice we can solve that by spinning containers from inside that code however it would be nice if there would be a set of libraries maybe that would help us out what I'm asking for seems to be something that many would benefit from hence if it would be silly if each of us would start from scratch all in all here's what we need we need a tool that will enable us to Define CI tasks in a way that will allow us to run them both locally from a terminal or vs code or whatever you're using When developing but also remotely from inside pipelines that orchestration of those tasks should be context aware so that behavior is appropriate depending on the environment finally the solution should have minimum Assumption of the tools required to be pre-installed and use containers to run all the tasks or most of the tasks does that sound like a reasonable set of requirements if it does the only thing left is to find a solution that was the task I set in front of me a while ago and after experimenting with a few options I think I have a good candidate so let's take a look at it actually before we take a look at it before we dive into dger let me display a pipeline for a cup I used in the past that is the pipeline I'm currently using can I Rite it locally I can't should I rewrite all that into something that can run locally maybe should I find a tool based on a declarative format like scaffold probably not how will I deal with differences between local and remote executions through conditionals definitely not yaml what happens when I start needing conditionals loops and other imperative constructs shell scripts maybe when that shell script becomes too complicated should I start using a programming language like python or go I probably should should they instruct people to install all the dependencies no I shouldn't should they rewrite the code to use containers so that there are no mandatory dependencies I definitely should now the question is how that's when I started looking for something completely completely different that's when I rediscovered dger now to be honest my past relationship with dagger was not a very good one I tried it out a while ago and discarded it right away I did not see the point in using it my head was full of why and what's why does it exist why would anyone use it what what is its purpose and so on and so forth I did not see the point I could not justify its existence so I abandoned it then a few months later after being bombed by people asking me about it I gave it another try and the outcome was the same there were wise and whats and I abandoned it however at one moment I started asking the questions we just discussed I started looking for a solution that can run both locally and remotely I needed something to both execute and orchestrate tasks I needed a solution that leverages containers and then then it clicked at that moment my brain remembered dger I got it wise and whats were answered partially at least so what is dger it claims to be and I quote cicd with code that runs everywhere now that is misleading because that claim is either not true or just plain silly I will let you decide which one it is by the end of this video for now let's take a quick look at it I can execute something like dagger run and pass it the command I want to execute to run since this is the first time I'm running Dugger on this machine it will take a while until it pulls the images downloads Cod dependencies and do whatever else it's doing so let's fast forward to the end of the process and there we go it took a minute and a half to run everything if you scroll to the top of the output we'll see that dagger published a Docker image to ttsh which is Emeral uh container registry running locally the code is written so that when running locally images are pushed to ttls and when running from a pipeline to a real registry further on it updated timoni manifest and produced yam that can be used to deploy the application to a local kubernetes cluster there was no need to have timoni CLI installed since dger used a container to run it from the end user perspective timoni is an implementation detail and just happens to be the tool I use to define my kubernetes manifest using Q if you're unfamiliar withon check out that video finally it deployed the application to a local kubernetes cluster I could have eded tests and security scanning or anything else I might need while developing the application and all that would be executed locally Al through containers and without the need to have any specific tools installed on my laptop if that's how I wrote the code executed with Dugger now as a proof all that happened I can retrieve kubernetes Resources with Cube cutle Get tall and we can see that the application is indeed running now one of the great features of dger is caching it took around a minute and a half to run everything locally let's see what will happen if I run it again this time it took only 15 seconds dagger used cash for almost all the tasks since I did not change the source code even if my code changed it will still apply intelligent caching and I should expect all subsequent executions to be considerably faster than the first one now let's take a quick look at the code itself as you can see I wrote it in go and you can choose any of the supported languages which I believe are not justs Python and alexir there might be others so please check the documentation if your favorite language is not one of those that I mentioned now I won't go through the whole code since you might not be using go and even if you do I don't think that the code is important you know how to write code right what I will point out that apart from the normal code I use dagger library in quite a few places for example here I'm saying use Docker file to build an image now you might say what is the point in using dagger for building images when I could just as well write a few lines of code that would do the same by executing Docker image build if that's what you're saying you would be right however dger has its own mechanism to build images without Docker file and if we do use it caching of those images will be even more efficient and we would get some additional features like exporting files from the the image into the workspace building multiple images and so on and so forth nevertheless I was not ready to give up on my Docker file and that was one of the reasons I discarded dger in the past but then I realized that I can do so much more with it here for example I'm using timoni to build yam manifest from Q definitions then I'm taking that output and using it to apply manifest to kubernetes Cluster using Cube cutle further down I using Y Q to modify Q definitions then retrieving the modified file and storing it locally so that it can be pushed to get I'm am also pushing Theon package to a container registry the details are not important what matters is that most of those operations are executed inside containers so I do not have to guess what is and what isn't installed on a laptop from where dagger is running and I can enhance the Flow by importing or exporting files injecting Secrets capturing outputs and so on and so forth everything is running in containers as long as that's what they choose to do another important thing to note is that some things are executed only locally While others are meant to be executed only when running in a real pipeline which in my case means GitHub actions that is not something specific to Dugger but rather the fact that I'm defining the behavior through code in this case go instead of trying to figure out how to do conditionals loops and other imperative constructs in a declarative format now as I already mentioned this specific D setup is meant to be used by me while developing but also by GitHub actions I want to run tasks when I push changes to get some of them will be the same as those running locally some will be variations of those and some will be completely different all that logic is now defined in code and all I have to do is to modify my GitHub actions workflow to use Tucker so while the previous version of my workflow was like this now it is like this I chose to keep code check out in the actions workflow since uh when working locally I already have the code checked out similarly I chose to push changes at the end of the CI flow also through the workflow since I already have uh had that defined and I have no intention to replace my local git com and git push commands with dger everything else is now replaced with a single step that installs dagger and executes it excluding git operations the whole workflow is now defined in one place and executed locally from GitHub actions or from anywhere else there is no unnecessary duplication and I can run the tasks related to this project from anywhere and without any dependency but to dager CLI itself and Docker or some other container runtime dagger Library offers quite a few features that I will let you discover yourself there's also a sus or software as a service or Cloud option which I'm not sure whether it is worth paying for but you can check it out yourself and decide whether it's worth it or not instead let's finish this discussion with nagger pros and cons I started the journey that led me to dger with the following set of requirements I wanted to be able to run my cacd pipelines both locally and remotely I wanted to Define pipelines through code instead of uh declarative formats I I love yaml I like yaml but only for defining the state of something and I prefer imperative code for everything else I need my conditionals and loops and functions and other constructs right I did not want to depend on specific tools or to put it in other words I wanted the solution to run in containers finally I wanted a solution that covers the whole sdlc or what address would call CA and CD and here's the question do you think that dger met all those requirements I will answer that question soon I will also elaborate on my early statement that the claim that dger is cacd with code that runs everywhere is not true or is misleading but before we get there let's go through the pros and the cons starting with the things I did not like actually there are only two negative things there is no option to initiate Dugger execution every time I change my source code it would be awesome if I could just run Dugger in the background I let it run whichever tasks I set it to run every time I change my code instead of repeating the same command over and over again that is not necessarily a bad thing but rather something something I would like like to see since many of the tools used to perform local tasks uh do that now I know perfectly well that I can combine dagger with other tools or write that Loop inside my code after all it's code so I can do anything I want yet the first option to combine it with other tools defies my requirement to not assume any tools being installed apart from Dugger and Docker the second option sounds better but I'm too lazy to do it and more importantly that's such a common use case that I would expect dagger to have it out of the box second and last is that it cannot replace pipelines there is no option to run dger in a server and wait for web hooks from G or other events to initiate execution of tasks as a result it cannot replace pipelines like junkins get ction and others what it can do is run as a task in those pipelines that's not necessarily a bad thing I'm guessing that its intention is not to replace pipelines but if that's the the case I think that um that's a missed opportunity everything else is truly great as long as we understand the scope of dagger and ignore its false claims it is open source with Apache License that's great it would be even better if it would be in a foundation so that dger folks cannot change the license really nearly as some other projects which I'm not going to name uh but projects that did it in the past or recently still Apache License should be good enough for now next um it works everywhere as long as dagger is installed and there is a container engine that can be Docker but also any other Docker compatible engine like container D with ner cutle or podman there is also an experimental option to run it in kubernetes everything can but does not have to run in containers I don't need to second guess which tools are installed nor to force anyone to stall anything besides Docker or some other container engine and and Dagger CLI what else oh yeah caching is amazing that might be the strongest point of dagger it's speeds up execution of tasks considerably as long as we know how to enable it in remote pipelines next uh I love that it chose to be based on real code instead of declarative formats used by majority of other similar tools as I already said I love yaml but only for defining the state of something declarative formats not meant to be used for defining things like Ci pipelines finally documentation is great I had no trouble finding the information I need and it is full of well-written examples now let's get back to the claim that duer is cicd with code that runs everywhere you will likely not use it for the whole continuous delivery or deployment simply because it does not work well with events just as almost all other pipeline Solutions don't it's enough to have let's say syncing done by Argo CD with the need to execute tests after deployments are finished uh to realize how that does not work well without events watch that video if you would like to know why I said that at best I would say that dagger is a CI without cd part tool as long as we consider CI a set of tasks executed before we start dealing with events other than git web hooks all in all it's a great tool and I strongly recommend it I'm switching CI portions of my pipelines to Dugger while still keeping the rest of the sdlc what you might call CD with Argo CD Argo events and other tools it is the tool that can fully replace your local workflows as well as equivalent but not all tasks in your pipelines that's what I was looking for for after all I did not expect it to replace all my tasks but only those that run both locally and remotely that's what it does no matter the false claims and it does it well big thumbs up from me there we go that's a thumbs up thank you for watching see you in the next one cheers
Info
Channel: DevOps Toolkit
Views: 14,623
Rating: undefined out of 5
Keywords: Argo Workflows, CI mistakes, CI/CD, CI/CD best practices, CircleCI, Container, Dagger, Dagger CI, Dagger pipeline, DevOps, Docker, GitHub Actions, Jenkins, Pipeline, Tekton
Id: oosQ3z_9UEM
Channel Id: undefined
Length: 33min 29sec (2009 seconds)
Published: Mon Dec 18 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.