Argo CD Synchronization is BROKEN! It Should Switch to Eventual Consistency!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] I'm going to make some of you angry by claiming that Argo CD synchronization logic is wrong the way Argo CD works is great only if you have very simple scenarios and more importantly if you have a lot and I repeat a lot of patience to Define orchestration it tries its best yet it was designed the wrong from the start so many of the attempts to remedy issues feel like workarounds that Force us the users to put more work into it than necessary Argo City is broken as a matter of fact I will even claim that the way Cube cuttle and Helm work is also wrong but unlike Argo City there is a good reason why those should stay as they are today's session will be a practical Hands-On exploration of the order and dependences related to deployment of kubernetes resources we will see that both Cube cutle and Argo CD are doing it the wrong way I will even claim that the way Argo CD operates is against some of the kubernetes core ideas I will also show a few examples of the tools that do take the right approach and are more in line with kubernetes core principles especially eventual consistency all in all cube cutle is doing it the wrong way but that's okay Argo CD designed the synchronization mechanism badly and that's not okay fortunately there are tools that do deployments and synchronization the right way and we will see an example maybe two later hence today's tldr is that one of kubernetes favorite tools Argo CD messed up from the start I have a love and hate type of relationship with it I love it because it's probably the best giops tool we have today yet I hate it as well because it causes me to do more work than I should do let's get going let me show you what I mean by those outrageous claims we'll take a quick break to introduce you to Twin gate the sponsor of this video so here's a question are you tired of using vpns but you still want to work securely if you are this might be the time to ditch remove get rid of vpns they are even and then you can switch to TWI gate it allows us to access office networks and cloud or private resources securely without the headache of VPN it's easy to set up and it's fast like really really fast there is a free tier try it out get rid of VPN and thank you Twi for sponsoring this video and now let's go back to the main subject let's start with a simple example let's say that I want to install an an application packaged as a Helm chart all I have to do is execute Helm upgrade D Das install and pass a few additional arguments that worked well and I got Atlas operator up and running by the way if you're not familiar with Atlas operator you might want to check out that video the problem with that approach as well as with using Cube cutle directly is that we need to do things in certain order we need to wait for some things to happen before we can do some other things that's in a way silly since that means that we humans are acting as orchestrators even though kubernetes is supposed to do the orchestration was that confusing let me show you an example this is an external Secrets store I won't go into details on what external Secrets is used for if you're not familiar with the project please check out that video actually does not matter what that manifest is used for what matters within the context of what we're exploring today is that we need to install the operator first that operator will create a few crds one of them being cluster Secret store used in that example once the crds are created we should be able to create the store hence the logical thing to do would be to install the operator and apply the Manifest right we can do that by executing Helm upgrade followed with Cube cutle apply and that failed miserably cetes complained that it failed calling web hook the reason for that failure is simple by the time we told kubernetes to apply the cluster Secret store the operator was not yet fully operational so first we need to wait until the deployment of the external secret swep Hook is up and running and wait and wait and wait and once it's finished apply the cluster Secret store we are so used to deal with dependencies in their orchestration that we often forget that kubernetes is the orchestrator kubernetes should figure out what to do and when to do it it should be a process inside kubernetes that does what I just did if something cannot be applied because there is no end point or there there is no crd or because Nam space is missing or because something else is not just right kubernetes should move onto some other resources and and and and and and once the right conditions are met get back to the one that could not be applied some of those issues can be mitigated with Helm hooks by creating an Uber chart by adding Das Das weight and some other tricks nevertheless those are workarounds rather than real solutions they're meant to replace some of the functionalities we get through server executions of drift detection and Reconciliation however I must stress that everything I said so far does not mean that Cube cutle Helm and similar client side Solutions are bad they're not they're doing what should be done since they are executed on the client side they're supposed to perform specific operations once they're not server side processes that are continuously monitoring the state of resources and Performing actions necessary to converge them into the desired State Helm is first and foremost a templating tool and as such its primary goal is to convert templates into yam before sending them to kubernetes API once that conversion is done it acts in the same way as cube cutle does both Helm and Cube cutle are ultimately sending a number of manifests to cube API which might accept or reject them if you would like to get a semblance of what should be the correct Behavior we would need to run commands inside those are executed in a loop a loop however that's not what we do today those are some of many reasons why we moved synchronization processes inside clusters that's one of the reasons why we adopted GitHub tools like Argo CD GitHub tools change the game instead of relying on CLI tools like Cube cutle and Helm to apply resources to kubernetes clusters they introduced serers side processes that pull manifest from git repos and apply them inside clusters all we have to do is push manifest to get repo and that's it that's all we have to do Argo City flux and others take care of synchronization they will figure out the order in which the resources should be sent to kubernetes API and they will ensure that the actual State you know the one that is in the cluster in the Clusters matches the desire state or what is in it there's only one problem though that does not work without putting a lot of effort or manually specifying the orchestration here's one example that directory contains crossplay resources like providers configurations and a few others naturally I would like Argo City to do its magic and synchronize that to my cluster which already runs crossplane itself now I will not go into details about Argo CD if you're not familiar with Argo CD you might want to watch those videos instead I will show you the money that we ensure that whatever is in the directory is applied to the cluster this is a simple one whatever is in the directory infra inside kubernetes deployment demo repo will be synchronized into the cluster and all we have to do besides writing that manifest is to apply it normally this would be the moment I would move onto doing something else I can trust Argo City to do whatever needs to be done right the reality is quite different and we can observe it through the Argo CD UI let's let me port forward the service open it in my browser log in and enter the infra application it seems that Argo CD is incapable of doing anything it got confused it does not know what to do with those manifest you see some of those resources depend on others to apply a provider configuration the provider needs to be installed and for provider to be installed cross plan itself must be running in a namespace that was created in advance all those are are in that directory but Argo City does not know what to do with them it's confused now to be clear Argo City does try to order resources it will try to create Nam spaces and crds first custom resources second and normal resources at the end however that does not always work as it's evident from the example we are observing right now the problem is not that Taro city is often incapable of figuring out the order resources should be applied quite the opposite the problem is that it tries to order them by doing that it assumes that the order will always be correct and that's just silly what Argo CD should do is to try to apply resources in any order and if any of them fails to be applied try again later that's how kubernetes works that's how we accomplish eventual consistency yet Argo City does its best once once and if that fails it gives up blaming me for not giving it more information it blames me for not specifying waves and hooks that's the silly part that's the part I should not be doing since that's the job of the orchestrator that's what kubernetes does it will try to run something and if that fails it will try again and again and again until it eventually Works Argos CID is a quitter it behaves like my daughter when she was younger she would try something once and if that fails it she would decare that it cannot be done Argo CID however is not a child it's a mature project that should know better than to quit on the first sign of trouble actually it's even worse than quitting Argo CID continues trying but without trying again it's locked in the first iteration instead of trying again fortunately I know how to make Argo c order resources even though that's silly thing to do I had to do it I had to do it it's unavoidable if you're using Argo CD Maybe maybe probably we'll see let me terminate the current process and show you yet another directory I copied the infra directory and made a few modifications to the Manifest here's the modified version of crossplane I had to add the sync wave annotation set to minus one that's a wave and Argo CD goes from the lowest towards the highest wave in this case crossplane will be applied before those specified as wave zero or higher here's another one without the sync options annotation s to skip dry run on missing resource Argo CD would stop the process because the provider config crd is not present in the cluster even though that crd will be created by one of the other manifest in that same directory I won't go deeper into Argo CD sync waves if you're not familiar with them please watch that video instead we will simply apply a modified version of the Argo CD application that will sing from that new directory instead now we can go back to Argo CD UI and observe that this time it seems Seems I repeat to be able to synchronize the Manifest from G to the cluster or at least that's what it looks like the problem however is that I had to specify syn waves syn options Hooks and a few other things to make it work that's too much unnecessary work and all I wanted from Argo CID is to synchronize what it can and try again and again and again until all resources are synchronized that's how kubernetes controllers work anyways and all I want is what Aro City to do the same just as kubernetes does not ask me where a pod should run Argo City should not ask me which resources should be applied first which ones should go after that one and so on and so forth if I wanted to Define orchestration myself I would not be pushing files to get but doing it from through oldfashioned Way by executing commands in a specific order from pipelines like GitHub actions Jenkins tecton or whichever other I might be using for the stuff right or I was using in the past now it might take a while be until all the crossplay packages are available so let's wait for them let's wait for a while until all settles in after a while all the packages should be available and we can continue what I'm trying to convey is that we can make Argo CD do the right thing but we need to put unnecessary effort and even if you do that there is no guarantee that we did it right I would expect Argo CD to be based on eventual consistency it should do what it can at the moment and then try again with those that it could not apply previously and repeat the process until everything is synchronized instead aroid thinks that we should tell it what should go first what should go second and so on and so forth that's silly now you might say that you do not have a problem with doing all that and you might be right for now at least your situation might be simple still even if that is the case I have a challenge for you destroy one of your clusters that is managed by Argo City create a new one and try to synchronize everything and I repeat everything you had in the old cluster into the new cluster think of it as a disaster recovery exercise do you think that would work do you think that Argo CD will be able to synchronize dozens hundreds or even thousands of resources accumulated over time into the new cluster more likely than not it will fail it will be stuck because a namespace or a crd or something else is missing even though that something is also stored in a git repo it will try to do things in certain order and fail now that I made you feel depressed I can show you how I believe synchronization should work I will use crossplane to demonstrate how deployments to kubernetes should be done nevertheless today's video is not about crossplaying and there are other tools that are based on the same principles at at least when management of resources is concerned hence cross plane here is to demonstrate how to do synchronization rather than to convince you to use it if you're not familiar with crossplane you can always check those videos over there here's a crossplane claim that creates and manages a kubernetes cluster those 30 to 40 lines of yam will create AWS resources like the cluster itself a node group a few subnets of VC and address as you can imagine some are dependencies of address a VPC needs subnets and the node group needs a cluster from Argo CID perspective we would need to specify the order in which those should be created there's more though some resources like Secrets will be created in the control plane cluster While others will be created in the new cluster some of those resources depend on others like in the example we explored before there will be an external Secrets store that can be created only after the external Secrets operator is up and running there are quite a few things that will happen and resources that will be created if you're interested to know more I can explain what's behind that claim in a separate video it's based on one of the most complicated and at the same time very useful compositions I created so far so please let me know through the comments whether you would like me to explore it in more detail and what matters within the scope of this video is that the resources behind that claim are more complicated and have more dependencies than a relatively simple example that failed to work with Aro CD without adding sync options and other tricks right we could and we should push that manifest to get and let Argo CD synchronize it but as an attempt to show you uh in as easy way as possible what will happen I will simply run Cube cutle apply while cross plan is working towards creating everything required to run a ful operational cluster I will show you yet another example this claim will create an RDS database server together with subnets vpcs and other BW resources RDS needs it will also create a database inside that server a secret in the control plane cluster push that same secret into aw secret manager pull it into the cluster that does not yet exist even apply a scheme and quite a few things just as with the cluster there are many dependencies and if you would like to accomplish something similar without crossplane like with Cube cut or Helm or Argo CD we would need to ensure that resources are created in specific order that we need to wait for specific statuses and so on and so forth still we don't need to do any of those things instead said I will just apply that manifest that's it let's take a look at what's going on with the database by executing cross plane Trace command as we can see most of the resources are failing most of the resources cannot be created because they depend on something that does not yet exist that can be a crd information from some other resource or anything else none of those errors matter those are all temporary and can be interpreted as I cannot do what you want me to do right now but I will try again later let's see what's going on with the cluster the current state is similar to what we observed with the database some resources were created some are in the process of being created While others are failing and that's okay we're looking at the process leading towards eventual consistency crossplane does not care about the order it manages all the resources all the time continuously continuously if some cannot be created right now do you know what will what will it do do you know well simple it will try again later let's fast forward a few minutes and take another look at those resources look at that that's magic of eventual consistency cross plan did not give up on resources that could not be created for some reason or another instead it tried again and again and again until they are all available and they're all matching their desired States we can observe a similar outcome with the database which among other things had to push some secrets to that new cluster which didn't not even exist at the time as we can see all those resources became eventually [Music] consistent this video is not meant to discourage you from using aroid nor to convince you to use crossplane I use Argos cidd myself and I do believe that it shares the best GitHub tool spot probably with flux we don't have anything better at least not right now what I tried to convey is that even though it's great it's not doing everything right every tool has features that can be improved and the way Argo C synchronizes resources is on top of my list in simple scenarios that does not matter yet there are cases when it does and when that happens the only option we have is to Define sync options and hooks carefully very carefully so we can do it well with Argo CD but it's not always easy sometimes being even painful what I'm suggesting is to change the default Behavior so Argo C vers works with individual resources and simply retries those that cannot be applied the important word in that sentence is default I expect tools to have sensible Behavior by default and in case of aroid the sensible default is to point it to get repo and forget about it it's silly to expect people to Define what goes first and what should be done last when it should ignore crds during dry run and so on and so forth what I'm suggesting is for our take a similar approach as what crossplane is doing and a few other tools crossplane is not the only one another important note is that crossplane does not replace Argo cidd don't think that me showing crossplane after I complained about Argo cidd means that one replaces the other it doesn't we need a tool that F syn manifest from git into a cluster and we need a tool that extends kubernetes so that we can convert it into a control plane those two work great together and and there are ideas they can borrow from each other while still maintaining focus on their own primary objectives as a matter of fact the winning combination that avoids all the issues I mentioned in this video is to create cross plan compositions and manage resources by pushing claims to a g repo and letting Argo synchronize them into the control plane cluster there's one important note though there is no guarantee that eventual consistency will always work sometimes things fail and when that happens kubernetes will be failing to create resources indefinitely unless we Implement some kind of a roll back mechanism that's true no matter whether we are working with kubernetes deployments or with crossplane or with anything else that's the nature of kubernetes it tries to converge the actual into the desired State indefinitely we still need a mechanism that will notify us when something is not working as expected the difference when compared with other types of systems is that we do not fire alerts the moment an error appears instead we fire alerts and act on them when errors persist over a period of time thank you for watching see you in the next one cheers
Info
Channel: DevOps Toolkit
Views: 9,931
Rating: undefined out of 5
Keywords: Argo CD, Argo CD issues, Argo CD problems, Argo CD tutorial, ArgoCD, DevOps, Kubernetes deployments, cluster management, configuration management, container orchestration, continuous deployment, deployment automation, deployment best practices, deployment workflows, eventual consistency, gitops, gitops tools, managing Kubernetes, software engineering, strong consistency, sync operations, synchronization strategy, system design flaws
Id: t1Fdse-F9Jw
Channel Id: undefined
Length: 22min 17sec (1337 seconds)
Published: Mon Feb 26 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.