Introducing Flux Helm Operator, a GitOps Approach to Helm Operations Stefan Prodan, DX Engineer

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

or we works I'm a flux city maintainer and today I want to give you an introduction to get ops flux and the helm operator which is a sub project of the flux CD adventure that we now have brought from weworks and now it's into the CNC F sandbox and we planning to give it even further away so what's key tops in my opinion so in one opinion you'll be you are probably using all these tools right all these commands to change things in your cluster to deployments to to roll bags and so on that means you have to share your cluster credentials to everybody that has must have access to the git repo it also means that your CI pipelines will probably shell out to DC Allies right this is how most people do deployments on kubernetes with the get-ups model we just you know ignore all these tools and we only use git push to interact with your with our clusters what that means is that someone else is running all these commands for us what we do is just we describe our desired state we say we have a git repo and we say this is how I want my cluster to look like and it's up to the tools some automation the runs inside your cluster to make sure that what you have described in your git repo it's apply these reconcile on your cluster so white what gift Ops brings to the table some an obvious advantages are that from a devil's perspective both engineers both developers and operations people can work together on how the infrastructure looks and how the applications are deployed on it so if you have a git repository that shared between your teams then you can work with to request reviews you can sign your commits so you know who did what and you can also get these audit log for free like many companies out there must be soaked to compliant or other compliances and get gives you that without any kind of effort if you switch from a CI model to a detox model then every single change it's you know get blog and if you'd sign with GPG let's say then you can ensure that whatever happens is there and you can trace it very easily this is one of the advantages another advantage is recovery time so let's say your cluster happens something happens really bad in your infrastructure that you don't control yards on some cloud something really happens with let's say your etcd cluster that everything is in there all your configuration and you want to start new and a new cluster right with the same stuff that was running on the previous one if you use CI pipelines and this is how you drive deployments what that means is you have to go into a system and trigger all the jobs that are making up your cluster you have to redeploy all your applications and so on right so this is our problem because you don't really know what's running right now on your cluster you know that okay there are some have to look on on your ta system get all those logs out of there see what was last applied and rerun every single job this can take a lot of time if you have many many apps now you also have cluster objects maybe you have users and so on and maybe those objects are not deployed by CI every time you do an app change so with with D tops you just say okay I have a new cluster I install the get ops operator there and you point that operator to the git repository that the previous cluster was both retro and on the first run everything should be the same as the previous one because the desired state of the previous one is now on the new cluster so in in.we words we've had a major incident some years ago and this is how we we managed to recover the whole thing in 15 minutes and this is when we realized that okay maybe detox is not something that we should keep only for ourselves and you should make this public and try to advocate for this kind of driving operations of our kubernetes and how do you implement detox there are some things here of course you need a get request arrived so you need a a get provider now choosing a good provider means that you have to trust that provider for uptime right if your question goes down and the good provider goes down we're surprised there is nothing how can you restore something when your source of truth is gone so be careful when you choose this one because it's a single point of failure right for the get up stoney of course you have to have a container registry and what I specified here you should push only immutable images and I will get back to that but the idea is don't use latest I mean latest is not something that you can describe as the state of your app its latest of what right so don't do it and the same thing can be applied to two hand charts you can use a hen repository if you don't want to keep your charts on indeed I will show you both options of course hermits is way better than a container registry because heme enforces semantic versions for your hand chart so use semantic versions and don't push the same version twice right this is something you should not do because you are going back to the latest problem okay so when you have all this infrastructure set up or get provider container registry maybe a heme repository service what you need next is a you need to build a continuous deployment process that has some properties it can watch for changes in it so it doesn't only rely on the get provider calling that operator right so it can do this kind of watching on its own so let's say if a get hook is not succeeded that doesn't mean you shouldn't change the cluster state because if it's changing if he's changed indeed the operator should react to it so that's why here we rely on watches and not only get web hooks it also can watch your container registry and detect if there are new images and take decisions based on that for example flux can upgrade your containers based on some expressions some rules that you can define for upgrading the in cluster workloads and another thing you should be able to detect drift like drift is something that happens right even if let's say your SR 18 says okay we only use the git repo to drive changes maybe someone has access someone definitely has access to that question and can do some modifications inside the cluster without going through the get process right so your CI CD system should detect that the desired state is no longer there because someone changed something inside the cluster and also the continuous deployment system should try to correct the drift and of course should alert on Miss configurations and Miss configuration for me here is let's say multimode yellow maybe if you don't have something like on each pull request on your git repo maybe you don't have something like cube eval or other tools that are validating your mo and then the continuous deployment system should say I cannot accept this model for demo and it shouldn't apply anything else so for detox to work everything must be all or nothing if you are doing a commit in your Guidry point you are changing like a dozen Yama files and the single Yama file is malformed at it then nothing else should be applied we shouldn't apply half of a commit or we should apply 100% of the commit or nothing and this is how these flux does this the right way okay so how we we implement this continuous deployment process we works has started the flux project some years ago and this is how we we implemented key tops for ourselves and afterwards make it open source and now a lot of companies and freelancers and other people are using it for business needs or you know just at home maintaining the cluster to play with the whole idea is very simple you put all these Yama files in a git repo flux is the flux watch is the git repo and how it does that for example on github or github you when flux starts up it generates deploy key it's an SSH key you give the public key to github or github and afterwards flux is able to connect to the cluster to the git repo clone the git repo and apply the changes to the cluster so this is how how it works is not not magical at all what flux does it monitors whatever runs in your cluster and it it uses add notations to keep track of what what objects are indeed and what objects got applied on the cluster so this makes flux ideal for also doing garbage collection what that means is you add a deployment or cluster that deployment gets materialized in the you are the deployment Herget repo that one gets deployed on a cluster then with another commit you remove that deployment from git and flux will will detect that ok on the previous commit a heavy subject now it's gone so it will do a cube cut' will delete on that object so you don't have to clean up things in the cluster ending it and this is like this was a future request for about one year and a half and we managed to pull it off this year so we are very happy about garbage collection in flux this is why I mention here let's go on to the helm operator hands on it right so what the helm operator does hem operator exposes a custom resource of type release you can keep these home release these Hamleys definitions inside your git repo flux will synchronize them so those home release files will get applied the objects will be created hello operator detects that and transforms the hem release object into an actual hand release simple and also does install update and delete using the flux garbage collector collection you can also run hello operator on its own we made big efforts to so hello operator was developed inside the flux repository and was part of flux itself the source code now if we pulled hello operator into its own dedicated git repository it has its own dedicated help chart so you can if you don't use flux you just want to play with helm operator we can deploy it like any other operators and when you create hem release easy to just work but I'm what I'm showcasing here is the full ETOPS pipeline that i'll try to demo if I have time some features that Homo Prater has so it makes all the helm operations Beggs and declarative it can pull charts from home repositories like the stable hand chart it also can pull charge from private home repositories if you have basic auth or other things and you can also pull charge from git repositories so you if you don't want helm repository server and maintain that you can just keep your charts in give and point helm operator to those git repo stories what I mean here what are those raposo is usually those are app deploy systems where you have your source code and you also have your chart source code can tell ahem operator there is my chart definition and here is the helm release and with the two files with the two specifications it will create the install also for the values how we can provide values to the tool release the couple of options here you can point to secrets in your cluster config maps or even external URLs like an s3 bucket or something maybe you want to keep their your value files and the helm operator will pull them off if you use hello operator together with flux you can do automatic upgrades of hand charts based on image tag policies when let's say this is more of a hand two features so helm operator works with hand two and three since today for him to there was a problem if you first install a chart and that install fails you cannot install it again because tiller will just you know reserve that name as a fail chart what helm operator does when it sees the first time he'll release it will try to install it if that fails it does a delete and purge so at the next iteration at the next time it does a reconciliation it will not air out and if you fix your chart in the meantime it will just get installed so this is something that we did to make this easier with him to until our and it also does automaker rollback on chart upgrade failures so let's say you describe something your git repo you want to upgrade something your chart that fails and maybe you don't want to leave it in a failed state but you want to know that that one fails old home operator will of will some events that okayed the rollback failed by the upgrade failed and it does a rollback fall so your stake is no always the last good version of your chart installed this is how ahem release looks like nothing complicated we have this chart section here where we say okay my chart comes from the stable repository and this is the URL for the repository that's the name of the chart and that's the version of the chart I want to stop here what if you apply this thing on your cluster it will install shield secrets in the default namespace because we didn't specify the any namespace here on that Virgie with the values document from the chart right this is the equivalent of helm install shield secret why why I have seen secrets here it's an example because here secrets is the controller made by bitNami which is now VMware and few secrets allows you to store secrets in your get reposted those are encrypted with the public key so anyone from your team can encrypt that those secrets put them in a git repository and no one can decrypt them except for the silly secrets controller which runs in your cluster and this is how you can build get-ups pipelines without sharing your API keys database passwords and so on you cannot you'll not just put them in clear in in the git repo you'll put them encrypted and they only get decrypted on the on the cluster side so here we've seen how you can get a can tell her my operator to install a chart from from helm repository and here is how you can tell it to install a chart from a git repository so that's it github URL it's is not it's not the highway so it's actually the git repo and you can use the ref key to point to a release in give up or to a branch so instead of ref 2.1 you can do laughs master wrench and what whenever you you put a change in in master then he'll know crate will detect it and will upgrade your chart so while you are doing development for your deaf cluster is very easy to just tell her my operator please synchronize every time with this branch but when you are doing production releases then you should point to get release assembler that doesn't change and it's yes yes it's branch attack and [Music] the various part down below its so you can override the default values from valve is jamming in here in line in the Hurley's so this is how instead of modifying charts all the time to fine-tune them for I don't know different clusters different namespaces different environments you can specify your values inside the custom resource definition ham release so instead of copy pasting charts and stuff like that we'll just create more ham release is based on plain space environment and so on and so you can specify values in line but you can also specify values from config Maps secret and also files that are inside your charts like it's very common if you look at the stable chart repo there are values URL and production values for example right so if you decide to not put all your settings inside the helm release and you want to keep your settings like you used to do for example in your actual chart you can reference here a path inside your chart directory where are the values that you want to apply then you can mix them all together you can put some values in line here your API keys should come from a secret and maybe you have something like generic values for clusters in a region yes yes yes so we use with we don't use the helm common line in hello operator hello operator uses the helm go packages but we try really hard to be compliant everything Han does and if we do it differently we'll get like an issue instant someone will notice so this actually works pretty well and we have it for sometime I think more than half a year like hello operator started like one year ago or maybe more than one year he didn't know about the year so we launched helm operator alpha one at the first hands on it and hopefully today we can launch 1.0.0 the first release candidate with mp3 support okay so this is how you can provide values from all kinds of sources and from here is missing the example because I spaced missing the example and can reference a various file from something like an s3 bucket and why would you do that let's say you have a bucket for each region and you have some values for all your charts that are that must change based on region so instead of putting those in a secret or something I can just use some external URL that points to your bucket we didn't talk about this use case it was some hello operator user that really insisted that his use case is good and we should we should deal with the s3 packet so one feature that works together reflux so do this you have to have flux on your cluster not only the hem operator so flux has using annotations you have a way to define how you want to upgrade your values spec so here in in my values I have an imagery poison attack and I'm telling flux please upgrade my hell release for a new 3.0 version is published so what I'm doing here what I'm saying I'm not going to change my chart for patch releases of my application right these are let's say small bug fixes that don't require infrastructure change in any way so instead of going in to get every time when I do a 3.0 point well release 4.2 4.3 or whatever I'm trusting myself that I will respect somewhere and I will not do breaking changes in patch releases and I can tell flux automate this stuff for me so every time I am pushing to the docker registry in my case a new version flux will apply the same work filter will detect okay 30.1 it's there on the docker hub it will modify the hell release file it will commit the ham release file to my git repository and afterwards it will apply it so even if I'm doing the modifications manually with git commit and git push flux does the same thing so every everything that flux changes in your cluster is not directly in your class and it changes it first in the kit and afterwards it applies it so it doesn't break the the get ops pattern in any way and also flux can deal with GPG signing and stuff like that so if you can issue encryption key for class if you are using the signing for git repos and this is how it looks like so what flux does it scans your cluster for all containers that are running in your cluster then it uses the container pool secrets from from the kubernetes api it gets that secret and connects to the docker registry it works with with the docker hub with quai with all sorts of registries out there so because it uses the double secret from community so it connects there and it doesn't pull the image itself it only pulls the metadata of the image like how many tags are out there and for from those tags how many are sample okay from all those same work types how many apply to my filter this one only then it will do a git push yes yes what is your age your AWS Google cloud yes yes so we we use the AWS and aja SDK inside flux so we refresh the keys and so on so don't lose your credentials and use this memcache here just to store these images metadata like the container name and all the tags that are in the registry so we don't store here like real images it's this memcache is very light you can kill it and fog so just refresh it okay another future that we shipped recently it's its namespace delegation so you can create ahem release so this file gets created in the admin namespace but you want to install MongoDB in the dev namespace right so in the specification next to the release name you can specify a target instance if you don't specify then it's the same namespace as the hammerless why is this useful maybe you don't want to allow your dev teams to install hammer releases maybe you want to control the ham release process only from an admin perspective so if you put the ham release file inside let's say the deafness place then someone could get to this ham release file and modify it right or delete it or whatever also if you use something like this like multi-tenancy then you can delegate things from the cluster level to the namespace level how we do multi-tenancy with flux and helm operator is like this we have a cluster git repo that's controlled by the cluster admins and what's there in the cluster git repo is the definitions of global cluster resources stuff like main species stuff like clustering or bindings custom resource definitions right and then we can say from the cluster git repository we can provision a flux instance and ham operator instance per namespace and we can connect this instance to the team git repository so each of your dev team can have its own get report you don't have to if you don't want to merge everything under a single git repo and all your teams have to share this one you can still use git ops and have a dedicated get per team with with this kind of setup and I've made the inside the flux CD organization on github there is a special repository where I showcase how I can bootstrap all these things up with customizing its give it a try is very easy it's not complicated what what this means is the cluster admin could have could use the namespace delegation to install help charts in the team one in space without putting those hand charts those hand release definitions inside team one get people that that's why we did it so how pipeline looks very easy if you keep in your git repo in a space because helm v3 will not create a namespace for you anymore so this was a requirement before even if helm v2 because we want to have all our objects in it so you put a namespace there you'll not just use helm install - namespace like hello operator is not able to create namespaces it will not do help install - namespace it expects for the namespace to be their pre provision and how we do that flux when it detects your git repo it will order all the objects inside your git repo in a certain order for example to create a namespace before let's say custom resource definition before a custom resource so a namespace before the deployment and so on so it does apply in a certain order so all these things can work together do I have time no okay so I don't have more time so tomorrow we'll run a workshop or tutorial and we'll showcase all this pipeline model if you can come tomorrow we'll see first and how we can build key tops pipelines with flux ham operator and tomorrow we'll also use flagger and linker D to do the ham release while measuring things like error rates latency and so on if something fails we'll roll back the ham release based on other stuff than just health checks and lift nurse checks please give it a try these are the repositories and the flux CD we also have published on on the Wii Works website key tops ebook that has a nice example with a lot of micro services I can build key tops pipelines and here is our pull request that we worked last night and today at it it's ready we have here the install command from the v3 support so you install the custom resource you deploy hello operator will healthy three and if you set this environment variable hello operator will only transform he'll releases into v3 ham releases if you don't set this environment very well then you have to specify what helm version you want inside the custom result definition so ham operator is able right now to manage both tiller deployments and on tiller deployments with Han v3 and in our roadmap hopefully soon we'll have a way for you to upgrade so when you change your git repo from v2 III will try to upgrade the helm chart in place so you don't have to you know have downtime or something like that when you want to switch from one version to another so I think this is it yes now if you use hello operator we'll make it smooth for you don't have to care about it hopefully thank you [Applause]

Info

Channel: CNCF [Cloud Native Computing Foundation]

Views: 2,812

Rating: 5 out of 5

Keywords:

Id: 273vXoDR3sw

Channel Id: undefined

Length: 31min 47sec (1907 seconds)

Published: Tue Oct 01 2019