A Hands-on Walk-Through of GitOps – Kelsey Hightower (Google)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so hopefully you're all as excited as I am of course we are very very lucky to have Kelsey Hightower as our closing keynote because you know he's very famous for demos so we are excited to see what he'll be playing around with we had sort of a a chat about what he'd be getting excited about and what he Fox he would have about get in general about the term get ops and you know what the developer experience might be so he loves to explain that and show that through a demo so I think what we're doing today is a so we'll be doing a demo first and then once he's come down Cornelio were joined and I'll moderate all your questions to do a little bit of a conversation so please if you join the slack channel post your questions there and we'll get to as many as possible so Kelsey are you ready yeah I'm gonna meet it now there you go yes unit so now you are on the live stage I get offstage thank you so much for joining of course we know that you are at Google and you've been there for a while you've been such a great face in the kubernetes community so we really appreciate your thoughts here and we'll go into them I guess after the demo it's not how you want to kick off yeah so my goal really is to if there's a lot of people probably watching that are like me I'm fairly new to this concept like get-ups maybe not necessarily using get to get things done or even the style of automation behind get-ups so what I want to do today is share kind of how I reason about learning these things so I'll start with a little bit of hands-on show some prototypes that I've been working on and then get into the fundamentals and then hopefully have a great conversation to dive in more into the details excellent I love that I'll start by sharing my screen how about that so I can take over and all right I'm sure my screen will give it one second cuz I know we were practicing and you're using a Chromebook so one you'll have to turn on your camera again because I noticed that that's what happened there and then once you did that we're going to check with Stacey our producer does it look good is his this camera on this is his video on the side thumbs up from our producer there you go excellent I'm gonna slip away alright thank you so much all right and I'll try to slow down a little bit with the transitions between my browser and my terminal to show things up so when I get upstairs and again just want to go into the details of how to think about some of this stuff so I still work at a company called puppet labs and I worked on a tool called the puppet Forge right so this is my first experience with this idea of infrastructure as code and back then we used to write puppet code right so if you're not familiar with puppet it's a configuration management tool where you write these modules and those modules essentially have a declarative API and then basically what happens is we take that module code and we throw it over to a puppet agent that didn't attempt to reconcile this server or network device that's being managed and then we got into a point where we started to share those modules with other people and a lot of people started to say hey this kind of experience is this whole idea of infrastructure is code the other thing that became super interesting is that we decided that you could actually package up these snippets of puppet code and the providers underneath them and that was a really key thing so it's one thing to describe your automation but it's another have a provider that knows how to configure SSH or install a package from rpm or yum repository and then when you package all of those things together you end up with this module and those two modules can then be pulled from anywhere and I remember there was a thing where some people used to keep all of their modules and like a github repository so that way as you're iterating on your puppet modules right kind of written in this puppet DSL and their providers they could all be in one place and you could just bring them down so the fundamentals are pretty clear right take all your code and throw it up there but there's something drastically different about this era of automation it feels less about scripting things right so it's not about writing bash or Ruby or even the puppet DSL we're fundamentally changing the way we think about the front end so less about code and we're adopting a more of a declarative config well deed we'll dive into the details there in a second but to think that we're gonna say no no front end full programming languages I'm gonna push all the smarts into a reconciler I'm gonna look at the details there the tool i'll be using of course is folks from the weworks team if you haven't seen flux but the idea here is that once you start to have a really nice reconciler so something that can change the state of the world and then you have a resource definition or some people would just say like a config or a Yambol file if you're in the kubernetes community these are the things we keep in the yem will fall like a deployment a big map or secret those things can be checked in and then we can have a little loop that just watching our git repository that applies it but for all of this getup stuff to make sense right git is very obvious but stuff in version control something pulls it out and does something with it but the other part of that equation that I think is super important that most people may not quite understand is you really need smart reconcile errs to make all of this flow fairly nicely so that's what we're gonna dive into the other misconception I've seen a lot is git Ops is only for kubernetes or tools like flux in this case only work for kubernetes and sometimes you'll even see it in the dock where these tools are a perfect fit for kubernetes and you start to ask yourself like does it work for any kind of other infrastructure and I think the answer should be yes but we have to dig into the details there so let's have an example first I don't want to deploy to kubernetes I'm actually using a service platform called cloud run ok so the idea what cloud run is I'm personally right now not quite interested in managing a bunch of kubernetes clusters to just run a container while I'll still use cout Bernays for a lot of other things I'm starting to leverage a lot more service platforms but here's the thing I actually still want that git ops workflow where I can describe what I want and just have some automation tool keeping the outside world reconcile with that state so here's where the ops part comes in just a quick reminder so when I was a system administrator this is the way we used to think about the world if I have an app that I wanted to deploy number one you figure out the command line or the script that you need to write mainly because you want some form of automation now this could be puppet chef or ansible for people that are using config management so for me I'm using a tool called cloud run so this is gonna be a little bit easier in terms of deployment and I'll make my screen a little bit bigger here in a second so I have this simple app it's basically an app called HTTP bin it basically just gives me a little web app where I can poke around in my web browser nothing fancy here but when I look at the deployment script it's pretty straightforward right make a bash script here I'm using the g-cloud command line tool I'm just gonna deploy a service called HTTP bin this part is super critical because everyone thinks that get-ups is just like this radical rethink of everything we've been doing but you got to do is rewind it a little bit and just look at the fundamentals one if we know what the inputs are to our deployment target kubernetes or cloud run this is a bit harder when you're dealing with just a raw linux server right your SS aging you're copying things around you have a system to unit files the API isn't clear but I think adding something like docker to a VM makes this a little bit more straightforward so if you look at this if I were to run this script what I expect is to have a container deployed to cloud run I can click on it again pay attention because this is the fundamentals so I'm just going to run this script really quickly I know some people are thinking like wow this is so basic it is but some reason people forget all the basics when they're trying to understand this new stuff so what I want you to pay attention to is the pragmatism here learn how to automate things at the very lowest level what the inputs are because those are going to be really key to describing all the stuff that you're gonna need when you start thinking about get-ups as a whole so we ran that command and we see that this thing is deployed hey power serverless let's click on it just to make sure it actually works you know how people do the sleight of hand in the live demos let's see if we can make it real so we'll click on this link and ideally if this is working we should see our application pop up and just give me the standard HTTP bin front page and if it's working sweet so now we're reminded of the ops part of this whole get-ups thing so what do we need to take advantage of this modern style get-ups well number one ideally kubernetes has taught us a lot about declarative config maybe we don't need a programming language at the core of it of course it's ok to use tools like helm and customize or even plumie where you can actually use full programming languages to generate the inputs and we'll see what those look like in a second alright so what we'll do now is I'm just going to delete this so this is the ops part you can write all kind of tools to automate this process you can even use terraform to do this but remember that git part we need to check in some artifacts to make this be automated one I'm using flux to be able to pull and watch a git repository so I have a very simple git repository here we'll get into what's in there in a second but I need a tool that would actually do the heavy lifting and again that's where flux comes in it's going to be the thing that goes and watches that repository but what do we put in that repository one one way to think about this is maybe I just checking all my scripts right I could take this bash grip deploy it to push it to get version it and then check out the script and just run it but then you look at the script and you're like well what happens when it's time to delete well I could write another batch script called delete that does and undoes all of this I can also do a thing that knows how to handle updates but by the time you start building out a framework and bash you're gonna be right back to where we are with like puppet chef and ansible there's nothing wrong with those tools but you start building out all of these scripting and you may not have the reusable components that you want like how do you authenticate who gets to run these scripts none of that lives here clearly and it's also hard to patch this if I were to check in the script and I wanted my CI CD system to come here and change the image out that's not going to be easy to do because I would have to actually parse this and make sure that I preserve the syntax by going swap things out and all my people out there looking at said and Ock like put set and Ock down we have a different solution to this so the first thing you want to do is if you're unaware kubernetes has this thing called custom resource definitions remember the operational part of this is we want to deploy things containers to cloud run so one thing we can do is model the input look at all these flags these flags are the foundation for a new resource type cout Bernays has taught us a bunch of things so one thing that we can do is we can make a custom resource definition so the nice thing about custom resource definitions is it gives us an ability to leverage kubernetes as a control plane this is not about using kubernetes or running docker images it's literally saying let kubernetes be our API server for even the tools that we're building so we don't have to build those from scratch my custom resource is going to be called cloud run services and I have my own namespace to gcpd our labs comm okay and then the next thing you do is you have to define this little schema down here I know this may be hard to see from some of y'all but the goal is I'm defining a lot of those flags I'm just starting from my basket that's how I think about this I have my CPU property I have concurrency I have image I don't even tell qu Bearnaise the images the string and you can do some other neat stuff down here as well I can do default values like for memory setting I can say if someone omits their memory setting or flag I can default and that usually contributes to the usability of your particular tool and then down here we can do some interesting things like how I will like the command line tool we're all familiar with ku CTL the kubernetes command line tool what I'm saying here is like this is how I want feels to be printed out if I were to go and get these resources so the thing to remember here is that I can take my command line flags and turn them into a data model for kubernetes to serve up a restful endpoint Cabernets won't be doing the heavy lifting per se but it will give me a place to store these objects now the thing you do with CR DS is you do Coop's ETL apply and you just say hey kubernetes i want you now to be aware of this new thing that i made called cloud run services that doesn't exist in kubernetes out-of-the-box but i can submit it to kubernetes and it does all the magic to make this available all right so what's next now we have a data model in place now so instead of running bash scripts what we can now do is something like this look at this with that CRD in place that means Cooper neighs now understands a new kind the kind here is cloud run service I just made that up and now I can give it a name we'll call it HTTP bin and the image that we want is the same one for the flags if you look at this I essentially tucked the flags and converted them to a data model nothing nothing revolutionary here but look at this there's no scripting these things can now be patched there's tools like customized that work really nicely I could give this as input and say something like hey customize I want you to take the spec dot image and add a different image and then it makes it easy to integrate in any CI CD system now we're dealing with data so I like to use the term infrastructure as data because now we operate on a data model that has validations and that has easy way to substitute various tools you can use helm you can use customize you can use whatever you want to manipulate this data model before you check it in okay so here's the next step one thing I can do with that CRD in place I can say cube CTL get cloud run services if I hit enter now you'll see there's nothing there good so the next thing we got to do is since we have the perfect data model the next thing we want to do here is submit our resource definition I'm going to show you all again before I push it up so this will say I want a cloud run instance with these parameters just like I did with the bash script will say QC tale apply now keep in mind here all we did at this point was tell kubernetes that this thing should exist now if you're scratching your head at this point like okay now what kubernetes doesn't know anything about cloud run neither does flux and we're not quite there at the flux part yet all we do now is have a data model and I submitted the data by hand you'll notice I didn't check it in it's not version controlled and it's just here on my laptop if I go over here to cloud run you'll also see there is nothing running and this is that missing link we have a data model in the front-end flux solves the problem if I were to check in that data format in yamo and applies it to the cluster telling kubernetes it should exist but who actually does the work and that's where the reconciliation comes in I'll share this code on github but basically what I did was I wrote a simple implementation of a controller there are great projects out there to help you write production ready kubernetes controllers I just did something quick and dirty for this talk I don't think it's your replicated in production I'm pretty sure if I put it on github somebody's gonna run in production tomorrow don't do that but I'm pretty sure you will now I'll show you a little bit of how the high-level code works don't worry about how it works on detail you can walk through that on your own time but at a very high level I just start my app and then I have what I call a reconciliation loop so what I'm doing here is saying in the background every 10 seconds I want to get all the cloud run services meaning go to kubernetes to say kubernetes give me all of the coupe's cloud run services that you know about it's basically the equivalent of running this command and to be honest what I'm doing in the background just for this quick and dirty thing I'm actually just saying I'm running this command in my go code and I'm just spitting it out as the ammo or Jason and just using this output inside of my actual coat so I'm literally taking the command line equivalent and use it in my controller this is a good way to develop so once I get all the cloud run services I basically processed them by integrating through them in a list now remember I have a bit of responsibility in this controller I'm responsible for paying attention to when the custom resources are created and when those custom resources are deleted right that's the responsibility of my control loop over here and look at the logic it's pretty straightforward get all the services and then process them now there's a bunch of details that I'm not showing you here in another source code file but this is the high level way to think about this this is a reconciliation loop now I'm gonna run that reconciliation loop right here on my laptop this is another misconception lots of people believe that these controllers actually need to run in kubernetes they don't all they need to be able to do is actually get data from kubernetes and then modify that data if they have some status updates like the URL for that particular cloud run instance when I choose to run it so when I build controllers these days I tend to run them either from my laptop or I'll run them in some service platform like cloud functions or a cloud run so we're just gonna run this on my local laptop now the way this works on using coop CTL inside of my controller so I'm authentic ating using my coop config so it tends to just work so now we're gonna start the controller now remember this is that key part to make it all of this get off stuff work you have a data model on the front end when you version controller on the back end and something like flux sucks it in and applies it to kubernetes but this is where all of the heavy lifting happens so I'm gonna run that controller you'll see that it's starting and what I'm doing is I'm waiting ten seconds and then I'm going to call out to the kubernetes api and fetch all of those custom resources and now I'm processing them and you notice this statement here updating HTTP bin cloud run service go check out what's happening behind the scenes if I hit refresh here you can see that this thing is now spinning up just as if I would have ran it on the command line so with the controller like this I can either install it in a kubernetes cluster or hosted somewhere else but just know this is the thing that's responsible for looking at those custom objects now if I click on this you will see that this is the same thing that I was doing earlier with the command line so let's click on this just to make sure it's all working and as this comes up we'll see that everything is working fine here right so now we have HTTP been showing nicely in our browser and now we can go and start to complete the whole end-to-end solution here now remember what your reconciler is responsible for you also need to make sure that if something is already in sync meaning if the users desires state defined in Yama file matches you have to make sure that you don't keep updating the service unnecessarily so what I'm doing here is I'm fetching data from the hosted cloud run API and comparing the results with the desire state and if there's nothing to do I just skip it now here's the other thing what happens if you delete the resource so if I say Kubb CTL get resources I can also delete them so if I delete it and we'll delete the thing called HTTP bin your controllers responsibility is to detect that that happened and now you need to delete it upstream because there is no declaration that says it needs to be around get-ups only works when these controllers are handling the full lifecycle of these resources so now if I go back to cloud run we'll see that it's gone so we get now the data model is the fundamental piece for this get outs workflow we understand how operations gets encapsulated in many ways encoded into these control loops and now let's complete the puzzle will keep our controller running just because and now what we'll do is we'll take a different approach now instead of me creating these things manually we'll see that there's nothing in here now I have flux already installed so ku CTL get pods - in flux and you'll see that we have a flux controller that's the thing that's been told to watch a specific git repository and then it's storing some state instead of memcache but we won't be touching container images that need to be patched we're basically using this as a proxy to control cloud run now this may seem foreign to some people but this is how a lot of the cloud load balancers works as well most of those controllers are just proxies for something that runs outside of the cluster this isn't that different all right last thing here is we need to host this config somewhere so instead of doing this manually now we're about to go all in on this get ops idea no more hitting kubernetes directly we're going to adopt a workflow that most people find in source code development using something like git github is my favorite kind of git hosting site so I have a very simple git repo here and again flux has access to this repository and underneath here you'll see a couple of things one is there's nothing but a readme and there is no configs in here so the next thing we can do is we can fix that so what I'll do is I'll copy that same config will reusing earlier and then I'm just going to put it in this cloud run services now ahead of time I've already told flux that it should look for kubernetes configurations inside of these two directories now inside of this directory we can do a git status and you'll see that there's a new file here so if we say coups e-tail apply not apply we don't want to do that we want to check it in so we'll say get ad get status to make sure that's the only foul and then we'll just do a commit message add HTTP bin cloud run service and now we just push it now ideally your team may have a different workflow ideally you might want to do a pull request for someone to review and then have it merged but ideally once it's merged into a master or whatever branch you're telling flux to look at it's going to take action and remember flux isn't going to do the heavy lifting flux only job is to pull from version control and apply it just like we did on the command line to kubernetes so git push origin master and once we push that we can look at our upstream repository and what I'm also going to do is throw a watch over here so we'll say coup CTO get cloud services and we won't see anything there so now what we're waiting on it is for flux to detect but there's a change upstream and let's go and look at our repository now you'll see we just did this commit add a sheep in service we come over here and we see the llamo fowl in play so now we have everything inside of get this becomes our workflow any changes we want to the infrastructure we go through the repository now the note the dope think about this is if you install flux and let's say a hundred servers now you have all 100 of those clusters looking at the same repository flux has some really advanced features in terms of like rolling things out in progressive style or some canary type patterns I'm just using something simple here like make this config run everywhere and we come back to our window we can see about 40 seconds ago flux picked up that change inside a version control if I go back to cloud run we'll hit refresh here and now the service is running again so if you look at all the steps here is like hey you probably know all the fundamentals make sure you know what the inputs are to your deployment target it helps if you write a basket first to help you really conceptualize what the inputs need to be and then what you have the opportunity to do is to define your own data model in that kubernetes resource definition if you want to think about this you get a rest server for free all you have to do is define the data model and now people can actually use a get-ups workflow because if your controller is smart and it knows how to add and remove and reconcile things then everyone else in your organization can actually use a git ops workflow to manage just about anything even things that don't run inside of kubernetes I hope that was helpful that's the end of my live demo and maybe we'll jump into a discussion to dig into the details but thank you so much Stacy my 1 3 2 1 thank you I really appreciate that and it's been exciting to see what's going on in the slag Channel with all the comments and questions that hopefully bring in more of those with that I'll hand it over to Cornelia and I'll be here monitoring so I just hollered at me super thank you Tom oh and Thank You Kelsey for that fantastic demo I am sure that and I know that you had other events today I know you're speaking at other events so again thank you for making time for us so you probably haven't been able to tune in for a lot of our our talks over the last couple of days but those people who have been with us just saw you actually touch and program every one of the four elements that we talked about being critical to get ups you check things in to get you had a declarative configuration you had something which was reconciling that over into some run you know some environment which was kubernetes kubernetes was a place that you were storing those things even though you're not deploying to it and some reconciler at the end so awesome thank you for I think everybody you just made it real for every single person so the first thing that I have though is that so does that mean that I need to build reconciler stew do get ups for the most part I would say yes the the good news is there's lots of reconciler the truth is let me back up for a second I think there's still a lot of value if you have no reconciler today if you just get into the habit of saying we want to review what people are doing as a company culture that's a teaching opportunity for everyone on the team it helps people learn the actual automation tool and what goes in and out of it it also helps you roll back in case it doesn't work pretty sure a lot of people knows what life is like when it doesn't work so I think just having that cultural idea that we should check point share collaborate should be the foundation of all operational workflows but if you really want what we just talked about where you people can get this kind of assumption that if I define something in a very lightweight data model and it feels like everything just happens automatically I don't know if you can pull that off without that a reconciler or you'll just be doing things manually right you'll be checking these out calling coups ETL applied by hand and touching this stuff over here so I think that reconciliation loop is critical to this working yeah now some of those reconciliation loops are baked in or might have been installed by somebody if if your target is kubernetes so if you're for example deploying applications the reconciler czar there the deployment reconciler their services the replicas set so some of those reconciler czar already available to you and we've been talking a little bit about cluster api being a reconciler that is available for cluster cluster management as well so that's that's exactly right so the directions I was most people familiar with are a deployment that thing reconciles the containers running another docker the ingress Controller whatever cloud provider you pick or if you're using something like envoy or nginx inside of your cluster there's a reconciler making sure that configs the engine extra load balancer config matches your kubernetes definition so I think we're all used to those reconciler and I guess to your point they set the bar pretty high for people building their own reconciler because they hope you are managing the entire lifecycle yeah one of the things that I one of the ways that I express this whole notion of reconciliation is that the way that we used to program in the scripts that you talked about was that we had this this misconception that we could reach a state of done this so Oh once I've run the script I'm done but we know that in the cloud we're never done because there's always something that's changing whether it's an infrastructure change or it's a change that you made to commit that you then just did in and get yeah I think a lot of the job is to try to constrain the imperative nature what we're doing everything I just showed you is fairly imperative it be truthful but what we want to do is we want to encapsulate all that imperative workflow inside of that control loop so I think when most people think about running scripts they be a script to a server or they use SSH or some other tool and they hope that that script runs perfectly and if it doesn't maybe they add a few more if statements and checks over time but we know what we all do you just log into the server and you start fixing stuff by hand and hope it doesn't happen the next time yeah you actually touched upon something that will geek out a little bit on as well which is okay if the script doesn't run perfectly can I run it again thing yeah yeah the whole uh opponent think is honestly it's a North Star for a lot of things that we want to do but the truth is it's fairly impossible to guarantee that if I run one thing once at this time it will always work again right for think about the cloud one case let's say someone were to change my I am permissions on the service account I was using to actually push the cloud run it wouldn't work maybe 10 minutes from now so I think a lot of people it's always a point in time but I think just having that as our North Star and holding that bar is like where we want to get to makes us actually do defensive programming to make sure that we're actually cleaning up after ourselves and tracking the resources that we create one thing I didn't show in the demo is when I'm creating things inside of cloud run I'm leaving a label inside of the cloud run resource on GCP so that I can actually come back and find all the resources that I created previously so that way I can run in more of a stateless mode if I don't find a resource inside of kubernetes I know that I can go and clean up everything that I created without touching the things that I didn't create very good very good so you're actually storing the state as a part of the Rhys the target resource that you put in cloud run yeah so this is why cloud providers a lot of people will tell you for billing purposes for automation and security those labels become attributes that really helped us with the intend lifecycle management it's one of ownership who owns this resource and sometimes that's hard to tell if you don't leave a breadcrumb trail then remind yourself that you put it there so you mentioned imperative in and I wanted to go back to your notion of data so you were talking a lot about now it's data it's gone from being programming to being data and the term that I'd like to use for that is actually programming via data I'm a functional programmer at heart and in languages like scheme and Lisp that the programs are data and so there's this really interesting notion there and I you know I don't know your background whether you you come I don't know actually I used to write a little Haskell back in a day and I remember I was just like.he I was like I saw Lewis comprehension for the first time Haskell also has this concept of grabbed three values from one to infinity and since this is a functional programming language it can handle that just fine but it all fell apart when they start talking about monads because at some point you gotta touch the network and when you touch the network all of that kind of starts to fall apart because there are side-effects in the real world and this is where we jump off but you're right when you can contain the side-effects and get predictable outcomes then the world can be defined and it's very functional and as you put it incorporating data into that is a beautiful thing you see it yeah really fantastic so so I was a I was noodling on this idea the other day that I'd love to get your thoughts which is that that cloud native I mean we've been on this cloud native journey for some time and cloud native patterns for software design I argue are a little bit ahead of these more what I would call cloud native patterns for operations and I went back and I looked at an article from Martin Fowler it turns out it was awhile ago 2014 which was the article where he said you have to be this tall to ride the ride but you have to be this tall to be able to use microservices and in my memory it was all about oh the software architectural patterns but I just went back to it and I realized that he was in fact not talking about the software architectural patterns and there were companies like Netflix that really got great at doing that with things like Eureka and history and ribbon and all those things which interestingly enough now are already seen a little bit as legacy people are moving beyond those already but Martin was talking in 2014 not about those software design patterns but he was talking about things like rapid provisioning and basic monitoring because monitoring is so critically critically important he was more talking about it from the human perspective but the reconciler the reconciliation loops have to have data to be able to do things or rapid application development and so as is usual with martin fowler 'is maybe sometimes a little bit ahead of the rest of us he was already thinking about some of these operational concerns when he was talking even about the micro service patterns yeah I think I also saw this back in the day you know early 2010 the application servers I believe it did a lot of experimentation in this area right so the idea that you know this is back when people just had a handful of servers I don't like the distributed systems most people embark on today but back then I'll take JBoss for example it took a lot of operational things and just put it right there in the application server so you could just focus on building your application that would then get deployed or hot deployed or no reason these war files in a particular directory and they would unpack it for you and replace the other one and ideal not drop any of the requests that were bound for your particular piece of code and when I looked at that a lot of the stuff that we were doing their dynamic binding there was clustering technology there was interfaces that you could actually swap out for storage and networking and it was also a clean API to debug how the app we were using these Jam mechs you know in beings to get and poke and prod what the app was doing and I think what we're doing now is we're expanding that at the infrastructure level so as no longer as at language-specific we're saying we can probably provide all of these facilities for any application by baking that same logic into something like kubernetes and Prometheus and etc very cool so what we're doing here then with kid ops and and actually I would love to have you come back because you said something at the very beginning of your talk you said going from ops to get ops and just for our audience if you could just re summarize what you mean what were the essential elements and going from ops to get ops then I have a follow-on question after you uh after you describe yeah so I think I'm just using my personal experience here so when I started out in operations the thing I first had to learn was what is the thing I'm operating if it's Linux what are the commands I can run how do I get information if I'm gonna start an application back then I had to learn in its script so this is before system to unit files and you just learn all of these interfaces right and they all come together to hopefully deploy your application and then you wrote scripts to automate that I got a do I have config for this part an init script for this part at git for this part and then these become your interfaces but they're so loosely coupled they're not well-defined it's all over the map but once you get a handle on the interfaces the inputs and the outputs they produce now you're on the foundational stage to say okay now I can actually automate this right you can't really automate a thing you don't understand or think you don't have a good process for so now once you get that I think that's the first step towards this get ops what kubernetes has taught us is that we can put a data model in front of that and I think that has been the the missing link we just started writing scripts and using bash interpreters to do this stuff imperative liyan live and it neighbor gave us a way to back out dry run we never had a data model to do any a policy enforcement on it's all about can you run this script or not and that's just too heavy-handed for what we need to do so I think the next phase is if you understand how to automate a thing because you understand the thing you're automating now you have to define that data model and if you can define that data model now you have the foundation to create that reconciliation loop and then we just adopt those developer workflows that we use for soft development branches tags code reviews that's discipline that we use to drive our automation and I think that's where you really start to get the benefits of this thing we call get-ups awesome and I love that soundbite which is you talked about the data model actually now giving you the surface area against which you can apply policies it's impossible to apply policies to a series of steps that are non-deterministic and how do you apply policy against that you always have to have a surface-area policies are applied to tenants or they're applied to control or you know it a something that's happening out in in the cloud run environment you have to have that surface to apply policies to and they give us a contract to promise to the people who use our platforms when I run a basket that says do a thing do a thing where do a thing with what and it's and it just loses the transparency but what we do now it and maybe command line flags help but now we get to flip that around and say here is the thing that you're allowed to ask for and who cares if I change how I do it later as long as I give you back what you asked for and that contract is really powerful because these systems evolve over time yeah I like how you encapsulated all those things again together with the data model the delivery mechanism and the reconciler at the end and what's starting to form in my head around that is this notion of O&E you had a word for it when you were talking about puppet is these OMA jewels is there were puppet modules and now we can start to envision we can start to like get this vague feeling of maybe there's get-ups modules where all of those pieces are put together into a module that we can deliver that now gives you that modern operational experience yeah a lot of Googlers are working on this if you think about it we have a project called kubernetes config connector and we have a bunch of reconcile ders for all GCP resources along with their data model and when you install it it's almost feels like you're installing it as a module KCC gives you all this functionality and then channel to the Microsoft they've been thinking about this idea as well even packaging applications as whole models and if you think about it some of those applications not only define the kubernetes configurations but also those reconciler x' interfaces and even sidecars that go with it so a lot of organizations big and small are doing lots of good work around this idea and I think we might get to the point where people just start installing these packs aka operators if you will yeah very cool one of the other projects coming out of Google that we've been poking around with and experimenting a little bit is when you start thinking about having this declarative configuration that's sitting in a repository somewhere in the simple example that you showed it's a single repository with a simple yeah Mille but when you start deploying very complex systems you're going to have things in multiple repositories and you're gonna need to do some type of and you're also potentially going to have fleets of deployments and so are the deployments exactly the same across the fleet where they're they're the differences and so on so one of the projects that we've been playing with this kpt which allows you to essentially have if you will a hierarchy and inheritance from git repository to the next git repository with provenance in place so that you can actually start programming against that inheritance hierarchy that's a lot of Googlers and you know I work with a guy named Brian grant at Google and the project is called kept and it's like a play on words for act you know app get kept git and that's getting closer to this packaging of these things cuz you're right sometimes we need to assemble a package of resources from other places right because that's a little bit separate on how we manage those things and then once you have that package of course you may check it in somewhere else and these tools may be able to pull from those packages versus raw git repository think about software development we don't pull raw software from version control that's not usually I always look for the binary I don't build your source yeah you want a binary or a package or in the nature where we're talking is a container so I think get-ups is cool but maybe we have to look at what we learn from the software industry that pulling directly from get may or may not be the right thing because think about if I want to share something with someone I may not want to give them access to my git repository I may want to give them a subset or the Bilt compiled version of my repository so I think that's just going to be area for you know thinking and improvement yeah yeah that's fantastic so I do have two so one one last question then a final comment at the end and I'll give you a chance to close it out so I I've used the term cloud native operation several times and with your background in operations and you've seen this transformation we talked about this pattern today this get ops pattern which really lends itself really brings a lot to cloud native operations and the other patterns off the top of your head that you say oh these are essential patterns that we need to realize kind of this next generation of cloud native operations I mean there's so many good patterns to be honest whether you're talking about sampling data for metrics or tracing that's a big pattern everyone in the brain thinks they're gonna take all of the data that they get from their logs and all the data they get for tracing and store it somewhere until they figure out that they have petabytes of noise right so there's patterns to think about how do you filter to that but honestly there's there's a different pattern I hope people pick up in this space you need to learn how to make a decision right that decision out until you get to a point where that decision is no longer working and then make another decision and go from there there's a pattern there like you just got to make a decision and a lot of people in this cloud native space right now are paralyzed there's so many tools coming out there's so many patterns people are Schenkel there isn't one pattern for all of these things right the idea of Zipkin and passing around an HTTP header for tracing data is only one pattern for tracing things there might be other patterns but what you got to do is at some point you got to make a decision that hey I want to adopt something like weave flux I'm gonna learn it and be patient with learning it and make sure that I can get all the value that thing has to offer before I start going to hunt for something else and that's the part we haven't mastered because most people that are really good at this all the projects you talked about Netflix and Eureka and some of the tools they shared they made a decision to do something and then they worked on it and they were able to share the results of that decision imagine just hunting for the right tool week after week month after month you got to learn how to make some decisions and I think that will help with any other pattern you decide to adopt because you know how to evaluate it and then I'll let it go and it's no longer necessary that is great advice so the last thing that I'll say before I give you a chance to you know give us some some parting wisdom is that I will point out that it wasn't lost on me that we got a dope thing from you which is when you said that the dope thing about this is that you can have flux running in a hundred different clusters they can all be pointing to the same repository and you can get that repeatable deployment so when you get a dope from kelsey during a during a tech session that's a pretty that's a pretty good day so thank you all that's here yeah my parting words are if your system administrator and you're looking at this get-ups and you're a little bit intimidated i think you should find comfort that you probably know 90% of what you need to know to approach this problem start with some of the tools go through the examples but I want you to pause for a second and look at what you already know and see how it applies because I bet you and I hopefully I was able to demonstrate today that the fundamentals are almost exactly the same as you were doing before it's about how we captured those fundamentals encapsulate them let the tools do the heavy lifting for us that's what's new so it's about leveraging your existing skill set don't forget the fundamentals and you'll always be able to safely evaluate this stuff and yeah there'll be a learning curve but don't believe that the fundamentals are what you're learning you're just learning the particular take on the fundamentals that's great Kelsey thank you so much for joining us today we were so delighted to have you here and we look forward to having more in the future oh thank you thank you
Info
Channel: Weaveworks, Inc.
Views: 3,233
Rating: 5 out of 5
Keywords: linux, containers, docker, networking, weave, sdn
Id: jbDidLauGtQ
Channel Id: undefined
Length: 49min 56sec (2996 seconds)
Published: Wed Sep 09 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.