One Chart to Rule Them All: Continuous Deployment with Helm at Ticketmaster - Michael Goodness

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

my name is Mike goodness I'm a systems engineer on the Cooper knots team at Ticketmaster joined by my colleague and co-worker Raphael who was also a systems engineer on the Cobra knots team at Ticketmaster before we get started I have a couple of notes first of all I want to say hi to my family and friends who are watching this on YouTube despite not having any idea what I do for a living secondly I want to apologize again to everyone who's watching and I was hoping to see some sweet Lord of the Rings memes given the title of the talk you're gonna be sorely disappointed although I could say a Ticketmaster one does not simply cube cuddle create a little background I have a couple of years of production experience with kubernetes both at Ticketmaster and at my previous company I'm a Helmut arts contributor and Co maintainer recently became a CNC F ambassador and I co organised the DevOps days Madison conference let's see my name is Raphael Dean I'm an open source enthusiast I maintain the sanik which is a Python web framework also pretty new to kubernetes probably started using about six months ago and this is my first time speaking in front of so many people the the code there is you know be gentle this is what we're going to talk about I gonna keep it you know kind of nice and tight hopefully we are going to kind of assume some familiar some familiarity with helm I will cover a few of the basics but you know I'm also going to be picking apart or kind of dissecting a few sections from our our web service chart and it's going to assume some knowledge of how helm works and kubernetes manifests but I this seems like a good crowd to have to have kind of that baseline so anybody who's worked with kubernetes knows that you're working with quite a bit of yamo like lots of llamo you want to deploy some pods so at the very minimum you need a deployment llamo you probably want to expose those pods behind a load balancer so you have a service llamo you may or may not well I mean assuming you want to actually be able to do anything with those pads I would you know from outside the cluster you want an ingress say it with me yamo I'm not gonna I'm not going to show every possible resource because as we all know there are a lot of AP is now in kubernetes and each one involves yamo so you take you know all of those all of it all of that ya know file that goes into building an application deploying an application to kubernetes you multiply that by you know the number of clusters that you may have in your environment you know and and each one has a certain you know certain configuration points that you know that are different than the others you need to deploy different versions of the same application and each you know some of them have I mean at the very least your docker image tag is going to be different so that's you know that's a different configuration point you know the problem with kubernetes as it is today is there's no real way to bundle all of those applicable all of these resources all these manifests together into a into a single unit you know into what we really think of as a deployable application we have labels and that's you know that's the native mechanism but there's there's no real enforcement you know you you other than manual manipulation making sure that you know each manifest has a label you know to to create that Association there's no other real mechanism for it and that's where helm comes into play so helm again at a high level allows you to treat your kubernetes application as a single unit it provides a rendering engine most mostly focused on go template though that is pluggable with a great deal of difficulty it provides a package and application manager is that butcher did I hear butcher laughs no all right so it can act as your application you're a package manager akin to you know the way app to yum or apk works for your Linux distribution and then it also provides a release manager so that if you're deploying the same application multiple times you can track that release you can you can you can track those those related manifests or those related resources and alter them as a single unit some terminology really and you know I should have put this build a point last obviously when we refer to helm we're you know we're going to be referring to the complete application but it's also specifically the client-side tiller is the server-side application that actually lives in your cluster and communicates with the API server so one of the reasons we did this is because we have a lot of clusters last we counted there 15 total have hybrid cloud so we have AWS and on Prem we have different regions for each of those and we also have multiple environments so as you can see that's quite a few clusters there we also have one namespace per team so every team gets their own namespace this is facilitated through we have a tool called namespace creator internally kind of like how github was if you went I heard that in the keynote they have something similar so the team basically a product team that wants to use kubernetes cluster will fork a repository that has a list of the enabled namespaces and add their product code to it and then that will perform some validation to make sure for instance that they have one thing that we enforce is that they have a contact technical contact filled in another system so that if their stuff goes down we know who to reach out to let's see the namespace creator also provides role based access control via Active Directory groups so if you know you want to give somebody access to that namespace you add them to an Active Directory group and it also enforces resource quotas on the entire namespace and deploy is tiller the server side of helm for each namespace to isolate things the resources and the creation of them everything is separate finally another detail here is that we have multiple as a result of having the hybrid cloud we have AWS ingress controller which uses the application load balancing and gorse controller which we developed jointly with Kouros and then on Prem we just use the standard nginx ingress controller but as a result of all these complications or you know different details that's essentially why we developed the web service chart so yeah as Raphael said this we refer to our our common a helmet art as the webservice chart who really couldn't get any more generic of a title than that we developed it originally so our original approach was that it would be just kind of a template for teams to fork and customize as needed that's you know that's a pattern that we've seen in the in or that we've used in the community charts repo you know each application has its own chart it is tuned to that applications needs and as it needs to be extended PRS are filed against it and we you know we integrate those changes there are some pros and cons as there are to everything to that approach the pros as I mentioned is that or you know the the single Pro really the biggest one is that that chart is tuned to specifically to that application there are no extra configuration points to confuse a team or to you know to conflict or just to you know cause problems it's it's very purpose-built the cons are that you know there's no there's no commonality between those charts so when team a you know discovers a new way to do something team that that's not easily communicated or shared with team B you know there's no shared best practices or you know not it at least not easily you know requires some manual intervention people actually you know talking to each other which yeah you're right yeah but then you know kind of the the inverse is that when there's when there are cluster changes you know whether it's something that we've done to add or remove functionality we being the cluster Ops teams we then need to communicate those changes back to the teams again communication alright or when there's an upstream bug that is either discovered or fixed that requires a change and chart functionality that needs to be shared so it's just you know it's it's all about sharing and the difficulty therein so a few months ago we decided to kind of flip that and as you can see from the bullet points like really flip it so now the coup Bernards team maintains that web service chart manually and we share it across all the teams so when a team deploys an application using the chart they point to our you know to that one helm repository to that one chart in the helm repository rather than you know having a team a chart and a team B chart so the pros again as you can see are again flipped you know we have some there there's no need for commonality because there is only one chart team a deploys that chart using the same values or the same you know the same options are available to team a as are available to team B if a team discovers a new better way to do it they can submit those improvements via PR likewise when we change something in the in the cluster or there's an upstream change we can integrate that into the cluster cut a new version and you know send that out to the teams who you know then update their pipelines so that they're deploying using that new version of of the chart this has worked out you know really pretty well so far I don't know if you know I don't know how long it'll work you know before we have some really customized applications that need you know that need special care and attention but we're hoping that this can get us pretty far so what I'd like to do now is kind of a la Vika glazes demo yesterday of the community charts best practices dive into some of the some of the things that we've done in the web service chart to support deployment to hybrid cloud account for the differences between those environments so basic structure here we have the usual usual components of a helm our chart yeah Mel you know metadata we have the templates folder and then our values files so I'm going to dig into the values file and because I'm using a large terminal this is going to be lots of scrolling as you can see this values file is 251 lines long 60 256 characters so to say we provide a lot of knobs and dials is putting it somewhat mildly a few I'll go through a few of these options so we have things like you know a configurable AWS region I am role revision history limit let's get to some more interesting ones of affinity if a team and most teams eggs have actually adopted this we're probably going to flip this bit because most teams do want to have some anti affinities so that their pods get spread across availability zones here we have that set to false just because that is kind of the you know it's not it's not out-of-the-box behavior but again we're probably going to flip that pretty soon a couple of things that I will mention in just a few minutes our replicas count and max replicas counts so these come into play when creating the deployment replicas count is you know the number of replicas kind of static replicas that you want minimum and then max replicas count is used by our horizontal pod autoscaler component so again I'll cover that in just a just a second and then we have an option for max unavailable pods which is used by a pod disruption budget which again I will show in a couple of minutes so I'm really just going to kind of scroll through the rest of these some standard stuff if you used a community chart before you're gonna recognize some of these values because we've identified those as best patterns so you know being able to specify the service account name being able to add custom pod annotations and labels we recognize that you know we don't want to provide values for everything so some of these are relatively relatively freeform where they you know they're just expecting lists or whereas you know others are expecting actual objects ingress settings so again a 62% of the way through and I've scrolled enough so the first resource I'd like to show is our deployments after a deployment object pretty standard manifest with the addition of lots of curly braces which you know really improves readability I know but you know it's a it's a standard manifest that we just have made very configurable one particular item of interest here is what we've done with this conditional around our rolling update strategy we've we've said if if we're only deploying one replica we want to set the maximum unavailable to zero so that when we do a rolling update we're sure that we still have a pod running you know during that update so it'll bring a pod up rather than killing the old pod and bring you know before and then not waiting until the new one is our you know bring your pod up and then not waiting until the old one new one is ready it makes sure that at least one pod is running during the rolling update so that is actually something we did that one of our teams discovered they were having you know they were having outages with any time they were deploying you know best but like best best practice would be don't run one replicas but you know we try to accommodate we've added let's see I am role so this again this is a pattern that use in the community charts repo for AWS specific applications being able to add the I am role as an annotation and it containers there's a there's a bug in versions of kubernetes less than 1.8 that ignore the standard init containers object so the fix for until 1.8 was to revert to the pod bay de cornet east io / and the containers annotation so we've accounted for that let's see there are here's the anti affinity rule that I mentioned in the values file I think what's a particular interest in what we actually kind of called out in our summary of the talk is the ability to add side cars to your applications so we've recently started pursuing Jaeger for tracing distributed tracing so we added a Jaeger dot enabled flag to our values file if you said Jaeger not enabled to true the Jaeger agent gets added as a sidecar and then we provide other configuration points so that if a team is testing a newer version or a different version of the image they can plug that in resources also this is a pattern that that Vic mentioned yesterday and just kind of FYI you should definitely when that video is available check that out because if you contribute to the community repo we will be very grateful if you follow those patterns like day one but in this case I'm referring to the resources section here where you plug in your resources because you're you know different teams are going to discover different resource needs so this is how we accommodate that fluent D some of our teams are using fluent D for log collection and forwarding so we have provided a very simple one-off configuration point for enabling a fluent container sidecar I've done the same for Splunk but what's what's really kind of interesting is that we acknowledge that we can't cover all the bases we have no interest in covering all the bases 251 you know lines and our values file is you know pretty good we don't need to add you know options for every possible sidecar that somebody might want to deploy so we have here just a plane if you want to provide a custom sidecar just give us the ammo and we inject that right into the manifest and then that gets added to your pod there's yeah there's you know plenty more in this deployment file that I would love to go into but I also want to give Raphael a chance to talk at some point so what I'll do next is show our service so just like we want to be able to enable sidecar containers we want to enable Prometheus like OneTouch Prometheus metrics scraping so we have a metrics enabled value you set that to true and helm adds these annotations to the service and then our standard Prometheus configuration starts scraping those services scraping the metrics off of those services automatically let's see then the other thing I wanted to point out in the service is right here we have a conditional we wrap our service type in a conditional here so when we deploy to AWS we set the platform value to AWS and if we're requesting an ingress we said ingress dot enabled equals true the way the alb ingress controller works is that it you know it requires a node port to be exposed it attaches to the node you know the nodes port so we have this conditional that says if those if those two values are set this service is going to be a note port if it's if they're not set then they you know it doesn't get set explicitly and the default kubernetes behaviors to provision a cluster IP service we have discovered that this it's not quite a binary thing so this is probably something we need to revisit soon we have services that are not deployed to AWS and do need to be note ports and you know they're not always going to be cluster IP services you know even though these conditions aren't met so again that's that's something we learn and as we do we make changes and everybody benefits speaking of ingress let's take a look here the entire manifest is wrapped in a conditional so if ingress if if an egress isn't required we don't bother creating the resource you know pretty straightforward but here again if we're deploying to AWS then we need to add a set of annotations that the alb ingress controller uses to provision those application load balancers and there are quite a few of them AWS provides a wealth of options configurability and the alb ingress makes does really does a good job of supporting them also if you have a TLS certificate that you want to attach to the alb you can provide that you can try to its Arne here you set the health check etc all of these options some of which are kind of sub templated for example the security groups if whoops we have a helper template that lists our ten our subnet names based on the type of sub based on the type of alb we're asking for if we're asking for an internet facing ingress then it puts the security groups and the subnets that correspond to our public facing infrastructure in those annotations if they are internal only alb endpoints then we use a different set of security groups and subnets so that that kind of covers how we deploy or how we handle the AWS case if we're at the point you are unplanned premises its Rafael mentioned we use a shared nginx ingress controller in those clusters so we so we don't need the AWS annotations we just need one that sets the class to in our case shared nginx and then here we have you know a few of the usual suspects you know annotations and we can set hostname we actually do have another helper template that will provide a default fully qualified domain name that can be overridden by teams if they want to use you know their own custom hostname as promised I'm not going to go into config Maps they're really pretty pretty bare-bones for the most part fluent E is just kind of a default configuration config map the config map yeah mo is basically empty all we do is you feed in you feed in a key value of you know what's your what's your config name your config map key should be and then you feed in the data and we just stick it right in the manifest it's it's super elegant what I will show though because I mentioned it earlier is the horizontal pod autoscaler so this is where those two values max replicas replicas count and max replicas count come into play if you provide both of them and met your specify a higher max then you know kind of min I guess we could have called it min then it will create this manifest and it will use those values in the in the appropriate field so in the spec you have max replicas and you have min replicas you know anybody who knows the HPA realizes that that it is if you if you have the custom metrics api enabled you can scale based on you know Prometheus metrics for example and this this that's something that we really like to offer like soon to our dev teams they are really keenly interested in being able to scale based on things like requests per second you know being able to spin up new pods when you hit a threshold for now though we are using just the base functionality which is CPU so with that what the HPA does is it looks at the average CPU usage across pods and it when it when it exceeds a given threshold it will create more pods up to your maximum until that threshold is no longer being met so a little bit yeah inside kubernetes and not really inside but bonus kubernetes information the last one I'm going to show is pod pod disruption budget this is a relatively new ish resource that we are also finding pretty convenient this is when we're doing when we're doing manual node native maintenance say we need to take a node down for whatever reason when you do a cube drain its first going to look at your pod disruption budget and make sure that you keep that it that the scheduler keeps your max unavailable pods that number of pods it's going to affect no more than the head number of pods during a maintenance during maintenance so this is again something that was highlighted by an application team they were wondering interactive you know why all of their pods are why would they were returning five hundreds you know at the same time that we were taking a pod down for maintenance and it's because of a few things I mean you know all of their pods got Co scheduled which you know oops but something like this you know the pod DB is meant to address that so that even if you do get close scheduled this will make sure that you know that X number of pods will be will be up and running before it kills the rest of them so it comes in very handy you'll notice we have a conditional around this manifest as well this so if release dot is install is a helm condition that says you know we're only going to create this if this is an initial installation the reason is that the pdb at least as we understand that the pdb is currently immutable so if you try to change a pod disruption budget after it's been created your helm is gonna flip out a little bit you're gonna get an error message that it you know resource already exists or so their you know field cannot be changed so there are a couple of issues I know about that are open on this so hopefully that gets addressed soon we are I am running super long so at this point I'm gonna hit it over to Rafael to try to squeeze in the last five minutes all right you guys hear me all right so I'm just gonna run really quickly through an example of deploying our deployment essentially with the web service chart let's see if you're interested the code is available on my github brothels hello cube con it's pretty minimal it's just it's not actually going to include the web service chart or anything but if you want to go and look at the files in more detail that I'm pulling up that's where it is what the example does just basic go application like hello world it's gonna run go to us it's gonna build the application push the container image to our container registry which in our case this ECR amazonia CR and then finally deploy the application using the web service chart and helm hope thank you so this is the app in question as you can see just as hello a cube con I'm gonna just show you the files here really quickly simple go app here's the test my hands are shaking like I was hoping they wouldn't this is you know just checking to make sure the status code is 200 and that the message is as intended and let's see what else so this values file this is a more minimal one as you know you don't have to fill in a lot of those values were just already defaulted in the other thing so here's like a kind of minimal example for what you would need to deploy this as you can see I'm just the important stuff here is that I'm telling it hey use the app I mean this is where I'm gonna push the container so that's probably the most important thing this is shared values that will be used both for AWS and on-prem if we look at one of the others you'll see I also filled in the fully qualified domain name and the platform is on-prem so yeah you can specify multiple values files for helm and it will use a you know they can overwrite each other and stuff these are completely not overlapping so that's not even the consideration anyway so we can go to the website I already deployed this so we wouldn't have to wait for the whoops how do I do that three fingers yes yep whoops I also need the fqdn oops so you know the whoops we have to be on the these are internal only or so you have to get on the VPN so might I yeah we're intended to do that earlier so yeah hello Keep Calm that would be showing up there but but yeah it's only it's not exposed externally we've had to configure that separately I did not put that annotation on there or up yeah so we won't be able to see the hello Keep Calm but you can use your imagination it's just regular text what we were going to do next I can do it but it will be far less exciting is a modify that message or the fqdn and then do a get push you know so go over here so yeah just imagine that that we had seen the hello cube con in the web browser it's really there I promise you let's see what else oh yes the test will not be happy if we don't change that as well but yeah that so you know the point is that once we do this I didn't go over actually what get lab runners are but you're probably familiar with them in some flavor or another it's a lot like Jenkins or Travis CI and essentially we're just running some code in a you know to find the container of choice and so you know whoops oh yeah well we need to be on the VPN to push as well so there we go those but yeah next you would have seen that the message changed after a moment yeah so now if you have any questions or yeah and if you have questions about the any of this you can contact us there Mike's on Twitter or you can see my email address [Applause] [Music]

Info

Channel: CNCF [Cloud Native Computing Foundation]

Views: 15,382

Rating: 4.8709679 out of 5

Keywords:

Id: HzJ9ycX1h0c

Channel Id: undefined

Length: 33min 41sec (2021 seconds)

Published: Fri Dec 15 2017