Progressive Canary Delivery for Reliable Deployment

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

morning all right close call there good afternoon good morning or good evening depending upon where you are in the world and welcome to today's devops.com webinar i'm charlene o'hanlon moderator for today's event and i welcome you as you can see we have an amazing lineup of speakers for today's presentation it's going to be a great conversation folks but before we get started we do have a few housekeeping items we need to go over first of all today's event is being recorded so if you miss any or all of the event you will have the opportunity to access it later following today's webinar we will be sending out an email that contains a link to access the webinar on demand and we are also taking questions from the audience so if you have a question for any of our speakers or all of them at any time during the presentation please don't wait don't hesitate just use that question and answer tab on your interface and submit your questions and we will try to get to as many as we can during the webinar we also have a very interactive chat feature here so i do encourage you if you've got a question suggestion answer comment whatever shoot it on over in the public chat and we'll take a look at them as they come in and hopefully we can maybe even get it uh make it as part of the presentation itself and uh we also have a couple polling questions so always encourage everybody to take part in those and then finally at the end of today's webinar we will be doing a drawing for four 25 amazon gift cards so please stick around hopefully you will be one of our four lucky winners all right with that let's go ahead and kick off today's webinar which is progressive canary delivery for reliable deployments a panel discussion with netflix snap aws and playstation good stuff there our speakers today are andy glover who is the director of productivity engineering at netflix uh paul roberts or powell roberts as we said earlier principal solutions architect over at aws uh kartik dadwal who is the senior product manager at armory pair kyle who is the perry kyle see i did it i knew it missy you guys put it in my mind it was pear but it's actually perry kyle software engineer over at snap and matt richardson who is the manager of devops cicd over at sony playstation guys thanks so much for joining me today what a great lineup we have thanks who's going to kick off kartik you're going to be kicking off the conversation right so i'm going to take myself off camera put myself on mute and uh i'll be on the back end if you guys need me for anything thanks a lot charlene uh hi everybody it's a pleasure to be here i'm surrounded by folks in the industry that have a lot of experience um doing deployments so i'm really glad i'm here and i hope we get a lot out of it myself included and we have folks from netflix aws snap and playstation so you're on the right company uh a little bit about me my name is karthik and i lead the product management for kubernetes at armory uh before that i started analytics company before joining army so i have a little experience building software myself and i also have my mba from ucla anderson so go bruins um and today as charlene shared with you earlier we will be talking about progressive delivery and i have some amazing panelists with me today like i'm really excited to actually share this webinar with you guys so um why don't you guys take a little bit of time and talk about uh about yourself or you know what you're doing at this company and then you know a little bit about uh what motivates you outside just work in general so um andy why don't we start with you sure uh that was a lot so yeah my name is andy glover i work for netflix currently i run productivity engineering uh which is a organ side of platform engineering and we're focused on making developers and netflix more productive uh previous to running productivity engineering i ran delivery engineering and that is uh the team uh we created uh spinnaker a number of years ago and worked a lot with you know the good folks on this call and others uh in open sourcing that and growing it uh both in netflix and in the community um outside of work i tend to run bike and swim a lot so that's triathlon andy you do all the stuff thanks a lot andy uh really happy you guys are here paul why don't you go next yeah thank you i really appreciate it happy to be here uh so i'm a principal architect at aws um i work with many large customers who are you know deploying software at scale and you know over the past few years definitely observed a lot of uh interesting trends with folks using things like spinnaker and really focusing on progressive delivery so you know my insight is going to be kind of the trends and observations that i've seen with our customer base but then also i think there's certainly overlap in terms of how aws deploy software at scale and you can see that from our you know almost daily announcements for fun uh i don't do swimming as much but i like to do a lot of mountain biking dirt biking a lot of hiking when there isn't smoke in tahoe but other than that i'm really happy to be here thank you paul um matt why don't you uh connect that'll be awesome yeah so my name is matt richardson i've been in the devops space here at sony playstation network for about five years and it's been quite a journey coming from deployments to on-prem and data center bare metal machines to vms hosted in the public cloud and our next journey is to containerized workloads running on kubernetes and evolving the way with which we do those deployments working now very closely with spinnaker to do progressive delivery um so definitely a lot of uh historical context on our journey there um and then outside of work uh paula you and i should hook up and go for a ride sometime because i'm also into mountain biking and dirt biking that's awesome thanks man yeah i'm actually planning to go uh swimming with andy and he doesn't know about it learn the thing from him about you and next we have perry yeah my name is perry i work at snap which is a company that makes snapchat uh we i've been at snap for about four years originally started on developer tools working on our in-house uh continuous delivery deployment system that we had and then we eventually kind of repurposed that into using spinnaker for much of our deployments as we moved into kubernetes so um been kind of switching to that team where uh our team is involved in uh provisioning deployment uh as well as interacting with our internal service message which is our own envoy based service mesh similar to istio if anyone has used the open source seo so that's with my main focus in my free time i am a hobbyist wine maker so that's my post technology career that um almost got my degree in so getting there that's pretty awesome i feel like um all of you guys have something that um outside work that i love to do whether it's swimming or running and you know since i'm getting old wine is something that i've started to pick up so thanks a lot for bringing it up um now to stuff like kickoff um i think we have audience from all walks of life um there are folks that are um thinking of thinking about a better way of ways of delivering their software and you know some force folks might have a little bit more experience so it'll be awesome if you guys can start and level said um annie why don't you uh start and kind of like share with us what do you think and what do you what is your understanding uh when people ask you about progressive delivery right and let's assume that people um on the call have a little bit understanding or they they would love to know a little bit more sure yeah what about it and i think actually uh it's fortuitous that today um we're doing this uh webcast or webinar excuse me um if uh you look at like any business you want to be able to obviously move quickly and and understand the market and in fact today if you google uh netflix games you can see that netflix released um essentially a canary of uh some mobile games in poland to people on android devices and so the idea of progressive delivery is just that in the sense of you can [Music] fashion let's say a deployment kind of pipeline that enables you to you know reduce the risk of breaking all of production but also gives you an opportunity to get a signal valid or invalid uh with a limited blast radius and so i use the gaming one as an example of kind of like what a business would want to do and every business you know represented here in this uh this webinar would want to do something similar in the sense of it you know whether it be a new business line games or just a new feature in some sort of product rather than just saying let's deploy it to the whole world and you know cross our fingers and make sure it works the idea with progressive delivery again is let's deploy this feature to let's say a small amount of users validate that it works and if we get a signal that it's good then we can roll it out to a you know a larger audience hence the term progressive absolutely thanks a lot andy and i think you touched upon um some some of really interesting things that i would love to dig on later on where i'm sure folks are interested about like if i'm using kubernetes versus vms how do i do this progressive delivery right platforms also matter but i'll come back to that in just a tad bit i know uh perry at snap is doing something really interesting because snap touches a lot of people right i mean there's millions of users using snapchat today do you have some interesting insights that you think are really unique and the audience can learn from what you guys have done at snap in terms of progressive delivery yeah excuse me yeah we've you know when i came to snap four years ago we actually ran the entire app off of one api so there is one big api it's probably similar to a lot of you in the audience that might have a monolithic api that you maybe can deploy a couple times a day and you're happy if that's good enough um and then as we've had to like change our direction and became you know snap became more important as far as availability and latency and things to our users we switched to a more service-oriented architecture so we have microservices and we migrated people away from using like a single api that everyone tries to release on right and what we've seen from just that sort of uh migration is is crazy so i checked the stats this morning we've deployed 600 times to production this morning and so versus like when i came to snap my big first project was to figure out how to get us to 30 deploys a day so that that was like a big effort just to get us to be able to safely do that so that's the first thing like as far as like getting to i think a state where you can progressively deliver things is to have service oriented architecture like if you have a model if it's quite difficult to do that you could but i think like allowing future teams to have their own app that they're responsible for and that they experiment with is the first step it's kind of the product you know predecessor to being able to do that and then two like one thing we've been trying to you know the first step for us i guess was just kind of getting canary deployments to work which is one aspect of progressive delivery now just allowing like a randomized set of requests and so to hit a certain back end that you can instrument right and that's worked fairly well but there's still some edge cases that we're working on figuring out and i think that's where a lot of the progressive delivery stuff kind of is a little bit different than like your idea of a one box deployment or canary deployment and that's where it is is like how can we test it on a subset of users and not affect all users with the equal amount of traffic and i think that's what andy touched on is like how do we target new things to certain regions or certain users or like certain a b tests and that's kind of what we've been working on internally at snap right now is like how do we allow our our main edge to actually route users to specific endpoints or specific pods in kubernetes and that's something that we've been kind of experimenting with on a couple of our bigger services and it's been working fairly well and we're going to expand that program as well so that's kind of uh you know i think some of the main stuff we do there uh that's you know is when we have so many users in so many countries as well it's very useful to get that data uh from specific users in specific countries because there's certain bugs that only crop up in other regions as well so that's something or even device types right that's the other sort of dimension you can split on a progressive delivery is like maybe some old devices start crashing the new ones that all your developers have all work because they all got iphone x's or or above right and then it looks fine there then like everyone that has a old iphone sc like snapchat doesn't load anymore so that's something that we've really been trying to figure out is like how do we merge all these uh cuba processes together into one delivery pipeline and not all the way there yet but that's something that we're working on integrating more so thanks a lot terry i think as you talk about i think you touched about something really interesting which was um canaries and progressive delivery i think the audience can really uh benefit if we can clearly sort of help them understand that these things are a little um different even though they are connected it's about it's about uh releasing your software so matt and paul i would love to um you know hear you guys talk about can you lay out the difference between canaries like doing canaries and progressive delivery like what is the difference between the two for the audience so matt you can go first and paul please feel free to build up on that if if you may please yeah absolutely um so that aligns well with some of our journeys so we initially started deploying with what we called canaries which was meant to playing up a pod or a vm to a production environment so any change would only affect a subset of your customers or a subset of the api calls but once we deployed that it became live immediately so that was taking customer traffic so we were getting signals back to know whether it was successful or not but if it was not successful uh we may be impacting our customers which we don't want to do we want to enable impact free changes to deliver only the the benefits and value that we want to um so building on that and then some of the other capabilities that you have with progressive delivery is the ability to let's say deploy to a production environment but not take traffic yet keep it dark if you will um to allow internal testing say in a production environment where you have all the same back-end connections that you may not have in your staging environment and then once you're healthy there and you're getting your health signals on that dark side then progressively uh direct or switch traffic live customer traffic uh to those healthy whatever maybe pods or got it yeah you know i think yeah i think you know andy kind of touched on this example of you know their launch in poland but uh i think one of the areas where folks are using aws specifically in this progressive delivery model is is that locality and the location matters and um you know with progressive delivery like blast radius is something that everyone's concerned about like you don't want to impact you know all of your users and you know perry mentioned like you know if you have one api any of this monolith you got to be really careful so what we see from our customers is is that in even in single geographic regions they're using our availability zones which actually encompass multiple data centers and once you start spreading across multiple availability zones even within a region you can you can start distributing your load you can also start building in additional additional layers of resiliency but i think one of the things that's really important on the progressive delivery side is some some metrics that you start looking at and again when we think about regions uh where are my users uh you know am i trying to you know if netflix just launched a gaming service that's in poland you probably don't want to have users you know accessing it you know here on the west coast in california because there's there's high latency so you probably want to look at you know when you do these roll outs start looking at latency look at you know how long it takes for your users to round trip between you know the servers or whatever they're interacting for and that's so that's just one metric but there's many different metrics that people look at and i think that's certainly a really important thing to consider you know as you're doing these deployments and if you want to take it up higher so if you say hey we're doing deployments today you know throughout you know california then you need to expand and you want to hit the the east coast so you have lower latency there so um it's something that you know we see a lot from our customers and that they leverage every day thanks a lot paul um there are two things that actually came out of this conversation which i want to touch upon just in a tad bit but before i do that i do want to bring audience attention to something that's going to be really helpful for them so we at armory are actually working on our progressive delivery feature right now and we are working with some ea partners as i speak so i would love to um you know we would love to have folks on the call if they want to join our design partner program to go and fill out the form that's under the handout section it should be it should actually show up if you go and click on handout under your handouts you can click on either handouts or offers and this will take you to a landing page you can put any information very simple things and then we can get back to you uh and see if uh you are the right fit for the uh early access design partner and we would love to chat with you so i wanted to bring that up and also it would be great if we can get our first poll if you guys don't mind um so can i have the moderator okay submit so you guys should see this poll on the screen do you guys see the poll on the screen awesome yep yeah we have some people who are answering the polling question the question is where are you in your continuous delivery journey you can choose from currently outgrowing existing tooling anticipating outgrowing existing tooling in the next 12 months or satisfied with existing cd tooling so you can go ahead and make your choice and we'll give the audience just a few seconds or so to go ahead and make your choice while that's happening just a quick reminder if you guys have a question for any of our star-studded lineup of panelists here go ahead and uh just put it in the question and answer tab and i'm sure they'll be addressing questions throughout all right guys i'm gonna go ahead and close this polling question here in about five four three two one it's closed and let's take a look at the responses awesome uh thanks a lot uh charlene for helping out with the poll uh yeah it's very interesting um the polls are pretty close we see that uh 38 of the people almost 40 i should say they're saying anticipating outgoing existing tooling in next 12 months um about 34 are satisfied with what they have and about 30 percent are saying that uh they're currently outgrowing existing tooling so what that means is almost close to um 65 to 70 of the people are outgrowing their existing tooling that they're using for cd or um delivery or they will actually outgrow in next 12 months so it's very interesting um and and i think that is a really good segment to my next question uh for you guys it will be great if you can talk about what are the platforms that you guys use or targets you guys use to deploy these applications because i'm sure as the audience are thinking of uh using better tools they might want to think about what are the better ways for us to deploy these applications as we're thinking of uh progressive deliveries so annie let me just circle back and start with you um can you talk about the platforms you guys use at netflix and how can the audience who are not at the netflix scale can think about uh thinking about different platforms and target for their applications sure and so by platforms for progressive delivery netflix uses spinnaker um but you know it's interesting there there are a mixture of other things as well that are used in tandem with spinnaker so i did see a question about feature flags and i think future flags are part of let's say the the subset of like how one could achieve progressive delivery um but yeah and you don't need to use spinnaker ultimately what you need is essentially you know automation that can obviously deploy put something somewhere uh that's the easy part the hard part then is figuring out okay if you really want to do a canary um what kind of traffic you want to route to it there was a question i think somewhere in terms someone was asking like what do you use headers and whatnot it just really depends on what you want to do going back to what paul was talking about in terms of you know latency or customers or whatever it is regions netflix uses a wide variety of let's say strategies there either round robin or you know device id one thing to keep in mind there is uh just as you decide how you're gonna route traffic it it might be good to think through from a randomization standpoint so that you're not always routing the same people to a canary and they're having a degraded experience which we've we've learned through the years uh can happen um and then you need great observability so at netflix uh we have our own telemetry system called atlas uh but going back to everything everyone's talked about if you really want to get a valid signal you've got to produce a valid signal and you've got to know how to interpret that valid signal or that signal um so you need automation it could be like jenkins or spinnaker or something else you need some sort of way to route traffic uh through elbs or some other you know intelligent routing uh service at netflix we use zuul um and then you need uh you need you need great observability or telemetry i should say uh and then the ability to like act on it and hopefully that's plugged back into your automation so earlier someone talked about like you know when there's a when there is there's going to be a let's say a problem or you know a breaking change uh you want to be able to detect it uh and then roll it back or roll forward as soon as possible uh and all of that is you know again incumbent on like great automation and uh great signals or signal processing thanks a lot thanks a lot andy uh matt would would love to hear from you in terms of what what exactly is the target on which you're deploying the applications are you um deploying everything with kubernetes or like you some other targets and you know what's your mechanism of doing delivery progressive deliveries on these targets would love to hear what you guys are doing at playstation or what you have seen in general right yeah so uh we are targeting uh vms hosted in a public cloud uh for for most of our workloads today um and we've had success with canaries um there and feature flags as a way of rolling out new features um most recently we are investing a lot of uh research and development and had some success already deploying to kubernetes and using some of the traffic shaping services everything from dns load balancing services within kubernetes to shape the traffic and direct uh for us mostly api hits to specific services um so those are the the two platforms um and we're also using uh spinnaker to uh develop some of those deployment pipelines as well as some other tools as well to assist us in that journey got it thanks a lot matt perry would would love to understand about your deployment targets and also andy touched upon observing your um running application right like as you're talking about can you guys hear me testing one two three okay perfect um as andy talked about observing the application understanding are they uh you know when you're doing progressive deliveries are the applications uh performing as expected so would love to hear from you how uh at snapchat are you guys doing progressively what's your target and um how are you observing the applications as you are uh rolling out the the um new features and the new versions yeah i think that's good yeah i think like you know the the core thing of progressive delivery is like observability if you can't observe the application there's no way you can achieve progressive delivery other than you're just gonna like guess that five minutes is long enough for you not to get a customer ticket which i may or may not have done earlier in my career but anyway so i think that's the key point is like if your applications are there's two things that need to be standardized you need to standardize on a metric stack if you want to be able to deliver multiple applications using the same systems and observe them in the same way and they also need to have standardized naming of metrics or tagging such that like just switching a tag can make me look at my dependency now and not be anymore and i can see if stuff is going wrong either from automated systems or uh from a dashboard right um and that's kind of like what you know one of the easiest or not really easiest but one of the best ways you can do that is like you know to take over the at least the http metrics or at least the connection metrics and that's kind of where we standardize on the service mesh is that we can see like you know grpc response codes or we can see that you're on h2 and you're throwing 500 that's an error and these things are what we use to actually canary and do ongoing health checks uh both inside of like spinnaker for uh deployments also like with configuration changes right we have our own health check service that does a very similar thing uh to canary and whenever we like roll out like you know a routing change or something that's especially dangerous um that's the main thing i think that actually took a while to get there was getting everyone standardized on metric naming and metric stack because there was just people using stuff all over the place when i first got to snap and so that's the biggest thing that unlocked the ability to do a lot of this stuff is you know is observability and that is a good time series database of some sort whatever flavor you want to pick um and being able to like query that and know how to query it across and repeat it over and over again for different services thanks thanks a lot uh perry uh paul um there's something andy mentioned that was very interesting and i think um since and he's from netflix i'm pretty sure our audience caught that word spinnaker so i also want to get your take uh my understanding um for the sake of the audience is that spinnaker is one way of doing your deployments it's a platform slash tool that helps you deploy application of course armory is has has commercialized spinnaker so thanks a lot andy for mentioning that um spinnaker but what uh we at armory are also doing is um looking at um other ways of doing progressive delivery right that is outside spinnaker so we as a company are also investing heavily in uh looking at companies that don't use spinnaker but still want to um do progressive delivery so paul would love to hear your take in terms of how uh uh people engineers on this call can think about doing progressive delivery even if they don't want you pick spinnaker or just want to get going progressively sure yeah i mean so at aws um we see a lot of different platforms you know we see folks that are using things like flux with kubernetes we see customers that are using argo cd to do delivery and then we're also seeing you know another you know uh project that's starting to gain momentum which is uh cross plane so um so there's a lot of different tools that that customers are using to accomplish this and then even you know even our own tool tooling such as aws code deploy can help with this and their whole code star suite so um with that said i mean it's really dependent on the stack that you want to choose and i think perry mentioned something that's really great is started talking about like standardization and i think what's really important for customers and for engineers to think about is like what tooling are you trying to what tooling do you want to work with what kind of platforms do you want to work with so you know is your stack you know vm based is it container based is it function based is it bare metal you know it's really dependent and then try to align that tooling that best supports that um so you know we have mentioned you know things like spinnaker but i think there's you just really need to dive deep and to determine you know what's going to be the best delivery platform that helps you with your progressive delivery goals and then start making those investments because if if you make the wrong choice it's going to take you a lot of time to to correct that later and then the other thing i think we haven't really touched on too much is is service meshes so um you know i touched on latency a little bit and what we see customers doing is you know when they're doing their deployments against things like kubernetes you know the way that they control access uh to kubernetes via the network is they they can segment you know which users are going to which which network segment within their their specific cluster so i think there's a lot of different ways you can go about this and you know we see customers using things like istio from a service mesh perspective so lots of different options thanks a lot paul i really appreciate the the insight about what you talked about how how do you look at when you're using kubernetes um and also for the audience we just have uh brought up a pop-up that actually again takes you to the same signup page and i want to talk about it just a little bit i think this is a perfect time uh we at armory are also building a solution that can help you do progressive delivery in a very simple yet um useful way and if you click on the pop-up if you sign up i'm happy to talk to you and understand if you are a good fit for the design part-time program that we're running and i just want to take like a few minutes and talk about what we're doing uh we're starting with a simple progressive delivery model where uh we want to meet your developers where they're at so allow them to trigger the deployments but still allow them to control how the new feature is being rolled out to your target um your target environment for example you could specify roll out ten percent of the traffic um of uh of on my existing application to this new version for example if you're launching version two you could say roll out ten percent of my existing traffic to this version two then uh wait for a tad bit you can check the metrics as andy and paul talked about right you can observe the application you can see if that 10 traffic is is looking fine and then you can say roll out 20 or 30 and that way you'll be able to slowly roll out uh the new traffic or what i call it version two um uh slowly but gradually and you can control the blast radius so with that being said i would love to have the uh the next poll so can charlene help me out with an expo thanks a lot charlene so on your screen you will see the next poll that we have uh which talks about what are your current or desired deployment targets i think this will be really good for the panel uh folks to know about since paul from aws is here as well i'm pretty sure he'll be interested in this one okay uh we have on-prem um vms amazon ec2 eks ecs lambda all right charlene how long will this poll be there so that audience know about it i'm going to keep the poll up for a full minute or so let the audience take a look at all of the different answers that are available to them make their make their choices and then in about a minute or so i'll close it karthik you'll see when it has been closed and at that point if you want to discuss it feel free sure i will say do it later actually let me do i want to do my submission as well you know for what it's worth uh you know here on the uh on our side we can actually see the folks coming in so it's really interesting to see you know the the mix um certainly from the aws perspective um you know i i see customers using many different platforms and of course we have customers that are still you know on-prem um but i think i think the major takeaway is like when i think about different compute platforms and that's kind of what i call you know the different compute primitives um you know certainly we're seeing a mix of you know containers functions and virtual machines in bare metal it's the data on our side that we're getting is really cool so i can't wait to dive a little deeper into this karthik um absolutely let's see what the results of the poll are i think it's still open but i see very interesting numbers um oh actually looks like the poll is closed um so i see 13 on-prem so like we had a lot of options so 13 are saying on-prem about twelve percent eks so uh good one paul for you twelve percent eks and then we have about uh eleven percent saying vms i'm just gonna pull out the interesting numbers so it looks like amazon ec2 eks ecs from about 30 and if i include lambda as well so that's about 42 um actually now if i include uh kubernetes that's another 10 so what i see is i see a fair distribution of people deploying at different different targets paul why don't you go ahead i know you wanted to touch upon the different targets that you see at aws right so please go ahead right before you mention that yeah yeah i mean i i think more just um responding to the audience and kind of what we were seeing here so i mean it looks like the majority is certainly on the container side um you know forget you know our respective surfaces but i think that's definitely a trend that i think all of us uh you know at our respective companies i think we're certainly observing you know a big push to move to containers uh and i think it's certainly reflected here so that's certainly interesting and uh you know i think again going back to like the tooling so you know i mentioned a couple things you know whether it's flux or argo you know spinnaker cross plane you know these are all platforms today you know that can deploy uh to get that can deploy to kubernetes i mean we also see other projects like palumi that's in the mix it's also being able to deploy uh to kubernetes so but i think again the major takeaway i'd have is is that when you're evaluating these different delivery platforms you know look at them for the richness that they provide if they're going to meet your requirements and what kind of metrics are they funneling back i mean i certainly know on on the uh on the spinnaker side um when you're when you're doing a deploy and you're looking at you know how's the deployment going you know what's the percentage of you know the different pods that that have been uh that have gone up so i think that's really important as you look at the different tooling yeah i would add on to that because i think it's you know and it's also related there's a there's a question in the q a about like what are the challenges faced in canary deployments when you think about like change not so much in the context of deployments but just change in terms of like technology and companies and looking at new tools there's people process and technology and we spend a lot of time talking about technology like whether using containers bms uh bare metal and then like you're using flux jenkins spinnaker i think it's more important to focus in on like the process because that's that's a hard one to change or actually even harders people uh assuming you're on this crawl like people are interested in this uh the next one is think through the process when it comes to canaries like they are um easier said than done there's like that question is incredible in terms of like you know what are the challenges faced with canaries i think we've talked about a number of them during this call and those are challenges regardless of tooling in terms of like what is the deployment strategy and then what are the metrics how are you going to get them how you're going to like act on those metrics and then you know what do you do when it works well and then more importantly what do you do when it doesn't work uh and every tool or platform out there has different ways of handling it but you should think through that before you even look at a tool and kind of again think about from the business standpoint you've got different businesses represented here and netflix has uh many different let's say key metrics that will uh will leverage four quote-unquote canaries and i imagine every other business is the same so uh really focus on the process and i think if you nail that then you know picking a tool is is actually fairly easy thank you andy and perry i think andy talked about two things are very interesting that you have to not only think about the tool because a lot of people overthink the two but miss the process right of uh making the best out of that tool could you talk about like the different problems you ran into at at snap as you were thinking of not only the tools but the process and also you can talk about a little bit about you know how your deployment pipeline looks like but in the context of um other mid-sized company uh because the audience of course i'm sure have smaller scale than snap so what is the big takeaway that they can take from your learnings at snap that would be great yeah i think like in general like teams will try to be a lot more creative with their deployment strategy than they really need to be and so that's a thing that i would have pushed back more on earlier on if i would have actually probably spent more time thinking through the process of like why did they think they needed to have a integration test in every region before they deploy um things like that that that really kind of complex made it a little bit more complex for people to understand how they should be deploying um i think that was like one thing that i think that if you can like standardize a set of things and like either dog food on your own applications see if it works fine if you're like an infrastructure team like we are we have big applications to test on and then you can like sell that vision to other teams that are smaller or the same size um that's that that's a good thing you can do to start figuring out like what's the process that works like you know do you have post-mortems at your company is there like data you can pull to see like what where why have we had post-mortems what was the root cause like one of the things that we found as like you know almost all the postmortems there was some sort of error rate elevation on their endpoint right like there are somewhere that's not true but in general there's always like something that would uh indicate that something was wrong on the back and that's how we came up with our core metrics for instance like is what andy talked about it's like what actually captures an outage and like sometimes you think a metric is going to capture knowledge but it really is useless and just adds more noise to the problem so finding those key things and you should keep it as small as possible that's like this is something that's broken was something that took a while actually to figure out because we had some metrics that turned out to be kind of duplicated in in what they indicate right so you want to try and pick the right metrics that indicate a problem with different pieces of the system be it latency as well as uh error rate as well as possibly even 200s right because sometimes maybe the metrics you know had a brownout or whatever during the period and you shouldn't allow the a deployment to progress forward and so i think choosing the metric was and getting them to work for everyone was a key part part of getting stuff to be safer and people making less mistakes and then coming up with a way to stamp out these pipelines over and over again was the thing that we worked on this past year was like well we're pretty sure these work and like now we have a system that you just have to like put some config in your repo and now it spits out a spinnaker pipeline that has all these things because now we know that like it will work for all these different services and different uh sort of usage patterns right and that's something that we've really tried to figure out is like how do we stamp these things out but still allow enough flexibility to support you know very complex app deployments that might have 10 micro services that all go out to five different regions and different orders these are things that we're working on with tooling and automation thank you thank you perry uh matt um i would love to understand that when we talk about progressive delivery sometimes uh a lot of people think about oh do i need to do everything that progressive um progressive delivery comes with for example oh i'm actually controlling the blast radius by only sending 10 of my live traffic to the new version that i want to test um and then between sending 10 and then later on sending 20 i want to automate the process of hey after sending 10 of the traffic go and check out this metric if it's if it looks good from my metric provider then move on to the next step and they are trying to automate that right away i know we have been working very closely to take it step by step where what we are trying to do is let's try to do progressive uh delivery first right and then as things are working out for you as you're able to roll out traffic slowly to your new version that you're trying to um release if that works fine then try to add automated uh analysis between those steps so can you talk about how the audience on the call can just take one step at a time and move towards um progressive delivery that means just control the blast radius of this new version that's going out right and even if they have to manually go and check the metric provider for a month do that but like would love to understand you know your take on how they can slowly get started and not think of as a massive effort yeah i think uh tying it back to some of the previous conversation points it comes back to what's your individual service what's your workload what are the metrics that your service is admitting um and how to identify what is a successful deployment i think everyone's journey is going to be a little bit different so i think understanding what makes your service deployment successful how do you capture those metrics and then build progressive delivery around that and the approach that you take to deliver progressive delivery will depend a lot on your specifics i did see an interesting question in the q a which i don't see right now but it was essentially this progressive delivery improve to paraphrase improve the lives of the developers or the operations teams uh more the way i see it is progressive delivery if you have a separate operations team deploying versus development team progressive delivery gives confidence to the develop deployment process and confidence in deployments enables you to deploy more quickly more safely and allows the developers and engineers to deliver features to customers more quickly so i think it it feeds back all the way back to the developer because ultimately with that confidence comes faster iteration ultimately delivering value to your customers more quickly and more safely thanks a lot matt paula and andy would love to get your take as well because you guys come from a lot of experience working at netflix and aws how can the audience sort of like get started you know slowly but surely right towards progress or progressive delivery and not think of it as a massive effort that you know they need to like spend six months to understand first so would love to quickly hear your thoughts and how people can get started on the call go for paul you know i think it's with any technology don't be afraid to start experimenting number one and you know i see a couple questions coming in and uh you know someone's asking you know how do we deploy across uh availability zones regions clusters like how do we even orchestrate this and um you know i think i think you have to start somewhere and you have to begin testing and you have to come up with you know a culture or a methodology on how you want to deploy your apps and you know i think perry had a great answer earlier where he talked about you know if you have a monolithic app it's really difficult but then if you start if you have a service oriented architecture and you have a lot of micro services that you can focus on you can start pushing updates and you can be observing you know what's the user's response what are you seeing there what do you need to do do you need to make any adjustments i think that's really important so i think you know my recommendation would be like focus on minimizing blast radius for deployments think about the tooling different tools that you want to use and we've mentioned several on this call you know jenkins argo flux spinnaker you know there's many others out there in code deploy so but just start experimenting a little bit and you know begin testing and start capturing like kind of understand what your own success criteria is because it's going to be different for each customer and your requirements are going to be different for everybody but just just dive in and then you know the community here you know there's certainly all of us but the progressive delivery community is strong and there's a lot of people that can help you on this journey thanks paul uh andy would love to get your thoughts as well yeah i think it's something that uh perry said earlier um you know maybe just start with health checks like that is your canary like build in automated health checks into your uh services uh and then again back to the process you got to figure out well how will you actually validate the health check worked like is there something you can use to like paying the service and assuming that you know your health check passes then uh the next step you got to figure out is like can i get a little amount of traffic to it and this is you know again predicated on a lot of the the tooling but again the processes you want to think about like uh you know always alive or whatever never down or blue green deployments right um so just think simple in terms of i have one i'll call it a cluster but it's a pot whatever it is a pod cluster one thing has version one and then something else has version two and i'm gonna validate that the health check works on version two and once it passes to paul's point you gotta start somewhere allow some traffic to go to it and that's essentially you know the the the skeleton process for getting kind of canaries going um and of course you want to figure out how to shut down the old one and route all traffic to it and then you can get more and more sophisticated as you build up confidence that hey like we've actually got this working uh and it's a huge journey like netflix is still on that journey we're always learning how to do better canaries but it ultimately started with simple automation that like basically was able to say is this new thing healthy and can we route some traffic to it uh and then you'll quickly find that like the health check is not enough like you need some other metrics and it's it's a learning process but you gotta start somewhere thanks a lot andy um and i would i know we have about 10 minutes left on the webinar i just want to like again bring audience attention that we would love to hear from you if you guys want to um see if you guys are good um a good design partner for us right now we're working on on the progressive delivery solution at armory so we'd love to hear from you so if you go to handouts uh and under handouts if you click on offers or the handout tab uh you can click on uh one of the offers that we have that will take you to a page and there you can actually go ahead and fill up the information and we can get back to you and talk about our design partner program so we'd love to hear from you and see if you would like to enroll in our design partner program in army for progressive delivery and we have one last poll so would love for you guys to publish that thank you so much so the last goal that we have is how long does it take your developer org to push code from commit to production less than an hour one hour to one day one day to one week one week to one month or more than one month and as i read this it is so interesting i think terry said that just today you uh published new version like 300 times or something so uh perry i think uh you would look at this uh this question and uh would would love the audience to hit less than an hour i know i think he said 600 600 yeah it was it was 600 this morning as of when we started the webinar um andy i'm really curious do you think you guys can beat that number like today you launched like maybe thousand changes at netflix yeah yeah it just depends i i was answering another question like it just depends on the the service and the environment someone was asking about i think it was actually in response to perry saying 600 and there was a question how do you do that in a pii you know compliance strict environment and in fact you don't most likely uh that environment at least netflix deploys you know let's say far less frequently than some of our edge services which are deploying you know thousands of times a day uh so uh do you think you can unpack the last point a little bit like why uh for the sake of the audience like why would you not be able to deploy in a pi environment like 1000 times a day or 600 times a day well i i won't say it's impossible uh but for example in some pi environments you need like checks like a human has to actually validate that like this thing is good and so i think they're there and lies your upper bound in terms of like uh can you deploy a thousand times a day i guess if you have you know someone who can sit there and approve an employment uh deployment a thousand times a day or something like that so got it it's very interesting so are you saying that even at a netflix scale like um you would want somebody to make sure that they're looking at um this this release uh if there's anything to do with pi right so well what i'm saying is that by like law like socks compliance like there there are some issues that you you humans need to be involved got it thanks a lot i mean i'm glad you shared that because a lot of times i've spoken to people and they're saying oh if you have to move fast you've got to automate the process and i'm glad that you shared because it's coming from uh the behemoths of the world right um it's okay at times that you have to have a manual process like if it's needed so it's not it's not um you know unseen so thanks a lot for sharing that and now we have interesting um results for the poll so 22 of the audience said that it takes less than an hour so perry you have some competition i don't know maybe like how thousands today and then 30 almost 30 percent uh say one hour to one day and then the rest of the audience are actually i should call this out because this is interesting uh about 25 between one day to one week and the rest is a week to a month or more so uh looking at the poll looks like you have almost uh 70 of the audience or 75 percent of the audience being able to deploy uh code in a week which is actually not bad given the fact that we in silicon valley think everybody moves fast but outside like people are still like catching up uh and on the interesting side almost 53 percent of the audience are able to deploy component within one day so that is that is good to hear that uh as well um i know we got about uh six minutes left and i want to give five minutes uh back to our uh our host uh would love to hear like one liner from you guys anything that you want the audience to take away as they are leaving the seminar and thinking about um continuous delivery or progressive delivery or just canaries right we would love to hear just one liners from you guys all right mandy like let's start andy paul yeah sure one liner uh if your business cannot do progressive delivery uh your competitor will figure it out and crush you so figure it out uh yeah uh focus on minimizing the blast radius uh and progressive delivery will help you quite a bit with that right perry yeah productions delivery is the faster you and safer you can deploy the faster you can beat your competitors or the faster that your developers can develop and actually focus on their job thanks barry matt deliver with confidence with progressive delivery and that'll feed back again to the developers and ultimately the benefit real beneficiaries are the the customers receiving this safe deployment thanks so thanks a lot matt i have a little more cheesy one that if you're thinking of progressive delivery i think to me progressive delivery means armory and i'll leave you guys at that thanks a lot so uh or to my host so she can uh close the closet webinar thanks a lot charlie know what to do all right great thank you so much you guys are awesome i i thoroughly enjoyed our conference your conversation today i love being on the back end and kind of watching the chats come into and and all the questions so awesome awesome stuff um uh i just a a a another plug their armory's got a compliance operations a k a comp ops webinar happening on september 30th at noon so i'm sure you can just go to the armory website check that out am i right cardig up you're on you're on mute that for those of us playing webinar bingo you're on mute has just clicked that one off so okay thanks a lot charlene yes you're right all right great great well just a quick reminder to the audience that uh today's event has been recorded so if you missed any or all of it you will have the opportunity to access it later on demand following the webinar we are going to be sending out an email that does contain a link to access that webinar the webinar is also going to be on the devops.com website so you can always go look for it there just go to devops.com webinars look in the on-demand section it should be right there waiting for you and also a quick uh reminder to anybody who did submit questions we had so many great ones too many for us to uh go through during the webinar so if we didn't get a uh if you didn't get your question answered during the webinar please know that the folks at armory are getting a copy of all the questions that came in and i'm sure that somebody from their organization or one of these fabulous organizations will be more than happy to follow up with you offline to get your question answered okay uh real quick i did mention at the top of the hour we would be doing a drawing for four 25 amazon gift cards so without further ado let's go ahead and do that drawing all right our big winners i'm sorry but no if you're in the webinar you are not eligible i'm so sorry to tell you that so but our first winner today is let's see uh rafael p congratulations raphael our second winner today is uh alan s congratulations alan i love this i love this our third winner today is oh timothy q congratulations timothy and if i mispronounced your name because i believe you are is the french version uh i apologize and our last winner today is uh maria v congratulations maria we're going to be uh uh sending all four of you an email uh to get your amazon gift card over to you so please check your inbox and if you didn't get if you don't see it there just check your spam folder and uh hopefully it will be there uh but uh but god what a great i'm just so bold over at today's conversation it was such a good one uh i do want to thank uh you guys all all five of you each and every one of them you guys are just awesome in your own right so thank you all so much for sharing your talent your expertise and uh and your experience with the audience all great stuff so thanks again two thumbs up on my end and i'm sure the audience gives you thumbs up as well and uh in that vein i also would like to thank the audience for joining me today and uh having such a great time with me this is charlene o'hanlon and i am now signing off have a great day everybody and please whatever you do stay safe cheers everybody thank you bye thank you thank you guys

Info

Channel: DevOpsTV

Views: 102

Rating: 5 out of 5

Keywords: devops.com, devops, devsecops, continuous delivery, microservices, containers, devopstv, Armory, deployments

Id: j-JcFk70rv4

Channel Id: undefined

Length: 59min 28sec (3568 seconds)

Published: Thu Aug 26 2021