How Netflix Thinks of DevOps

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so you come home long day after your rockstar in your ninja and you sit down on your couch which of course I envision looks something like this and you have a little bit of spare time at the end of the day and you decide what you're gonna do with that spare time you're having a moment and in that moment you have to decide what am I gonna do what am I gonna watch maybe something with superheroes how about that new lawyer show how about that show with superhero lawyers maybe something a little romantic perhaps a comedy number one Bojack fan maybe romantic comedy how about something grand and epic or something epically spooky maybe a political documentary I'm just kidding I know having a prison documenter I was funny twice how about an actual documentary maybe something about retirees powerful women powerful women retirees in their natural habitat so you make your decision whatever that might be you're presented with the play button you've made your choice you sit down you're ready you hit play and this is what you see never yeah this is a fictitious nobody's ever actually seen this before where a debuting it at the conference today now if you have had this unfortunate experience I've long said we should actually change this out for this that way if we're having a bad day we're at least help you make healthy life choices so anyway you're presented with this and you might say to yourself huh is it just me so maybe decide I'll check Twitter maybe Facebook or a couple of those news sites so what's going on at Netflix so I call this next bit a byte about Netflix I used to call it a bit about Netflix but this is at least eight times better I love tech conferences because everybody gets the nerd jokes and I have this experiment so Netflix is a large data and telemetry company that has the byproduct of streaming video over the Internet okay it was another joke it's gonna be a couple more just ask that you keep up with in 2016 Netflix went global Netflix is now available in over 190 countries in the world - those few little gray areas that just aren't quite ready for our brand of entertainment just yet when you talk to Netflix you talk primarily to three different places Netflix has a large infrastructure in Amazon Web Services it's where we do all of our compute and storage a little bit with Akamai UI assets and small assets to help build out that UI that you look at and then Netflix open connect is our purpose-built video CDN any video bits that you see while viewing Netflix come from our CDN a few years ago when we first started out we would stream from one of the big three the lime lights the Akamai is the level threes but over time we found some problems first of all see the ends can be a little expensive especially if you have video files and a lot of them that kind of big its be even a bit more secondly they have requirements around their their business processes and needs and goals that Netflix doesn't have any time they are to place a cash somewhere whether it's you know in data center or an IX or embedded in a network they have to have some kind of profit model associated with the placement of that cash so unless you're on a very large network if you're on a medium size network or a smaller one they can't make as much investment in the caching infrastructure so that means we can't put as much Netflix out there so we started building our own because our goals are to help win those moments of truth that we were talking about earlier and one of the best ways that I can do that beyond amazing content amazing content thank you thank you you're following along very well and very much is that you have a good experience so with the Netflix open connect CDN we can take our caching machines and approach any ISP and give them a stack of caching equipment for free and ask them to put it inside of their network this provides some really nice benefits really for everybody involved you as someone who is consuming Netflix those video bits are now much more close to you and you don't have to go through the drain point at your ISPs network out to the Internet there you know you're not worried about some of those some of those choke points that happen in those areas and typically your communication is simpler in the cleaner your ISP oftentimes has to pay transit costs to talk to other networks while spits stick feel like I'm doing another kind of show they have to pay those transit costs on the Internet and you may have seen a few of those stories to talk about how at least in the US Netflix is Peaks out at about 37% of us Internet bandwidth at peak time that means that 37% of the transit kadjar ISPs are paying for so that you can watch any of those shows if we put those caches inside of their network they get all that transit cost back so reduces it reduces cost for ISPs it makes the experience better for our customers and we win just the story behind the net flicks open connect CDN it's another one of our interesting open source projects everything about the hardware and the software and how we put it together is available at open connect Netflix comm if you're curious about how we chose to run a CDN Netflix is a large micro-services infrastructure it kind of looks like this I'll give those of you taking notes a minute it's a little bit of a complex ever-changing beast I think it's pretty safe to say at this point Netflix is complex enough that there's not one person inside the organization that really understands all of it even if we break it down to just one of the other services and we look for instance at call paths coming off of that service it still kind of looks like this so Netflix is made up of hundreds of micro services there are thousands of daily production changes and I don't mean you know we updated a record in a database your code pushes feature flag changes things that actually change our production environment thousands of those we run tens of thousands of virtual instances inside of Amazon we have hundreds of customer hundreds of thousands of customer interactions were naturally activates every second now I actually need to update my slide hundreds of thousands of customer interactions every second we have millions of customers as of Tuesday when we published those numbers that's now at eighty one point five million global Netflix customers billions of time-series metrics currently we run about two and a half billion time-series metrics every minute that are delivered processed and stored we've strained tens of billions of hours of entertainment to our customers every quarter we do this with tens of operations engineers and no knobs we also don't have anything that is a knock and has cleverly been renamed to something else so nobody calls it in off we don't have data centers anymore Netflix made that transition we started make sure I'm not getting out of myself there we are preview we started that transition in 2010 and six years later we finally finished our transition out of the data center and now we're completely cloud-based so you may say it's really great Dave but I have a question well I'm glad you asked how does Netflix think about DevOps well then the truth is we don't you might say Dave this is a DevOps conference the word DevOps is literally in the name of the conference so I've told you we don't think about DevOps take a few things we do think about we don't build systems that say no to our developers and engineers there is no push schedule there are no push windows there is no crucible to production people have to go through in order to have their code blessed so that it can go out into the production world engineers at Netflix would never see that second half of the screen every engineer and really everyone at networks and Netflix has full access to our production environment there's nothing there to tell them no we don't take the time to build systems or have policies or procedures that prevent people from accessing the production environment so what do we think about we think about freedom and responsibility one of the goals at Netflix is that we want to hire smart people and get out of their way if I hire someone who's good at what they do and they're intelligent they need to have the freedom to make the decisions to solve the problems in the way that they see best and if we've created a lot of artificial constraints and guardrails trying to predict what it is need to do it's done me no good to hire smart people we also look for people that are the kind of people that take responsibility they don't wait for it to be given to them freedom and responsibility is imbalance so you look for people that enjoy that freedom and exercise that freedom and understand the responsibility that comes along with making those decisions and taking that kind of responsibility we don't think about uptime at all costs now if you've ever looked at Twitter when Netflix is down some people think they're going to die so wanted to let you know we've checked nobody ever has now there are some companies or organizations for which uptime that all costs is very important yeah things in healthcare IT finance those kinds of areas down time has a different kind of repercussion not so in Netflix we don't look for uptime at all costs so what do we do we prized the velocity of innovation I want those smart engineers that I've hired to use their freedom to develop new things and new features and new ways of exercising the system and new ways of delighting our customers and Netflix as an organization knows that we are going to trade some amount of uptime to keep that velocity of innovation I am proud to say our uptime frankly is rather good but it is not our first priority our first priority is keeping our engineers doing fun and interesting and exciting things for those millions of customers worldwide we don't do a lot of processes and procedures as I mentioned earlier it's it's difficult to have a fast-moving organization full of people solving new and interesting problems and assuming that someone can build the guardrails that are appropriate to what they're going to be doing can think of all the processes necessary that will keep them safe in the decisions that they're making it's really a very bureaucratic way of thinking of things bureaucracies have a certain function and one of the primary functions of a bureaucracy is to be able to take that bureaucratic machine plug in virtually any cog and get the output that the bureaucracy wants the cogs are only allowed to do certain things it's very prescriptive lots of guardrails a lot of processes a lot of procedures so that any cog can fit in and get the job done that's not what we're doing we work very hard to look for the right kind of people that aren't a bureaucratic machine member we want them to challenge the things that we currently do we want them to have new ideas we want them to try things and if we try to contain them too much that's not going to happen so what do we do we trust the people that we hire that's why we don't have about allowing anyone into production now when I mentioned that to some people there's always this well anybody get into production you know they'll just shut everything down that's happened precisely zero times to us in all the years the production has been open to literally everybody at Netflix we don't do control you may have kind of gotten that idea so far what we do talk about is context I have the privilege of being able to hire or beyond being on the hiring crew for some managers that we bring into Netflix people looking to help you know help engineers do their jobs and one of things I talked about is that managing it Netflix is very different than it is a lot of other organizations in many organizations the job of the manager is to determine what needs to be done figure it out lay it out and put their cogs to work it's not true at Netflix the job the primary job of a manager at Netflix is to make sure the people they work with have a quality and constant flow of context about the business does it ordinary decisions being made in other departments where other people are going and what they're doing so the people they work with can make well-informed decisions exercise their freedom well and keep moving and doing things so we prefer context over control we don't do a lot of required standards we don't have a thou shalt write in this language and use these libraries and this framework in this IDE Netflix has we have a lot of we have a lot of jvm languages running around we have job running around we have scholars running around we have some closure running around we also have Ruby and Python running around we have rust and go running around we have a lot of node running around so we don't have these required standards what we focus on is enabling so as a as a for instance the the teams that are responsible for your interaction and the UI if you access Netflix from a laptop or a desktop something through a browser decided a couple of years ago that they were making Herculean efforts to write their back-end in Java and the front-end in JavaScript said you know we think we'd really get we'd really get some time back and we'd be able to exercise some new advantage and do some new things if we ran our infrastructure on node instead of Java they did not have to go through a research process they did not go through an approval process they did not have to run a crucible to make that decision they decided as a team that this was the best thing to do for the company and they rewrote the portion of the system that they're responsible for and launched it note it's the same when we think about tooling I mentioned earlier at Netflix there are tens of operations engineers that run this whole thing currently the operations engineering organization Netflix is about 70 people interestingly enough the vast majority of those people are software engineers what they do is they focus on writing tools and enabling other software developers to do their job and focus on the things that they're good at focusing and if we hire someone to work in our billing group because they're really good at writing billing and processing code for billing should they have to learn a whole lot about time-series metrics databases should they have to learn about the interesting and twisted path that it takes to get things running in the Amazon Web Services environment no we want them riding willing go and they literally would be what we hired them for so we work and focus on enabling our developers to be able to do exactly that spend the time on the things that we've hired them to do we want to do silos we don't do wells we don't do fences I was rather impressed in my first two months at Netflix I spent time just running around can you tell I'm with tiny bitten gregarious and just rocking over other teams ago so what do you guys do and I was impressed that they take the time to tell me what their team did how they fit into the ecosystem and even more so how they worked with the other teams their dependents and their dependencies in the effects that they had on them and how they work together I thought it was gonna have to go to each one and try to piece it together myself and here were these people that are used to talking to each other across teams doing it for me we also have some of those traditional operational senses you know the operational engineering group does not sit behind a fence over which code is thrown and hopes that it will show up in production so we don't have any fences to throw things over what we do is we focus on making ownership easy I think probably everyone by now is for the view build it you run it we focus on you build it you run it but with that enablement idea again it's one thing to say you know you're gonna you're gonna run your code you're gonna deploy your code you're gonna figure out the operating system or gonna figure out your instances you're going to figure out all of your agas you cluster settings and all your Yale B's and then you leave that thing out in production it'll be fine because that's be fine to the light you're getting paged best of luck see later so we do you build it you run it with a focus on enablement making those kinds of things easy so for instance we have we have a fun tool called spinnaker by the way I mentioned some of our tools vastly all of these we've open sourced you can find it at Netflix github di'ja spinnaker is one of those and spinnaker makes it easy for a developer to get there I term their job putting the Appy thing in the cloudy thing for some reason they don't find that nearly as humorous but we want to make that easy so spinnaker allows them to describe a little pipeline that says I want my code from here to live in these places and I'd like to have these kinds of things run like the smoke Trust has to pass I'd like it to run through the automated canary analysis system and look for any kind of performance regression so they just do that quick description and then all they have to do is publish their code to a repository and the system handles that for them from there if the code smells funny if it doesn't pass tests if there's some issue with the way that it's being built out that'll never make it into production it passes all of its tests it shows up exactly where it's supposed to traffic is managed and now our eighty one and a half million global customers are now talking to that new code base so we focus on making ownership easy we don't do a whole lot of guesses and get indistinct and we try not to fall victim to to the the traditional thinking of well that's the way we've always done it we do focus on data Netflix is an enormous data company I mentioned earlier those two and a half billion time-series metrics that's only our operational data that's not accounting for all the data we have to run all those algorithms that we talked about there's no financial data that's another reality a lot of the decisions really the vast majority that Netflix makes are based on data so a few examples when Netflix started streaming in 2007 Jenny buddy around streaming Netflix in 2007 horrible catalogue at the time I'm sorry about that so we started streaming from our data centers in 2007 in 2008 there was a fire in one of the data centers now for those of you that aren't familiar fires and data centers very incompatible so we had a decision to make you know we're releasing space from someone you decided well delete and we start doing around datacenters we have some really bright people I'm sure we could find a way to do data centers and data center management really really really well but what we concluded is that if our job is to win those moments of truth being a really really really good data center operator didn't really help us do that so that's we made the decision to move to the cloud we look for partners that can do what we call undifferentiated heavy lifting work that needs to be done but for which having it done doesn't bring our customers in a direct benefit there's some people out there on the vendor floor today that Netflix uses their services because we'd much rather have them do it and build it and let us use it so we decided to move to the cloud in 2010 we've done our first bit of streaming 2 devices out there in the world from amazon's cloud and by sometime in late 2015 we finally finished our transition from the data center to the cloud for those of you bad at math that's a long time making that transition is hard but we finally made it other times in which data is helpful we do a lot of looking and digging at what people watch and what they enjoy and what they come back and watch again and the related kinds of things they watch so we were able to produce there was house of cards up there earlier that was a very data-driven decision we had a really good idea about the kind of stories people like the kind of actors they would enjoy the kind of director that would bring that story to life and we took a gamble and we produced it and honestly it's been one of our biggest success stories ever so everything Netflix does is driven on data now one thing that we don't do mentioned earlier we don't have a knock or anything named like a knock this is a this is a picture of the Operations Group at our headquarters in Los Gatos you may notice one thing here that's different than a lot of other organizations in their Operations Group we aren't surrounded by televisions showing us graphs we're a data-driven company decision driven company we have two and a half billion x series metrics if I took a 15 inch MacBook Pro with a Retina screen and I stacked up 90 of them I could almost get one pixel for every time series that we have who's gonna stare at that interesting experiment I suppose we also believe that human beings are entirely too intelligent and frankly entirely too expensive to spend time staring at a screen hoping they'll pick out a problem out of a complex system so we invest in data and we invest in algorithms and we invest in systems that can comb over all of that data very very quickly and let us know when there's a problem and we leave that charge in the organization by this example there aren't TVs with screens and graphs and sounds early I can tell you the person taking the picture it's not like they're behind him we weren't trying to be succeeded so we are data driven operationally yeah and this I work on the team that is responsible for you press play and it works or if it doesn't work that's our work environment we're data driven as well so as I said we don't do DevOps what we do is focus on our culture I've mentioned a few words and phrases that are important in the Netflix culture and that we bandy about and we talk about things like freedom and responsibility context over control even in our hiring process if somebody's going to come in and see if Netflix is a good fit for them the first thing we always tell them to do is to go through our culture deck and make sure that the way you do things and the way you think about things or we think about things is compatible with the way you think about things there have been some people I've had the privilege to interview that were brilliant but they would not have benefit in our culture at all and we value our culture and its benefits so highly that we will pass on people that would have a negative impact on our culture so the result of the Netflix culture really looks a lot like DevOps but the important thing is the focus DevOps is a wonderful result of the healthy culture and healthy thinking if you have problems you can't just hope to take the DevOps cream and rub them all other things and all of your problems will go away surprisingly doesn't work that way now other people have said this to me not you of course it sounds very nice Dave but it won't work where I work and I'm always curious so I have to ask some questions or really why doesn't this thinking or this methodology work and invariably I end up making some blunt statement like this you don't have a DevOps problem you have a culture problem if you want the benefit of DevOps or whatever work whatever the new phrase will be for this change in mentality for the way that we did I team engineering over the years you'd have to address your culture just giving people that ops titles won't fix the problem if you're curious about that that slide deck that talks about some more of these things is about 127 slides right now I think that's you can find that at jobs at Netflix calm now I'll go through all the points of the culture and all the things that we value well I'm glad you asked there's a lot of information about people we'd like to have come play with us at jobs Netflix con have a couple of cohorts with me Brian and Blake are here with me from Netflix we're hanging out at our table out there now we're a little different than some other people you may traditionally sleep conferences we don't have anything to sell you the first you know a month of the service is free to you and literally billions of other people all the software we make we give away at github but we do have some really nice swag that we like to share with people and we love answering questions and talking about what we do how we do things and why we enjoy it so much so please combine channels I'm Dave Han I'm a senior sre on the port team at Netflix that's where you can find me out on the Internet's so our call centers exist in two pieces the question was how much of this culture exists in our call centers and I'll let I'll let Brian holler at me since he knows this information a bit better than I do we currently have I believe 21 call centers around the world close enough these not connect me we're 21 call centers around the world 19 of those are actually outsourced to business partners so the culture there is dictated by what the business partner the business partner believes is most advantageous we have recently opened to Netflix branded and owned customer service centers one in Utah and one in Yokohama Japan that will start doing more and more of our customer service work and our culture is just as important there as it is in Los Cabos sure so the question is how does this whole freedom and responsibility you know set of set of verbs deal with things like PCI audits we do have a small segmented area inside of Amazon where we do all of our payment processing and if you want access to it you just have to go ask so still rather compatible within reasons so that audits are reasonable but not still it still still under the same you know still under the same flag we run all the same tools all of those kinds of things any other questions there one of here yes how does our culture affect our interview process did I hear that correctly I like this question so part of our interview panel we have people in our talent team that actually specialize in feeling out whether or not someone is a culture fit so that's always tested because we really don't like that I do that we bring somebody in and we hire them and we change their lives just to go sorry so it's a very important part we also have a very free-flowing back environment you are expected if you work at Netflix to give people feedback all the time to the point to where people come pull it out of you if you're not providing it we do the same thing in our interview processes and I've said to people doing interviews you know I like you a lot you're great engineer but you don't want this job so very much so even during the interview process very important because if we're gonna bring them in they have to be able to work with people like me so I'll answer the second one first cuz that's easier I mentioned that we're kind of free-flowing on languages and the questions was how are things like templating standards enforced they're handled team by team that team is responsible for that that you build it you wanted idea there are teams that have developed everything from coding standards to template standards to things that work well for that team but they're not pushed throughout the rest of the organization so it's a team by team question then your other one was need to pizza three pizza Amazon kind of thing I'm not familiar with that one at all so if you can tell me what you mean I'd be happy to answer otherwise I eat so sounds great yes - please okay so the average team size at Netflix last I counted was 11 now we do have those teams grouped together for instance the operations engineering team I mentioned is 70 people that's working up amongst I think right now six teams so maybe not intentionally the same but very much the same kind of model we don't have large teams let's see I think Jason's gonna push me off the stage so I will be back at the Netflix table happy to answer more questions if you like intelligent answers Brian and Blake will be there as well thank you all so much [Applause] are you doing
Info
Channel: Coding Tech
Views: 330,273
Rating: 4.9044943 out of 5
Keywords: devops, devops at netflix, devoperations, servers, software development, computing, cloud computing
Id: UTKIT6STSVM
Channel Id: undefined
Length: 29min 33sec (1773 seconds)
Published: Mon Mar 19 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.