Implementing SRE practices: SLI/SLO deep dive - David Blank Edelman - DevOpsDays Tel Aviv 2018

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so good morning so my name is David blank Edelman I believe it or not I kinda don't believe it work for Microsoft where I'm a senior cloud ops advocate and I pay a lot of attention to site reliability engineering stuff s-sorry so let me just sort of get a little bit of a sense let me just switch my thing over here of how many people in this room know what site reliability engineering is okay that's good how many people in this room consider themselves to be sres for a living okay so look around these are the people you want to talk to if I get this wrong so you should ask them you know like so your job is to keep me honest to make sure that I'm gonna say the right stuff but those people that don't know what sre is let's do some like a little bit of a level set just so you understand we're the stuff we're talking about comes from so this is my best definition of what site reliability engineering is it's a big block of text I'll read it to you but there are three important words in this site reliability engineering is an engineering discipline devoted to helping an organization sustainably achieve the appropriate level of reliability in their system services and products if you would like to help me translate that to ev'ry flitter I would very much like that that would be very cool I would like to be able to show it and say I don't know what I don't know this so well but this is what it means here's the words from this definition that I want to highlight the first one is reliability so my first question for you is what's this what is this lag mean please raise your hand well I mean I worked really hard on this slide so I want you know so yes sir yeah holy smokes that is the first time in a good three years that I've given talks like this in which someone has gotten this right this is in fact what happens when a PHP app errors out or here I'll put this on I tend to not put up Java stack traces if I can help it just because it causes people in the audience to twitch a little bit so we'll go back to the PHP one because it's nice nice and soothing so here is the point you can have the most amazing application or system and service in the world you can pour millions billions of shekels and dollars into your app it can do so many things it can walk the dog it can go get your dry cleaning it can you know serve in the are in the in the army it can do all these things but if it's not up it's worthless to everyone and all that money goes downhill so that's why reliability is the thing that we focus on with SRE as one of the primary parts of what you want to make sure you ensure in your systems another important word of this definition is the word appropriate by appropriate I mean it is almost never the case and this is going to be a little bit of heresy that you need a hundred percent reliability or you even want a hundred percent reliability there are certain exceptions that though you know the things that fly in the air you really want them to be a hundred percent reliable the thing that ticks in your test right you want to make sure that that's reliable but you don't need it everywhere else and in fact you don't want to even try and there are many reasons for that one reason is as chances are even if you create the most reliable thing in the world you're going to be using something that has dependencies and I bet those dependencies are not hundred percent reliable or let's assume that I have an application and it is on the is on the web yeah I don't know what the ISPs are like in Israel are they up a hundred percent of the time so there you go right so if they're not and you have poured a lot of money into it and York and the Pete your users and customers can't get to your thing a hundred percent at a time it's not worth it so there and there are lots of other good reasons that we can talk about why a hundred percent is the wrong goal so that's why appropriate is important and the final thing I want to say and will point this out later is sustainably it is the case that you cannot create reliable systems and you'll hear me say this again from people that are burnt out from people that cannot continue to run your systems it's just not possible in fact it's not even desirable it's not even ethical to do this so it's really to come up with a practice that you can sustain not something where we go all out all the time and at the end you just have somebody who is over in the corner who has been in you know has been extra fried so okay so Saeng reliability engineering is trying to deal with this push-pull this this sort of tug-of-war thing that goes on in DevOps as well where on this side you have the people who are responsible for creating the software whose job it is to iterate the software to write more to add features to make it do things and on this side you have the group of people whose job it is to make sure that that software runs reliably in production which means that the less you change it the better and their job is to focus on sustain this sort of operability and keeping stuff running and keeping it keeping it going so you have this tug-of-war because on one hand you have people whose job it is to change things as much as possible on the other hand you have a group of people whose job would be very much helped if things never changed right once it was running so the other thing that I think it's really important to say whenever I talk about site reliability engineering is sign reliability engineering is not the next evolutionary step from DevOps it is not that once we're you know we were once system ins then we're gonna grow up we grew up and we became DevOps and now now someday we hope to mature and become cyber sres it is not the case in my opinion site real a bit reliability engineering is just another parallel track on opera of operations that is attempting to solve similar problems so please don't take it that I think that everybody in this room I hope someday you'll be oh sorry that is not what I hope I hope that you'll take something from essary but I don't think you should I don't think you should expect I grew up and become an SRE the best way I know to talk about this in context is to start with a talk that was given at a conference that I helped start a long time ago called SR econ which is a conference for s eries at that conference the very first talk given on the very first day in the very first time we had it and now it's been had and in several continents around was given by Ben trainer SPLOST then at that point Ben trainer he hadn't married at that point he was the person at Google in 2003 that came up with the notion of site reliability Engineering and in his talk he put up this slide you don't have to read all these things in fact I'm not even gonna let you read all these things we're just gonna focus on some of these just so you have some context here so let's talk about the first three things that I have highlighted here so the first thing is this notion that you should have an SLA for your service which is kind of weird because I don't really talk about SLA is in Google I don't know why he used that term you measure report against that SLA and you use error budgets and we'll talk about whatever budgets are later but here's the general here's the general idea step 1 figure out your system and what in your system you want to measure so you understand whether it is reliable what is your understanding about that and we'll talk about that and that's a site that's a service level indicator what is the indicator for reliability once you have an indicator your job next is to come up with an objective where do I want that indicator to stay or to stay above or below or however you measure it once you have that you have a service level objective and that's exactly what we'll be talking about in this talk congratulations you have these things your next step is to put that SLO into place into your monitoring system so that we can all look at the monitoring system go hey how are we doing oh we're not doing so good oh we're doing great it's important to have a monitoring system that everybody trusts and that is a single source of truth so once it's in your monitoring system then you can do things like this let's say you have a new version you're trying to think about whether you should release it well I can look at my monitoring system and say let's say let's say this thing that I'm trying to run I'll use it I'll use an example to make it easy let's say I want it to be up 90% of the time it's a terrible it's a terrible it's a little but we'll use it so 90% of the time so I look at my monitoring system and I say hey our thing has been up 94 95 percent of the time it's been available 95 percent of the time well now I can say make the decision okay maybe I'll watch a new version maybe it's okay to perturb it maybe it's okay to add a little bit of change to the system or maybe I look at the system and I say hey wait a second our thing has only been working 60 percent at a time what's going on here and I might choose at that moment to make a different decision about what to do when it comes to that new release I might say hey we're not going to release anything new until we figure out what's going on with this so that's the basic choice okay so why are these things important what is the use of what we're gonna learn today how is this helpful to you so the first important thing to understand about SLI is and SLO service level indicators and service level objectives is they provide a common ground in a common language so that everybody in the company everybody in the organization can speak the same language we can all say oh I see this is how we're measuring reliability oh I see we haven't been that reliable oh we have there's no arguments about this there's no need to translate developers understand it stakeholders understand it all the VPS understand it you're speaking the same language and this is tremendously useful though those of us who know about the Tower of Babel know it's not always that useful the other thing that is really good about these things is it allows you to focus on the data you have a conversation that is based on data not hey I think we were up a lot in February or yeah you know it doesn't feel like things are running so good right that's not the sort of conversation you need to have you need to have something that's foot that's on objective data not somebody's feelings about this and the other thing that you get a chance to do is it creates this virtuous cycle you have the ability if everybody is paying attention to reliability in the same way to increase your reliability as we look at and say hey not so good oh it's getting better we know exactly how much it's getting better and we're talking about it in a way that helps it get better there are two things that I want to add as sort of small little asides just so you understand so you don't think I'm promising you something that I'm not step number one SLI is an Esso those are not magic it is not the case that I can walk into an organization have you start doing SLI zenus those like a toxic organization where everybody hates each other and they're all you know they all are carrying like a knife in their boots you know for when they need to use it in the next meeting or you know I mean we've all worked in the organization I hope you haven't but we all worked in situations that are toxic slic so those will not make your environment less toxic they might help some but they are not magic so that's the important thing to understand your company will not automatically become much more profitable your organization might not more people just because you do a slice Nestle OHS but it could help the other thing to realize is it's not entirely clear that this is the only way to tell I just want to be clear about that I don't want to say I am selling you the one true path anybody who tells you that they're walking the one true path well I think you're supposed to kill them I believe right is I believe the expression when you see the book if you see the Buddha on the path but anyway just going back to the previous awesome talk sorry have to have to include it okay so let's talk about slis service level indicators so the thing to understand about SLI is is that we're talking about reliability and reliability means a lot of different things if you if you think about it the first one and everybody thinks about is availability is my service up can I get to it right that's pretty obvious and that's probably the thing that you'll be dealing with most of the time but it's not the only thing and I want to make sure that you have sort of a broad understanding of reliability the other possibility is you might care about latency right there's this notion that slow is the new down have you heard that before right and so if something is too slow you might want to make sure that you're that you have the right level of latency or not sometimes you care about throughput if you're running say a batch processing system or you're running a pipeline you care a lot about throughput and you need to measure that also or maybe you care about coverage how much of the database has I processed how much of the data have I processed that's a big question correctness this is something that that maybe you think about maybe you don't I hope you do but are you doing something to measure did my systems do the right thing when they process the data it's a little hard to it's a little hard to write test for that but it's not impossible it's sometimes it's screwed it's crucially important quality quality is a strange a strange way a strange thing let me explain what quality means there is in where I come from a video streaming site I'm sure they exist here as well right where you go to that site and it says hey here are the movies I recommend you watch and then it allows you to watch a movie yeah something similar here please yes okay so there's Netflix also here awesome great I am actually talking about Netflix wait you have computers Sorry Sorry you can tell that I need to spend more time in Israel just to get the lay of the land so an Netflix way and this is actually where the example comes from in Netflix if they're having a problem with their recommendation engine the thing that puts the on that puts the here's what I think you should watch they don't say hey sorry can't stream can't even get to the website our recommendation engines down we're done right instead what they will do is they will serve data in a degraded fashion where they'll show you last you know who you know that these are the upcoming movies that we're about to show or the these are the most popular movies right and if your recommendations don't show up on the page they don't panic they fix it but they give you the gate degraded information and so quality is that measurement of how what's the fidelity of what I've just delivered to you have I delivered you the entire experience or just part of an experience and you want to measure that if you're dealing with a system for which freshness is important let's say sports scores or the Eurovision contest I know you know about that one that was a great win yes I guess we paid too I pay attention to that in the u.s. I don't know but everybody else but anyway so you might care a lot about freshness and then finally if you're in charge of a storage system or a database system you kind of want to measure if I put a bit in I'm gonna get that bit out again later right that's important and you need to be able to have objectives and indicators around that now let me say the most important thing let people love to take a picture of this slide but if you're gonna take a picture of a slide take a picture of this one which has that on it what I want you to understand is everything that we're talking about here is from the perspective of the customer or the user be that internal via external it does you no good to sit here I'll get out of the way some people would you know anyone use selfie with us it does you no good to measure how hot your CPU is running who cares I don't really care the question is from the customer's perspective am i achieving that reliability because we're doing it for a reason right we're not running machines because we like to walk over and pet them or or pay money to Azure or something like that we do it because we're trying to serve something and serve somebody okay and so all of this has to be measured from that perspective if you take one thing away from this this is a good thing to take away if you want to take a second thing I'm happy to do a picture you know like a you know a commemorative picture and sign your picture later if you want so so here's how a soul eyes are computed just so you understand so this is pretty easy so you might decide to come up with SL eyes and they're basically ratios so an example would be the number of successful HTTP calls over the number of total calls yes make sense or maybe the number of operations are completed in less than 10 milliseconds or the number of full quality responses that we were talking before or the number of Records processed over the number of Records total right it's a ratio so you can probably guess that right it's not that hard so let's do this this this terrible math if we were to take the ratio multiply it by 100 we get a percentage proportion and and we're gonna test your math skills in just a second just so you understand but here's a key thing that I want to say about this it is really crucial to say how and where you're measuring this in your slis okay so for example I might say the nut that for that ratio as measured at my load balancer does that make sense to you that I have to tell you where I am getting in from for us to have a really useful conversation because you may think it's the load balancer and I may think it's the server and we are talking two completely different things because let's say you have a situation where all your packets don't get to your server if I measure all if I if I go to the server and ask it like how you know what how many successful calls I may not know about the ones that fail because I never saw the data right so it's crucial to say that or you measure it to client if you're doing if you're doing some sort of some sort of latency thing or in your server log or determined by the app or something something like that just so you understand it's kind of crucial okay let's do the let's do the terrible math this is how hard simple slis are are you ready can we can we work really hard on this ready I'm gonna show you some complex math let's say we have 50 successful HTTP calls out of 100 HTTP calls are you ready the ratio is 0.5 got it this is how hard we're gonna get I'm pretty sure are you sure okay so if we were to do this multiplication everybody here ready for this stuff you have your calculator out somebody care to tell me how reliable that is yes there we go 50 percent availability that's it that's how hard that is now where do these numbers come from like when you're when you're just picking it when you're trying to decide like where do I go get these numbers to measure well maybe you go look at it at the server maybe it's on the server itself tells you or there's some sort of log collector that pulls it up maybe I ask that have some sort of client library that reports back to a telemetry system maybe it's reported by my application itself maybe the front-end the load balancer tells me these things or maybe I get it from the monitoring or at some you know we are running some sort of monitoring or test infrastructure infrastructure I hope I hope you're running them on our infrastructure otherwise we'll have a second estimate SR we talk after this here's the thing it's your call I'm not going to tell you where you have to do this measurement it's up to you but just understand those trade-offs I understand that situation where if I measure it over here I may not see all the packets right if I measure if I measure it at the server I may not see all the fail failures and that might be crucial oh and the other thing that's down here I'm sorry the tiny thing the bottom says you have to be able to trust your monitoring system for this to work if somebody can walk over and say you know I'm looking at the smarting system and it's lacking the data from Tuesday I think your modern where the system isn't is isn't isn't worth trek right like if you left starting up sorry I don't know if I'm allowed to say it on the stage but it's not worth anything if somebody can say to you I don't trust your monitoring system and this all goes out the window okay so that's a crucial requirement to do these things okay great so now let's do service level objectives the way it helps me to understand service level objectives is to emphasize the last word so I would like to know here in your best American accent you say the word objectives that was that was pretty good it was almost like you've already doing it so I want to appreciate how well you did that you had the nice nasal quality if you like I'll show you my American Hebrew it has a lot of cats in it yeah if you like so service love objectives our objectives what do I want to make sure that is true what is my objective so let's say I'm making an Esso let's let's pretend I'm making an SLO cake let's give you the recipe the first step is what is it that I'm measuring in this particular case it could be HTTP requests it could be storage checks it could be operations in this peculiar case I'm gonna measure HTTP requests as reported by the load balancer that have succeeded yes got that yes okay so that's the first thing what is it we're measuring and then the question is what is the proportion well I'm gonna say 90% again terrible you know if your top nine 90% might be actually fine successful 50% of time can read the data 99.9% of the time returns at 10 milliseconds 90% of time there we go 90% of a ship request is reported by the load balancer have succeeded and the last thing that you need for a successful SLO is some sort of time boundary in the last 10 minute period during the last quarter in the previous rolling 30-day period and in fact s ellos are often done in rolling 30 day things or rolling seven day things as opposed to this month because month days change but sometimes you want them per month especially for financial stuff okay so congratulations your first SLO applause please and the reason why that's so weak is you're like is it really this easy and the answer is yes it can be really this easy so that's why you don't have to you know I'm not gonna show you something that your walk are you going that's I could never do that that's alchemy okay so if you want to make it more complex you can in fact you'll eventually find yourself doing that you'll start doing things like compound ones will you say okay 90% of the reads take place in 10 in 10 milliseconds but 95% did in less than 20 milliseconds you start doing these compound statements where you're trying to model the system in a better way right not just a simple thing or you might say you might do things by percentiles not you know 95th percentile that sort of stuff that that can come later there's no reason why you have to start there but that will eventually show up that will eventually show up when you do this stuff ok so where do we get these s lows where where am I going to get these numbers from how am I going to determine what the right numbers are the first possibility is that maybe I already know that there's this customer expectation that my latency has to be this slow because I have some monitoring that says I start losing business at a if it gets any slower than that or maybe somebody comes to me and says hey I want to use your service and I expect from you a response within 10 milliseconds right sometimes some customers will just tell you and you'll just know what they are and that's great and you can use them in that's fabulous sometimes there's some other data where somebody out there somewhere says I think the number here should be 90% and you're like cool right and you don't know where they get it from or maybe they don't even know where they go from but sometimes somebody will just hand you a number and you could use that number if you wanted to it is possible to use the current state of the system you can say hey we've been running at 90% for the last three months let's use 90 percent the thing that I want to tell you and I want to warn you about doing that it's certainly possible in the absence of any other data it's not bad but if you do do that then sometimes you will box yourself in unnecessarily let's say that you've been running at 95% and so you say oh I think 95 percent is a good SLO and then you realize that you had a not had at just a really good week but it's not usually like that or maybe the people that are expecting data from you don't need it at 95% and you could have more room and slack if you wanted to so just a warning along those lines okay so why don't we do with this stuff in there sorry there's also the concept of something called an error budget and so let's talk a little about error budgets okay so here's our budgets let's assume that we've done that previous exercise and we've come up with three different SL O's one that says hey the system should be only 60 percent available or whatever you want to say this one over here should have be 70% and this one should be 90% the error budget is simply this extra stuff at the end does that make sense it's this extra slack we have it's this extra thing at the end after which I don't care if the system is down because I have already met my objective it's this pool of unreliability that I can draw upon does that make sense if I say it's okay to be up 60% of time and I have been up 62 percent of the time the rest is the American idiom would be gravy but I don't know what you'd say in Hebrew say again you don't have to translate the word gravy for me because I bet it's not that oh yeah the rest discipline is probably not so much so here's the thing this plus having some sort of plan or policy of what to do with this extra stuff makes up an error budget so what do you do well what do you do so the questions I want to ask is what do you do if you exceed your error budgets and what do you do if you exhaust your error budgets and I'm gonna give you the only one true answer that anybody can ever give you and I'm gonna be absolutely honest with you because anybody who tells you anything different than what I'm about to tell you is lying or at least missa missa misinformed the only answer to these two questions are it depends I want to be very clear about that it really does depend on how you want to do things in your environment and what makes sense for you but let me give you a couple of options so let's say you exceed your error budget right well you've successfully been up for more than you expected yay you did it congratulations celebrates break out the break out the you know the wholeness I guess is right you just pour it into like a champagne cup anyway yes right so no but it's good right it's it's I mean this is a good place to be so so what do we do about this well maybe when we might choose to do is release faster we might increase the release cadence that seems perfectly reasonable that's a perfectly reasonable thing to do or maybe we will increase the number of features that we're going to clue when the product sounds great can do that because we can now be a little less reliable we have a little more room to play we can experiment a little bit more we can have more experimental features we can rewrite things and put experiment those things in because now we know that if we perturb the reliability it's okay and we can just simply increase the change in our environment if our if our error budget if we you know exceeded our budgets or here's the thing you can also do nothing you could simply do nothing you could celebrate your thing and do nothing different it's a possibility I just want you to understand that nothing compels you to go do anything right so or if someone suggests you could also go change your SLO and make them more more something if you wanted to but I don't think that's necessary this is it necessarily you might be the right thing to do now let's say you've exhausted your era budget let's say for some reason that slack has gone down and you are less reliable than you would hoped for uh-oh now we might want to do something about it right so we might do things like we might release a little slower because now we have to spend a little bit of time instrumenting our system to find out what happened how did this go wrong what went wrong with this we might want to focus our resources on reliability instrumentation going back and looking at how did this fail in production we might want to spend the time on that and it's probably a good use of your time you might divert engineering resources to look at that you might if you have to push the big red button the thing that says everybody stop the entire company the entire organization stop we have not been reliable enough we are so far down that this is a crisis and we need to pay attention to with all our resources sometimes that happens and it happens in lots of companies that happens in Google you know where they will push the big red button and say sorry no more releases no more anything until we figure out what the heck is happening with our SOS or you might say hey maybe the to go to your point you might maybe our SLI Sonesta those aren't correct maybe they are too strict and maybe I need to loose them that's okay it's perfectly reasonable and then again here's the last thing you could possibly do [Laughter] it says nothing you could choose to do nothing with this you could choose to say this is information take it for someone who's been in couples counseling before one of the things that you learn when you speak to your spouse's that's information what you're telling me I don't necessarily have to do something about it but now I understand it right we can have a long conversation without this afterwards if you're like we'll have a whole open space about couples counseling if you want so okay so great here's the key thing no matter what you do when you when any of these things happen make a plan and follow your plan don't be sitting around on Thursday or Friday and go hey we have you know we have exhaust our error budget what do you think we should do I don't know what do you think we should do I don't know what do you think we should you know you don't want to be doing that right you want to have a plan in place and you want to follow that plan feel free to adjust it later but that's the key thing to understand is when this happens make a plan and file the plan so my question is before I talk a little bit about how to get further into s into s re would you like to ask questions now or would you like to know how I suggest you get deeper into the land of essary are there questions otherwise we'll save the questions too and then we can have a little question time and a little bit of discussion about my marriage whatever you like if you want only only one marriage 27 years yeah so it's true we're great I'm gonna go home and say you know honey I had a great talk and they applauded for our marriage and she's gonna be like why are you saying this on stage but actually she probably won't because it's been like I said 27 years watch what's your secret do nothing yet I would like to suggest humbly that it is just like DevOps is work so so is that really what we're gonna talk about during question the answer is that really because because that's not exactly where I was prepared to go but I can do it how many kids so great this is lovely this is how you know you're in Israel right I have one his name is Elijah he is 12 he has his Bar Mitzvah coming up it's gonna be wonderful I hope so yeah I know so now we're all feeling it's great I love it anyway you have a if you have a question is not about me and my family please I just ask pretty please yeah so the question of the question is when you say make a plan file plan I am saying that is part of your construction of your error budget you need to have a conversation with the stakeholders that say okay let's assume we're doing really great what do we do about it let's assume that we are doing poorly what are we gonna do and you might have as part of that as part of that actual thing that you write down when you write down all these things cuz you better be writing this stuff down you don't want to be thinking like what did we say 60% no no no I'm pretty sure I heard 65% in that meeting you know in that thing you want to have a plan that you have written down and everyone agrees to you know and if you look at the plan six months later and it's stupid and it's like you know you know if I if our SLO falls behind 70% we should run around going help me help me you know like that's your plan then maybe you want to change that I think that's perfectly reasonable yes sir so the question is how do you know when an SLO how do you know when you've got the right SLO essentially and the short answer is that's usually as part of a conversation usually you have this conversation you walk up to a developer like this and you say so tell me you have this great application how much do you think it needs to be up how is it going to fail what is important to you or maybe you come over here to this business this person here who runs the company and you say to this person how you know what is important to you how much money should we spend so it's one of these it's one of these conversations where it depends where your job is to attempt to get an understanding that is communally from the stakeholders of what is the right thing how can you tell when we're running because for example you might look at the dependents seize on that service and say they're all expecting you to be up 70% of the time so that would be one way you could find find what matters to you you could look at drop-offs on punt sales there's lots of ways to determine what's there and you will always be tweaking these things up and down if that makes sense other questions I like coming out here in the audience you guys are your you're a beautiful audience any other questions yes sir so the question is how do you deal with client diversity so what I like about what you're saying is that you're looking at or immediately you're starting to think clients and you're starting to think customers so good for you for thinking that way so the answer the question is I'm gonna get a shirt by the way that just says it depends or I'm gonna work on stage with like a set of adult diapers that just say depends and just wave it in the ear if I have to but the short answer is it depends you might choose to make an SLO there's a compound one that says hey I want to make sure my 95 print 25% I love this is this my 64 my 50th percentile is this and etc so you might choose to make a more complex SLO to better model what's important to you so the question is how many so the question is how many SLI in SLO do you have per application and system and service want to guess what my answer is anybody here anybody here like if I can if I if I can get the entire audience to say it depends - does that help because I can do that here let's try that there you go so the answer the answer is you have to sort of play with it to see what makes sense to you too many and too complex this thing where you can't monitor it and you can't take it can't understand it and you can't understand what the system you're trying to model is does not work I simple the thing that says you know yes/no also does not work right so you have to find the balance that works for you on that system and it depends on the system itself and the complexity of that system I'll take one more actually let's go let's let's do this part here let's talk about how if you decide you want to get into sre including the people that just walked in pleasure to see you you missed an awesome talk these people in the room will tell you so if you want to get into sre here's the way i recommend because people usually ask me how do i get started the first thing I'd say is that there are three lovely books from O'Reilly and just as a disclaimer I ho to edit one of them the one on the end the lemur book is mine and there are really good one of how does Google how did Google do SRA how do they think that other people can do it in some other examples and some more how to do that sort of stuff this is what they did is it kind of how did do it in mind a sort of way that a - book that tries to expand the question you know what's it mean to do essary one of the important things that you have to know about how do companies that are not Google in Amazon do things how do companies that do think that look closer to what you're used to do things and that's where that comes from I think it's a lovely book if you want to wait for the movie I think that's perfectly reasonable sign the copies if you got a copy I'll sign it but that's the best I can I don't know how available it is here I know you can get this online electronically easily so the other thing is again this is an again disclaimer because I had something to do with this as well there's a lovely set of conferences that is taking their plates place on three continents called SEO recon the next one is going to be in Brooklyn New York which i think is a pretty cool place so come visit if you can you know make sure if you come you tell me so we could we can all get together and talk about that time where we were in the room together and I can't talk about this and the other thing that I want to say is how do you personally start for fun down here there's a link to an online course that I wrote for Microsoft which is free you can you can go get that I'll get out of the way for that the other thing is my experience is people typically start because they have a problem or they have some downtime or they have an epiphany they wake up in the middle of the night and they go hey wait a second I should do a sorry but it doesn't usually happen that way usually what happens is they have a bad downtime the other thing you'll need to do is get management support lined up and we can talk about what that means I recommend when you read those books and this is my last line in case in the case you're concerned one of the things that that you should do is read the book critically don't look at them like like you know there's some sort of like the like they're so dari right don't be reading them to each other right as if this is the thing you have that you have to follow right look at them and figure out which makes sense for you in the actual books hang out with other s arees people who have the same problems you do and then maybe you start doing SLO s and error budgets so I love talking about this stuff I'm gonna try to do an open space this thing I hope you want to talk about it tonight I got no dinner plans so if you have any suggestions let me know and you want to go out I'm happy to go out with a bunch of people and talk about this sort of stuff I want to thank you very much for being such a warm and welcoming audience it's great [Applause] [Music]
Info
Channel: DevOpsDays Tel Aviv
Views: 7,036
Rating: 4.9722223 out of 5
Keywords: devopsdays, devopsdays tel aviv
Id: dplGoewF4DA
Channel Id: undefined
Length: 37min 36sec (2256 seconds)
Published: Thu Jan 03 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.