Self-healing networks using ansible

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

okay okay it is officially 10:15 on Thursday just make sure you know what date is and where you are you're at the Red Hat summit before we get started I'm going to take and do something that's really important to me is a get a selfie of you guys so hopefully you don't mind being on social media let's see I got most of you here we go all right most important things are done all right everybody is good now you can go no you want to talk about stuff okay all right so today we're going to be talking about self-healing networks with ansible and the basics around what we're going to try and establish for you guys today is we know we're not going to give you the code to go and do that right because we'll talk about why that's the case but what we will give you is some baseline information as to what you should be looking for to be able to automate doing something like self-healing networks so hopefully you can go with us on this journey a little bit as we kind of laid out for you I don't have a clicker so I'm going to be walking in front of you guys like this probably a lot so you're gonna have to get used to me on that so I'm gonna let my buddy here John introduce himself a little bit so hi I'm a John pursued ER I've been with Red Hat I guess for a little over a year now I've been working with ansible for since day one was with Michael behin at when he was writing it I've been from Raleigh I've been attending the trial egg meetings with red hat Plus on for 15 1 15 years now at this point so yeah I'm a big fan of ansible and sort of just landed in this great position of being able to do things with it every day cool cool thank you John and he's being very modest just so you know which is one of the things I love about John John is one of my rock stars on my team he's an architect he's the guy that comes in and lays things down for you so that you guys can be successful in your businesses right and when he says he's done hundreds of automation engagements he's actually being modest I don't say it's more right so I love his modesty but don't don't don't discredit this dude this is really a really smart dude unlike myself who I don't have any of the experience that he has so my name is Walter Bentley I'm the automation practice lead for Red Hat consulting and we'll talk a little bit more about what Red Hat consulting is and what that means I get the privilege of being able to manage dudes like this gentleman and today he can deliver for you guys of the things that you need I've been in Erie for eighteen years I call myself a new action which is a title I made up so you can't use it you can't steal it is copyrighted but I am a New Yorker that now lives in Texas so that's that's how that kind of came about cloud advocate always a believer in ansible always a believer in OpenStack I don't think that I'll ever go away and I am an amateur marksman and a motorcyclist and I do do it expound on the word amateur in a great way but this is some information and you follow me on Twitter if you're interested I have a lot of stuff got on github with ant related to ansible and OpenStack that's something that you're into as well as I have a blog as well that I don't keep updated so don't judge me for that okay so alright so we're going to jump right into the agenda a little bit and in it this thing is in the wrong place kind of I'm figurin yeah I'll send you over they are that's good so this is the kind of agenda we're going to cover today just to kind of give you an idea of what we're going to be looking at we're first going to start identifying some of the things that you should be conscious of when you're trying to do determine Network failures with automation tool like ansible kind of go into how the small guys are doing it of course small guys it's in close right so we're going to review how the Facebook and Google and Microsoft and Netflix actually look at how they solve their network failure problems and then kind of take that information and coincide it to what you can do in ansible without the major investment that those large corporations have right so you can still kind of accomplish the same similar approach but just on a penny a budget so to speak and it last but not least we're going to try and go through some real life examples as - how you can go about trying to do self-healing networks or trying to do self-healing with answerable in general and some of the best practices around that and then at the end just kind of sum it up and kind of show you how we can help you do that if you wanted to extra help all right cool all right thumbs up everybody's good all right all right just want to make sure you paying attention I'm going to do that every once in a while all right so we're going to jump right in here automation approach to identifying network failures so the reason why I love this picture and I use this picture a lot on a lot of blog posts and things that I write write because the idea is is that one of these things are different from the other the idea of well you may say to yourself well why capturing metrics for your network is important now it sounds trivial at first right you would say well of course I would capture networks but that the question is is you know how do you determine when something is really broken in a network right just because you sent a network thing does that mean that if you don't get a response back it's broken no it does not always mean that and then you have to say to yourself well what's considered normal network wise what would be considered normal network behavior for your network the only way that you can ever get to the point when you can determine what is normal in your network is by gathering what we call good state metrics right now not just waiting for a failure to occur but actually gathering those metrics on a normal everyday regular basis is actually going to give you your good state and then there's two statements there that kind of conflict with each other in the first statement is mine and the second one is his right so we don't we're grading to disagree on this and he's going to expound on a little bit more but civil number pings can do the trick where as John says simple number things can always do the trick right so and and what we mean by that is that the idea is is that yes you can do a symbol network ping and get back a good response but in at the same time do you really trust that network ping and what does that network ping really tell you about what just happened in your network John do you want to kind of a bound on how you feel about the absolutely because that was sort of the impetus of the talk was going over some of the smaller companies around that you know that we'll be going through in greater detail that we're putting out papers about how they were going through getting there for stigmas of up time and on their networking finding those really hard to find edge cases and what you would find out is that you can't rely on your networks which telling you you know actually not lying to you basically you know if the networks which may give you back information telling you it may give you back good information but if it's you know the input into the switch is what's actually bad then you're going to get bad information back so you have to not only be able to work with the metrics that the network switches give you but you also have to be able to create metrics from outside of that environment inside like you have to be able to create the network itself from within and that will give you the metrics that you need to move past you know 99.99% uptime and stability so you know network pings in round trip times can do this do the trick for you or for you know 99.99% of the way when you start getting past that you need to be able to hit every route every node and know that each and every interface that you're going through is reliable and that's where you know these this problem came from that's that's what makes it interesting is how do you do that when you can't always trust the network equipment to tell you the truth so we're going to go through a few of the smaller of our smaller friends and how they did it with their large internal networks just believing that what's next yep so now that we understand the idea around well you got metrics are really important I'm not sure exactly why this thing is not moving now technical difficulty there we go so now we're going to go into the topic of defining failure States and again again this may seem very obvious right well I do a ping I don't give a response back that's fair well the reality is is that everyone's failure states will be different right it really it depends on your network topology it depends on how you lay it out it depends on what you consider to be latency in your network because what I may be considered latency for my home network that I have my two daughters and my wife all streaming Netflix at the same time may not be your network latency for your corporate office where you have a five million dollar a three million dollar application that that's making you revenue right so the overall idea here is is that you have to define what your failure stage would be for your network you can't let anyone else to find that for you and the idea again is also seek to be proactive don't wait for total failure right or don't use total failure metrics as your determiner you know you want to be able to start to see as thing go back be able to figure out well if I saw this then the next phase of that is a total failure this should probably should be my benchmark that I set from my failure state make provisions for hiccups and expect them normality and what I mean by that is when you go to kind of create automation to kind of remediated or network failure do build in some logic for hiccups hiccups do happen in the network it doesn't mean it is an ultimate failure it just means it was a hiccup or there may have been an abnormality or result returned but building logic so that it can kind of handle things of that nature John you have thoughts on that I basically do that's really where I'm coming from because you know everyone's failure states will be different what you determine to be a failure will be different from what another business unit or another business determines to be a failure it may be that you have to have you know with your hft firm you need latency above and beyond but you know if you're more about getting actual bandwidth through then you need to make sure that you have all your routes up that they're consistently up so you know failure states whether they are round-trip time or packet loss or any other of the many many many many many many many network metrics that we can measure and throw into some sort of analytics machine to come out with a basis for what are we going to do now that we're in this state we know whether we're in this state is that a failure state if that's a failure say what does that failure state means that failure state mean that we are going to immediately kick off some automated remediations that we are going to escalate it to a human to take a look to make sure that okay well this is sort of on the line we don't really know this is an environment that we're not really using at the moment it's not getting a lot of traffic you know or is this something that needs to just go straight to someone an actual human actor to say this needs to be done before this doesn't meet to be done so you know failure States is a very broad very broad term and it's intended to be and so when we come up with what a failure state is that that's sort of where consulting comes in but we help you figure out what your failure states are why their failure States and how that works into your business practices all right so kind of moving into so now that we've figured out that we need to collect metrics not only determine what our failure States would be the idea now is that we need to be able to analyze those failure States in some sort of real-time fashion to be able to figure out some sort of remediation plan and I know this is a very controversial phrase but failure is not always equal to go fix it every time and what we mean by that is that just because you get a network failure does not always mean that you need to go to remediate it right there may be situations where you may say in big picture it's actually okay that I got that failure and I can maybe we go about and to make it and fix the problem in another way or maybe enable another interface to be able to bypass the problem so all this is meant to say is that you know analysis has to occur as soon as you get an affair you can't not just react to affair you must conduct some sort of analysis on that failure and that fixing the issue may not always be the optimal course of action so that's the basic message result isn't there not sort of what I was touching on earlier is that you know failure state doesn't necessarily mean you know that you're going to move into a fix-it state just yep okay all right we're going to keep it moving I know we're getting to the crux of it all we have to build it up we got to lay out foundation for you right so all the remediation versus remediation so I'm a very tactile Matiu like to define words and make sure we all have clear understanding the meanings as the what is meant by it right so Auto remediation or as the term self-healing is the idea that automation responds to an event or a problem executing actions that can prevent or even fix the problem real-time for you versus remediation which is more human driven but you know basically trying to apply the same type of remedy but the difference being is that it actually has to be invoked by some sort of human right so the idea is is you don't have to rush for order remediation right out to go right I mean that that that can be a difficult thing and you can take baby steps to getting to order remediation but first having a remediation plan first figuring out how you're going to approach fixing your problem is really step one then step two is figuring out how do we use tools like ansible to be able to kind of set up that Auto remediation and using tools like tower to kind of help create the workflow to be able to establish that really that's the you know there are some errors you can catch and deal with and then sometimes you just have to bail out so that's really where we're going with figuring out whether you want to go through an automated process of Derek etching or you know there's nothing that we've done that we can handle this in the moment all right so when are we moving into the area of how some of the small guys are doing it and the reason why we felt this was good is the idea is that we can learn a lot from what these other companies have done right David Estes hundreds if not millions of dollars into trying to troubleshoot these problems that they have faced right and it's not too late attend if they've faced these problems you too may facing problems in your organization and we thought that using their examples and learning from them is a really easy way of being able to pull in some of the information that they've actually figured out use that for yourself to be able to figure out how to solve it for your own problems so one of the first guys we threw in here with Facebook and again at the bottom of the slide here and we can figure out a way of sharing it with you is actually the article that actually goes into a lot of detail to what Facebook did but Facebook actually described that they were facing things called grave failures right and the idea of those were they were problems that were not detected by traditional networking metrics right so they had traditional networking uh monitoring tools in place but despite those monitoring tools being present they were still in coaching on failures that those tools weren't picking up so what they decide to go up and do is they decided to create their own system right called Netan or Matt which basically treats their network like a black box so all it does is focus on the network and it does more than what a traditional networking monitoring tool did in the sense that it not only checked the network and monitor it but it also would troubleshoot the network problem and independently try to remediate it for it for them right so this is at a very high big level of how Facebook kind of went about kind of solving their problem do you have to do this for yourself no obviously not and we're going to go over how you can kind of approach some of the things they did but some examples of changes they did was using UDP instead of TCP right so instead of doing TCP things that use UDP and their logic around that is that UDP was simpler and that they were actually able to embed custom information into the UDP messages that they would use later when they're doing their analysis right so that's definitely something you can steal and use for yourself I definitely thought that was a great use case there they use something they also broke that network down into clusters clusters bidc clusters by region then clusters by global and by those clusters is actually how they set up the net nomad net NORAD sorry to be able to gather analytics and by use of those clusters as far as they actually would determine that they were having a problem another two things that they also use with outlier detection logic and proximity tagging you may say what the heck is that right so outlier detection basically is the idea of being able to applying filters to bad ping returns so kind of rember what I we talked about before how that just because you get a bad thing returner you get some sort of hiccup or anomaly in your network doesn't mean that you're having a total failure right so what they would use what they call is outlier detection which is a logic built-in to say hey if you get a hiccup from from a something in the environment check it again and check it again and check it again and see if every pic if that problem went away if not then it wouldn't be outlier that would be a true failure and then proximity tagging is when they actually put those those tags they actually tag those mexico's GDP requests so that they know exactly what cluster he came back from and then compare that to the DC region or global and then from there determine where the failure could possibly be so yet the did the the way that they it's very interesting article and I do one make sure we get that up because it was sort of coming across these articles odd in succession that sort of stuck this idea for this talk in my head because they were because these well more than just these four but many of these companies were dealing with the same sorts of problems but from different perspectives and with different metrics for what they considered failure and from the Facebook approach what they considered failure were not fully utilizing every route if there was a route that was not being used or not available to be used even if they were still getting traffic through they considered that failure so they were reaching through each and every node to make sure that every pass through that node would be effective and useful at some point because at some point those best routes that were being used will go down so then you need to have something to fall back on and it's preventative measure that saves them from having you know an actual failure the other interesting thing about using the UDP and heat is TCP you know it's dripping out the all the extra CPU for statefulness of teeth of TCP and sort of working in a little bit of statefulness so that they can figure out which parts of the network were acting up and it says they actually have open source something that they call Facebook tracer which written go and you can go take a look at it it's on their github it's an interesting little program that I recommend you go and take a look at if you have some interesting sort of how they're doing these sorts of things they have an open source all of it but you know there's enough for you to get an idea of what they're doing all right so kind of now moving into the Google example whereas they actually also faced a similar problem right there with traditional networking their quote was too many times the device needs are lie or does not tell you the whole picture right so again this is a perfect example of how you know we know Google has millions of positives of compute running to be able to handle their business functions and they have realized that traditional network monitoring just didn't cut the mustard you know they posed the question well you know we can run IP SLA processes on each router right and well that's all of our problem and they realized that still doesn't even solve their problem despite the fact that you know you would think that that would allow them to be able to collect information that they need but it still didn't get them to where they were John so yeah that they were surfacing a similar problem where they you know we're running into having routes that would go down but you wouldn't notice that they were down because they weren't the optimal routes that packets would be taking anyway so until you were actually making sure that you were testing this network from the outside without relying on what the network switches were telling you then you wouldn't know that you had a faulty route and that would only come back to biting you later on when you really really didn't need it to bite you which was when already the optimal routes were down so they took the they were there was one of the earlier articles about how they had approached it from graph theory perspective if you want to get that far down into it they have some very interesting math on that in the in that document um but really what it comes down to is basically making sure that you're going through every possible route on your network making sure that is available and if it's not figuring out why that's the case and pushing through to that failure state of okay well it's a failure State what we do with it all right so you do wanna grab a picture that you're good all right Thank You Man uh I love my oddity so one of the next examples is a Microsoft approach and this one actually I won't even get too much in the detail this one because this is a dissertation that Microsoft basically put together as to what they did again they created something called ping mesh which is meant to be able to troubleshoot large-scale data center network latency measurement and analysis right and again when I say dissertation like it's an if a dissertation article that I encourage you to read if you ever have a hard time sleeping at night or you really just love networking stuff right but the idea is is that they decided that you know they're collecting tens of thousands of terabytes of latency data per day but with this application that they created and and you know it's just amazing to look at how organization that can focus their energy and time on troubleshooting and fixing a problem and and I love this like you said mentioned before about the graphs about how they graph from a server within a rack all the way to the top of a rack all the way across their data centers and they create graphs that a they overlay over top of each other to determine where their network latency is I mean this is like PhD type stuff yeah it is a great paper it has a lot on connection matrices and things like that and get them to the actual graph theory of it in some parts so I wasn't actually falling asleep reading it but I'm also me test oh yeah the other interesting thing about the way Microsoft took this or a purchase problem was that their failure states were different than the previous student we were talking about where the previous two we were talking mostly about failures of routes not being optimized and use whereas Microsoft the biggest thing that they were worried about was latency and that came is not only a network level but also at the application layer so you know they didn't care whether what level of the stack it was that they just knew that you know and you know the to that when you have a program that's acting slowly you have to show that it's not the applet that it's not the network you know and so to do that you have to have the network metrics to do that you have to have the round-trip times you have to have the you know the the the drop statistics all those sorts of things and looking at it approaching it as a black box as we've been saying rather than you know believing what your network is telling you about itself is where they're getting all this data and analyzing it and coming up with those failure states that we're talking about but their failure states are based more around the actual latency as opposed to Route use usable routes and drops packets I think finally we have Netflix who sort of took this and then took it to the next level where instead of just being about network driven remediation or see the remediation around their network whose remediation around okay first we're going to deal with the network problems but the network problems weren't really the real problems real problems were these micro services that we're making use of the network's so what we're going to do is we're going to implement this but what they were they called Winston which was Anna driven diagnostic remediation platform which some of you might also recognize as ansible Tower that where you know you get enough data you get the right kind of data and you can kick off some sort of remediation on the on your infrastructure and the infrastructure that they were dealing with in this case was mostly micro services running on AWS into the creations and you know they didn't really have a problem dropping without you know they didn't really care about latency because they would just drop off to to another region they were more concerned with overall bandwidth things like that you know latency obviously not you know not something to completely disregard but they were more concerned about actual just availability again this goes back to why we were saying you know you need to find you what your own failure state is and that's where we can come in as consultants and help you figure out what your what your failure states would be as a business unit and then how to remediate those when you should read e8 those when which things should be remediating this one thing that I want to point out here is that Winston has a mechanism in it where low level failure states are sent to Tier one support to one support then decides whether to take action on it or not that's something that we built an Ansel tower as well where you know you know not just here one but any tier web support can be sent from a failure state but the notification saying you should look at this and here's a playbook that we can use to remediate that so this is where you know we start to get into the idea of well we have all this data and we have the we have the failure States and we have the possibility to what we can do to fix it but should we and who's going to make that call and so that's you know that's one thing that Ansel tower does very well that's one thing that Winston was made to do and finally the last one I wanted to bring up here was that when they were originally looking at doing this solution you know they came into it looking at the existing technology and deciding that they needed to take what was available but also build on top of it and that works very much into the way that we do think those consultants is we will work with you in your and the within the existing infrastructure that you have so that you have a hybrid approach to whatever you need to get to figure to get to a state where you are working through your failure States if we got to make an official alright so the thing that I'm wondering about all these different approaches and obviously they're different approaches which they're trying to solve least marginally different problems what since you can't trust the switches and the routers always what are the pattern to the typical data collection point he ever talks about that that is something that is hard to talk about because that's the stuff that they will not open source so so and it is the secret sauce and it's just weird to imagine dropping a device on every segment to get a thorough data suite so that's when it becomes cost-benefit of how much cycles are taking these are all things that are very pointed in in these in these cases that they're saying we're trying to find out all this information but we're trying not to also overload the network because if you're looking at the network as a black box then you're going to have to be feeding it extra traffic extraneous traffic that it's not normally supposed to be having just so you can diagnose it's going on right and so either you're going to be need to be need to be you get a Heisenberg effect for you right I mean is your ating unless you're unless you're able to take advantage of some wonderful zero copy buffers or something you're really going to have a problem keeping things out like it's not going to be a sustainable thing so but that's also why I was trying to point out that there are different ways of what they are different parts of network they were looking at for failure because each of their secret sauces do different things based on what they're looking for yeah I was I was kind of hoping there was a broad pattern but it sounds like that the broad pattern is that they find the parts that hurt their business the most and then they test for those so in the case of you know Microsoft its latency in the case of Netflix its availability and so those the things with the secret sauce capsule all right so now we're getting to the part that that we enjoy the most is the topic of try not to boil the ocean with ansible and the idea is is that we don't want you to think that you need to replicate all those things that Microsoft or Google or Facebook did with their custom-made applications with ansible right so you don't want to boil the ocean with ansible but you can use ansible to do some really significant things with Network testing and Don is going to kind of recap that that for us here so yeah apparently the great and terrible thing about ansible is that it will do pretty much whatever you want it to do and I mean that in a very literal way that it's designed so that you can basically put any sort of modular piece of code into it run it as the module gather metrics from it in any language and then use the built-in ansible building ansible analytics tools to come up with States and remediation plans for what you farthest for the data that you've got so a real problem with that though is that you know that sort of sort of tends to start creeping out half the scope of everything so like even though even the subjects of this talk sort of it more I started writing it the more became that less about networks and more about remediating everything like it became less about the the network's but those were just the best example that I had as a way of saying ok well we're going to look at this as how we can help you as a business figure out what your failure states are in terms of tech and then where you need to put a an automated remediation or a human actor to remediate this failure so you know that's that's where consulting comes in we can help you about when you should be using ansible about how you should be using ansible because sometimes in school isn't the best answer for something there our it can do everything but it doesn't but it you know it shouldn't so you know we'd like to think of it as the glue that holds most everything together or can't hold everything together but I can also do most everything so in terms of consulting what we and services what we offer are ways that we can look at you know the white blob white box metrics where we're actually calling the network switches for themselves and then when we stop believing them you know setting up a black box style way of gathering metrics and then taking that collated data and figuring out the state of the network and whether any action needs to be taken from that so you know there are there mechanisms within Tower that allow you to do these sorts of scans on a regular basis or all with a triggered basis and those are also when you also throw a tower into the mix you also allow for multiple levels of your organization to be actual actors in the remediation process so that you can trust a tier 1 or tier 2 support staff member to deal with a problem without giving them full access to everything and Ansel Towers designed specifically around that youth base so that you can prevent or prevent giving full control but allowing them to fix with some discretion what any problems that might come up and that's where ansible gathers that data comes up and says ok I think this is the problem sends it off to tier 1 or tier 2 tier 2 takes a look at it and says danceable you're crazy or no that's right on the money this is what we'll be doing to remediate the problem so that's really where answerable comes in in that a lot of lot of the previous people that are companies that we're talking about had a specific use case centered around networking where they were doing this but we expand this so that you can actually be able to with any layer of your stack and any layer of your application development process and you know we could go and do ten presentations which I'm sure you don't want to sit through um about the specifics of each parts of the stack and the idiosyncrasies of those parts of the stack but we're ansible can step in using the same sorts of logic and what that helps create is sort of a lingua franca that everyone in your development team everyone in your ops team everyone in your DevOps team which we all have now can look at and say okay you know this isn't really where I'm working but I can figure out what they're doing when that is a very valuable way to allow this business units to work together and figure out what you like you know how to work with each other easily cool I covered most of everything on that flow that I wanted to do but I don't know if you have any no no you hit it right on the head you know just simple steps and just the bigger take away for you guys is that you know what we've outlined is just some steps that ansible can handle right is more right but just giving you a general idea as to what you could possibly do it ansible to try to start the process of figuring out how do you order or actually set up automation to be able to detect or remediate network failures so this next slide is a shameless plug I'm gonna give you that warning now it's the shameless plug I have to get the shameless plug out there you know I you know just to kind of keep things going but in all reality we are here for you right if you're not familiar with red hat consulting and what it is we are red hat consulting we will come out we will help you with your business process no we're not staff alright we are there to work with you and to be able to give you the power and the skills and the knowledge to be able to hand off to you so that you can go off and do great things within your organization you'll get a gentleman like Jon Schull betcha door he they won't all look like John just so you know that so don't expect John to show up every time but you know he can come in work with you work with your enterprise architects work the solution architects work with you be used to figure out how do you want to approach a problem design it up for you and get you on your path of being able to set up the automation around that I promise I'm done with that slot no more shameless plug oh I'm sorry doesn't know much shameless plug no I will go into but we are some American experts right the idea is is that this is what John does every day its deal with ansible deal with real-life problems that organizations are facing and helping them solve them and that's something that is hard to find out in the world and so definitely use us in for the sense that we are willing to mentor you not just come in and do something and walk away right we're there to help support you throughout this process all right I'm done my shameless plug is over so one to say thank you very much for showing up today we really appreciate your participation I do want to open up for questions if anyone else has any questions I promise I will not answer them I will let John answer them but you know most importantly I do appreciate your time this morning and thank you again so if you have any questions this is the time oh we got a question hi John Chris tomorrow from Excel oh so you doing he didn't even address me at all so that I could know bozo like an everyday you kids I know you were touching originally on the you know the great failures I can't remember effect yeah who were who that was those experiencing the great failures begin has a Facebook yes Facebook yeah so um typically when you come in an organization I mean obviously there's various layers that you have to abstract from each other to break things out no I know with our environment you know we have multiple layers where being where shop we have districts is very prevalent in our network so there's a lot of layers there typically what do you find when you walk in or into an organization and you can just you know address either using ansible or some other product how you typically find yourself abstracting those various layers I mean do you take a Khepera level approach do you like say okay we're going to run the network the base foundation work our way up or do some top-down application down what do you do well you have to understand you know not only the pieces but the actual puzzle itself so you know there's not a real you ideally you will want to understand each part and how each part fits together and then move from there that's not always possible because in many cases business units under the students aren't engaged in the engagement but you work with the information that you have so in that in that case you know it's I do tend to get a little nosy if I'm seeing something in part of someone's stack that seems a little off but it's coming from another business unit I'm you know I'll try and get permission to go and talk to them and see what's up because it seems like something from their side is throwing something on the side that I'm trying to optimize how off but it's it's never a cut and dry thing that's really one of the big things I was trying to get to get across from this a bit not only will what if I were to come into your organization and look at the different verticals and areas of who's running what and how you know each one will have different goals that are reaching towards an overall different goal from an every other company that I've been to so having that context being on-site and sitting down getting input from the stakeholders actually running through these examples in running through examples of you know runs that find that where your problem areas are that's where we end up the dressing those just taking like a step by step approach with what becomes visible I know it in our network sometimes it's not always visible so we kind of go by our gut and sometimes that works and sometimes it's you know it takes a lot of times a frico down and that is that is a you know one of the tried-and-true methods is well you know I feel like I've seen this error before you know every because every every every piece of software has its own EDS idiosyncrasies so you have to at some point you know you'll see an error that you saw five years ago and you just hope that you remember how you you solved it but you know that tends to be the way that we break it down great thank you any other questions any one of the questions anybody have any other questions they want to throw out there I mean we're here for you so we're not running away

Info

Channel: Red Hat Summit

Views: 4,591

Rating: 4.609756 out of 5

Keywords: Ansible, Consulting

Id: ZqvTtFnBV-c

Channel Id: undefined

Length: 40min 47sec (2447 seconds)

Published: Mon May 15 2017