Network Automation: past, present, and future

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

starting with a panel session in the next two minutes talking about network automation but more than network automation the team that's coming up as the panelists are gonna talk about some of the experiences they've had as well through their careers growing up and being in this part of the business and hopefully giving you all some insight as to what to do and what not to do as part of your career path going forward so without further ado I want to go ahead and introduce Scott who's gonna basically run the panel and I'll just sit on the side and moderate appreciate it thanks hi everyone how's everyone doing today good all right I can tell that the lunch has taken a toll on you all right I must warn you that if I catch you sleeping I'm gonna call you out so just just be aware all right well I won't be up here for very long I'm gonna very quickly transition into the panel because they're the real stars here I have just been invited to kind of moderate and guide a little bit my name is Scott Lowe and I really really honored to to be able to moderate this a great group of individuals here a lot of them you probably recognize their names or may have met them in the past they've all made massive contributions in the areas of Network automation and network programmability very quickly I'll just introduce who you have on the panel this afternoon and then each of them will give you a more in-depth kind of review of their background on their bio that sort of thing so we have Kirk Byers with Twin Bridges Technology David Barroso with fastly Jeremy stretch with digital ocean Jathan McCallum with Dropbox almost almost lost that and then Mira you linked with CloudFlare and each of them is going to be discussing a particular aspect of network automation we have some topics on how you get started with Network automation how do you you know kind of get going we'll be looking at various components that you might be leveraging within Network automation from Python libraries to sources of truth back in databases that you will leverage in order to do automation as well as some other configuration management tools that are gonna be discussed so there's a nice range of topics and you've got a great level of expertise here to share that information with you we are gonna save some time at the end for questions so I would just ask that you hold your questions until the end if you don't mind and there are microphones nearby so towards the end we'll open it up to Q&A from the audience and then you just go to the microphones ask your question and one or more of the panelists will respond and try and answer your question for you so to get things started and kick us off I'd like to introduce Kirk Kirk take it away all right so my name is Kirk Byers I am a longtime network engineer can we go on to the already screwed up so gonna go I'm a longtime network engineer I run a lot of Python courses I run ansible courses I'm a CCIE and routing and switching Emma riotous which means I'm old and I don't want to take the written anymore so I also work a lot in open-source I sort of the main creator of the net Mikko Python library and I work quite a bit on the Napalm team and I also occasionally run the SF Network automation Meetup so today I'm gonna talk about a couple of points on Network automation I'm going to talk about getting started and I'm gonna sort of invert this because I thought it would be fun I'm gonna invert it and say how we could fail now I'm not trying to hammer on negatives because it's very easy to be negative and it's hard to do a lot of things but I just thought it would be fun and it would probably illustrate some anti patterns and we can look for positive patterns and at the end you know I'm hopefully going to provide some context of resources you can use and places you can go to you know actually more proactively get better whatever stage you're at so let's dive in and talk about how if we're getting started in Network automation how and what are some ways we can really mess this up so one pattern that I see is people early on they try to start with high-risk and difficult problems so as opposed to looking for easy wins that give you a lot of value they look for something that's really hard really challenging and has a lot of risk that you can really cause some serious mistakes so just to pull one out randomly let's say we wanted to try to programmatically program routes on the router for whatever reason we wanted to pick that problem to solve now that's a hard problem I don't really want to do that problem I don't really want to abandon routing protocols and say I'm gonna go and dynamically program routes into my router so there's much easier winds that you can get instead of that in the kind of patterns you want to start to look for are what are low risk but time consuming so if you think about it operational actions are clearly going to be easier like information gathering it's clearly gonna be easier than config and then even in the set of config there's going to be certain changes that are much easier than others there's gonna be changes that are a lot higher risk than other changes so definitely early on you want to be focusing on low-risk easy big gains now as across time you can start expanding and building on more and more things but don't try to do that at the beginning so another pattern that I see that it's problematic is this all-or-nothing mindset where you sort of think in order to do automation I have to do everything that it's an all-or-nothing either I automate my entire environment or I don't do anything and this is really the wrong mindset to be in you need to be looking for small parts you have some big process you're doing maybe it's 20 steps automate like steps 3 7 9 and across time keep adding on adding on to it but don't think of it as oh everything's ought to be automated or nothing is look for these small incremental gains and then build on what you have another anti-pattern I see and I fallen into this as well as you're trying to reinvent everything yourself there's a lot of amazing intelligent people that have come before us they've built a lot of amazing things try to reuse the things they've already done don't try to go and rebuild what's come before you if you can now sometimes there's cases you get into where what exists before you doesn't solve your problem very well and then you can build something new but especially early on when you're coming to automation this is not the problem you typically have so spend that extra 30 minutes looking around for oh what are the solutions available that do X can I reuse one of that does that use case solve 75% of my problem or not another pattern that I see that's problematic is and this gets into how much programming do you need to know but the superficially copying things without much comprehension so I think you're gonna have a lot of trouble and network automation if you don't know some amount of programming now you can say I don't want to do anything with automation you can be over there in that spectrum and you can have a long and successful career but if you're gonna be an automation I think you're gonna need to know some amount of programming and we could talk about you know how much programming you need to know but you're gonna be really really in a bad situation you're not gonna have a core tool available to yourself that you're gonna need to be successful and I would say this applies if you're using frameworks as well so I'm a lot more familiar with ansible and then salt but if you're using ansible if you don't know some amount of general-purpose programming you're gonna have some serious difficulties with ansible ansible has two programming languages in bed in it it has its own and it has Jinja too and in many ways they're very obscure at least from a syntax perspective so I think it's probably not naive to think that you're going to get away from programming at all even using some of these frameworks what other anti patterns do we see so failure to learn good debugging processes so in my head I like to word this always have running code this is probably the most common pattern that I have I try to make the gap between when I make changes and when I execute and when I fix it to get back to running code very small you want a small feedback loop if you go and write a big amount of stuff and then have to fix all the problems to get it running again that's problematic that's pretty hard I would also add figure out with whatever you're doing how to get a good feedback loop maybe this is just print statements maybe it's logging messages certain contexts it's actually harder to get information out of your system figure out how to get a good feedback loop build upon that another anti pattern that I see is over engine engineering the solution so really ask yourself how much does performance matter here how much does this other aspect matter here a lot of times they don't don't solve problems you don't have you know like don't solve problems that don't exist you can really add a huge amount of complexity that is unnecessary in doing that ok so a few other anti-patterns this is a little bit similar to a previous one whenever you learn something new try to apply it as quick as you can if you want to retain it if you want to make it useful in your life then you need to apply it find something small maybe it's even tangental to what you're doing apply it in that context another anti-pattern is this being too busy to automate so if you have no time in your work schedule if you're a hundred and twenty percent booked you're not going to get any extra time to think about how can I make my system how can I make my processes better across time I've noticed myself fall into this pattern and it's not a good pattern to be in where you think to yourself you know I should really spend the two hours here go and add this automation and this thing would be done but then you think I don't have the two hours I just have to get that done and that's going to happen from time to time but if that's your constant pattern that's always happening then you're gonna go a year two years three years down the road and you're gonna be in the same exact boat you're in today you're gonna be doing the same exact operational tasks you're gonna be solving the same exact problems now these start to get into these last two start to get into little longer term items so fail to learn how to reuse your code and this is a more definitely a more intermediate skill here but programming has this amazing capability you can build these artifacts that you can reuse later so yes you can solve that particular problem that you're solving but you can also build a set of artifacts that can help you solve the next problem and the next problem and you need to learn how to take advantage of some of these capabilities that any given language any given programming language allows you to do that then a similar items is this failure to use available developer tools so as you start to do more and more automation you're gonna need to add things like git and code quality checkers and unit testing and CI tools these things add some amazing capabilities and you eventually need to become familiar with them alright so that was some ideas on you know from a personal perspective how do people fail now I'm gonna talk very briefly about devices I mean some characteristics of devices in the network automation context so one problem and I've done this as much as most people is this problem of we make everything different we make our configs different we make our vendors platforms os is different we make the number of features we're using different this is gonna make automation harder now sometimes you just have to solve the problem you have to solve but at least have in the back of your head is this problem you know this variation that I'm adding is it worth this difficulty that I'm going to introduce the second point here is we as purchasers of hardware we need to buy equipment that has a good usable API it needs to have a hundred percent feature parity with the CLI it needs to we need to basically get out of screen scraping I've worked on a screen scraping library I'm done I don't want to really maintain it for the next 20 years we need to move on and then a few other minor points so well not necessarily minor but a few other points we need hardware that has a good commit mechanism we need commit candidate configs rollbacks diffs we need good mechanics for how to programmatically make configuration changes on a device and then we need virtual devices for testing and the bottom one is I think you're gonna have trouble in automation at a given organization if you don't have buy-in from the management and from a lot of people you can do a certain amount yes but I think you're gonna struggle if you don't have a large larger momentum behind yourself and with that I will transition back to Scott all right thanks Kirk that's great good job alright so up next following Kirk's presentation we're gonna have David come up and he's going to be talking about napalm so I take it right here you go hello that talk was so depressing Kirk so my name is David Caruso I'm a systems engineer at fastly before that I was network engineer at Spotify which is where the parrot I'm gonna be talking about actually started so anyone who is familiar with naval here raise your hands oh that's a lot of people right so napalm starts for Network automation and permeability abstraction layer with multi vendor support it's basically a Python library that tries to abstract how to operate your networks I mean caracals mention in for example that if you have like multiple different vendors like they operate differently so Nathan tries to solve that problem like you get a set of methods that manipulate configuration that get operational data and no matter which winner you're dealing with those methods are going to behave consistently and even supports many vendors vendors like well operating system like iOS iOS XE X are you know us for TOS glory boosts there are a lot of different mentors it's integrated with Sol stack there are also integrations for stock storm initial T uncivil so you can use it with your own code or with any existing framework framework out there and it's used in many large networks like fastly links cloud for detail oceans and yeah there are many others companies using it out there but their legal team didn't allow me to put them here so why napalm so if you have try to automate a network that has multiple vendors you would you prolly ended with this spaghetti code you can see on the left of the screen would you say like okay if my network rating system is this one I have to do this if is this other one I have to dis complete different things so it doesn't really scale you keep writing code just to deal with all the vendor differences instead of trying to solve the problem that you wanted to solve and the very beginning so this is where Nathan comes handy like you just solve the point that you want to solve because Nathan is going to take care of all the particularities all the vendors behind the scenes for you here's an example where we are just getting data from the devices we just you can see here that I'm doing it for a Juniper device and for an iOS device and you don't know what's happening behind the scenes that you don't really cared you just want to know certain data from the device you say okay give me this data and you get not only the data but you get it normalized like for example Juniper might give you the uptime in in seconds since a book for example while EOS might give it to you like in a formatted time like twelve months five minutes whatever so Mabel not only takes care of getting the data but in normalizing it for you it's the same if you want to change configuration of the device instead of using PI C to the plotter configuration on the device if you're doing you're juicing Juno's you just use maple and it will take courage of the configuration behind-the-scenes similar for years instead of using p api you just use maple and the same code that you would run for one device it works on the other one note though that the configuration has to be seen tactically correct for the particular vendor you're dealing with now we also have open config support and by this what I mean is that we can actually parse a device configuration and return an open config object so a lot of vendors keep talking that yeah now we have open comfy support but that's pretty much marketing you probably have in your network like iOS 12 pretty sure that's pretty common here and that obviously doesn't have any sort of open config support so Nathan can take care of that just parse the native configuration and give you an object you can also do also the opposite like you create an open config object where you put all your data like in this example we're just creating an interface that in the description them to you configure an IP and then it can give you the corresponding native configuration for your device so in summary Napalm aims to allow you to focus on what you want to do not on how to it it brings also open config support now to all those vendor that claim to have it but it's just pure marketing or to all those all vendors that don't have it and a neighbor we don't want to pick sides there is this war between oh should I use ansible she they use so like I personally don't care just whatever you want like we integrate with anyone that wants to integrate with us and this is all I had great thanks David I'll take that all right awesome good information there on napalm and I'm sure that's something that you guys will want to dig into afterwards all right coming up next we have Jeremy from digitalocean Jeremy here you go take it away sir thank you hi folks my name is Jeremy stretch I'm a senior network developer digitalocean what that means is I have one foot in the traditional network engineering and another in I guess what you might call DevOps which is basically developing applications in support of network operations specifically you might know me as the lead maintainer of the net box open source project which I'll talk about a little bit today you might also recognize right now is my name from pack of life dotnet which is a blog I used to maintain so what is IP address management that's what I'm gonna be talking about today typically when we talk about IP address management we're talking about number spaces on the network so not just IP addresses but prefixes VLANs route distinguish things of that nature in practice when we talk about when we say I Pam what we're really saying is anything that's important as a number or as an attribute of the network so sometimes we'll extend this to me you might also see that the acronym DISA more data center infrastructure management and that entails things like physical connections geographic locations were sites Iraq elevations and things of that nature but for the purposes of what I'm talking about today I Pam just kind of means all of that an IBM database should function as the author the authoritative registry in your network what that means is anytime you need to know something about the network this is where you should be going we're very guilty across the industry through for various reasons whether it be organic growth or acquisitions or what not for having many different sources from which we pull information so you might have some sites that exist in a series of legacy spreadsheets others might exist in the database and it kind of depends where we should go ideally you want to have that all on a centralized location a central authority to everyone that everyone can can both populate and reference the form in IBM system takes really depends on your organization what makes the sense for you the most popular solutions are obviously there are commercial and open source purpose-built applications other people take it upon themselves to build things in-house and I'm sure a great now can I get a quick show of hands how many people still maintain either part or all of their IP addressing in spreadsheets it's a it's a safe space you can omit it it's okay all right not that many that's good that's good a lot of people who have commented to me that I've gone over to net box I've said you know we're coming from spreadsheets this is actually enabling us to come from spreadsheets and like that's pretty crazy because there's been IBM tools out there for many many years of varying degrees of implementing of adoption and maintenance of course the final solution is also nothing some people are in a small enough environment where they just keep everything in their heads and that obviously doesn't scale very well the link at the bottom here is a Wikipedia page and go there it has an entire list of I think just about all the known ipam applications that are out there is probably close to 60 or 70 of them in the list a digitalocean we built our own IBM system I'll take a little bit of time to talk about why we did that like most companies that were very young this is I've him do about three years now when I started everything was the one spreadsheet so I'll admit it we started reevaluating the approach as we began growing more and more quickly obviously and then they simply don't scale as we got more and more people on the team became more more problematic to maintain these things we looked at the market and we looked at what was out there and obviously there's two outs you can go there's open source and there's commercial what we found among open source solutions or options were some some limitations including lack of ipv6 support so some of these applications that had been developed many many years in the past and the author only really implemented ipv4 because that's all they needed at the time and then it fell out of maintenance it fell it fell it became out of maintain so that's why I b6 and that really got added V the lack of year of support also could be explained by many organizations especially some years back simply didn't need fear of support that had no need for other networks I didn't have the time allocate to development just understandable many one of our one of our requirements internally was to have something to track physical locations again that's kind of that decent concept that I talked about and many IPM applications simply don't have that feature set it's just not something of what they offer which is again totally understandable and of course one of the most common problems in open source is the lack of maintenance it sometimes things simply fall out of maintance because the primary developer was commonly changes jobs or has a life change and they simply don't have the time to to accommodate development anymore on the commercial side we have some different set of problems the foremost of course is turns out on the commercial side people often want you to pay for things this not that I have anything against the commercial software but the licensing structure of some products didn't make sense for us we're a cloud hosting provider as such we have a huge amount of IP a public IP space as I'm sure many of you do and the service provider space unfortunately when a product is licensed by volume they try to cap the vendor wants to capture the value that provide that it's provided by the application which is perfectly fair however when you're a cloud provider you have a huge amount of IP address space but that doesn't tip that doesn't necessarily mean it provides direct value to you when we allocate things we're doing it typically like a slash 20 or slash 19 at a time and we're managing that one prefix so it's a huge amount of space but it's only one prefix and the and some vendors want to still want to charge you based on the aggregate amount of IP space that you're managing so they didn't make any sense to us another common common thing was you have some commercial applications bundle all their features with their products as well which is great if you're a more on the enterprise side you need things like DHCP and DNS but from the service provider slash hosting side those were extraneous so in that case we're paying for something we don't necessarily need and probably the most concerning bit for us at least was we didn't have a guarantee opportunity to to expand the features that we need so if there's something in the product that we need we wouldn't necessarily be able to do that we can ask very nicely and they come to whoever makes the software but there's no guarantee we'll actually get what we need to move forward so so our solution was this net box I started developing this internally late 2015 and it was open sourced in June 2016 I know it's I've been blown away by the rate of adoption that we've seen so far it's a django application it's built on the django Python framework it's a web application has a REST API I won't talk too much about it but I'm gonna give you an idea of what it is that box was developed internally to be our source of truth you probably hear that phrase come up more and more often J --then we'll talk about it as well my I think that I always talk about whenever someone asked me about what do you mean by suppose the truth is every network has a desired state in an operational state very very rarely will they ever be the same what by that I mean the desired state is what you want your network to look like the operational state is what your network actually looks like the important thing to the truck here is what what do you do when they're different right obviously they're like I said they're practically never gonna be the same but what do you do when they're different that you need to have some authoritative source of truth that you can reference to say no no no this is how it's supposed to be what what's here right now is wrong because of that maintaining the integrity of your IPM data is crucial this is going to be the top of a horror the hierarchy that defines where everything should be what should exist on the network so if you go to a set up a source of truth you need obviously get data in there how do you how do you put data into this let's say you you find an IPM solution that works for you you set it up and you say okay now what well you have all this you have all this operational data from the network as it exists now you have some documentation to I'm sure and you have your existing spreadsheets or a legacy application that you're migrating from how do you get data into it there's a few different ways depending on the application netbox for one will will support a CSV import this is great if you're moving from spreadsheets because it can more or less just take CSV and and import it directly to objects on the database other common approaches are to use the REST API this can be great especially if you have this capability for web hooks or another API in your existing tool you can program things directly through the command-line shell it's a very very powerful tool if you if you sit down and learn like the Django ORM that nut box employees you can actually do immense heavy lifting it was just a few lines of code or you could directly did minute and manipulate the database I advise people against this because whether it's net box or a different tool you don't really know the assumptions that the the assumptions or the validation that the developer has in place necessarily and you risk corrupting the data if you're manipulating sequel directly the biggest thing here is don't blindly copy data from the network into the database again this goes back to operational versus desired state just because something exists on the network doesn't mean it's supposed to be there on the network here's an example this is a 4 by 10 gig lag between two switches there's nothing wrong with this configuration this will work you'll get 40 gigs of throughput through this however if you were to take lldp data from these switches and import it blindly into a database you've now corrupted your database you said welcome it's not a big deal but what if in the future months from now you say ok I want to change this to a two independent layer three all two of them working to break working with the assumption that they're that they're connected and in sequence obviously it looks like two of them once they've gotten swapped it's funny because after deploying netbox and implement and using LTP validation on our network this is a very very common problem that I've seen is that we've someone whoever and the data center team was installing this stuff swapped links and nobody ever noticed because nobody actually went back to check because it was all working why does this fit in with the theme of this talk today along the lines of automation well if you're going to deploy napalm if you're going to use configuration templates if you're going to script out automate things that data has to live somewhere an IPM is the place is a great place for it to live not all of it but you can get most of it in there what goes on in IBM is things like device IP addresses what platform they use what napalm driver to use right napalm is an excellent tool but to connect to the device you have to know what it speaks raise a Juniper device is it this is good advice you can put that in your I Pam when you render it device configurations that data has to come from somewhere ideally you can pull it from your IP address management system you can look at net box or whatever your chosen solution is and say okay this interface is connected this other interface this is the IP addresses in the VLANs and use and you can use it to validate operational State against desired State for example you can't Nate net box will act as an napalm proxy which allows it to go talk by visiting a clicking a link nape net box will reach out to a device pull LDP data from that device and compare it against the database and highlight any inconsistencies that's it and that's one example of using a web interface if you script that out or do something through the API you can do that for an entire site at once maybe daily if you want that's the kind of gains that you get by having that data available you just have to do the heavy lifting on the front to get that I Pama once you have a fully reliable source of truth these are the think kinds of things you can do real quickly REST API is a great one to leverage for example in that box you pull up the API for specific prefix you make a post request to the available IP some point it will automatically provision a new IP address within that prefix for you so in summary real quick pick pick an IBM solution that meets your needs and your budget always protect your service the truth always validate that data before you put it in there make sure there should always be a human at some point that looks at the data before it goes into a system and leverage API where you can to tie in not just the devices and just for not just for scripting purposes but also to other existing systems that you might have okay all right excellent Thank You Jeremy lots of great information there and continuing on the theme of the importance of having a network search the truth we have Jathan McCallum from Dropbox to come up next here you go sir hello I'm Jason McCallum let's see how does this thing work all right all right so I am a network reliability engineer at Dropbox what the hell is that we pretty much formalized net DevOps as a role at Dropbox and really we're very spiritually connected to the SRE organizations which you may be familiar with which is basically a formalized DevOps are also what does that mean that means that we take software very seriously at Dropbox and some of our new architectures are completely driven by software which is going to be pretty cool but anyway we'll get to that later mmm I'm the maintainer of networks just the truth which is something that we open sourced at Dropbox not long after I started there and I took over that project and network source of truth is almost everything that net box is except it's more stripped down and I'm gonna we intentionally let Jeremy go first he could get that out of the way so I can contrast what and inside its approach compared to net boxes but anyway I also maintain another project called trigger that you may have heard of it's a network automation tool kit that way out dates net miko and napalm and also shows its age in some ways but it also does a lot of things that they don't but that's not what we're here to talk about today I was at net Inge at a low for a very long time almost 13 years and then I went to Salesforce ended up at Dropbox so what is inside it is a source of truth database inside is much more focused on inventory as a concept was it mean IP address inventory device inventory interface inventory circuits coming soon as protocols and so what we're really focusing on here is the ability to vote declare and define what you want your network to look like and then use that data to drive configuration and whatever else in similar ways that Jeremy was just talking about but most importantly inside is API first and that's one of the big differences between some other traditional approaches to network inventory and IBM and I'm going to talk about that so everything in insight is done using the REST API that might sound terrifying at first there is a web there is a web interface but the web interface also uses the API nothing in insight has special database access and that is an explicit design decision that drives the way that we do validation inputs within the way that you interact with inside all the way through so for example if you wanted to go and put an IP address and creating your IP address entry and then you give it a crappy IP address that's not real it's going to barf at you from the server side not the client side and that's a very important relationship that you do applica have with the application because it allows you to customize your interfaces very well and you know that if you get an error its error is always going to come from the server and the errors in the interface the API for insight is uniform all the way from the top to the bottom so that the errors look the same even if you're working with say an interface versus a network and this also allows you there's also a reference implementation of the of the API which is the CLI that's a Python library as well so if you're a Python chopped it makes it very easy to get started using an insight without I'm gonna worry too much about actually interacting the API because we've already done that for you similarly to netbox BIA what's also a Django application so it's we're using the Django rest framework so it's allowing you to have a browsable version of the API which is also very handy for engineers and I think that yeah okay so there's a couple things that we really wanted to focus on and one of which I'm gonna jump ahead a little is feature parity and all the users or interfaces that we are publishing problem is is that just kiding relying the CLI is the perfect references implementation of the API everything inside can do can be done from the CLI the web UI has languished a little bit and that's just because we had someone shift teams and their course drinks with with them and we were talking about rebooting the the web UI so if you go and install inside today you will find the web UI does not do everything you want to do and especially because networks are one of the core strengths of inside and the web UI for inside is a little lacking in that area but we'll get there but one of the core principles here for in size we want it to be really easy to set up and use and literally if you remember with anything with Python it's really easy to set out and another thing too is we wanted it to really be easy to get your data and get your data out most traditional IBM solutions out there's a lot of it open source ones and especially the commercial ones good luck with that I'm sure that's why a lot of people use spreadsheets right because you can get you can do whatever the heck you want with the spreadsheet but we also wanted to be flexible and customizable and especially for our environment that Dropbox is is we move very quickly and things change and we so insite has this attribute value system it's very highly optimized and also allows you to do almost whatever you want with inside a lot of ways like it's for example the device objects and inside which I'll cover in a minute allow you to you could use it for hosts if you wanted to it's not doesn't care and we actually have seen people doing that and I'll talk about that too in a second last thing is loose coupling what does that mean we want to be able to hold a lot horizontally scale any part we don't want to be tied to Postgres sequel cuz we're in my sequel shop you can use sequel I'd if you want to some people are using sequel I'd be just because it's easy you can set up whatever web front-ends you want whatever you can do caching there are all kinds of extra things that you can do with insight that don't come out of the box the data model is pretty linear and we're starting from the top to the bottom here what we're trying to do with insight is in so it's really invested in being a desired state vehicle you can use it for discovered down if you want to we've seen people do that but that's not what we're using it for our drop box so sites we have it at the top level might not to be sites and what you think of their namespaces namespaces basically you can have overlapping objects you can have different groups of network objects for example that conflict with each other and so dropbox were a one site shop we're not using that feature but it's their attributes of values are really where the flexibility comes in so when I said the inside is a bare-bones solution it is it doesn't have come with any predefined attributes or values so if you think on a device like what's the most common attribute that you probably have a vendor hardware type those kinds of things you're gonna have to populate those yourself and create those yourself the device for example by default only has a host name but then we go to the network so like you know devices you connects you connect interfaces to devices you connect attached networks to interfaces you bind interfaces to circuits these are all things that we do in the real world this is exactly how they're modeled and inside lastly changes everything that you're doing inside is logged in a change log and there is coming to and the ability to roll back changes and restore objects to a previous state but that's not really there yet I kinda I think I kind of skipped it around like I said you can use inside however you want it's it's it's as mentally minimally opinionated as possible one of the other really cool things is the ability to do set queries this is this is something that at first seems a little weird but it's actually very powerful so based on attribute values you can do unions intersections and differences and it starts getting really mathy there but actually the way we've expressed it with text is actually pretty cool I'm not gonna get too far into that right now I just wanted to raise the point that set queries are a very powerful way to say limit you could say like vendor equals Juniper - hardware type equals switch to get all Juniper devices that aren't switches for example this is kind of a very expressive way of filtering queries to look things up that's kind of it this was kind of meant to be like a hard and fast introduction inside but it's fairly well documented and there's the part the Python client is also includes the CLI utility and we do hang out on the network to code slack and we're also on IRC on hashtag insults and that's it thanks all right awesome thank you Jason I'll take that from you okay and to wrap things up and close this up we have Mira who is going to be talking about some event-driven Network automation stuff and so just to remind you keep an eye on the time yeah hi my name is Mitra I'm a network engineer at cloud fair I'm also member and Madinah at the automation community together with David and Kirk I've integrated the neighbor insult and I'm also configure a representative and sometimes I blog at my complicated name dotnet and today I will have a very brief introduction to event-driven Network automation because there's a meat saying that Network automation is only about configuration management but I can't agree with mr. Abraham in this context and I'll try to prove how host is this affirmation because we have plenty of depends happening around our networks and it can be internally or externally some of the most common wastes internally your network is trying to communicate with you is through SNMP traps syslog messages and the new kid streaming telemetry and it's about millions of messages your network is trying to to communicate with you so let's take the first two for example SNP traps they've been there for years but most of people have been using them only as notifications like hey you have an interface down okay thanks I'm going to apply manually Oh configuration change now or in the last few years I'm going to run manually a command that is going to do what it has to do but you see the pattern is still manually syslog messages most of people don't use them at all or they they store them somewhere they are on the server somewhere I don't know what where the server is or some people may probably use them as some vac notifications but there's a plenty of important data in those syslog messages they can give you details like interface is flapping or the optics level is below the threshold or chases arms NTP server is down so your network is unsynchronized or simply bgp network is leaking their entire routing table to you so do you really want to ignore this data both streaming telemetry while SNMP is about pulling data at specific intervals a cemetery is the opposite it's going the device will push you notifications when it has something to tell you about and these notifications are usually structured documents which are their error he follows the yank models there are two main organizations that write these models they are open Pig and IDF although I'm a very big fan of were streaming telemetry they're still supporting only on very new operating systems like iOS 6 are you will need six dot one dot one while many people are still running four dot something or Juno's 15 at least 15.1 but that depends what problems there are some platforms that have it only on 17.1 which has been released only a couple of months ago and even so the a variety of features is not that big let's go back to the syslog messages I have here two snippets of assist of messages from June OS and iOS 6 are although they present exactly the same notification we're saying that an NTP server is unusable they look totally different so this is why in the net on automation community we decided to normalize this as well as we do with the other parts that David has already presented and Nepal monks is an engine that continuously lists for system messages either from the network devices directly via UDP or TCP or through different systems like Kafka or Zealand and so on then Nepal rocks will compute a document that is structured following the same yang models from config and IETF and this structured documents are then binary sterilized encrypted signed and published over various channels like because um queue Kafka and so on to visualize this in the in the center is Nepal Knox that is continuously listening for messages from the network devices or pulling them from Kafka and will publish the structured documents to cozy room and so on well various commands can connect to and to retrieve those documents from the examples I had previously with those two snippets and Naples will produce a message like this one that structure for in the open public system model and doesn't matter if it's if you received a notification from io6 Arjuna's us or so on the the document will have the exactly the same structure is system NTP servers and tells you that one seven two seven 1771 has a stratum 16 that means is unreachable as I said there are Paris clients that can connect to to collect those the structure the documents you can either correct connected directly to in them or using a very well-known framework like a stack storm or salt how I'm going to exemplify salt for who is not comfortable with as data-driven automation framework and anything happens if you run a command on the CLI is going to be will go to an event bus if you for example are on net dot ARPA to receive the ARP tables from advice you will see an event like this on on the bus each event has a unique tag that identifies the event and the event has also body one of the kinds that are already embedded into salt that retrieves messages from maple logs is called an engine napalm syslog is very easy to configure only say you'll consume messages from napalm syslog Napa loops running at this address at this port and will import them on the soul bus any porting means that will have exactly the same body but all also assign attack for this event in this tag somewhere at the top is napalm syslog slash juniors because the message has been saved from a Jewess device then a unique label of the of the message NTP server are eligible and at the end the last namespace is the host name of the device the slash is between two consecutive namespaces at the end we can fully automate your configuration changes for example you can uniquely identify your events or using for example the reactor system and when you have a message that has the pattern I just exemplified you can match it with the reactor for example you say match when you have an event from having the tag napalm syslog then followed by anything then NTP server unreachable from any hostname you are going to run this command that basically is the equivalent when you would run from the CLI manually to deploy a configuration change when this can be extended to any kind of events you want to imagine to trigger in your network is the myth busted alright great Thank You Mirchi appreciate it all right so you've just heard from all five speakers or various aspects of Network automation I'd like to take a moment now and see if anyone has any questions they would like to address to the panelists so let's start here in the middle go ahead Sir John O'Brien University of Pennsylvania this has been a fantastic panel and I'm gonna try to constrain my enthusiasm for questions one of the sort of intrinsic challenges that we've had in my environment is that while there are sort of clear technical measures to facilitate the provisioning of resources it's kind of hard to make sure that people clean up after themselves and that things are deep provisioned and that's reflected in the sources of truth can you comment on strategies that you've found successful for that okay let's get Jeremy and Jason to jump in on that I'll take this one for a sec delete it from your source of truth it's not easy I think one of the strategies is you have to have a good life cycle like a pipeline of like how things get in and how things go out and then and a lot of times like what we've done at Dropbox for example anything that goes into inside is tied to physical hardware we're not still good about deleting things from inside yet although I'd like us to so what we do is we have an attribute and in thought that represents the actual state of your Hardware right now it's still a little bit overloaded but that's a good strategy right you can say like the state of this device is decom forces the state of this device is in production and then that's something you can filter on if you don't want to delete stuff that's actually a really good stir point I'll just add that in net box one of the approaches that we've taken is is work I'm doing right now on a feature log that validation reports well it's in container the version 2.2 beta if you guys want to go online go take a look at it but basically it amounts to remember you have this data available you can leverage tools like a napalm which is built into that box and validate the network against that you can actually reach out using napalm say alright give me all of your go talk to a Juniper router and say give me all of your IRB interfaces that have an IP address assigned to them pull that data and you match that against what's in that box you can flag on anything that's not supposed to be there whether it was there and it's not a source of truth or it was and somebody decommissioned it you can do that and you can have a route you can have a report that runs at the end of every day says what's everything that's nuts not supposed to be all every IP address or a VLAN that's not supposed to exist on this you can actually email that to your team or or however you prefer to have it delivered that's the bridge I would I would take awesome great thanks let's go over here yes sir from a beginning developer standpoint and I'm not trying to start a holy war here Python 2 or 3 what should we what tools are you guys developing in what should we be learning all right Kirk you want to jump it on that and then have somebody else roll up yeah so I think we've really reached the point where Python threes at parity so I I really wouldn't I think you might as at this point you might as well just do Python 3 that you're not gonna run into too many obstacles doing Python 2 and for a lot of things especially for a beginner the Delta is not too big so I would just start with Python 3 at this point all right go ahead sir Adam Mills from roblox I had some questions for the gentleman who presented about the the decent products both of those products seem to be very network focused how do you guys address the 50 or 60 other devices that are in the racks do you guys track those do you have to separate and have two different sources of truth for both of those products so in net box you can you can import anything doesn't have to be a network bus it can be a power supply or a PDU it can be a console server it could be a server anything you can you can create as many device types as you want you can find manufacturers and hardware types you can add we met buxley supports console ports power I'm sorry console connections power connections and data connections so you can have pretty much anything that goes in a rack can put there whether it's rack manager or not the only real limitation we have right now is around furniture so like patch panels and shelves and stuff like that we don't really do a good job with but anything that was or like a specific type of device that you had in mind just the the rack-mounted servers so like right now we're in the we're in the process of evaluating a couple of different decent products device 42 you know some of the big the the big name ones really intriguing based on the api's and stuff that you guys have but you know at previous companies that I worked at there was a very tight coupling between you know the the guy who's racking the servers does he enter in the devices properly and then that you know that kind of kills automation and you know if the teams don't work together so I was curious if you guys had broken because I know digitalocean and and Dropbox you guys have a huge server footprint globally are those are those in the same system the like if I you know you have a failure on you know port 43 do you know what that is we're getting there right now right now we in in our instance of netbooks we're only doing the network engineered stuff however we're adopting it on the data center side as well our DC team is using that it's just a matter of importing all everything into the new database and how we're going to do that because part of that unfortunately it's the one thing you can't automate is rack elevations you need someone to physically go there and look at take a picture where is everything in the rack there's no way to get that data outside of a human so that's kind of the slow down or the trip point right there but yeah you can put everything in there and you can actually assign permissions to different teams so that only the data center team can manipulate a server only the network team coming to manipulate a router if that's what you want to do shorter insert drop boxes two systems for that thank you alright go ahead sir thank you Markham our third tip is double global this is a question for the VDI mainly I wonder how do you deal with those comments that Donald fit into open config for whatever reason I know that there are the open config model are huge cover a lot of parts of the configuration matter but there are some commands that you cannot parse and you cannot put it into open config and they wonder how your own Apple works in that way so the if we're going to talk about the yang integration with napalm the idea is that it's I mentioned open config because that's what everybody knows about but it is that it's integrated with young models so you could it could be an ITF model it could be an open config model so if there is something that is not supported by either the open config community or the ITF I'm a will be a matter of bringing the topic to them see if they want to fix it or otherwise just write our own models for that so that's fine there is an example already fact we're using the IP model to manage all the IP addresses associated to the interfaces and the open config model doesn't support the secondary keyword that you need in different platforms so we extended the existing model with that keywords so we could have it in in in a pond yang because we needed that to be able to do the translation to native configuration in platforms like iOS or iOS thanks okay go ahead John are Brian again from Penn pen is very organizationally decentralized so even though our enterprise network is called Penn net you might as well call it BYOD net or the network of unmanaged devices one of my pain points has been trying to monitor and mine Mac tables ARP tables and neighbor tables because of what seemed to me to be kind of clunky and an adequate support on the vendor side is this a problem that you've encountered and what sort of solutions do you have for it yeah I there are a couple of ways to do this you can continuously monitor the size so you basically you are interested in the size most specifically or what you have inside the labels well I've encountered both those problems the size you know you know I basically got to ingest the whole thing to count how many are in the in the table but also just sort of monitoring over the course of time which which IPS and MAC addresses have been active over time and where yeah what IP at this isn't in the tables you need to continuously retrieve this data but if you have too many addresses in in this tables I know for sure that there is a syslog message that tells you you have exceeded the maximum number we have set or what is set by default and you can use this was I exemplified with an upon logs for what other tools are available and I think I have poor requests over the weekend someone just added exactly this and will be available for the mac address tables in the next release of nepal oaks and you know how to add to that that napalm does make it easier for you to pull the MAC address table and the our progress table from multi vendor and that we have normalized including done the screen scraping on some legacy platforms so that you just get basically a data structure back that says you know here's your MAC addresses here's your ARP table okay great thanks guys alright we have time for one more question unfortunately the other two guys in line I'm so sorry I'm sure if you grab one of these guys during the event they'd be more than happy to talk with you so glad sir Jason Belk network engineer my questions going off Kirk's coming about being too busy to learn automation try it so in all of your experience what network operations is an interrupt driven job where something breaks in someone pings you or you see in a word so how do you do the critical thinking that it takes to do these more complex problems doing coding where if at any time you might be pulled into an escalation or something yeah and so I I think this is why in a lot of times and a lot of the DevOps events they talk about organizational things that I think there's a lot here that has to be sort of embraced at an organizational level and that I think an organization sort has to realize if they have everybody at max capacity that it's a failure scenario and you know there's really you know good books on this um the Phoenix project and the goal where they're exemplifying this pattern of you're operating at a hundred and ten percent and you're in this constant chaos and firefighting mode and then a year down the road you're still on that so I think you as an individual you can bring it up and try to tell people that hey we need this extra time so that we can start improving our processes improving our systems but ultimately the organization has to embrace that that is net valuable for you to do to give you that or else you're gonna have real struggles in order to enact that okay and that's it for time so let's give these panelists are a great round of applause in victory you

Info

Channel: NANOG

Views: 14,230

Rating: 4.8578682 out of 5

Keywords: NANOG 71

Id: aQFbSovedIE

Channel Id: undefined

Length: 63min 7sec (3787 seconds)

Published: Tue Oct 03 2017