The Single Source of Truth for Network Automation

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

okay thank you everybody goodness me how do you follow Betty actually I do probably have no I no recollection but the very first operator community where I gave her a technical presentation was Nanog and I prepared some slides and unfortunately Betty had a look at them the day before and help me make them better so probably no no memory of that at all but it's um I was looking back and I hear in my archive and see that yes it was the first person first conference I ever spoke up so thank you for your help helping me find my voice those years ago then this is a presentation that I put together because um I've worked on a few automated networks and there's a lot of content out there and a lots of learning a lots of understanding about how to run an automated network from the point of view of writing software script systems that that go and change your network device go and change what's configured out there but one of the things that I've learned that's that's very important is you actually need to be able to not just run an automated network but run an automated business and have your network influenced by what the business understands and I haven't found an enormous amount of content out there about having the the automated network take inspiration and and instructions from an automated business so because I've worked on a couple of businesses that have worked like this I thought I put together some slides about the things that I've learned and you can grab me afterwards and tell me what I've done wrong or not um but but let's um let's have a look at what I think um for network engineers which most people in the room probably discovered when they first started and being able to talk to their equipment using scripts rather than using CLI instructions and the a great place to start with automation is producing some scripts that give you reports what's happening on my network right now you get information out of the network that's actionable and useful the next part of the journey is you can take that learning when you write software that gets reports out of your systems you become more confident you get more familiar with the tools the libraries that you can use and you start to write scripts that impact a change on the network like it's something that will automatically bring up paying sessions for example in a consistent manner eventually some of those systems start to work together you end up with a more complete suite of tools that go and set stuff up and it feels a bit more like a an application you're starting to manage the network rather than manage a device and you start to have there something that actually helps you make decisions about something that's a lot more than just pure configuration eventually the ultimate place to reach the thing that the next step to take which is perhaps the hardest is to reach a fully automated an integrated business so a technology business with a set of processes that are actually enforced and delivered by your automated software he's stopped configuring the network but start to configure the entire product generally this is solved by businesses with with pretty large scale challenges such as large access providers hosting companies but but now this is something that a medium-size ISP an IXP for example can implement so this presentation is based on that this is these are my technical perspectives and thoughts on the architecture of a greenfield deployment of an alternative business we as a business an internet exchange point as a service operator we build AI X's so what motivated us to build our network and our product in a fully automated manner it's because we had to replicate this our intention and our businesses to build lots of these that are all identical and you can't do that without process and software and actually by getting it right and having such strict but efficient processes that we're getting helps you deliver efficiency and leanness in your business right from the beginning and it helps your customers because they get an assured service they get a service that works the same way however it's set up when it's set up because it's being delivered by the same tools same applications it allows you to integrate with party databases like the I accept database if you're interested in clearing up hearing d-beam actually some of some of the the reason we do this is because we were quite frustrated with some of the the legacy models and that had been used and therefore the chance to align the business and the technical process with an efficient software stack was in our DNA from the beginning and the way this presentation may be different from other automation presentations you've seen before it mainly focuses on how and why you should build a data model and we'll talk about exactly what that is in a moment and we also hopefully will have time for you to talk a little bit about the way that we architects use some of the software and some of the thing to do with software testing from a network point of view - ok so what do I mean by data model I've said it a few times now what what do I really mean by that and a data model is a structured description of all the things that your business needs to know in order to operate so it needs to understand which people you are interact with which organizations in your supplier or customer base and which products that you sell for example and and and the technical elements here describe configuration descriptions and infrastructure elements where your switches are where your routers are and the thing to do when you start to build your data model is start by mapping the steady state of the business when nothing is changing who are the people the organizations or products that we sell when nothing is changing what is it that my business knows and once you've modeled that wanna probably a very large sheet of paper or a whiteboard you can then model the interactions between these things so for example a person who works for an organization and requests a product which is configured in a certain way that can be described in a there's a quotation for a product that's prepared for a person at a customer for example and and these interactions also need to be modeled why would you care about that from an engineering point of view why am i standing in front of it an engineering conference and saying you need to model quotes in your organization actually there's a benefit to you as an engineer for doing that because very quickly you realize that for example when you're setting up monitoring when you're monitoring customer services if you understand if you've modeled the quotations if you've modeled the thing that somebody intends to buy from your company you can monitor the thing that somebody wants to buy not necessarily the thing that got rolled out so if you start to model and if you start to model everything that to do with an order you can actually take advantage of that at later stages of the engineering process once you've modeled the information once you've modeled you people processes products you start so you have to understand where does data where does this information live in your organization and actually whether you're an established organization or whether you're working on a greenfield product a Greenfield application you'll probably find that data naturally lives within different teams in your organization so invoicing will likely live with finance and information about prospective customers or current customers might live with sales technical information will live in a supports function or an engineering function so you've modeled exactly what information you need the new model where it is and it's actually fine for data to live in different tools and different data bases around your organization it's much worse if the data doesn't exist in a database but it does if you exist in databases they're spread out in your organization but what you have to watch out for is where the same information can live in different databases within your organization but be subtly different each time for example you might find that information about an organization live separately in your finance system your sales team your support teams where you're dealing it so you're thinking about the same organization each time but subtle fences and in engineering maybe we don't care about the organization because well we just deal with Fred and and that's not okay it's it's not fine for data to be authoritative in more than one place the data needs to be authoritative in a single place and other databases around the organization simply need to link an index to that authoritative record so for example if the CRM is going to be the database where information about organizations live that's fine but the other tools in the organization need to look at that database in order to get information about that particular type of record so when you when you build your data model you've worked out where you live where the data lives it's important to follow some rules of engagement which is that you store any item of information just once in the organization there's a benefit from doing that it means that if you store the information once and that information is wrong in the place that you store it you only need to fix it in one place and then it's correct across your entire organization storing the data once and linking to it for data based nurses is to call third normal form and if you want to read more about good design of that that's something you can look at when you and when you store that record in a database it needs to have a unique identifier which is nothing to do with the data itself so for example an a s number isn't a good record identification number for an organization or whatever because an organization can have more than one I guess or it might change the a s that it uses or something else in a s is it's an attribute it's not an ID number for the record within your database and you need to decide where it's authoritative and because you're working across teams is therefore important to have buy-in across the organization when you're planning this you have to do it as a as a business when you're actually modeling your data now looking at some of the in network specific stuff when you're modeling your data you need to be very careful to make sure that infrastructure centric and customer centric or service centric data doesn't live within the same database record this makes your database much more scalable and easier to maintain and from a portability point of view a specific example here might be to make sure that information about network ports and network services that live on that port separate pieces of information this means that you could stack multiple services on ports further down the line or it's much more portable you can move services around to a different port without losing the service history if you separated the infrastructure element from the service element enough for example once you've modeled your data you're going to realize that in engineering you're going to be responsible for a bunch of information though and there are some interesting data based fashions you might read about when you're trying to select a database to hold that information there are two there are many but there are two common types of database that you might consider and the documents store or no SQL type of database and the relational database now I've actually worked on automated networks that have had both kinds of database for authoritative engineering data and I found the relational databases suit that better for the needs of an engineering database about services ports configurations this kind of thing we tried them a non relational database for this on the document store at a previous organization and the developers really really liked using it because it was very extensible you can you can define you can just extend the document when you want to think of storing new types of data as your service evolves but I actually personally find that strictness is a benefit in terms of building a database that describes configurations products ports and you get a benefit from from having a strictly defined database for engineering information at the common source of a common database is that I actually use in our current stack we use we use MySQL and four truths about users ports and services and this and at service states for example things that are inherently relational in our in our stack of information that we need to remember and we use time series databases like in flux DB to store time series data about that so port utilization light levels error counts and information that is about a single element that changes over time we use influx db2 so information about that and we also use some third-party databases that I recommend you have a look at as well like the Euro X and the peering DB database is really really good for getting information about what people are going to do with their connections to your network for example so now we've talked about the model we've we've talked about the fact that you've mapped everything that you need to know you've mapped the interactions between those things and you've worked out where they're going to live in your organization it's time to break out a text editor and write some software well what might the architecture look like the architecture that we've settled on is these very similar is in fact the one on the screen right now and and the the interest at the the area where I lights are the the area where I find most interesting is that single API layer in the middle the API in in this architecture acts as a gateway for these many different kinds of database so we we don't use sugar but it's an example of a database with an API that can store information about organizations and people MySQL is where we store information about network elements in flux DB the time series database and also the Gateway to the actual infrastructure where our service lives on switches and in servers they can all be addressed through endpoints on our single common API this makes it really easy for us to develop and monitor the platform and much easier to make changes to back-end services by just writing technology classes at the back that sorts of different databases in the backend through a single API layer and it also makes it easier to expose and data to customers this is a really really good thing we I we've learned quite a lot about what's happening on the network if I if I learned that I'm incrementing in our account on a particular port why not actually make that information available to customers by the API if they're going to build tools against our system as well so that they can see what we see I'm only running services for their benefit so why wouldn't I share what I know about the services they're enjoying from us and having a single API layer that can address all of the information that we know that's all as an organization it makes it easy to expose parts of that to our customer oh also having a single common API layer gives you a benefit that means you can no matter what the back-end storage format or the back-end messaging format actually is whether it's an XML API to a invoicing system whether it's a but are you going to have SQL databases and the time series database whatever however the backend data is stored you could address it in its native format it from your API and expose it to your tools to your portals to your customers in a single format like Jason so you can you might have to address things very very XML for example but you don't have to expose that hora to your customers you can give them nice clean Jason consistent data for every kind of information record that you hold here's an example in fact of what it looks like we have a single API layer as I mentioned and on the on the Jason document on the left that is a description of a service that's called a cert if you you have a service with us you can ask our endpoint you can authenticate and then ask what do you know about this this endpoint and I'm collecting information via a bgp collector running bird a bgp route server running bird there are different servers are internal SQL database on exposing that in a single JSON format to our customer despite the fact that none of these backends actually natively talk Jason on the right now this is information about a port which is in use by a customer and you can I'm here I'm pulling information from the switch pulling information from our internal SQL database again pulling information from our influx DB database again exposing it all in a single format Jason to our customers so our customer can ask for information one endpoints about their services and the back-end information can live in many many places but a single request to a single endpoint gets that information for the customer in a single format and that's what that architecture allows you to do so now we have now we've now we've got our data model we've got an API we've got in a way of addressing that model in a consistent manner how do we actually use that from the point of view of a network engineer well once you have confidence your data model once you once you find that you're using this information using your tools you're looking at the information you change any information and it's working you're confident and you can start to harness that power of network configuration via templates when your data model extends across the business you can do that with much greater accuracy and devolve control what do I mean by that there's an example and we we don't actually have anybody in the organization with a provisioning brief in their job at asteroid our salespeople all the customer themselves can deliver exchange ports directly from the quotation so a customer can describe what they need a quotation is produced if everyone's happy you can actually deliver the the the port from the quote when the quotes accepted it gets rolled out and the advantage of doing that is we can deliver not just the exchange port and the answer related services like routes over sessions but we can also deliver our monitoring so we can what is sold not what gets rolled out if you've ever had a situation where you're monitoring the you you're monitoring what got rolled out but the customer actually intended to buy something different if you use what the information that's in the quote to roll out the product and to set up your monitoring then you're delivering what was ordered and you're monitoring what was ordered not something different so that these things work together in a bit of a fire triangle you've got structured data your data model data that is correct and completes you've got templates that describe how monitoring or network parts or server configuration stands they should be configured and you've got a way of rolling that out the automation which we'll look at in a moment and the templating engine that we use is is one called ginger and I think these slides will be online afterward so if you want to have a look in a little bit more detail you can have a look or I can share more examples with you afterwards too the nice thing the thing I like about ginger is that it's not actually specific to a particular back-end configuration it's just a way of managerial and the flat files so if you can generate some plain text as a piece of config you can develop that quite quickly you can express that quite easily in a ginger template so it can take variables from your JSON API so in our case where if I I can I put the ambitious err what does this do this is a bird configuration snippet that sets up a bgp session and there's a stuff that stays the same each time the stuff that gets wrapped around the session and then there's information that's coming out of our api you can see in the double curly brackets like there though the network's a s number and the IP addresses all coming from our json api and using the ginger templates it allows you to use programmatic methods within your templates so you can see the top and the bottom line of this config snippet is indicating the start the end of a for loop so I loop over that as many times as there's information coming in the JSON and for rolling that out we don't actually have scripts for rolling out configuration I use some ansible for changing server configuration so this is an example of a task that again sets out roles out that config that's on the last slide actually so this is the instead of having to write scripts that do that which are pretty similar each time we just have to manage config files that describe the end state of what it is that would like to be configured so here's a yeah there's a bit of config that goes and gets the information from our API it goes and generates the config and then rolls out onto the server and restarts the daemon you can do conduct conditional logic nicely in the and several templates as well so you know you're not not limited where you can see that line starting when indicates that we should only build a root server session if that is flagged in the Jason as being true you know we would like a root server session to be configured what are the advantages of doing configuration through something like unstable rather than having a a piece of software you have to write and maintain yourself well you don't need to write the maintainer Peter saw for yourself it's already there it's already complete and allows the API layer to stay pretty lightweight all the API layer has to do is retrieve an update database records or retrieve an update information excuse me from a back-end worker and that's that's not difficult to write and you can write it in a in any language that's familiar to you there's lots of stuff in there there's so much good back-end code in Python that a lot of network engineers become very very familiar with Python very quickly and I I chose to use Python for our API because you get very familiar with it when you're working in network automation and it also means the automation layer is very very lightweight it's just configuration files the sort of things we've all been using for a long long time and when you when you're building a data model and when you're building your API layer it's important to make sure that the decisions you make when you're building your API and mapping your data allow you to store a retrieval process and business logic as well as purely technical information so you remember to model and build tools for everything not just the things that touch your network for example we have them we have a system call the campaign's that allows networks to suggest where they'd like us to build an IX next or allows someone to support an existing campaign we model that in the database as well because it means that when we convert from a campaign to of running exchange there's a bunch of information that's already in there and already structured that we can use in the initial configure as well so there's a strong advantage to some modeling and mapping everything that your organization does not just the stuff that touches your network no although I said I will talk a lot more about the data model and and the stuff for the rest of the business no network automation so it can be complete without talking a little bit about what we what we do to talk to the switches and and the servers that run our service altogether and when you're running and when you're running an automated network at some point you are going to need to write some stuff that goes and rolls out config and what do you need to consider when you're considering the messaging or the communications between your common API layer and the backend you need to consider inter process communication how are you going to trigger that there's something the API need a layer needs to instruct the backend and you need to consider how you can map a model state the state of a network job how did that roll out completely that it fail why did it work successfully and state you should consider stuff to be device independent so that when your vendor evolves to a new configuration dialect or a new product line or you maybe want to pick a new vendor in the future and you can do so just by writing a new and back-end worker rather than all news load of stuff you need to consider as well if you can in your back-end systems cope with device swap outs if there's a device's failure for example there inter-process communication there's a couple of ways of considering it and these are the two that I've I've looked at before with actually both both work fine it could be message to you based like a system like RabbitMQ this is a system that allows you to leave a message you know in in a queue as it says that describes a something that must be done never something-something what puts a message on the queue saying here's something I would like to be done and another process picks up the message from the queue and says okay I'll go and do that for you then and rabbitmq is a good one because it's got very good support in in major scripting languages it's pretty fault tolerant you can also impose strict order muttering guaranteed delivering delivery high availability and but we didn't use RabbitMQ for this project we used we built a separate web service layer so we have a central api but when we talk to the back-end servers and switches we go through another web services API and it's because it allowed us to use an exactly identical technology stack to our central API when we got familiar with with with the technology we just had two platforms that look very very similar it's very very extensible and because if any if it needs to support a feature that isn't in a message queue it can do it we just have to write the feature and it allows this to be decentralized so that the as long as the mothership is available if there's a there can be many many sites with many many exchanges in front of them a worker processes that are independent of each other so if we lose power in one city or whatever it doesn't affect any part of the provisioning or our system in another city and we chose and when it comes to actually rolling out config onto the network devices themselves I look to trying to do write a generic worker system and and separate worker system per back-end and I in the end we chose this time to write a different worker per back-end now this means that there's a bit of copy-paste overhead which actually is definitely an anti-pattern if you if you find yourself copying and pasting code lots of times it normally means you've done something wrong but in this case I decided that it was actually just a little bit of overhead that can be explained because it means that if I need to treat vendors different vendor systems slightly differently and I don't have to treat them generically if a particular vendor has a particular quirk I could address that in that back-end worker system without having to have that hack in all of our back-end worker systems and I use the system called napalm because allowed us to come continue with ansible when we're addressing network stacks rather than addressing as well as addressing servers it allowed us to quite a lot of Technology consistency and it means that we can do switch a server drop switch outs really really easily because we just run an 8-pound command that says roll out the whole config and and the and that's a that's our device emergency strategy done it comes for free with napalm yeah as I said here there's no need for a specific software feature just an operational process saying this is how to do it is that he's okay now um something that I I don't come from a developer background I come from a network engineering background and one of the things that's actually helped me write code this so far being quite reliable is as I learned how to do automated software testing in at a previous organization that did automated networking it was completely new to me at the time but I will probably credit and also mated software testing is the thing that took me from a mediocre script cert as someone who's confident enough to actually write more complicated architectures the the important software testing architecture mantra to learn is red green refactor read write the test first before you write the code run the test the tests will work the co doesn't exist then write the code run the test the test goes green because the code exists and you wrote it properly then refactor the code to make it more efficient run the test has it gone red has it gone green you start with a working state you can make improvements to your code for efficiency for readability or to make it work better as a library that you you just call for example and because you've started with a with with a simplest format for that code and prove that the test works you can then write more complicated stuff and check that it still works by running the same tests red green refactor then I use because we use Python I use PI tests as a as a framework for testing our our back-end there are lots of methodologies on automated software testing and as I mentioned because I'm not from a pure software development background I found the unit testing methodology way you mock outcomes and you mock data and then run tests against that mock data quite difficult to understand and off and didn't I found that pretty hard and but I found that writing integration tests that in my development environment went and did the thing for real and then indicated whether it works or not to actually be quite easy and and it has my back so if you like me you might find the the methodology of integration testing rather than unit testing is is easier to adopt right lots of tests write tests for everything write to us for everything he writes and remember to cover desired exceptions where something is important here's an example on the on the screen now writing a test to make sure that a regular user through our API can look at their own contact record and then check that it can't see all the records to check that the thing that you want to work works but also check that the thing that you want to catch the exception that you want to catch and can be can be triggered as well and you run the run the test run these tests from scratch every time I write a block of code you know it's really really lightweight every time I write anything run the test write some more run the test you can actually develop framework so that you can run the test and carry on writing whilst you whilst you're writing the next bit of code and have something like a growl notification pop up and say hey you you broken something go back it's it really does catch stuff the longer I think there's a there's an efficiency from writing tests and that you get because he spend a lot less time debugging even though he's spending time writing the test it's an investment this is what an error looks like okay this this would have been quite a good one because I'm Pete well it looks like as I've made a mistake that's broken generating quotes which this is what I said well this is what an error looks like it goes back it tells you what you've done wrong the great thing about this when you're writing automated tests is when you're writing it tells you what you or you've done wrong and it's much easier to go back and fix a bug when your own computers telling you you've made a mistake here in this library in this module then it is trying to figure out something because a week after you rolled out a change someone from accounts can no longer send an invoice so a couple a couple of extra thoughts and there are lots and lots of ways of expressing information to customers lots and lots and lots of them and I really can't stress enough the value of expressing information that you know to customers in the JSON format because so many third-party databases like peering DBA are making their available information available it with Jason you'll find that lots of people in the room that are writing their own applications and scripts and reporting's systems that are based on looking in those databases if you as their vendor can also Express data in the same format all of your stuff can be plugged into work that other people are already doing that I can't stress enough the value of if you have to pick a way of expressing information to customers Jason's the way of doing it so what is the single source of truth it's there is the data model and that you make available by your series of tools that that's under the control of all departments not just the engineering department but all departments can control the information so that they all have a vested information in keeping accurate which is used to configure services and the network elements that's accessible to all departments and accessible to customers as well so that they can perform more self-service tasks information in in in one place and one tool has generated as so many efficiencies that have a significant benefit to our organization and not least one such as for example our account managers are able to do some of our first line support for example using the tools that we give them which helps account managers do the thing that they really want to do which is help their customers we generated so many efficiencies that I can't stress enough how I'm happy with how we've got to without with the automation work that we're doing ok I don't know if there's any time for any questions but if anybody has any questions you can either grab me in the hallway in the breaks or a somewhat the microphones now if it's time Steve is there time yep there is so if anybody has any questions I'd be really really happy to take some hi John O'Brien University of Pennsylvania could you speak briefly about the technology alternatives that the significant alternatives you've considered and why you went with the stack you described ok so there's a lot of so the vendor specific controllers but you can use that that allow you to control network state control network configuration rollout network changes that give you a head start you you've got to do a lot less writing software if you're using your vendors like contrail for juniper or or something like that that that takes you further along in the network Automation element but I wanted to be really really vendor agnostic I didn't want to actually be tied to any particular part of the system and when you write your own API layer and workers stack yourself you have 100% control in vendor selection in the future as long as the vendor has a programmatic interface an API anything that's not telnet an a expects script anything better than that then you've got no you're not tied to any particular vendor I happen to really like a vendor but one day they might upset me I hope they don't and if I do I'm not tied to their provisioning system so it can move in terms of things like ODL and the software controller and utilities for orchestration that offers again I found it just as easy to use may pom and it was less for me to learn I'd already learned ansible which I needed to use to control our servers anyway but figuring out napalm was almost no extra work so it was it was a sure to step up that learning a different open-source controller for example and also I I didn't really find something out there that allowed me to reverse back into the business you know wanting to understand and if I literally want to be able to express a product and make a make that roll out from our quoting system how would I plug that into something which is purely focused on network configuration States or configuration management for example I'm gonna have to do a load of work anyway so I might as well just do this because then I know the data model is exactly as we need it I don't know if that answers your question or is it that's fine thank you very much for a nice talk thank you hi there hello Andrew Danforth Linode I just wanted to talk about you said you had a you wanted to do a single source of truth across all departments was that an effort on you part two coordinate across departments to get that going in other words how did you get buy-in from multiple departments to have a single source of truth so I I don't make any I don't make any guesses that it's easy and Olding all companies but I was very very very lucky to be able to do this on a completely greenfield idea on this occasion and one of the reasons that we wanted to start this business and offer this product is because we actually wanted to understand the most efficient way of delivering I access as a service possible and there's no way of doing that without modeling and making this repeatable replicatable and modeling everything that we know it's the only logical way of approaching the product that we wanted to build so I was very very very lucky that actually at the time that we we decided to build this business everybody wanted to contribute on a lane efficient system that works a little bit as the way I described it they we had to evolve the idea of a little bit because everybody had good ideas but there was so much buying based on the fact that we were frustrated with the legacy model that that everybody wanted to do this work and I you it's so important to get buy-in from a whole organization you have no option but to do that and in a larger organization with in trend entrench processes and that that will be more difficult but actually if you do other processes and that you can model in your in your data model that you actually already part the way there if you're a process driven organization in sales and finance and stuff anyway that's actually half of the work done figuring out what it is that you need to do so it might be possible hopefully it's possible to get buy-in across departments if you're already quite processed arisen thank you thank you I don't see any more questions so I think that's a thank you so thank you so much everybody

Info

Channel: NANOG

Views: 3,529

Rating: 4.8620691 out of 5

Keywords: NANOG, Tuesday

Id: LCW_ve1lWi4

Channel Id: undefined

Length: 40min 35sec (2435 seconds)

Published: Sun Jul 01 2018