Cumulus Networks Real World Examples: From A Network Admin To A Cloud Admin

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
um kind of bring this back to the not quite ready for DevOps world sort of what's it like for a network engineer to move into the cumulus network environment so my background is I've been doing data networking for about 25 23 years so anybody that wants to compare bridging CDC net or you know take a walk down amnesia Lane about Penn rail stat MUX is happy to do that but in past lives I based out of Seattle Washington I've worked at a number of you know vendor web scale providers that are up in that area and it's given me the ability to go and experience what it's like on campus data networks cloud data networks large scale small scale enterprises you know private and public service providers and along with being a vendor so you know I'm one of the traditional network guys it's got the stack of network equipment in my basement from a bunch of different vendor ears that I mock different things up that means that I got a couple of key miss Linux switches that I added in there so I can you know do interoperability between various different platforms just to you know try things out maybe when Dinesh where comes out and he asked me if I'll take a look at something you know I think we touched on this a little earlier one of the important things is we really are trying to bring a functional router switch traditional capability network box we've started in the data center we've moved towards the l2 sort of Enterprise sort of model with M lag like the Nash talked about but the important part is we are unabashedly Linux bash shell is your primary you know entree when you first get into the box but then we have all of these add-ons of quagga and other methods to add different features and functions into the box but we do a number of the very traditional you know layer to STP trunking M like C lag we do routing through quagga you can have traditional route interfaces if you want dot sub interfaces SV is on bridges all these sorts of things that the traditional network engineers are going to be very familiar with what it's really about is where those things moved so as Matt touched on it's sort of like if anybody's gone the progression from a version of iOS to being the Juneau sky to now being say this cumulus guy it's finding where those things are and as some of my co-workers can probably attest to when I went from being the iOS bigot like I knew iOS backwards and forwards and had to learn June OS the first time there was definitely swearing coming out of my office going down the hallway where is this darn thing but really it was just understanding where it had moved not necessarily that things were fundamentally different so one examples is like how do you actually configure IP addresses and first hop gateway and OSPF right in iOS you know its evolved over many many years where there's a lot of things that are kind of grouped around the interface itself where you apply the IP address and modify the OSPF cost and say apply vrrp Juno's came around you know 10 years or so later and they sort of broke it out into a little more program of a model where interfaces are really clustered in one spot protocols are really clustered in one spot on cumulus it's not very different we started from Linux so in Linux when you configure the interfaces you go to slash Etsy network interfaces file you apply IP addresses there you happen to also be able to go and launch applications when that interface comes up so one of the examples knowledgebase articles that we have that I wrote talks about how you can go install VR RPD if you want to run a version of ERP so this is from the Debian distribution pull it down install it and after the interface comes up spawn it as a way to run vrrp we also have documentation how to do other protocols that are out there that you might want to add on but again it just comes around the interface file that you want a map to bring that up to if you want to go configured your OSPF config like Dinesh has shown earlier you can go into quagga and set your OSPF cross so really it is it's finding where those things have gone not completely rethink it not completely do it differently I didn't necessarily put the arrows on here so you guys can kind of see where they kind of find if you can where things moved through but the same idea occurs when you start talking about layer 2 in iOS you kind of go to the interface you say it's going to be access for a VLAN you define the SVI apply the IP address there in Juno's this is an older config so I can't attest that it's a hundred percent up to what their latest one is I don't necessarily have boxes that that run that but it's the same sort of thing where you define the bridge you map the interface to it you can go and create the SVI or the equivalent and turn STP on on cumulus that then all happens inside the networking interfaces and I'm kind of showing one of the enhancements that we brought in with two five which is a VLAN aware bridge so this is the way that we are now getting higher scale where we can go to 2,000 active VLANs on the box you have now the ability to go and specify that a port is bridge on the port instead of defining it on the bridge itself which is the traditional Linux way to do it you also can now go and apply certain specific things to the interface that historically in Linux was very Bridge centric and you know was okay when you had a handful of interfaces on Linux but it's scale where I've got 64 or 96 ports on a box it's a little unmanageable so we've been able to enhance that an if' up down to to give you the ability to group what's going on on the interface as opposed to it being directly on the bridge let me know is this at this point is this how cumulous recommends doing like just day-to-day configuration is all in the in the config files because one of the things well I've you know looked at cumulus one of the things that I saw that was kind of struggle is real-time configuration using you know Bridge kettle or not kind of thing is different syntax and what goes in a config file and I think there's a huge opportunity for human error there when where the syntax differs so like how do you what's your how do you address that that's one of the things I think a modal CLI kind of differs from the the UNIX conf file kind of format so it's a very good question we are in a moment of transition right now so if you look historically AF up/down itself was not necessarily designed to operate well in applying Delta's it was more designed into this is the atomic configure have when I boot apply all of them we have made an enhancement in if' up down to that we've returned to open source and and we've had the writer of the software went and actually gave presentations at the was it Linux forum I believe it was Linux Foundation about attempting to see that adopted into the community one of the things that that did is that gives the ability for us to go through and apply the deltas from that static config file so you can now go into the interfaces file should make the changes that you want have it rerun and it won't reset everything it will figure out what is different about why those you're not resetting all the interfaces in the process of doing that you're just applying the deltas real prime clean so then the one final thing is then our user documentation now needs to shift to doing that so that's been a fairly recent introduction and one of the things that we're actually driving so I'm in customer experience so that means we own the post sales support the documentation the user training and sort of the pre-sales stuff which is more where I'm in one of the things we're doing is we've actually kicked off the let's go through the user documentation the training that we're giving to people in the field to make that the recommended approach still talked about the original so that you know you can do it if that is what you have your automation with but that you have this option now going forward yeah a question / suggestion one of the things that I found to be very useful in managing a very distributed geographically distributed network under Jun us is the confirmed command it doesn't seem like too far of a stretch if you're already modifying iff down to to apply diffs to apply some logic in there that says i'll by the way roll this back if it's not robust so that would be actually extremely easy to do with bash IE copy the old file apply the delta commit if you don't get a comment that comes back or you don't kill that process I'm sure there are other ways I'm freeform thinking then revert the old one back rerun I F up down to so I think to our earlier conversation on if you'd love to suggest a script to start doing that that could become the adoptable one that we do going forward or look at actually doing that as part of the tool chain itself an if' down to was really customer driven so one of the things besides adding you know JSON supports all these automation tools can work one of the things that the main developer did Ruffa she also had a capability that you can do commands in the command line like if config IP route do all these things interactively and ants snapshot you can write it out as a as an excedrin every Denver exactly nice so my question was how many places is the configuration scattered because I see that as a potential issue are you an L to environment or an L three yes okay so if you're if you're predominantly L two because you're manipulating the physical interfaces the majority of what you're going to be configuring is just an Etsy network interfaces so that'll set up your C lag configuration or M lag equivalent that'll set up the bridges the L two the mapping of the port's you sort of invite in that environment predominantly now if you want to set hostname it is a Linux box so you you know modify hostname you want to set up NTP you go modify NTP but from the provisioning of the network function of the box that is dot predominantly in network interfaces and the reason for the question is I occasionally get to play with different variants of Linux and I usually have the okay where the heck did they stick it in this version probably I've also gone back with other vendor boxes like f5 I think was one where I said to a customer hey I would just want a snapshot of the system oh well that's 30 different files so let me answer it let me ask that second part and it's in a separate one here in just a second so that that's one phase of it if you're doing l3 then which protocol suite did you choose we recommend quagga so in that case you have network interfaces for configuring your IP addresses you have quagga for configuring OSPF and BGP if you're biased to birdwell then you have the bird comp is where that would be if you decide to write your own in some esoteric script or decide to have your own you know traffic engineering protocol that runs on top of it those would be for those specific packages that are manipulating the routing table inside of their ACLs in all cases appear in slash Etsy cumulus ACL and then has the policy sorry policy D and then the files that are inside of there so it's not IP tables know it is IP tables it is IP tables rules that are hosted in that directory so that you run clac ACL tool and install it that runs automatically at system boot or if you make a change it's it is the normal IP tables command but it is our tool chain to do the loading into the hardware as opposed to saying using ufw or you know some of the other tool chains that are out there and just a slightly elaborate on that it's it's IP table ZB tables ARP tables you have the full you know IP six tables the full breadth of rule set and inevitably what it does is it goes through make sure that what you've specified is realizable by whatever hardware platform you're on and that the resources meet constraints and then it starts messing around with things so kind of back to your confirmed perspective this is one of those tools where it looks it over and says yeah this will work and it does it or it says I sorry you lose not gonna work so maybe you need to rethink this you can't use that feature it does it safely because it goes and compares what's in hardware it goes to attempt to apply what you're doing that didn't work so it returns back to what you originally did and forms you so that some of the some of the things that were that you brought up with just a real quick you know as far as managing those different files that's where like the the background and managing UNIX systems can become incredibly powerful because you can use things like Etsy keeper and vty SH and other things you script all that stuff and push your configs out like you would grab them with rancid and things like that well my thought process here is what I found very valuable over the years and it's probably just a tweak to my standard process is to diff the state of the system with to some text files it's simpler than having a tree of files and have my mind so so to your point about the sets of files we did we have a knowledgebase article with version control version control it's the ability to rapidly compare I mean that's my allergy to GUI based systems is how do you document what you just built without doing screenshots no but we are there system use something like get or or subversion yeah there's multiple ways to approach it but to your point on the backing up all of the files we have a knowledgebase article discusses the primary important files like anybody done a reinstall on certain operating systems and the SSH keys change and suddenly you have to go and flush out the known hosts well in because we're purely Linux you can backup those pre done keys as you moon between operating system or install a set for that specific box so you don't run into that problem so specifically we have a knowledgebase article that talks about this I created a really hacky bash script that will go and back those up every time we are working on a more formalized version not of what I did but a more formalized approached in that when you do upgrades between images it will go and attempt to calculate all of the Delta between the images and ensure that all of those move across also our support and our support team has created a script that's a little more elegant than mine sort of super quick one that they write that they'll work with customers to leverage that will do that same sort of backup of all of the files and give you the ability to go and migrate those between images or platforms if you yeah because if you know them it's easy enough to script and just concatenate exactly are always jokes that my voice goes as far as Saint Louis yeah so the point was that you see we could do something like that and what would end up happening is it would be yet another way that we are different and you're always learning whereas here it is just Linux so there are six ways to do it and we don't prevent you from doing any one of those ways so again the knowledgebase articles gives you some tips you can go come up with your own and so again the power of an open system now I understand that trade-off completely that's what I'm sitting here thinking is if you have somebody who's new to this having it all in one place where chunks are parsed out by the appropriate yep so definitely our knowledgebase location is wide open for people to go look at definitely suggest people taking a look at that and seeing what sort of examples so we've kind of hit on the first couple of things we're just Linux we're you know just a standard CLI you've got multiple things to add on the one I will kind of throw out there kind of in that knowledge base article camp is because we're just Linux so I imagine a number of us run wide-open servers that are accessible out on the open Internet whether they're VMs whether they're their own boxes one of the problems we're all going to have and that is port knocking or someone attempting to try and get into it so on my own boxes I happen to run fail to ban because it sort of works and does enough things it turns out that I'm working on I've got mocked-up I just need to write the knowledgebase article and then I think we'll do a blog post on it is you can install failed to ban on top of this or even better install fail to ban on the servers that live below it and have it login to the box to insert the rule that you want it's all just Linux so I found there's a one guy wrote exactly how to do the fail to ban to log into another server and there is another set of series of blog posts that someone was working on where they were going through and saying instead of just using this kind of crufty allow an automatic login what we can do is have a database that you go and put the entry in and allow that to go more elegantly go push it out but the point being is by taking these off-the-shelf packages you just can grab this and suddenly I now have a box that's going to do black holes in my network that my server is finding someone is knocking on my SSH ports and I want them to stop I'm not saying it's the perfect one for production but it's an idea to get people started in something we can just support directly so I do have a couple of real quick sort of sorry yes oh okay I also want to make a comment about the fail to ban and that one of these it's frustrating the wrong traditional networking gear is that you pretty much have tack ax or radius as your your Triple A component because it's Linux base if you want to use two-factor off and add to a security or Google Authenticator it's an app getaway right so that's very powerful if if you want to control of that environment you already knew that the next slide was coming so in one of our customer examples so this is a very large-scale layer three multi tiered class environment they had an existing fleet of Linux servers in their in their deployment and one of the things that meant is that they had their own homegrown tools that had existed over a long period of time and one of the things that we were able to help them do is because we're just Linux those tools the the relevant portions that were useful could be moved into being leveraged just on our platform because we looked like just the other Linux boxes in the fleet what was really interesting as Matt sort of alluded to is because they had been doing Linux for a long time they had their own approach for doing consistent authentication consistent authorization across all of the Linux boxes that they had because we were just Linux they could go take that off the shelf they're not necessarily based around Debian they were more sort of based around a Red Hat sort of approach and so it took a little bit of time less than a man day to go and actually put their own Authenticator on to the box that suddenly meant instead of having this draft that on tac-x or radius that's not a first-class citizen it's not the primary repository of authentication the platform could now fit into their primary authentication system and they no longer had to have this nonlinear scaling of something that very few people understood because it fit into their tool chain another customer example that we're working with that's kind of an pilot right now is the sort of not the web scale the the this customer is more a distributed Enterprise they have sites worldwide they're sort of the traditional wiring closet approach what they were running into is their existing hardware was end-of-life and it was also a stack-based approach so to our point to the question earlier say about chassis x' stacks are also very popular because it gave me one management node to be able to go and virtually manage a number of devices the problem is the stacks are not necessarily within the stack interoperable so you're tied to who you've deployed in that wiring closet some of them also have limits to the total number of devices that you can stack so their first approach was how can we do networking different that stacking is not a primary approach that they're going to do but then I have this automation problem as the flip side to it so we've been working with them on going and doing a decoupled model so they now have the selection of hardware platforms that they can mix and match local exit aggregation local access in the wiring closet use traditional protocols spanning tree inside of there too to be able to you know fail the redundant once I close the redundant links and only open them when a failure occurs we did an initial pilot with them we actually were able to go in early because the prior testing was doing really really well we proved out the solution and now we're working with them for one the next-generation hardware that fits their specific need is actually going to be ready we expect to go into a larger scale pilot with them and if we're successful then we'll get to move on to a production so with the customers who got out there I've been thinking about the questions Pete was asking about the multiple files and I was starting to think okay how do we configure this there's no CLI and then thinking probably my head's in the wrong space and I should stop worrying about how to type my config and start looking at different ways to put the config on there which leads me on - so what kind of things our customers doing right now to configure to manage the configuration of the devices are they actually getting in there and screwing around with a quacker comp file or are they being a little more mature than that so I think it depends on scale so if if I've got six devices in my network because my data center is of that scale I only need like two thousand beams so yeah the management of that is fairly easy and efficient right it's not a ton of switches and so we see those customers are mostly doing manual configuration or maybe they're doing lightweight but what I what I like to call sort of the network engineering approach so ansible is a wonderful network engineers approach to automation because it's how do I push a specific set of commands I want to go run on a box and I'm sure it does a lot of other fancy things I'm summarizing it from my perspective and I don't mean to discount that it can't do other stuff so there's those class of customers and so they're either doing it manually or they just have a note I mean everyone here that's probably had a notepad that you cut and paste on different boxes that's definitely an approach as you start moving into the tens of boxes or hundreds of boxes or thousands of boxes automation has to be a fundamental to make that work so what we see is customers either have their own in-house because they've been doing this for a good enough time in their own and Linux environment that we dovetail right in or they are already a puppet a chef or a noun scible customer to begin with and now what we've done is instead of being this graph that on looking different we fall right in it's just an interfaces file it's more complicated but it's just an interfaces file host names are set the same ntp all of those things and we fit right into what they're doing and that seems to be very popular as picking those tools up and leveraging it and part of the things that we do to make that better is we have a bunch of demos that you can run on our workbench that shows you how to do this and the the code is all fully accessible a it's up on github B it's up on the workbench so copy the modules print use the modules as the foundation for what you want to go build for yourself and we're very happy to see customers do that so just added a little bit yes obviously the the really big people that are playing out truckloads of gear Deut and lots of amounts of automation the the ones that do six I'm starting to see this trend where they're recognizing that Bash is your gateway drug automation they always known that they should automate things they've always known that the notepad approach kind of sucks so they're starting to do is put together the interfaces file but then they use there's this templating mechanism that in NIF up down to called make oh so they'll put it together as a mako template and then they've put have variables for the differences between the six boxes and so then they go through push that config file out edit the little variables on the box and now they have something that they understand extremely well added a couple change a couple variables and they're up and going interestingly enough that's something that kind of fits into with a classical operational mindset of touch every piece of hardware but the the abstraction of that is something that gets pushed out to through puppet or something that goes through an ansible playbook is a very very small step so I'm gonna skip over the one last customer I think this was I was gonna head it back to junior at this point so thank you I just wanted to thank you guys for suffering through us for the last two hours or so hopefully the espresso was good the the takeaway is I think you know the National I talked up front kind of introduction architecture that kind of stuff it's all kind of well and good the pieces that I really wanted to get your head wrapped around is the the most important thing to us as a company or the second tooth presentations or presenters that you heard Matt's perspective and David's perspective or really what drive almost every decision that we make here not them personally but the people that they represent it happened you know you know that's that's what we look at that dev that DevOps mindset that Matt was talking about it applies not just to customers it also applies to partners I talked about a little bit in my original presentation in that when you become a cloud admin when you were in when when this is what we're seeing of customers we're seeing four people responsible for standing up and OpenStack instance or four people responsible for standing up a Hadoop cluster they're not you know four at Network admins or a network admin and a server a before people your eyes are all admins you're all on the hook together what they tend to do is they need to pick an operational framework that makes sense for them some of that's gonna be based on what the latest cool thing on the web is some of that's gonna be based on what they know and are comfortable with some of it's going to be based on what they've license and is available to them as a corporation or a corporate standard so operational frameworks are very very widespread it's it's fundamentally impossible for an infrastructure company and when we consider ourselves grossly an infrastructure software company to come in and say this is the only way you're gonna do we've we've decided this is the cool way to do it you must do it our way we don't enable anybody that that way we enable people by saying look you have the opportunity to lay this in however you want and that's in the end in my mind that's the heart of that DevOps mindset is that the the network isn't the bastion or that you know the the gold guy on the hill the network is a means to an end in the context of building out a cloud and everybody this building that cloud has to be hyper efficient and leverage a recently common tool set you're gonna have some people someone on the team might be a network expert some people on someone on the team might be a storage expert but they all have to be able to talk common sets of languages across the board so that's kind of us in a nutshell so thanks again for coming by appreciate it do we have time for one more question when will you have this running on x86 server with DP DK it's did they pick up the question yeah so we're looking at that right now it's slightly unclear you know when you kind of look at that space in the market that that that solution kind of sounds like what is it when does the same become an edge router is kind of what the question is it ends up that that's one way of coming up with an interesting edge router we're working with a couple customers actually pretty broad set of customers of a slightly differently interesting way of becoming an edge router where they want hundreds of gigabits of bandwidth running through one of these you know kind of classical Hardware situations but managing a full BGP table and slightly creative ways that seems to actually be a pretty interesting driver as well and that's the one that we're working on as far as the DPD k+ cumulus linux it will happen I don't know the day probably sometime this year would be my guess at least in terms of dabbling and coming out but I can't make a commitment because we don't have it on the roadmap questions afraid of a second so just speaking from somebody I've done nothing but networking right so last year I took it upon myself to learn Python and rest and JSON allow itself so I learning Python it opened up understood a whole lot more than I ever did and it's so like rest and JSON I bailed me it seems to be pretty fast so bash is something I don't have any experience with and I'm sure there's probably gonna be quite a few out there easiest of all is it just jump in and do it or do it is there some online tutorials you can run through to kind of get the syntax and things sufer does your remote lab or do you guys you know yeah we have tons of that kind of stuff we have let me take three steps back so Bash is a user interface into Linux systems is also a programming language that's why it's kind of elevated itself up to being the user interface of choice for Linux systems if you're gonna go to a command line so it allows you to do kind of quick on the line scripts you can write standard scripts that are interpreted or you can write individual sets of commands the from the cumulus perspective if you want to understand how do you interact with a cumulus networks platform kind of Linux plus networking together we have this roughly weekly class called CL 101 it's a structure driven it's free you just sign up you come online you walk through so you'll you know kind of hear people talking about some stuff walks through some hand on labs get your head your head wrapped around it we have online labs you can do yourself kind of stay in play all by yourself reserve a customer workbench instance go through and set it up for a set of customers that we have actually reasonable the set of customers we have you know they have a networking team that want there needs to learn this stuff fast so we have these boot camps and jump start programs where we come in spend one two three days with them kind of walk them through something that's oftentimes crafted for their deployment let's say for instance it's a l3 BGP type environment so we'll go through teach you the basics but then we'll do through some examples and trial stuff around you know deploying IP v4 + BGP on in this kind of in this context the you know you brought I you said you know Python and rest and all that stuff and I said ouch and the reason I said that is from my perspective as a programmer I always kind of write from the easiest thing to the hardest so I start and bash if I if it gets too complex for bash and move to Python because too complex for Python I'll move to C++ or C if it gets too complex for that I'll go to assembly right and so so you know and the reason I'm bringing that up is you know it might be worth it for you as you're kind of evolving your head wrapped around this space to just go back and mess around with bash get your head wrapped around what you can do there doesn't mean it's the right solution for every problem but oftentimes it like what I find sometimes is I can start and bash I'll get the thing pretty close to done like not perfect but kind of sketchy and it helped me out in two ways first off is I know where I need to go what's the right language to jump to next and second off is I get to prove that my general thesis was about correct or flawed within one way or the other or a look at boot up scripts for examples and it scripts all that kind of stuff in news from that inch it's basically
Info
Channel: Tech Field Day
Views: 9,213
Rating: 4.8730159 out of 5
Keywords: Tech Field Day, Networking Field Day, Networking Field Day 9, NFD9, Cumulus Networks, Cumulus Linux, Cumulus, Software Defined Networking, Network Operating System, Linux
Id: _EUlu6gcQZg
Channel Id: undefined
Length: 30min 43sec (1843 seconds)
Published: Thu Feb 12 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.