Netdev 0x12 - Introduction to FRRouting - Tutorial

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

guess are we ready to go yeah that's fine welcome everyone i'm donald sharp work for chemos networks and i'm gonna talk to you today about FR Rowden i can make go to the next slide there we go so we're gonna cover we're gonna cover how far out and came to be and then some basic routing and then advanced data center out and and finally we're going to talk a little bit about where i hope to take FR out and in the future so what is it it's just another routing stack it's the same thing that you can get from all the other major vendors today development is supported by a large number of companies configuration is actually really similar to other vendors there's two different ways you can configure it you can actually configure from both a file and interactive CLI FRR runs natively on linux and many other platforms the the one caveat there is that all features actually only work on Linux um that's not true for all other UNIX is BSD and the BSD variants don't have all the features and and their networking stack is not newer as advanced as what Linux is fr Allen actually uses the kernel the kernels route and stack for packet forwarding where control plane we don't doing that um and finally we're gplv2 plus that's effectively means that we have a couple other license that are compatible with Fri me with GPL that are being used as well so so weird FR out and come from it all started around in 1996 with with a route and suite called zebra and in 2002 FR allen was I'm sorry quagga was forked from zebra and that continued on until 2016 where we forked from quagga and our first release was actually in January of 2017 with version 2 oh and now we're up to version 5 basically a year and a half later later so why do we do it we actually wanted a faster paced development we wanted the ability to to work with people in the co-operative way we needed a CI system that wasn't available and we also wanted the ability for people to work fast and make mistakes and correct them in a timely manner we also really want to community consensus on the the features and and how we did the work together and this is all under the auspices of the Linux Foundation all right so this is the basic new work that we've done the first five items babel yard your P n HR PD PBR and LD PD our new demons since we forked I'll get into those a little later what they are how they work home we've added large communities a bunch of upnc VPN support rpki which is a public key infrastructure for route Veritate validation in bgp you can do MPLS based VPNs VPN ver for out leaking we've implemented 5549 RFC 5545 so that's actually most of our work is actually in BGP and that's that is the focus of what most people are interested in actually currently it is is we've added a multi topology support and SPF back-off OSPF we've got experimental segments routing and also it also supports 55:49 we've got its sparse mode to PIM and we also added the ability to do verse both use in the device and namespaces the one caveat there is uh not all not all demons actually support Murph yet it's you can really use verse with BGP zebra PIM and OSPF currently and I don't actually have all the the new route and features listed you can go to the website at the bottom and see what we've added as well if you're interested so so where do you get fr out and we actually do most of work on github so you can go to a github Kham Kham i Ferelden FR are down on the source um we have a full we spend a lot of time recently working on our documentation so we've got a pretty good dev guide on how to build and and set up FR out and if you want to build from source we also have the ability for you to download RPM and deb deb RPM based in debian based packages for a wide variety of platforms these are the the packages that we actually provides you can download right now and the packages and i'm sorry distributions and versions that you can download the package in for right now so i list caveats for both bsd and solaris based systems kind of covered already but a lot of the new features that we've added do not work on those on the network stacks available on those platforms so with new features comes new dependencies on kernel feature versions so I just wanted to call out some of the the major dependencies that we've introduced in the last year or so year and a half if you're using verse and that depends on the l3m dev device which is for for the for for kernel for limited in for eight for full functionality I miss anything there David is that right yeah it's good enough so the okay that's fair territory David's right if 410 are higher for all bug fixes that were aware of currently BG bgp evpn support depends on the new NT f EXT learned net link message and also the ARP suppression feature available in 417 for the NT f EXT learned and 414 for our suppression tickets pim sm going we needed to actually add some RT and multicast excuse me RT and multicast net link messages upgrades so we can get a bit more data from the kernel and also we also needed to add a new call back via the pin socket which is IP m IP m GP message wrong verb hole to allow for l register packets to work all right so so once you've gotten FR out and compiled and ready to go well how do you how do you start it um it's it's pretty simple we've integrated with system control now you can just do the normal system control start/stop reload so I missed anything start/stop yep restarting reload it does what you actually think it should do the the one thing that's actually interesting that we've implemented is the reload feature if you modify the FR comp file which is the the configuration file for FR and then issue a system control reload it will take the the configuration file compare it to what you actually have runnin versus that file and apply the diff so you get consistent and here is the an example on the side here the system control start fr and and the vty shell is the interactive shell that people can use typically one of the first things that people ask me after they've got it up and running is what the heck's going on where did my login go that's been one of the harder points to get across if you're not sure where your login is going type show login inside a VTR shell and it will tell you where each demon thinks it's sending its output to it's actually kind of important because people can't find it sometimes then we actually support three different types of destinations you can send the output straight to syslog you can log it to a file or you can log it to standard out standard out really is only useful for a developer and you're doing some debugging it's not for production usage the other common question that people ask is why does my log file have no output we've actually designed FR out and to not be chatty unless the bugs have turned on this is mainly because there are a lot of platforms that f around is run on they don't have good hard drives or SSDs disk or whatever they are SD yeah sorry and if you're not sure what to figure out what's going wrong under v2i shell type debug protocol name and question mark and they'll tell you the different types of debug and you can turn on which will go to your lot where you set your log fault your login to go to that's one of the first things that I do I don't always remember all the debugging commands I just know that I can hit question mark will get which will give me help and that's actually true as an aside if you want help on any command question mark and tab are your friends so so how does that for our work we have a demon called zebra which provides the rib functionality and control of of what gets inserted into the kernel so so a routing protocol like BGP are SPF they all connect to zebra and they do things like install tell zebra to install routes and zebra takes that and talks to the Linux kernel via the net link bus to install those routes the one interesting thing about zebra is that we actually have two net link sockets once for just listening for new net link data from the kernel and once we're actually sending commands down alright so in the previous how did the backwards sorry let me go back slide my my presentation fee was not great so in the previous slide actually if you if we go if we looked at it real quick there was a zappy protocol referenced that's the protocol that zebra and that all the rotten demons used to talk to each other so so why didn't we just use a net link that's a question that I get sometimes too as well when zappy was actually first written we didn't have the net link protocol in the Linux kernel so we had to have something additionally net link doesn't provide the same data communication needs that a routing daemon needs things like redistribution passing around router IDs turning off BFD next hops and passing around capabilities if you're interested in figuring out looking at more you can go into libs you client dot CH that's where we've defined the the zappy protocol and and how it works it's essentially just a stream of data being sent from the route and demons to each other and it uses named named sockets and right now it's only it's only really used to talk from FR demon to offer our demon and we actually do change the format quite often so so and it's just something to be aware of alright here's just a quick slide to give you places that you can learn more about a ferrata alright so I'm going to talk about basic router next why actually I'm just gonna skip this line so so why why Robin networks today are becoming increasingly complex and just adding a static route isn't going to cut it for doing for connecting computers together and additionally networks are not getting simpler with things like vm's and containers you're actually not starting to need to have the ability to route between VMs on the same host fr route and who allows you to do that so surround allows us to control all this chaos that can happen in network and and so one of the really great things about this is that Linux now has the ability to interact with the entirety of the network via standards-based approach so if you are using a Cisco or a huawei or a wrist for route and you can put fr route and on your post or on your server and and connect to those machines and get routing from them one of the other subtle differences between the Linux kernel and zebra is this idea of a RHIB versus fib so a rib is a rather than from a base it's commonly commonly called a control plane this is what fr Alden does that's when and what zebra actually does zebra is in charge of the RHIB it makes decisions about what gets installed and how its installed and the FIB or the foreign information base is the data plane this is the Linux kernel and in general when you talk about Rawdon there's this clear line between the rib and fib and who owns what at the bottom here is just a simple look at how the kernel and FRR both show the same information so so I I'm sure you're all familiar with when you do a IP route show on the Linux kernel one of the items that gives you as a metric in the Linux kernel uses that metric to decide which route to Ford you decide to what how to brow packets based upon the metric of the best metric well the problem with metric is that every single route in protocol has a different idea of what a metric is so like for instance rip here uses hop counts and is only allowed values between 1 and 16 ERP uses a composite metric which is a u nth 32t and OSPF is a link state distance which has a range of starting at 100 going up so so how does zebra decide what's the best metric to use so so routing protocols or route and suites have this idea of this administrative distance we've decided a per re that each route and protocol has a different distance so if rip and BGP both install the same route into the into zebra zebra is gonna look at those routes and decide that the BGP route is better based on its this administrative distance value we've assigned to it lowers winds and it's and it's a valid number between 0 & 2 & 55 I on the right hand side is the basic values that F our route and uses for administrative distance this is actually also consistent with pretty much every other vendor out there it's there is no standard here is just everyone's kind of agreed this is what it to use all right so so how do I determine what to install this is just an example slide I wanted to show a kernel route the K there on the left hand side the 4 3 2 1 / 3 - route is a Colonel Rall and the parentheses 255 8000 hundred ninety-two is the admin distance and metric and zebra if I enter a IP route 4 3 2 1 3 2 I'll go on out interface E&P 0 s 10 that creates a static route in zebra so the decision zebra at that point time makes a decision that the static route has a better Advent distance and it selects it and you can tell that by the the greater than symbol that says a zebras decided that that's the one we want to use and on the right hand side is the colonel showing what the output of IP route show is all right so so with the Linux kernel anyone can put routes into the kernel and as a robin suite we have to be able to interpret those routes and figure out what Adam in distance they have so that if a routing protocol like BGP or OSPF or rip install the same route we have to know who wins so what we've done is that the kernel metric for routes that are not generated by fre Robin we interpret the kernel metric as a two pieces of information the high order byte becomes the admin distance and the low order byte becomes the metric and so so if you look at the exam peers we do an IPL to add four three two one three two dev en p0s nine metric four billion something and further right we have the the hex number that that translates to which is 0 x FF 0 0 to 2000 so FF becomes 250 and the metric of becomes 2,000 or 8,192 decimal so so why is this important so if you're actually going to be using verse inside of Fr out and you need the ability to terminate I should actually take a step back so so of the Linux kernel has this concept of table lookup and rule lookup so when you tell the Linux kernel to do a lookup I'm sorry when a packet comes in and it has to do a lookup and it goes ah I need to look up in table 1,000 and from the IP will show we know what table to go to look at and if I'm using verse I do not I was i if I'm using burst I do not want that route search to leave that table because the Linux kernel will if you don't have a default unreachable route will go to the next table in this case so it would go to three 2007 or 65 and then use rule 3 2006 to live then rule 3 3 2007 or 66 if it doesn't find a match and finally we're looking 3767 so what we've done is that we we've entered with the metric that we've installed the default unreachable route in in the kernel can be interpreted as a high end min distance and and because of that we also want the ability for FRL and to override that default route if some other router protocol wanted to put something in all right so I just kind of talked about verse and ephah Robin supports two different types of verbs there's the MuRF device okay I'm using I'm sorry I've skipped a slide here so there's a verse are separate tables for collection of different interfaces it provides layer three segmentation and and it also provides the ability for routing protocols to to install the same routes into different tables now why would you want to do this there's a whole bunch of different reasons but the main reason the main reasons that you you want to have the separation for some reason other say you have a telephone network and you have a data network you want them to use different routing tables I'm a provider and I have a whole bunch of customers that are connected to me and I don't want their route and those customers don't want their routing tables to be mixed and they don't want any of their packets to end up in the wrong place so that's what a verb provides you security segmentation I have a requirement from some security organization saying that these packets must stay on this network they cannot go anywhere else and the other reason for verse is I have an overlapping IP range from when two companies start to merge and it down at the bottom I have a link to two David a Hearns goofed Murph tutorial for how it's actually implemented in the Linux kernel so fr Allen actually has yes sorry just a question what said the difference between like practical difference between vrf and network namespaces so so the the practical difference is that a ver device creates a master device that you attach all the interfaces that you want to be in that verb to route lookup is done by the new rule insert into the kernel and allows and that causes the the murph device to cause the the FIB lookup to go to the correct table this also allows route leakin via just installing a route with appropriate outgoing next stop so lookups are extremely easy from a RHIB perspective and the other great so so the Linux kernel you specify next hops by a gateway and device tuple right and that can also you can also do just gateway by itself and a or device by itself but if I want to do route leakin across verse with the verve device I have to specify I have to specify an outgoing interface otherwise it will not work right all right so what is the namespace bye-bye you so the way namespaces work is that they you create the namespace and then you move your interfaces into that namespace route lookup is fairly similar it's the you look up in the default table for each namespace the real big caveat here is that route leakin can only be done by creating a bridge outside the namespace and installing taps into the different namespaces or I need to go to the front panel and where I've put those interfaces than different ones and plug the cable in between the two of them so that's how I can do leak in between verse and a namespace base that is totally heavyweight if I want to do more than a couple namespace more than a couple verse I all of a sudden have to manage a large number of bridges a large number of interfaces and I don't get easy route leaking by a simple route table lookup that the verb device provides you um so so again namespaces are like of Earth but not there are really heavy weight and I wouldn't recommend using them anything else yes sir hi as it possible to create the vrf interface directly from the FRR because i'm playing with it and i need to do it on system MD and then start there and then so that fat bill is not there right now and when we implemented the verve device the the verb in the verb functionality in FRL and we were under a heavy time pressure so it just wasn't done the cumulus where I work also has this idea that um you kind of render unto Caesar what what that belongs to Caesar and Murph's are an interface and interface control and creation is controlled by a peer out to that's kind of the decision we made David you know I do that on purpose other question is nuts related to the RF but featured demands for be FD is there a future for there's a pull request for it right now there's actually two different ways so currently you can get BFD integration if you go to the cumulus github page and download the PTM package and you know that kind of sucks for a lot of reasons right so yeah it sucks it's that the integrations harder there's a the gentleman named Raphael Zulema who's taken that PTM code PTM BFD code and has started integrated into a Ferrara and there's a pull request right now that were working on trying to get in so hopefully the next month or so we'll have BFD support directly inside of if i route it just to add some additional words so the request to add the ability for vo FRR to create VRS the decision at the time as I recall was you don't use FR r to creat things like bridges which are connecting lis devices for layer 2 so the decision was like well let's not do the same thing for VR F so that way you have an interface manager whose job is to create the different networking features like bonds Mac VLANs bridges burps and its job is to connect them all together so it's it is something that could be done because it's a routing concept but at the same time and see how the argument is that it's you're connecting a series of devices together which becomes more of an interface manager domain so that's that was part of the argument as well exactly render unto Caesar yeah at which point and then about the network namespaces when you look at what a network namespace is it's not just routing tables so the idea of ER F is just to have multiple routing tables which is essentially a policy based routing to say these lookups go to this table these like let's go to that table namespaces are a complete network stack separation fro so everything from your devices all the way up to your sockets and so that's why because it is separate tables you can use it for verbs and certainly many companies do but every single company that's done it has learned that it's an extremely heavy weight extremely painful solution for what you need for layer 3 which was the motivation for doing the vrf device yeah managing a bridge interface for every single time you go to a route leak is a huge task and keeping that right um so so the one thing I like to go back to can we create verbs inside of fr route and we just need someone to program it up so someone just needs to take the time and do it and and then we'd take it so so if anyone wants to do it feel free from my perspective sure that would work you're right all right so I'm gonna in this next section I'm going to cover basic topology and some basic configuration of some rotten demons I'm gonna use this topology for what I'm talking about for the next little bit there's five routers all connected um you can see the the the IP addresses for everything if you see see an IP address that ends in dot one that comes from our 1.2 comes from our 2.3 comes from our from our three con and so on and loopback interfaces are in the 192 168 240 range and here's the configuration file files I'm using for for everything all right so so what does bgp BGP is what is used to make routing decisions on the Internet if I'm connecting the company to another company or I'm connecting to a service provider you're gonna use BGP it connects autonomous systems together it has a policy language that is both rich and feature full to allow operators of very fine level control in general there's two levels of modes of operation for BGP I BGP and ebgp eyes internal e's external if you're doing internal BGP it's usually one of two things it's route reflectors or full mesh I'm not going to spend a lot of time talking about ibgp I'm gonna really spend more time talking about ebgp bgp is considered a path vector protocol it uses an AS path to determine routes what routes to install and we use the AAS path also for loop avoidance so if I get back an es path route i'll route with an es path that has my own AAS number in it I know it's a loop and I can't I know not to install it I'm also going to greatly simplify BGP route selection it's a complicated process there's a whole bar FC for it but the basic idea is what I'm gonna cover BGP is layer three only it uses tcp/ip for connections and the bottom two bullet points are the first one is a book by Dinesh dot which is BGP in the data center but it's a really good book that if you want to learn about BGP it's a great thing to read and if you really want to be kind of masochistic you can read the RFC but who wants to do that alright so how does a BGP determine a routing path so and this is a slide I want to show how our one would decide to reach our two from a BGP perspective so a side note here s5 is the autonomous system for r1 r2 is a is 10 R 3 is 15 and so on and you build the path using the autonomous system so if I'm on r1 and I want to know my route to r2 I need to compare all the different paths I have in the network so the first one is directly from r1 to r2 and and that's the a/s path of 10 and the second one is the the middle which is from a is 20 to 15 to 10 and that's the length of 3 and the last one is on the bottom is 25 20 15 10 so as I mentioned earlier the a s path chosen is the shortest one and in this case the shortest one is direct the direct connection between r1 and r2 and that's the route it would choose to get to r2 router would choose to install the guitar 2 so this next slide I wanted to show an example of ecmp so if I wanted to get to r3 from r1 there's two paths that would actually equal so the the top path which is 10 to 15 and the middle path 20 to 15 and that would be considered ecmp and we would install both next stops going from r2 and r4 to from BBB gp's perspective and the finally the third one is the 25 20 15 I'm gonna skip this slide this is the same thing showing how it works for r4 and the same thing for r5 is I think it's kind of self-evident you can come back and look at it later so bgp to set up you need to know two basic things who am i peering with and how how what how am i decide on what routes to include so that's a in a nutshell that's how you configure bgp you specify your neighbor so in this case we have neighbor statements neighbor one i to one sixty eight to ten dot two four and five and i have to specify my remote a s number as well BG by bgp by default if I set up neighbor relationships will not move routes around or pass them from neighbor to neighbor you have to tell BGP what routes to include a common way to do this is you say redistribute connected yeah and the network say yeah so you in BGP you either must use redistribute a redistribute statement or a network statement network saying it's only import exist in prefixes that are in the rib and if you want to turn that off you can just do no bgp network import check in any network you've entered will automatically be include it again I'm not covering policy its policy is covered by route maps and a s list and a whole bunch of other stuff but it's too complicated to really get into in a reasonable amount of time I want to call out the multipath relax that allows BGP to to use different AAS paths but the same length as ecmp and finally each router needs to have sorry and each router needs to have the neighbor IP address set for it to work properly so I kind of didn't show all the different setups for r1 to r2 at r3 at our forward r5 but it's basically effectively the same you have to create a neighbor statement and you have to specify the ASP if it's the the autonomous system of the neighbor so after you've done that you do a show IP route and you can see that we have a whole bunch of BGP routes all right so I've set up BGP and it's gone to hell how do I fix it I use most often show bgp IP for unicast summary and show bgp ipv4 unicast the first one shows me the neighbors that I have configured and the number of prefixes I have received from that peer and the second one shows me all the different paths and my the two most common debugs I used as debug bgp neighbor events and debug bgp updates my most common mistake that I make when I set up a BGP peer in relationship is i when i set up BGP is I forget to set the the neighbor address correctly I messed that up all the time it's really easy to mess up you know this is just some quick debugging examples and what the tables what the output actually looks like all right so so I'm going to talk about OSPF next OSPF is considered an interior gateway protocol it uses link state routing and what that basically means is that each router floods information information about itself and every single bit of information is received from every single other router to everyone else so so it is pretty heavyweight and the fact that there's a lot of data being sent if you just have a couple neighbors it's not that big of a deal but if you have a lot you start to send a lot of data around it uses I always miss this name Dijkstra Dijkstra's algorithm to figure out what to install into the rib it also uses IP or layer 3 to advertise router information from OSPF router to OSPF router there's two different OS PFS there's OSPF v2 and OSPF v3 I don't know why they did it this way but v2 is IP v4 v3 is ipv6 and I v3 also provides the ability to I do ipv4 and ipv6 but no one does that I don't know why for Policy control OSPF uses areas and I'll get into that a second but areas allow policy that's kind of beyond the scope of this talk and if you do have more than one area you must also have an area 0 I want to talk about how Dijkstra determines the best route to use Dijkstra actually has some optimizations for that are great for computers to make the UM to make it faster to determine I'm going to kind of skip over that a little bit because it's it ends to be the same so so the numbers between like so between r1 and r2 there's a number 60 that's the weight of the interface that we're gonna be using for Dijkstra and so R 2 R 3 is 60 and so on so from our ones perspective if I want to get to our to the best weight of all the three different paths is 60 so that's what's gonna choose so the second path is 100 + 5 + 60 that's 165 and the third path of the poem bottom is 60 plus 60 plus 5 plus 60 so the next thing I wanted to call out so so from R 1 to R 3 if you recall for BGP we bgp had a nice EMP path between r1 and r3 that's not true with OSPF with the examples here so our one along the top the weights are 60 plus 60 and that's 120 but all in the middle 100 plus 5 to get to our 3 is 105 so that's a better weight and that's what the route that will be used for most vs perspective so so so I'm just trying to call here a key difference between BGP and OSPF and how they determine routes this is just further examples of the same thing I'm going to skip it in interest of time all right so on the left here is all you need to do to configure OSPF router OSPF network 0 0 0 area 0 0 0 you put that on every router and you will have a work into OSPF network it's not great for more than a handful of routers but if you're doing something simple that's what you would use all right so I've configured OSPF and how do i D bug it I'll use show up the OSPF interface neighbor and database and route database router as commands to figure out what's going wrong and my two most common debug statements or debug OSPF packet and debug SPF NSM i mean it's just a quick again examples all right um next round in protocol and talk about is is is it's also considered an interior gateway protocol like OSPF it uses a link state and it floods all router information about every router it knows to every other router also uses Dijkstra one of the main differences between is is and OSPF is it uses layer 2 for connection between two routers and the product the protocol it uses a CL NP I don't know a whole lot about it but um its layer 2 it is is it's considered really easy to extend due to its TLV support in its packets you can basically carry an arbitrary payloads for Policy you can have three different areas level one two and one two one is just an individual area of a bunch of routers working together I use two to connect areas together and level one to let one to is for connecting multiple areas together so this is a basic example of is--is setup if you'll notice that and under OSPF you just have to have a network statement you can't do that news's you have to go to every single interface you want to be included in his and say use is is and finally you need to configure a router is his statement with a word and an arbitrary word like I used toes IVA here and that net 47 0 and whatever is a holdover from early is is where each and it it's a holdover and it originally you could use it to send information about how the the that particular router was being using the network and I think people kind of found it didn't really work too well but you still need to have a net statement and it's got to be a 220 tuple of data and and they have to be unique I don't know why and you basically need more config than OSPF because is as does both v4 and v6 here's an example of the is this configured and working and here's how I've debugged is this in the past show commands some basic show commands and the basic debug commands here's some sample output all right so so I've gone over bgp OSPF and it is which one should you use the reality is is that you use what's best or what you know best they're all complicated they all have their own pluses and minuses and if you're just setting something up for for yourself or for a small network use what you know best bgp scales better than either OSPF arises mainly due to the lack of link state flooding and finally BGP can also handle many a fee Safi combinations that are missing another routing protocols and it's very commonly used for both on overlay and underlay networks or VPN networks all right so so I haven't talked about a whole bunch of different protocols and that's want to mention them at least real quick so if you want to do non link-local multicast route and you're gonna use PEM if you're gonna do and I've lumped rip and EIGRP together because they're both distance-vector routing protocols most people don't use rip it's ancient not really great and the EIGRP and fr Allen is not ready for production use right now it works but you can easily make it fall over NH RP is really really complicated and it just has a special you case that's not really that common basically I want to have a one-to-many connection that's what NH RP is used for it's the dmvpn solution PBRs policy-based routing again it's limited use cases it's static routing with a twist you can do things like match on source and deaths and ports to figure out what how to route the packet fr Allen has Babel which is a wireless mesh in home routing and finally we have also LD P which is labeled distribution for M so so what does a fr Rodham provide it provides you the ability to run route an anywhere in your network so traditionally you go to your network vendor and they sell you a box that you go take to your network and you put it down and you plug your cables into it and you plug your hosts into it and the two don't mix and and I think we're discovering today with with the proliferation of host and VMs on host and containers and advanced route on use of cases it allows you to run your network and stack directly on your Linux box wherever you want and to provide the networking that you need wherever you want one of the problems that networking vendors have is that they only have control of the box that they sell you that far out unless you have control of your network everywhere the other great thing that it provides is it allows you to connect to those close source vendors just using the standard routing protocols that they are using as well alright so the next section is advanced data sent around them I'm gonna cover modern date and center architecture what it looks like RFC 55:49 and a very quick overview of bgp evpn so the modern data center is a cloths network it's named after Charles cloths some guy I think he worked at Bell Labs invented this back in the 50s or 60s so it's been around for a long time so so a class is typically a layer of routers or switches on the top called the spine with Leafs or tours underneath it that are connected and you can see each leaf is connected to each spine so what does that provide it provides known latency between each host and the in the data center it provides guaranteed bandwidth for your applications it's really easy to build because they were just plugging in and in doing the same thing for every single leaf tour and everything goes fine and when something does go wrong you've limited your failure scope so so typically what we have on the left is called a pod and that's a data in a data center that's basically just a unit of work or control and if I want to have more pods I just add another layer and call it a this call add another layer it add more spines and and call the middle layer leaves so what is the other thing that the data center modern data architecture gives you is ecmp I can do multi paths between all my different hosts and without using any layer 2 and that's gets rid of SDP which is extremely extremely um I'm sorry SDP we will shut down links between different redundant links and without limits your bandwidth and finally um a mag that is used in the data center is extremely hard to get right so what is RFC 5549 it's the ability to have v4 routes with v6 next hops so think about it from a I have a v4 pod and I want to get somewhere else and I can specify how to get there with v6 next hops so why would I want to do that well I don't have a lot maybe I'm running out of v4 address space and I'll have a very limited well this allows you to get around that it's really ideal for data center point-to-point links this allows you not to have to put a v4 address on every single interface in your data center and it's also commonly referred to as unnumbered so in the in this example here if I put an ipv4 address on every single interface on between the spine and the tours I need to have 45 ipv4 addresses and if I convert down if I convert to using 55:49 I can reduce the outer space to 9 IP addresses in this particular example I just need one IP address for the loopback on each of the spines and one in one for each of the tours now obviously if you have a bigger data center the ratios changed based on how you fill out your cloths all right so so the unfortunately the Linux kernel does not I'll really allow you to do v6 next stops currently it's something that's coming right roopa soon so so how does how does it work we have to actually keep we have to derive the MAC address of the peer via v6 router advertisements and so what we do is we get we create router advertisements and from the peer it will give us its link local address and from that we can derive the MAC address and using that we can insert into the neighbor table a IP address of one 6:9 254 0 1 the interface and the MAC address and I can use that as a next op and at the bottom of the slot here I wanted to talk a little bit about how this actually works from FRS perspective so I have two BGP I have to tell it to use on number and what that causes bgp do is send a message to zebra saying hey could you turn on router advertisement for that interface zebra says sure i can do that and it sends a router advertisement so i kind of have the arrows going to the right from zebra to the colonel I kind of really mean zebra to the network but from zebras or fr Rawlins perspective it's sent out of socket and it's taken care of by the colonel that's why I've kind of left it there I don't want to give the impression that it's actually really talking to the colonel so I I sent out a router advertisement eventually I'm gonna get a router advertisement back from a neighbor out that interface and at that time zebra does two things it installs that neighbor entry that we just talked about the 169 254 0 1 and the MAC address and it also tells bgp the link local address for that interface b gb takes that data and forms a TCP connection you use the link local link local address of the peer and at that point in time it does to the normal BGP mechanism of gathering ringing the neighbor up and then gathering routes from it and then installs the route and zebra will see the v6 next top that BGP installs and go oh that's a I should use the 169 254 0 1 next top instead so you're looking at me funny ok alright so so how do I set that up in BGP it's actually pretty easy at the top I have the interfaces and if you note they don't have IP addresses so we removed those and the second thing is the neighbor statements have changed subtly and I have neighbor interface the interface actual interface name in the keyword interface and then the remote is external that's I wanted to I'll talk I'll come back to the external in a second here but the neighbor statement becomes the interface you want to use the remote we've added a feature to Fr Island called remote is internal/external and what that basically means is that if I if I write internal I expect the autonomous system from my neighbor to be the same as myself and if it's not I'll reject it and the second one is external if I receive a different s than mine I will accept it and reject the same so so why do why is this good well from a configuration perspective and a data center it reduces my config to something I cut and paste I can take that config and put it on every single router in my network fixing the router bgp correct a s number if i've changed it but it becomes incredibly simple it's simple since it simplifies my network I've reduced my errors and I've got this cut and paste configuration across a large section of my datacenter so I wanted to call out the subtle difference from over here on the left the the fr Island perspective of the route and the kernel perspective of the route so we have the BGP route 4 to 42 and you can see it's got a via a v6 address and on the right we've installed it different with the 169 254 0 1 and the Deb SMA p1 and if you look at the bottom here we have the neighbor table entry for each of the different neighbors I've specified and the red one is where is the the SW p1 one all right so we also have the bill to do OSPF unnumbered I actually kind of hate the name it's it's awful it's not unnumbered it's the same ip address everywhere so so BGP I didn't have to put an IP address on the interface OSPF unnumbered i have to put a slash through to on an interface I can put whatever slash through to a number I want to put on just as long as it's the same one on the left is the configuration so so if you recall earlier it was router OSPF network 0 0 0 err is 0 and that was it well to do unnumbered OSPF I have to change to configure each interface as a point to point and specify if it's in what area is in and over here in the middle here we can see that each interface has the same IP address and the bottom is a quick documentation on how to use it if you want to if you interested for yourself for further and this is just the results of unnumbered alright so why do I actually need l2 on a data center there's still a lot of legacy apps out there that still need layer 2 connectivity I'm doing link-local multicast typically for service discovery maybe I only have the MAC address of whom I need to talk to or they're not using tcp/ip in a data center typically hosts under a tor are can only talk to each other but maybe I want to have hosts under a particular tor talk to another set of hosts under a different tor so that's why I need the ability to pass layer 2 across those and finally there's an assumption that IP addresses stay the same even when a host or an endpoint is destroyed and recreated elsewhere in the network this is especially true for VMs or containers so how do we solve that we typically solve that with VX land encapsulation it provides layer 2 segmentation over layer 3 network the l2 over l3 network allows you to leverage all the data center links and you have ecmp paths to get to to the host you're interested in it VX LAN encapsulation is just a bunch of tunnels that effectively write Rupa it's it's a lot more complex control plane but allows a lot better higher availability all right so in this example I'm gonna rework the topology a little bit but the thing I want to call out here is that r1 and r3 become the Leafs or tors then a vehicle in this vehicle and network r2 becomes the spine and r4 and r5 are going to be configured with the same slash 24 network here's my interface configuration our 4c has 192 168 244 and r5 has the I'm sorry but 214 is what I'm really interested in here 192 168 to 14.4 and the 192 168 to 14.5 so how do I create it you create a VX LAN interface and you create a bridge that connects the VX LAN on r1 and r3 all right so what is bgp evpn it takes VX lan networks and extends the control plane across the the the data center so it's layer 2 networks across layer 3 underlay networks it's a unified control plane and so the way it really works is is it distributes the mac addresses around the data center so in the picture here the MAC addresses for a b c d e and f would be distributed to all the leafs here and if for instance a wanted to get to I'm sorry if II wanted to get it to a all I would have to do is just send a packet and the MAC address would allow you to get there there's a evpn in the data center book that you can get for free if you go to the key most networks that you can sign up for it and it says you know you give your email and they'll give you a book for free but it's it's a really good book and then really explains how a VPN works in the data center roopa also has a from a previous net Deaf conference her evpn or Linux bridge tutorial that I've linked here as well so so evpn setup is actually pretty simple for a basic use case the only thing you really need to do is add and add this new idea of an address family and it's the address family l2 VPN with the EVP and Safie on r1 and r3 all you really need to do to get it to work is specified advertise all vni so you notice r1 r3 habit and r2 just has the is a spine and doesn't need to have that BGP also has this concept of not using an address family unless you tell it so you do need to specify that you want to use a particular neighbor in a particular address family or that will not work so this is a the I've configured BGP and just some show commands and the the big one here is the show BGP L to VPN evpn there evpn has five different basic route types in this example we're only using route types two and three route type two is the MAC distribution between via VX LAN across the network and type three specifies how to handle the broadcast unidentified in multicast traffic across that network alright so where are we going with Fr on currently the main big thing we're working on is a zebra rewrite I think it was mentioned in an earlier talk today that David a Hearns working on rewrite in next hop groups may have specif breaking out routes and next hop groups in the most kernel well that it's going to bring a new net link protocol messages that we need to to actually use so that's something we're working on we're actually doing another CLI CLI rewrite again to take advantage of net coughing there's a a pull request semi-active pull requests that started that will start that conversion that's going to be happen in the next couple of weeks hopefully and the other thing we're actually thinking about doing and some people have actually started doing is full remote data planes and what I basically mean by that is that the Linux kernel would no longer be the FIB it would be a entirely different something else so you can basically allow you to install routes on to another device anywhere as long as you set up the appropriate communication methods between the different devices and people are come in with people are asking for a whole bunch of features and we kind of need people to work on them so if you're interested we have a whole bunch of feature requests and finally the the last thing or is we're going to have type 4 evpn which is multihoming alright that's it thanks yes sorry questions what do you want me to go oh sure yep it's gonna be built each each so effectively now when we did the Sealy rewrite year-and-a-half ago or two years ago now we had to go through and touch every single freaking demon and rewrite anything that's gonna be the same thing its products that more invasive and intrusive so we're so the goal from my perspective is to provide a middle API well you do because because we want to take the CLI and pull it out and use that middle API I don't believe that one the goal is to provide a seal an API for configuration between each demon and and and so so FRR is written with a very tight coupling between CLI and and implement it and we have to rip that apart put that metal layer in and then once we've done that you can have the net coffin yang talking through and then or in the reality is anyone who wants to write their configuration du jour they can just write to that API and things will just work you said you had two more questions did you so so it's the fpm Fordham playing manager that's its well so it's gonna change and I think we discussed this as well but to give everyone context I want to have we're gonna rip out so currently zebra talks to the Linux kernel via synchronous methodology and it so basically what that basically means is that we install route and then we wait for an answer so we need to rip that out and make it so that you say install this route and at some point in time in the future we get the notification back saying it was successful or not and and so so for every single type of thing you want to do to the kernel we have to do that change it from synchronous to asynchronous clearly define that API and and then program everything to it so so one of the real problems we have is that the different kernel interfaces are tightly coupled with zebra so it's just again ripping things apart putting the middle layer in and then program into it does that answer your question and so okay I'm sorry so to to to further extender but you know that so so so so we rip it apart put that metal air in and I want to be able to write a whole bunch of modules that I can pick and choose odd that I want to use the Linux kernel module clip I want to use afford and play manager that someone wrote bloop and they just have to write to that API and respond to it and pass up it worked or not there's so so there's there's ongoing talks about how that is properly done right now and there's some contention so you know fpm users don't like right did you know that it does there's two different ways you can use the fpm there's the net link and there's these other protobuf whoops sorry that I'm sorry but can you you speak so soft I have mine I'm going deaf did you could you come up and usually so Volta today so there's a guy from Volta networks who's working on and he submitted a pull request a couple days ago to start this ripping apart I've actually started working on it myself and over the last six months or so I started that work the synchronous asynchronous there's just a lot of work there it's there I'm not aware everyone using it there's a lot of you know it's with is with most things there's a million little features of someone's throne attic yeah it's it's probably been two years three years since it was put in all of them there's a lot I mean we have probably close to 30 feature requests that people have asked for I'm not I basically tell people sounds awesome bring me code yeah you can't write so and and you know I work for cumulus and I do what is best for cumulus and so on with an RFC that matters for a non datacenter usage I'm not gonna spend a lot of time on any other questions all right thank you

Info

Channel: netdevconf

Views: 8,699

Rating: 4.884892 out of 5

Keywords: netdev, netdevconf, netdev 0x12, linux networking, frrouting, frr, linux routing, frr tutorial, linux, OSPF, RFC-5549, EVPN, BGP

Id: NxP9lBvoawE

Channel Id: undefined

Length: 74min 3sec (4443 seconds)

Published: Sun Aug 12 2018