BGP Made Easy

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

oh sweet okay so it looks like everybody made it in from breakfast so welcome to BGP made easy my name is John and feel free throughout this presentation if you have questions about something step up to a microphone ask the question I'll try to fit you in I do this presentation a lot at Namek on the road it's usually a hair shorter than this so there's plenty of time in the slot for questions to be asked and answered I very much encourage that that's one of the my favorite things about nanog it's that it's a learning experience and if you have questions you should ask so a lot of us run BGP you know where you're use it to multi-home on the internet but what is it well if you're at Nana guide very much be remiss if I didn't give you the snarky answer and tell you to read RFC 42 71 because you know the short answer is always the correct one especially for the microphone in a log so this is one of the things that I like to tell people to do but really what is this exterior gateway protocol and it's the only one we use it's used on the public Internet and it's used for inter autonomous system routing between autonomous systems the in an autonomous system is a discrete network what does it do it signals the path to every destination on the Internet in the case of using internet for BGP or BGP for the Internet most major search writers don't contain a default if they do contain a default it's towards null zero if they drop it that's that's very common you know the default gets very nebulous when you have a lot of egress is for your network so because of that every router on a major service writer typically knows where every internet test station is and on a big provider they often have a very complete view of their internal routes to not just the global table and that's required to make a router decision that's intelligent and not carry stuff randomly across the country BGP learns multiple paths and this learning and multiple paths is complete in that it learns lots of them but it only retransmits it's the best path so that's one of the things that people have to think about when you're using BGP as a given router may see six or eight paths to something or even more but it only redistributes its best one there are some ways around that like BGP ad path things like that but they're not in wide use on the Internet so typically a given router has one path that it likes towards the destination that could be over if it's internal it could be you know towards a loopback of a foreign address that has multiple paths between the two but often it means that you know all the traffic towards a given destinations going one way why would you run BGP well there's a lot of obvious reasons multihoming and provider identities probably the most common we're seeing it pretty often these days - with corporations and IP VPNs so they can control their own routing for private networks that's a quite common use case these days well that's not the global internet table it's still a table that needs to be managed and you don't want to import or export the wrong routes because you could break your other locations that sort of thing um why else you do it well equipment and port redundancy that's also a common reason this internet access has become more and more important people don't want to have one piece of equipment be their souls point of failure or one port or one circuit be their point of failure for their access to the Internet and BGP is how you do that with a third party you can't run aways pf' towards a third party because it's not secure for anybody who's used like a nigp like OSPF or is is you can't filter the routes went into order out of it very easily so it's not really suitable for use between networks that aren't managed by the same group well the third use case is really peering this is typically larger a SS but um with the advent of lots of internet exchanges around the world a lot of small ISPs enterprises are actually participating in peering directly that most commonly comes in the form of a PI export and probably remember the discussion from yesterday morning if you were in the the opening session about I exes I X's range from large to small but everybody on this week's BGP and whether you've you know joined am six with a 8 by 10 gig port or you've joined you know a little ISP and like Denver or something with a one gig port those all take the same fundamentally the same routing policy requirements you need to run BGP you need to make agreements with your neighbors with the other people on the i-x and configure those neighbor sessions um the fourth reason people really run this is connectivity quality a lot of us these days are looking for better and better quality connectivity of the Internet this might mean most commonly like a cloud service or something like that say you're an enterprise and you want to establish a peering session you might join an IX establish up here in session just because a couple of your cloud providers there and you want to make that traffic go directly or you might speak to multiple providers for quality reasons you know one provider has a bad route to a destination you commonly go to have to get another provider you both get redundancy and you can manipulate which one is used the last one that I actually just recently added to this slide that I think is really important now with the Aven of ever larger DDoS attacks is that you can do remote triggered black holing with most providers now and that's where you announce a specific route like a slash 32 single IP and before - the provider and they drop traffic destined for that prefix so that's a really important feature if you're under attack and it gets too big for your circuits there's obviously third-party mitigation you can engage if you need to stay online but sometimes what's being attacked isn't actually a particularly important device and you want to keep all your other services up so remote triggered black hole in something that's good to have in your toolkit of options well how does BGP work I think somebody before we came in today was asking again talk about what is an autonomous system the answer is yes because I think what an autonomous system is is not always clear its people BGP divides networks into autonomous systems I'll get to what exactly that is here in a second but that is a logical way of thinking of either one router group of networks there are group of routers what does it do it exchanges routing information to build a global table that forms the internet exchanges the routes you advertise and they stay and vice versa and most importantly for a edge of one's own infrastructure it allows application of policy and this is really the most important aspect like an IGP doesn't allow policy to be implemented it lets do implement metrics but it really doesn't implement policy and policy is very important for us in the ISP world because it makes the network reflect what our business decisions are and and I think this is the part where we alluded to a little bit yesterday in the opening where business and engineering kind of mix you got to make your engineering and reflect your business practices but before we get to that and system is typically a single network and it could be one router it could be hundreds or thousands of routers but more importantly it's controlled by a single administrative domain that's really where this definition comes in it's all connected together in one infrastructure and it's one administrative domain that could be lots of people but they're all working to the same policy and you see this globally you have a Tier one it's all one AAAS it's all one NOC it's all one engineering team those decisions they make are global they're not unique to like this Seattle pop because the network across the entire footprint operates the same way and that leads into a common routing policy the common routing policy means that they all behave the same way their interactions with other providers behave the same way in every city and those are identified by an ASN and that's a globally unique number assigned out just the same way IP addresses are through the hierarchy of rars from ana and that's that's very important that those are unique because people tie your routing information to them although just like IP addresses you remember there's nothing protecting somebody from using your ASM that's totally possible at the moment because people don't do our PKI that's a different presentation but very very important to note that this is just like the rest of rowdy and entirely honor-based how do you exchange routes at bgp well exchanging routes is done by peering peering does not appear in has multiple definitions and does not mean necessarily free and this is probably one of my biggest pet peeves and routing linguistics here is that purine is both used to describe settlement free interconnection and a BGP peer which are not the same thing a BGP peer means that you established a routing adjacency with somebody it can be for money it can be for free it could be over an IX over a dedicated circuit it could be over any number of media but that does not necessarily mean it's free and that's an important distinction throughout this talk because I think it's confusing that we use the same term unlike an IG P neighbors are set up on both sides policy is applied and executed and and this is really important this is an affirmative neighbor setup there's no neighbor discovery in BGP and the reason for that is you don't want neighbors to be created dynamically you want to be able to create them on a one-off basis think if you're an IX it may seem convenient that you could plug into the IX and peer with everybody magically but you actually don't want that if you're a big Network you want to understand who you're connecting with when you're trying to get up what capacity implications that has and what performance implications that has so the only exception that would be of course route servers but route servers typically tell you what you're doing and they're optional join at most I axles so that's the static configuration and that's what makes BGP unique and that's really what defines an external gateway for an EGP and an external gateway protocol because you need to have those statically confined okay so policy what is operating the policy there's operational policy you know I put this in the category only accepting certain routes you'll need set out so you should be getting from somebody and this really applies to customers and peers but you know a lot of people don't actually apply route filtering to their peers they trust them and if they don't trust them they just don't peer with them increasingly it's probably a good idea to put some sort of filtering on your peers if you can but generally given the number of routes involved that means you're gonna have to do it in an automated fashion and that would be generating prefix lists based on you know programmatic ways of doing it like a routing registry and generating the prefix list you should see from an appear that way it also lets you match on communities or set communities and this is most commonly used for things like water note black holing or if you look at like the transit guides for most major to your ones they'll have options to like creep in once twice three times you're done ounce to a specific Pierce or other types of policy you might want if you have two or three up streams and you want one of the other tier ones to return traffic a specific way you can generally influence that using communities that's specifying this operational policy and this one of the real features of BGP really that takes a second seat to business policy and I hate say that as an engineer but it's really true we are in the business to either make money or if we're a non-profit we're trying to be as efficient as possible we don't want to take routes that we don't get paid for or otherwise compensated for across our networks that's nobody here is in the business of taking something from somebody carrying it across the country and hand it to somebody else for free you know we'd love to think we are sometimes but that's not definitely not the case the typical problem would be you if you're a small ISP you get you know global table from transit you don't any how might have to transit so you don't want to reenact it from one transit to the other but you will announce your customers to both well that gets really tricky if you use sort of the basic knobs on BGP which are you know AAS path filtering prefix lists and other things like that at egress that's probably in at least my operational experience and I would wager this is mirrored by many people who have worked it bigger providers is the most common problem we see with our small customers speaking BGP is they're filtering using one of these methods that doesn't really scale to have any down streams and then they add one and when they add that first downstream you find route leaks or in that downstream customer if there's cancels you notice that you're getting a route from your customer that's via their other transit from their former customer because it was all filtered with prefix lists and that obviously is bad business policy you don't want to pay one transit and then send data from your other transit out to ports you pay the other really important piece of this is you know route leaks route leaks are real bad there's been if you search in there's been many instances this if we search in the nanog archives the mailing list you'll find large and small ones for years and years and years it's it's been a feature of the internet as long as we've been running BGP um basically from you know the like the es700 7 incident a long time ago that basically broke the entire internet - to more recent ones like Global Crossing accepting lots of routes from PCCW which you guys might remember that meat makes algae's rather most likely makes it how dish for you is the entire Internet or a big portion of it tries to go out to another big portion of the Internet via your network you can almost guarantee that you're not big enough to take that traffic load or near nor is anybody near you that's that banks outages in both cases it over went runs all your links with traffic you're not getting paid for you know if it doesn't fill your circuits you're wasting a whole bunch of money that's not something you want to do um and the other problem is is that you're you know black holing stuff that the rest of nanog shames before and God forbid you're on the Nanog list and everybody knows who you are and then you're a s does that because you will get a very large amount of contacts and I'd I don't think the least speak from experience on this but I have watched it happen man ugh it's not quiet when somebody makes a mistake it's since you likely have guessed just even if you're new what types of peering relationships do you want to enforce I mean this is really what it comes down to you from a business standpoint and typically you have three most networks have transit and I'm excluding to your ones was I'm assuming you probably already got your BGP well in line if you're Tier one but if you're not you're when you have transit you might have peering and you probably have customers well those have three very different business implications the first one transit costs you money I don't think I'm just speaking for myself when I feel like transits the last resort if I have a route that either is free or is a customer route ie they pay me I'd way rather send traffic to those routes so typically you want your network policy to mirror your business policy and I like to order it and in hierarchy of where the money flows transit being I pay them therefore it's the lowest priority from my side um like I you I for this particular presentation I use what we use in in our network for local preferences and so in this case it would be 50 95 and 110 local preferences are totally arbitrary hires a higher preference means it's picked first it overrides a s path length which is an important detail if you're talking to people about why their route was preferred but I don't care what the path length is if I get the route for free I'm gonna take the free one over the paid one and that's a very common policy of virtually all networks dear ones do this too they they take customers over peers in virtually all cases often you can influence which one of these tiers you get with communities if you really want to use the transit as a backup often you can set your routes to be typically they'd say something like lowest possible or something like that and in a lot of networks that's studying below their peering and below their transit so that unless they don't see the route from anywhere else they'll never use the port and that's a backup only so sometimes you see it as last resort or backup only preference level and that's pretty commonly used for people that want to have you know say a smaller circuit for backup but they want it to be there and they wanted to take over but not have any traffic the rest of the time well and I'll warn you this presentation is rather Cisco centric a lot of this could be translated to juniper though but it takes a little bit of work because the syntax is basically completely different than iOS I do show some iOS X are examples in here as well as we keep adding more platforms to the global Internet this you have to kind of translate this for each platform because all of them have slightly different ways implementing what was route maps or route policies on iowa's XR or and I think that's that's an important caveat here but a little bit of googling can help you figure out how to do it you have to have a fair bit of programming understanding but it's not complicated it's just if-then statements so it's pretty easy in iOS previous lists can be applied directly to be vacant p r-- I do recommend this for customer-facing ports because iOS is not a default deny platform it's a default allow platform so typically my bet is always to put the prefix lists in multiple spots on your configuration so that when the Nok tech removes one of the lines inadvertently you can't accept a full table from a customer and renounce it that thankfully is going away as people deprecated iOS from their core infrastructure and Juno's was smart from day one i OS x are finally got a clue about denied by default there's no policy it doesn't do anything as opposed to if there's no policy it does all kinds of bad things which the the subtle laugh you get from people or people who have experienced that problem which oh my god the network blew up cuz this line of config didn't exist as opposed to nothing happened because this line didn't exist which is a very drastic change on iOS you'd use route maps they can match all kinds of stuff most importantly they can match communities and by way of community lists and that is really important because that allows you to move your filtering around iOS XR replaces all this with route policies I've always said that the best way across the network and this is typical of every major network in the world does this is to filter with communities and that's to filter all of your egress announcements with communities you take and you apply when you get a route you classify it you apply a community which is just a tag it lets you then explain you know to your routers basically where you got the route from or what they should do with it even if it's the same route so a classic example of this would be actually I had a customer email about the other day they're like well your other are my other upstream uses you and peers with you but I don't if I don't have transit for them I don't want you to renounce the route you get from them by appearing and I don't not on the transit port and I'm like that's the default behavior and it took a little while explaining that that's actually implemented on a per session basis because the appearing session a settlement free peering session will never get reanalysis of transit because doesn't have the right tags on it whereas a customer session we get renounced to transit and peering because has the right tags on it and a higher local preference but that's a great example of where this kind of policy gets applied filtering only ingress works great for both single router ASNs and multi thousands of router SN's this means like you still can prefix your list your customers and you should and I would encourage everybody to prefix filter their customers and max prefix limit them to so that they don't for recreational purposes which seems to happen a lot on the Internet these days you can filter at that Custer ham they or at the customer edge you're not can update if they want to add a prefix they only have to add one prefix list then anything you get by that such a new tag with your communities that says to your network hey this is a customer I want to announce this to transit I want an ASUS appears you have to design the policy in such a way where you don't do things like just announce the transit not to peers because at least if in our case we would not want a customer announcing to us that tells us not to also announce to our peers om or our customers if they're announcing to our transit because you can get some very creative ways of customers black hole and themselves especially if they do stuff like do not announce to your customers and you have single home customers that don't have a default that gets real fun so you have to craft the policy in a in the same way but you'll notice that if you look at any of the major networks and in the world their policy is quite similar the numbers they use are different but the policy itself is quite similar the biggest advantage of this though is that really lets you scale you could start at one router and you could keep adding more you don't have to change the configuration you might add more communities or time to support but the beauty of communities is you can have multiple so you can keep your old ones even if you want to change them output leave your old ones on add some new ones rewrite your policy change your ingress room of your old communities it's totally safe you could do it as long as you're not deleting your entire config in the process on iOS you can you can do that in real time and can be done even during the day if you plan it well the and in case the other side benefit of this is the egress policy facing your peers and transits can be set up to deny by default and if you didn't already get it from this talk you know one of my favorite things is a deny by default policy you know I think all networks should if they don't know what to do with a route they should definitely not send it anywhere that is the preferred outcome and and I think that's the way most transits are now working you know at least in our infrastructure what we would do is if we don't have a tag on a route it literally goes nowhere it doesn't go to customers doesn't get a transit doesn't good appears and that way when some of they turned something up that wasn't intentional or whatever because that in theory shouldn't happen on a well-managed Network but still somehow does sometimes those routes then do not escape our infrastructure and do not cause havoc on the Internet hopefully they're not duplicate 'vu needed internally but more importantly they don't make it out it also does another thing that's really cool inherent in that is when you speak bgp with a customer and you want to send them a full table you can send them a full table that does not include your internal routes and this is another thing that we see very commonly with small providers is that if you take and look at a full feed that they send you it typically includes more specifics out of their own routes and that's because they're doing a prefix match and sending everything out that matches a specific set of prefixes instead of saying hey we're only sending our aggregates because we're matching them with communities the same stuff we send our up streams and for larger networks as internal routes can be quite a large number of routes and you don't really want them in a customer's table because if they accidentally really come to a transit the more specifics will take over which is very bad when that happens and yeah because customers a lot of times our peering BGP now with things like firewalls you know if route leak happens on a firewall that's that's quite bad for whoever's route was being leaves because almost certainly it's gonna suck the traffic in drop it and not send it out the other transit port so it's even worse than the saturation or double paying example it actually blackhole's the traffic um any case what else can they do communities can do all kinds of stuff you know at least in our infrastructure we use them for remote trigger blackhole and we indicate location we got the route from internal routes are indicated by Metro region all kinds of cool stuff this really helps as you scale if you if you're a larger provider and you're working with CBN's often they want data on where did this route get originated from where is the customer that's in this IP block well that's a pretty important thing to give them because it results in direct customer experience and you know what we're all here for is if you have people keep paying their internet bills or paying for their hosting services or whatever we didn't have that none of us would really have jobs so performance is king and it's also a good way to work with these partners so that they know where to send that traffic that's easiest for your infrastructure you know if you're talking to a CD ending like in our case in California and Washington I don't want them to send me the traffic in California for my Washington customers I want them to know that that's coming from this pop and should be served off this interconnect for it and you could do that with meds and things but it's much more positive if they know like hey that's originated from you know Tukwila Washington because that pop has that code and they can just achieve graphically map that it also helps sérgio location which is it's pretty pretty key for a lot of these providers now um triggering DDoS mitigation that's probably the newest one that's really come up is that the DDoS mitigation systems now do diversion routes and those those routes I don't know how many of you are familiar with how most of the mitigation systems work but typically they inject a route that could be a slash through or could be whatever is under attack into your BGP table internally and they pull that traffic instead of towards the customer you pull it towards the scrubbing boxes to scrub that and you may do that at lots of locations in your network well you still need to route that traffic back to the customer eventually and a popular way to do it is to then inject it back into a vrf and that doesn't have that diversion route and then pop at using MPLS to the destination and at least that's what we do in our case and I think that's that's very common but it's a way to do it all with BGP and and doing it with BGP makes it so it can scale this can be done in automated fashion nobody has to configure something it goes back to normal when it's turned off and you can do it to many many destinations at once there isn't a particular another another good way to do that I've also seen BGP used for a number of other sort of custom applications and communities to watch that you know a lot of people use it for any casted services you might want to take those with communities so that other particular devices around your network could watch for the service specific hosts that kind of thing in case I'm gonna switch to some examples here I'm from 11 404 so I use our examples in this case this comes back to that policy we're discussing where you know you you don't want to announce to transit and peers but not customers so you can kind of see how the communities build on each other there's announced the customers only community there's announced to peers and customers community and then there's unannounced to all three and that 993 community would be what we would use for transit for customers that needful transit that announces to everybody in Reverse we build what we send to customers by a combination of those three plus this summary communities for all transit in all peers and I think this is a pretty common way to do it this eliminates all your internal routes from the feed you would give to customers and makes it a very positive thing the only real way to break this in a bad way is to have somebody turn up a transit er appear without a policy on it that you then set up to prefer but they don't apply a community to it is then you don't rien NE buddy this is probably a best example of where this and you're wrong but also a great example of where you should be fully templating and automating your config if you're turning out new peers and you're creating new route policies for every peer you have a problem either you're not grouping them and we're not automating or not templating or all three that's that's a policy thing these days you don't want to be curating custom policy on the fly it's a it's a area for mistakes but at least this way if you create one that's mistaken it only breaks you it doesn't break the rest of the internet if you want more examples about this we have an engineering website that literally has no graphics it's very not appropriate at that URL kind of gives you an example of more communities that can be implemented and how they're implemented I like to show this in action this is iOS of course this is towards a customer so you can see the guesses in this are real their other asses of our own same with the prefixes and don't use customer ones for this perfect purpose but you could see where we import prefix lists and we assign a route map to it that route map sets the local preference in the communities appropriately to say it's a transit and in this case it says it's a customer route and it says if you looked it up online is two thirty ten is the Western building so that's a transit customer in Seattle you can tell that just from the communities and then you can see what we send out in that full tables output and I also would really recommend that if you're generating this use simple names use names that make sense to a human I've always called this the if you page an engineer drunk at 2:00 in the morning tests can they figure out what the policy does and one like full tables out is sufficiently obvious to know what it is that that's oh that's our full table speed to a customer ok cool and a lot of providers get I've seen get really into naming things in what I would regard as a non human readable way and I've always been somebody who thinks that's quite dangerous because when something really goes wrong humans need to look at the configs you don't really want that to be your standard mode of operation but you want that to be possible and you want it to be intelligible and I don't get paged very often a few times a year now but like when I get paged it's usually at 3 in the morning like after I fell into a deep sleep five or ten minutes before and I don't know about you guys but those are the times where I'm very thankful for easy naming conventions except trying to figure something out in a hurry and I really believe that it helps with the the mean time to restore quite significantly and we've had this debate internally and I do actually in our internal meetings call it the if the engineer is drunk test doesn't work because that never happens right nobody's ever been paged while drunk or incredibly tired this is the same exact implementation but implement is the route policy so this is iOS X R you can see we're setting the local preference and we're saying the community there's a community set that that's the communities so OSX ours easier to use community set sins there's a customer in community set for this router as opposed to saying them manually on the route policy I think that's another example of where you can summarize and for a given router every customer should have the same community on that router in most cases so if that's the case do something simple call it you know customer PGP customer in that's what every customer that's full table that's announced the global Internet gets on that router you know share as much of the config between route policies as possible so that they're simple and people aren't touching the complicated parts of the config because that's where it always goes self and we had an incident just the other day where customer called in saying their remote trigger blackhole II wasn't working its config was tested in the lab like it's the config that's on all the routers that did not stop the on-call engineer from reconfiguring the policy even though the underlying cause was the customer not sending the right community tag for the black hole hey dude there was no policy problem on our side but it was the the customer not saying the right community that is the kind of thing you want to avoid people thinking needs to be troubleshooted you shouldn't have to troubleshoot your default policy it should be well vetted and if you have a lab or even a spare router or two setting it up and testing the various iterations of it making sure it all works is great because then they can be cookie cutter across the whole infrastructure and cookie cutter is the key to reliability it also is the key to automation even if you don't automate upfront if all your temp configs look the same it makes it much easier automate in the future this is the same config towards the transit this is towards XO this is one of our XO periods or transit peers and you can see we're doing descriptions that match that rule keep it simple and then there's a route map going in and out on each and you can see there's a couple things in here that I like to note we always ignore meds from transits I don't want to transit to tell me to haul it across my network to somewhere else to hand it to them I'm already paying them they charge the exact same amount wherever I send it I'm not carrying the traffic across my transport between markets so I always set meds that I get from transit especially peers I would typically set the zero unless it was otherwise negotiated just depends on the on the player and then you can see that lower local preference of fifty being applied and that lower local local preference of fifty is the transit local preference that says hey if nothing else is available I'll take this this config can get a little more complicated if you have paths you like and don't like you can make more iterations of it and the reason I don't set like 50 51 52 for my preference of you know transit peers customers is because I like to have a little bit of ability to up raise and lower because in the real world you find that certain paths of are certain transits or piers are bad and you don't want to take them and so that lets you duplicate this route policy and apply an AAS path match to it and match a path you either like or don't like and so typically in arms or paths I like actually set to like 55 paths I don't like 250 paths that basically don't work 249 um so never use this path unless there are no other ways to that destination available and I always do it with local preference and I think that's a something that's very important to get people in the mindset of is don't ever block something what you're getting from a full table transit just lower the preference enough where you don't care because if you block it that's asking for you to create black holes at some point somebody will change an upstream at some point you'll go through a few up streams and pretty soon your only path to some random network that will drive tech support calls is actually blocked and people are like why can't I get to this website in Romania I don't know you look at anything like somebody blocked the whole destination you don't want to do that you want to just lower the local preference so that worst case they're headed towards a route that has a little preference that's the I don't care at all local preference because it'll still work in this case you can see the outbound route filtering being applied as well it's matching just two communities 993 which is the one that's transit and then I keep a specific one that says announced only to this one transit and that's used for specific traffic engineering internally we have some cases where some of our routes we've just list a couple of transits in that market and don't announce them to our global transits but we don't want pests to do that because it gets weird in a hurry when customers try to do it because they don't usually understand where you're connected to which carriers some real-world examples you can actually see this if you do a show ID BGP in one of our routers this is a old 6500 it's actually not there anymore at this site but it shows it quite well so it was a Portland router you could look at this slash 20 and you could see that it was coming in from the loopback of a Seattle router and and the reason we're seeing the loopback is this is all next-hop self infrastructure for BGP so all the ibgp sessions are built between loop backs that's the part I don't really cover in this is how to build your I BGP infrastructure but that is its own mess and itself and all take lots of questions on that but depending on your topology it gets building ibgp infrastructure gets really hard because a full mesh typically isn't possible if you have tons and tons of routers but route reflectors have to be designed very carefully to not create new single points of failure and I've seen a lot of cases where people have gone to route reflectors but not understood that okay this I have five sites and these three sites on this side are all brought with hundred clients of two routers in this one pop what happens that if that pop goes dark does all your connectivity really go through it and often they answer that is actually no but you lose all your routes so if your connectivity's up with no routes that's the worst possible situation so I bgp configs actually take a little while to design how to do that in a way that this scales well big networks typically would do full mesh between like major sites and then route reflectors from those sites downwards but if your regional like it's much more complicated because you got to understand the individual fiber routes around that here though you can see that this is being announced in bold to transit and it's a customer in Seattle just like we saw on the import from the prior slide where the the prefix was imported into into BGP via the route the route map and you can see that it's coming from Seattle in this case Seattle actually means the Western building has its own code this is the same thing but a black hole and I think I use the same I use the other router but you see that routers now Iowa sxr that pair and this would be a great demonstration of showing something that's been black cold the only caveat with creating your own black hoeing infrastructure across the network is that you have to rewrite your next hops and that gets a little bit tricky because if you actually want to distribute a black hole you have to go around on all your routers and when you set this community they have to recognize it and rewrite the next hop to be like a null interface and in our case what we do is we actually use the example subnet which you'll see here we use 192 that's zero that 2.0 slash 32 on every router in the infrastructure is routed towards null zero that allows you if you set that as the next hop which is done using this community the 11 400 4 6 6 6 community that knows them all towards zero and having actually be distributed is particularly important given the volume of DDoS as that people see now you don't want one router it could be sinking in that traffic you want your whole infrastructure at every edge to be sinking in but what's cool is if you implement with community you could do the same stuff we did before and rematch that policy and BRE announce it to your transits and rewrite it to their black hole community and this is like my personal favorite is that yes we might be black hole in but I also want every transit of mine that supports remote triggered black hole - black hole themselves and never even deliver it to me if we're black hole and you can do that but because you're tagging it with communities you're not just statically routing it towards null zero you can put that as part of your egress policy so I'm intentionally finishing here early here because we got filled up with a whole bunch of people with it looked like they had questions initially and so I'd love to open this up for people to ask questions because we'll have about 30 minutes ask questions yes Nico's i-i-i don't but should i filter them but a lot of customers actually I'd say probably 70% of our PGP speaking customers announced their global aggregates and then a whole ton of slash 24 is from their global aggregates this has become a huge problem across the internet these days and it just have to look at any of the routing reports and the internet is like what 75 percent slash 24 is now and the the that's really just not healthy because most of those were not you know are AR allocated slash 24 s jump in Sandy Murphy Parsons you said over and over again prefix filter your customers can you discuss filtering on peering relationships yes I like to look at hijacks and then look at who passed that bogus route around yeah and that's that's a really good question so prefix filtering on Pierce has proved to be incredibly difficult and there's basically two reasons that are not necessarily directly related but I think administrative fitly they're related one there's at least in North America and unfortunately you really need it for the whole world there isn't a good routing registry that has useful data anybody who's been around for a while if you actually pulled a filter based on like rad bas set data would have like ten times the number of prefixes in that a s set that are actually being announced if not 20 or 30 times I know because nobody ever removes the data the it's just garbage people don't get removed from ESS people don't remove route statements from you know rad B or any of the other are ers you go in there and like I've got Aaron prefixes that we got allocated to us by Aaron five or six years ago that still had rad B entries from the mid 90s in there that we're from you know ISPs that went bankrupt in 99 that then went through the whole time out for all their IP used to go back to Aaron and then went through the cycle for me to get it and the stuff was still live in the routing registries so effectively they're not useful for peering filters they're really the gold standard solution would be rpki right um strangely enough standing at the mic but ya know I knew that that would be the gold standard but then you have the exact same problem the instead of there being a whole ton of garbage data there's just no data you know there are a few prefixes that are registered but it some percentage of the global Internet it's quite small and and so what do you do with all the ones that don't match you know do you set them to a lower local preference what do you do the problem with that is I think I'd be of the mind that if I had verification on it I'd probably set to my default local preference for the peer anything that matched and then set all the ones that didn't match lower it doesn't really solve the prefix hijack problem because prefix hijack cesare hijacks are typically more specifics and so that would still take over local preference isn't relevant at that point and I'd love to see us get to a point in the internet where we could actually drop traffic that was towards unregistered prefixes but I don't know how many people in the room have pulled their customer base I know I haven't but I can guess what would happen if I did that to work whole Center we'd be finding every small web host in the world that didn't know what you know either rpki or read route registry was would fall off the internet from our perspective and I'd have a hundred thousand of our half a million cable subscribers calling our call center at the same time saying my internet doesn't work to my favorite website we'd find out a lot of new favorite websites I'm sure but that's that's kind of the problem I see with it I think we might get there with rpki it's gonna be a little like the v6 deployment unfortunately I think I think it's already going faster than the initial v6 deployment so one of the problems that I've heard expressed is that you have to kind of like include in your prefix filter everything in the customer cone not just the customer but the whole cone which means you're regenerating your prefix filters all the time the rpki has you know you express the origin and it doesn't matter how it gets to you you can decide the origin your examples of prefix filtering would be good places it's it's an easy match for valid and invalid and you have not found so at some point in the life I was wondering whether you thought that that ability would make it possible to check on the peering connections because you don't have the volume problem the can I have to recommit a configuration file that's what Jared's figures 400k or something yeah and that's really the problem with classic prefix list cuz the numbers that Jared throws out are I think quite accurate because the Internet's gotten so big even if you put some good summarizing code in you really previous filters just don't work on peers because frankly routers are slow and iOS X are still 32-bit not 64-bit so you start putting large which I don't know how many people actually knew that but that's the hypervisor for it is 64 but the actual code that the BGP daemon licenses 32 so it's it's severely limited in the amount of RAM it can address and other features like that and I don't think Cisco is unique in that particular problem but I think really more to your point the if we had fast enough CPUs in the routers to do that or an outboard way to do it to feed the filters to it and not put the infrastructure at risk I think that would be the way to go and we're kind of moving that direction you know routers are getting faster and there's outboard solutions for processing the updates which I think from our standpoint like at least in my infrastructure I'm particularly nervous about making my accepting of routes dependent on like a server somewhere so I'd have to have it set up so where I'd allow by default effectively like if I was down internally I couldn't validate I'd accept it was just that's easily solvable but you do have some problems like that you have to work through because you don't want to make your infrastructure more fragile but if we could do it that would be great like in our infrastructure thinking about it just our own customer cone would be a huge amount of work for our support team to make sure they all were valid able because we as their upstream if where they stole upstream I can't not their prefixes if you know if they're not validated and getting we have about 2600 routes you start thinking how to scale that up in terms of your NOC staff and stuff contacting and walking what in some cases are marginally clew full customers through how to do that that gets really interesting the other thing we see a lot that I don't know how it's gonna it's gonna be reflected is people swapping space around either for DDoS mitigation providers or one provider you know sublease in some IP space to someone else because keeping the Whois records up to date is for me to be a problem let alone keeping you know assigned origin validation up to date because my suspicion is is that those are kind of two levels of difficulty you have the Whois it's a pretty low bar and the origin validation is a hair higher bar but not much right but if if you find that like company's been bought three times and they've given this space to you know whatever company ended up with those assets its I just went through an issue where we were using a bunch of IP addresses that were four years out of date from a merger and acquisition because nobody had realized it and it had about fifty thousand kilo modems enough and so like that would have been a problem if that was validated because well a we wouldn't been able to announce it but nor would the other party that acquired part of this bankrupty so how do they update the who is that there if pain doesn't no pain painfully yeah so what once you've got the who the reestablished the relationship with their and it's just a click to do really yeah the the real trick here is is that the Whois data is sort of hopelessly out of date in a lot of cases because that policy of getting up to date with Aaron is difficult on purpose because people steal in space but when you're going through an M&A I mean I probably put in the last year in the three or four companies we bought like 20 or 30 hours into dealing with Aaron on transferring stuff and like I know the reviewers and they know that I'm not like a spammer trying to steal the IPS just providing the documentation required by policy is tricky well if you start making that process be required to keep them reachable as you merge the infrastructure that actually becomes really tricky and I don't know totally how you do it because it if you buy the whole company you probably have their old access but if you buy part or somebody from bankruptcy or acquisitions are not always as convenient as you bought the whole thing and you now have access their old contacts is that of course would work you wouldn't need Aaron to update and as we're proceeding into an era when IP addresses actually have some monetary aspect to it and maybe that people start paying a little more attention to who is entries we could hope I can hope yeah I think you're gonna see that art and we're gonna probably see theft if we haven't already I'm sure we've seen theft already that we just don't know about you know what everybody has because IP addresses and in WestEd going right now on the marketplaces you know twelve or thirteen hundred bucks first / 24 well you know people are gonna be trolling those for what to fund company has what space that I can swipe and nobody will notice I mean I'm you know it's it's sad because it's being dishonest actually wouldn't be that hard in those because you really providing legal documents to Aaron that they have no way of validating I mean it's if you do a transfer you're sending them your purchase agreement but do they know that the purchase even happened is that even publicly available unless they're public companies probably not you know I know we don't publish anything about what empties we acquired or how we acquired them so just because they came from our lawyers that doesn't mean they're actually the real box right and that this is sad state of affairs that the the internet so based on trust that the that we start seeing that trust evaporate as resources and actors get less less honest basically anyway does that answer thank you any other questions on this how many of you guys run BGP for your eyes PE R so good percentage of the audience and some of you I know we're big networks I see raising their hands any other actual technical questions or implementation questions yeah jump in yeah yes so if we pop back to that slide let's see here really one of the things we do and I think this is very common and obviously the vendors that are involved are like you know Arbor read where the usual suspects but they effectively all work the same way as you have some sort of mitigation cluster that's usually hanging off dedicated ports on one or more of your edged routers in some form typically we use the knot decor but the edge because we don't want to carry it into the core we'd rather dump the junk as quick as possible towards our transit and peering edge but typically how you would do is you'd inject into your default table of a diversion route that says shoot it towards this this box and it might be a loopback IP that you're injecting because if you have a lot of scrubbers you might set the next hop to a loopback and that loop back might be statically routed towards every scrubber rate that you have so that all scrubbers are taking it with one route injection that would be pretty common but to get that back into the global table that traffic back following the normal forwarding domain you actually create on place on Cisco you create a verb that imports your whole global table except the diversion routes with the next hop and the clean side of the traffic would follow go into that berth and then those routes from the global table are leaked into it that don't have that so that berth doesn't have the more specific they would create a loop and then it would route the traffic is in that birth to the next router destination downstream and and in some cases you might pop it with MPLS the whole way to avoid a loop in the MPLS you can do that too hopefully it's not you can do yeah yeah absolutely so it's question was do you need to import the whole table and I've done that mostly because at least on iOS XR it doesn't have much of a memory penalty anymore it used to have a fib problem to do that but now that now it doesn't it actually shares the fib entry so you can do it safely if you're directly copying your global table into the fur yes so that's the answer of why I do it that way on XR because actually blob Testament was like this is cool it only uses like 20 Meg's of RAM to do this because I was worried about this and for those who don't know that at least on most routing platforms the gating item is the number of routes you can put in the fib um the actual forwarding table because that's what has to be for it in hardware and a lot of times ver for outs count as a second route but they've made it so that in this particular case if they're identical can use the same entry which is particularly important for this does that help answer your question a little bit I don't know what platform you guys use of that particular answer it'd be dependent on the platform no other questions worried exactly an hour into this I was hoping I'd have like 25 minutes of questions based on the number of people that walked in oh well I guess we're gonna finish a little order with them it seemed like sort of a michaelis question setup so in a lot of cases in ours in particular we have a like an internal group that handles DOCSIS right Wow James Ashton Wow cable and we are finally getting them moved over to BGP so that we're doing BGP with the CMT SS and accepting their routes and therefore able to easily tag communities but prior to this it had been all static routes and OSPF because that was all the sounds yes it sounds really familiar yeah I'm sure but it's your export filter to your commercial customers is a lot more complicated when those routes are not in BGP until they get to the edge yeah and I think my recommendation for that particular config will be if you have to speak OSPF to device on your network speaking in another SPF instance that's and then three important to BGP if you possibly can like at you say you have a cmts and then you have like aggregation routers at that head end or whatever if you possibly can put your C MTS's in another OSPF instance so that you're not polluting your main OSPF table but I come from a cable side too and I can tell you that like in our infrastructure that wasn't even possible because the amount of year that was deployed didn't allow it exactly um that and MA and yeah you always end up with with networks that you haven't had time to do anything to yet yeah so they all you always end up with those routes somewhere in your IGP because that's just how everybody always did it yeah so that's you know in the end at least until we get everything moved over to BGP which were in the process of we're still using actual prefix lists because I mean there's no thinking about do they the internet all that stuff comes in all of our upstream and pier routes come in through community but as far as our internal stuff we don't really have any way of doing it other than a prefix list at the moment and I think one you know at least in our case the way we handled that was was probably the gold plated approach it happened to more recently coincide with a cmts upgrade and the backbone routers we build which is a luxury that most people don't get so we left the old turd going with its OSPF everywhere with everything in the OSPF table turn up a new infrastructure connected bgp between the two and started rolling see MTS's two bgp one at a time but because there were no customers on the edge AS the default would work and you could not create routing holes by doing it so you could just basically NAB the traffic before it reach the legacy infrastructure by the BGP connected see MTS's and so we've been rolling in one cmts at a time but honestly that is very hard to do live if you don't also have a big other project for replacing hardware going in cable and that happened to coincide with us going to 16 and 32 qualm everywhere for downstream which is for those that aren't cable that means prepping to offer one gigabit cable service which was a big capital initiative on our side and I think that's likely the same as a lot of people these days are doing big upgrades in their DOCSIS plants to try to offer next-generation speeds but that often means more see MTS's which made it a little more flexible but if we hadn't been doing that refresh I think we would have been trying all kinds of crazy route policies too and trying to shrink OSPF domains and import them at specific strategic locations and do it all without creating a new single point of failure which is the fun part only in very specific circumstances so the question was this this is the trick question because redistribution is typically quite bad I typically on our internal interests should you redistribute connected and statics just so that somebody could put and I do it without filtering into our internal BGP table but that also means it never gets renounced anybody that's specifically for our enterprise team going I peer out towards the customer circuit that has like you know a dedicate I P allocation so that just works without having to contact in the backbone team the question that led Niko's was bringing up is really the one of should you redistribute OSPF and BGP and the only situation which I would ever do that is if you have a dedicated instance for talking to a device that doesn't speak BGP and so that's reasonably safe put some communities on it have that OSPF instance only be on a couple routers and only being on the devices that don't speak BGP so that it's a very small contained number of routes then that's probably safe you still need to put some careful guardrails on it like not importing your default and some other fun things like that it is very ugly and it's one of those ways and actually the place we've come up with that the most often is acquisitions and we usually use an internal you know set of communities that we just apply to hey this internal only route it came from something like this though that would be kind of my method of it and then try to eliminate as quickly as possible and I think that comes from there's sort of two schools that actually we were just talking about where older networks typically had every device writing OSPF and newer networks typically have the ITP and BGP and like customer routes are only in PGP just like peering and transit routes and that's kind of how we've changed as the number of routes has gone up and and part of that is really a lot of backbones now have many more links on their backbones so actually controlling OSPF table size or is is table size it's pretty important for convergence times and like a big isp might have 10 or 20,000 customer prefixes you don't really want those in your OSPF anymore because you want your SPF to rican and actually come up quickly so that your PGP sessions that rely on that IGP being up and converged can start being brought back up very quickly if a rapid loses connectivity anyway oh yeah jamming so in those ones it's not necessarily bad to have them redistributed in BGP so but they'll classically you would have put them into your I GP right but now if you can do next top self you don't need them in your I GP which is the reason for studying next top self on your ibgp session so the instead of the purine or transit far in network you actually get the the network that is you know the loopback on the router but there's been a fair number of a dos attacks and I think this is probably what you're mentioning this because of the the prominent one was towards I think was clawed for flare vial inks hey remember exactly who it was but a lot of traffic ended up destined to the exchange prefix and people were carrying it so the IP block that belonged to the exchange was under attack and that was reachable because people were carrying it in their I GP and so oh so it happened to you too so I think there's sort of two pieces of policy there that are important whether or not you carry it in Europe in your eye bgp is probably not that important but what you really need to do is if you're at one these eye X's is have proper egress filtering facing the i-x you shouldn't have any traffic destined towards those prefixes except your own bgp sessions there should be no other traffic destined towards them and you should only accept traffic from them and the reason to accept traffic from them is so that trace routes work right because if you accept traffic from the i-x prefix that's the TTL exceeded messages that make it trace route work but you should never have anything towards the IX prefix and the same thing goes for all your transits and peers and I think any of us that have been under pretty large DDoS is probably have noticed that what happens is people like to DDoS everything they see in the trace route and I don't know how many if you've seen that but I've seen it multiple times where like literally every hop and the traceroute like you take the destination that's being DDoS and eight hops back every one of those slash 32s that the customer can see on the trace route is now under DDoS at varying levels like in our infrastructure we solved that by actually allocating all of our backbone subnets out of dedicated prefixes that are filtered at the edge of our network everywhere and they're only ingress filtered they're not egress filtered so the tree trout's the work customers can't tell their filter the only way they could tell is you can't ping what you see in the traceroute which sometimes breaks things like MTR but we tell people they shouldn't be running that anyway cuz it's not a useful our official response is that painting the routers on the destination has no bearing of anything on the end-to-end latency well and I think there's a difference between we used to be internally of renouncing into customers so like if you looked at we remember the CL IX Nicko's member to LAX and like the Seattle IX prefix is in our IG P but we drop traffic destined towards it and it's not tagged with a customer announced perfect so a customer would never see it in their feed from us it's technically in ours but there'd be no way for the IX to detect that other than our own looking glass because we actually drop the traffic yeah but for the I axis portion it actually doesn't matter which it is because as long as you drop in the traffic you're doing the right thing you need to not deliver that traffic to the i-x and that that's really the the gold standard is just don't deliver the the garbage traffic this there should be no traffic to an IX I actually should probably add that when these presentations there's a CLS towards torts and IX or towards a transit because it it's actually pretty easy because I mean you really have to think about all you have to do is redistribute or allow BGP which is just easy it's one TCP port and and so that's actually quite easy you know what port is gonna be honest beyond 179 you know that you can put that in your edge filters you know what destination is it's going to talk to you so Leeson are our typical HCL's is that we allow anything to anything on the edge devices that is source and destination of the i-x prefix so the other IX members can ping us and do whatever they need to do directly towards our edge but nothing internally on either side can pass towards those prefixes which is it's kind of a nice solution any other questions no other questions okay thank you guys

Info

Channel: NANOG

Views: 27,910

Rating: 4.7366257 out of 5

Keywords: NANOG 67, BGP

Id: 3qZX1zsMLbU

Channel Id: undefined

Length: 65min 42sec (3942 seconds)

Published: Wed Jun 15 2016