AWS re:Invent 2018: Behind the Scenes: Exploring the AWS Global Network (NET305)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right hi folks welcome to the behind-the-scenes session focusing on exploring the Atos global network original title was something about global networking higher performance and cat photo distributions we thought that maybe heard some feedback that that sounded a little too seedy unlike so decided to change it a bit and there will be some cat photos although a lot of its gonna focus on the 80s Network and we'll dive into a little bit more detail my name is Tom shoal I'm a senior principal network engineer in the 8 OS networking group mainly focus on a lot of our backbone connectivity as well as our network infrastructure as it relates to to and from the internet I'm up here with Steve Seymour who is a principal Solutions Architect he works across Europe Middle East and Africa with saw work with our customers focusing on their network architecture and requirements when deploying into a toe yes he'll come up here a little bit later in the presentation and talk about another part so one of the in this session what we're gonna talk about it was a few different parts here one of them is going to be some of the key themes that you'll find throughout the presentation in the 80s Network and that includes a variety of different items and as we go from there we'll talk into about how the network actually looks like from our perspective and I've given this conversation this talk with other customers as well as people inside of Amazon to describe what the 8os network looks like and one of ideas is really kind of starting from the bottom up start from the data centers and work your way up through the Atos regions to the backbone and to our internet edge at our edge pops so let's start with kind of an overview of the Adriel network as I mentioned there's going to be some key themes that we're going to go over security and availability are really important part of it and from a security perspective you'll see how we do things like implementing some security controls at the perimeter of our network availability also really important you'll see that entero how we do things with like strong isolation from failures how we've taken concepts that you've seen in maybe in the software and service space of cellular architectures and how that translates over into the networking infrastructure side probably will talk about scale which is something that we constantly have to do in the network which is continually scaling it up to support customer traffic and a final theme is on performance and that means it's really when you look at things like low latency between regions within between availability zones as well as things like throughput and looking at like things within a region like you can do at jumbo packets and things like that so let's start by start with something a little bit simpler here which is example of a customer traffic flow in this case we have a cat that wants to go to an ec2 instance where there's other cat photos on there and in that case a high level basically looks like something on the internet that needs to talk to a region within that region we have an easy-to instance that runs within side of a V PC that lives inside of availability zone which lives inside of a region now there's been other talks given at reinvent in the past that talked about the network for example things like V PC and some of the technologies that go into it how it operates there's also been other things that talk about some of things we've done with like server and network cards but we've never really talked about in great detail about what goes actually inside the network infrastructure this is our first attempt at really going to it leveled a little bit deeper so let's take a look at what that actually looks like to someone like me within the address network we have a lot of different things that are stitched together we have data centers which live inside of availability zones which are part of regions they have transit centers that are basically service are on a ramp and off ramp onto our global backbone and then we have Internet connectivity which goes into the transit centers within a region as well as two edge pops at the perimeter of the network and from our perspective you don't really have to know all these different things we certainly have to but you don't have to know all these things in order to build on top of a UAS so let's start from the bottom up basically from the atrs regions and go on from there so one of the concepts we have availability zones now availability zones they provide that failure isolation between from other ATS from other availability zones within a region they also have direct connectivity between the availability zones the availability zones themselves can include one or multiple typically it's many data centers as part of them the interconnectivity between a vote within data centers within an availability zone are built for low latency and there are close proximity one of the main things also is that scalability part which is that we have to continue to scale the network as things grow for example an availability zone can't really be kept we have to be able to scale it out further and further and then looking at an address region from this perspective you have multiple parts here you have data centers within availability zones in the transit centers and there's different kind of types of connectivity that we have here between data centers we have a lot of connectivity that should put them together within that availability zone and then we have Intervale ability zone links and then we have the transit centers I provide the connectivity outside of them one thing that's really interesting is that like if you have an elastic IP and that lives within availability zone when the traffic goes into a region we send that traffic directly to that availability zone we don't spray it to the other ones it goes directly to where it needs to go where it lives and if you move that elastic IP to another availability zone that traffic will move accordingly and also if you own a traffic that leaves an availability zone it goes directly to the transit centers to egress for inter region traffic or to the Internet say again yeah well actually we're gonna have a meet and greet that's going to be after this session and then you can we'll be there we can talk about it so all right so within the availability zones we have a number of data centers and there's really kind of two types of traffic that we have to support for example we have side to side traffic or basically host-to-host communication and then we have up and down or north to south traffic which basically goes is traffic that's to and from the internet as well as other inter region traffic another part that's really important about availability zones is that they have to be elastic and that the way we treat these availability zones we have to think on different dimensions of how we scale them for example intra availability zone capacity inter availability zone capacity as well as how do you scale up the internet and inter address region capacity through the transit centers so when it goes to building a scalable data center what are some of the things that you need and some of this includes basically you have to build a network in network building blocks you have to basically make these components that you can bolt on to the network and continually add them and do it again and again you have to basically make it easy in order to do that and tat Ramallah the network and right-sized increments another part that's really critical is that the straw you need to have strong isolation boundaries when you go and build these components these building blocks that we put together along with that it brings additional capacity that they have to deliver as well now another really important part of the data center is the actual networking technology that you use and that includes things like routers the connectivity and links between them and the control plane and the control plane is a really critical part mainly because it's the protocols and it's the controllers that actually drive the network and without a control plane or if it's not working properly there really isn't a network and so from our perspective the control plane can't scale out of control as you scale the network as you would start attaching these additional building blocks into the network the control plane cannot be put at risk so we do a lot of things into in order to compartmentalize the control plane as we scale out the networks within a data center so here's an example kind of a high-level drawing of what sort of the cellular data center network architecture looks like inside of it we've got these individual cells that serve a particular function some of these include things like the access cells that are associated with taking in poke ports from hosts for example and then you have cells that are responsible for intra availability zone connectivity you have connectivity that's responsible for inter availability zone connectivity and finally like edge cells which basically make up traffic going to basically leaving the region going to the internet or other Atos regions within each one of these cells are a number of routers we'd like to go really wide when we deploy these routers and we have rows of the these devices that are in a particular layer we'd like to go really wide because it brings us certain really nice availability properties in which we know that routers are going to fail and links are going to fail and the idea is that we've over engineered the layers within each one of these cells so that they're redundant within themselves they're also compartmentalized to a degree so they we have real a lot of controls when it comes to how much control plane state can actually be shared between them and we have very specific touch points and how that's accomplished and then within all these cells we have a basically a fabric of lots of many many many devices many routers in there that provide interconnectivity between the two between individual cells themselves now one thing that's very interesting is that as we build this network we don't really like the idea the concept of like an active or a standby router on the network what I mean by that is that you don't want to have to wait for when you want to use something when something fails and it goes on to another path really they don't like that idea because if it's something that you're not exercising frequency frequently how do you know it's actually going to work so a lot of the forwarding architecture within the network is built on the basis that everything's active and it's forwarding traffic so we don't see any of those surprises because we don't like those sorts of surprises when you expect up something to work and it doesn't so one of the things I brought up was the networking technology that we use in network and routers are a really big part of that there's kind of two different flavors or schools of thought when it comes to actually using different routers inside of these kind of environments one of them is a large chassis routers right and that you're looking at something that could be half a rack or a full rack system and in that case you've got a router that's fairly large brings a lot of ports to the table but it also brings a larger failure domain if you lose that device that's a whole lot of capacity imports going with it - which isn't always that optimal they do bring some things such as there's some flexibility for example larger chassis based systems generally align cards you have different modules you can on those line cards you can have different port mixtures different speeds so give some level of flexibility if you need that it also brings that there's fewer devices that you have to manage for some people that's really important finally on the larger chassis boxes you also have a different kind of forwarding architecture in which case it's in multiple stages of porting they have to go on within the platform now if we flip that a bit to look at things like single chip routers it's a little different you're looking at like a one ru or two rack unit type platform a little bit smaller has a fixed amount of ports fewer ports but I have a nice kind of container failure domain also gives you the property that you're going to have to deal with is that you're going to have a lot more routers to manage in order to provide the equivalent level of capacity which brings its own things you have to accommodate as a result of that it does have a simpler forwarding architecture which is nice to have when you compare it to some of the larger chatzi platforms so let's take a look at that so if you look at a larger chassis based platform let's start from the top down and kind of walk through what's going on inside this type of box in here you've got the line cards and the line cards themselves are going to have a few different things on them they'll have the front on facing ports and they're going to have one or multiple switching and forwarding Asics on them basically the whole chassis the enclosure is basically lots of individual kind of routers working together each one of those line cards almost is a router unto itself now you have these line cards and you're going to need to have traffic that needs to actually talk between line cards well you for that you need some sort of a switching fabric that provides an interconnectivity between the line cards and in order to do that you probably want to have multiple these switching fabrics so that they're redundant so it could be in some configuration of M plus 1 or some kind of variation of that to actually support facilitating transporting traffic between the line cards themselves it's kind of interesting here is that on so these platforms if you look at you know if you wanted to tap what's going on between the line card of the switching fabric that's not just regular IP traffic that's typically some sort of packet that's being encoded and transmitting across there so if you were trying to dig in and troubleshoot it you can't just take out a Wireshark and take a look a lot of these cases it's some kind of special encoding and its own kind of under the hood routing that's going on there so it's a little bit more complex now you have these switching fabrics we probably have several of them that connect all these line cards together the next part down that's really important is the route processor or supervisor basically the brain of the router this is where the routing protocols operate this is how you configure the device this is what handles programming all the forwarding tables of all those line cards within the chassis which can be quite a task and then you actually have another thing that's really important which is telemetry you have all these ports in this given box that's a lot of statistics that you need to carry in terms of port counters packets per second bits per second all that telemetry that's coming off those line cards generally will go in bottleneck on that single route processor in CPU that's on the device itself net flow data s flow data things like that now you've got this brain that runs the device that handles all these important functions well are you really content with just having one so a lot of different platforms will offer you to have an additional route processor of supervisor in that case well now you're going kind of back to that same point of having an active and a standby which brings its own complications so if you were to have one of the broug processors fail it flips over to another one in that process that might not be hitless all the routing protocols might bounce as a result of that or in some cases where there's additional complexity and that some state gets replicated at the second route processor and in that case there's a bit more complexity or potential software issues that come with that so you've got the brain you got the raw processor probably need to have more than one of those and then you finally get to the bottom which is the power supply and the fans on some of these bigger chassis platforms that are really tall sometimes you have things like power zones and because not all the power supplies can energize the entire platform and it gets really complicated when you're trying to figure out well what power zone undereyes is one part of the box versus the other part of the box what happens if you lose this can all that kind of thing run on the other remaining power supplies so it gets really complicated and it's almost like looking at a modern commercial aircraft where you have something where you have a green hydraulic system and a yellow one and a blue one and certain ones serve for certain areas of it I think that's pretty complicated if you have to worry about them and some of your routers so it's a bit more complicated of a platform and if something was to break in there one component in here can you actually effectively troubleshoot it rapidly the other option is if you think the device is sick you've shifted away you're also losing a lot of capacity that comes with it as well all right so flip this around to another platform which is a single chip based type device which is a one are you - are you kind of platform something that we really like in our data centers this case you get the front end facing ports you've got a single routing switching ASIC you got a single route processor or supervisor and then you have dual power supplies this case it's really simple bits go in bits go out single lookup and leaves the device you compare that to large chassis box you have to go through the ingress signed card to the switching fabric to the egress signed card and out again on this platform a lot simpler anything coming in and out of this device is going to be IP it's really easy to troubleshoot that you have a single raw processor makes it real easy if the thing fails that's fine device goes out of service you lose the control plane everything else goes away with it so it's something that we like a lot another part is that on these platforms we actually operate we actually build our own operating system that goes on these boxes as well so we have an ownership of the code that's deployed on them the routing protocols the entire stack which gives us a lot of visibility to what's really happening under the hood and a very simple architecture and kind of sticks with the the KISS principle hair we like it quite a lot now if you're going to have all these small chip routers in a network you have to be able to operate the network properly and there's a few things I get that are really important there one of them is the automation component and basically building programmatic configuration when you have so many different devices the router itself can't be there Authority you have to build a system that actually understands the network topology that understands what what things are connected to other devices you have to understand what functions may live on that device and it shouldn't live on the router it should actually live in a system that's responsible because at one point that router is going to fail and you're gonna want to swap it out and when you're gonna want to bring it back into service and you don't want to be in the position where you have to cut and paste whatever was the previous configure in there and get it on the new device you're going to want to have a system basically bring that device online give it its correct configuration and go through a return to Service workflow where you're doing the right pre and post checks or you can safely delivering the voice back in line doing its what it was doing before you really can't have situations where you have routers with specific functions that have some kind of unique quality like to use the term what you don't want to have is artisanal farm-to-table handcrafted routers you know you want to make it very uniform and very simple to understand so you have a lot of these devices you also have to deal with the fact that there's realities of you run a big network you have a lot of links between them how do you actually know that the network's running healthy and so we use a variety of techniques to actually go in validate what's going on inside the network and if there's any problems one of them is active data playing probing and in that case we're actually building probing traffic that is going across the entire network on every router and every link so we get that visibility and end-to-end connectivity what is the latency of that communication is there packet loss one really cool thing that also brings us is the ability to triangulate a fault in the network and we can actually identify where that problem is and then go and generate alert now another part is statistical deviations and anomaly detection and what that means is kind of break this down into a simpler term or simpler way to explain it would be what traffic goes into a router should come out it's real simple how many bits go in you should see how much comes out there's always some little discrepancy because there's some traffic that actually is bound for that device but you can actually look across every router in the network and see if there's something that isn't matching up right and you can generate alert to say that there's something abnormal going on here as I mentioned in ourselves we have lots of routers that are scattered horizontally you can also identify situations where you know we think all these devices are roughly equivalent if there's one router that's not pushing as much traffic as the other ones we can use that to also generate an alert to say that there's something that doesn't look right here the Browder's themselves also generate a fair amount of telemetry that's really useful for us for example we grab a lot of the syslog information that comes out of the routers and we classify those messages and generate that into our alerting system there's other things too that you can look at as well for example the Asics that we have on there generate their own messages the Asics themselves we can actually interrogate specific registers and understand is this doing the right thing that we expect it to does anything look strange here finally we can look at things that are finite resources on a routers table sizes things like that from a forwarding table size or from link adjacencies we can look at that information to know is this thing growing out of control or something doesn't look right here and we can generate an alert finally the other part is deep component monitoring what I mean by that is looking at the hardware itself looking at things like the power supply is the voltage on the platform looking at environmental and temperature to know if everything is we're looking correct there another part of that it also involves some of the components that we put into the router for example pluggable technologies are those pluggable optics for example are they operating with the right voltage and the right temperatures another thing that's really neat is that you can actually go and look at the light level that you're receiving on a link and understanding is this within the the threshold in which it should be operating or is the signal degrading over time and we think that this is actually a component that's going to fail at some point in the future and then we can go and take action and action is a really important part because with all this network monitoring it doesn't really do any good if you know you're collecting all these alerts you have to actually mitigate the problem and in that case we have all of our learning that goes and feeds into an auto mitigation system that can go and take action and those are machines that do that that's not alerts going to people that's completely computer driven and like to give you an example of that would be something like an alert would come up and it could drive an action where if a link is having errors go and shift that link out of service or if something doesn't look right with a router go and take that device out of service and we've built these cells horizontal and we've over engineered them so that we can take routers out of service any point in the network and not compromise on availability or capacity demands of customer traffic another part within the a diverse regions are the intra availability zone and inter availability zone connectivity so this is basically the bulk fiber that we use to actually connect everything all together we've established these spans of connectivity within a availability zone and between the availability zones to provide a lot of capacity for the network we optimized for low latency and a real critical part of this is physical diversity so we know when we connect these things together we want to make sure that there aren't redundant paths and we use basically geospatial coordinates to actually identify the specific fiber papp path and we know where it actually goes in out on the ground so we can actually verify that the paths that leave a particular data center are going through diverse entry points and they're taking a path where they don't converge together that's all of our controller infrastructure that we have in place and there's a lot of different mechanisms that we can use to validate that that's working as we expect for example we can take the optical light loss levels and understand what how long should that path be that matches up with our coordinates and actually validate that that is on the right path that we expect and if there's a fiber cut we can look at afterwards and understand did anything any properties change it with that fiber that needs to be looked into that's constantly going on all the time and it's always being collected I'm the slide here we have a few different pictures that I took in previous free events there's been some talks from James Hamilton talking about some of our cables we talked about one fiber boat cable that we had it has three thousand four hundred fifty six individual fiber strands on it we've actually done bigger than that we've actually doubled it we have a six thousand nine hundred and twelve cable and they if you look at the cable and there's quite a lot of fiber in there and then they have a metal core which gives it strength but also allows us to trace the cable from the surface and gives us some nice capabilities there the bottom right the blue cable there is something a cable that we have to use in one part of the world and that's specifically in Australia because there's a particular termite that'll eat through fiber and has a special coating on it that actually prevents it from happening we're gonna at the meet and greet I've got some fiber samples in my bag if you want to come by you can actually see some of these we got some samples you can take a look at of different fiber densities you'd like to take a look finally another part is how do we get more capacity out of that and I mentioned you know we've got these fiber cables that we set up to have a lot of fibers on them and you can light that up for you know 10 gig or 100 gig but one of the interesting things is all the fibers in the ground can we get more out of it and that's where use technology like dense wavelength-division multiplexing or DWDM where we can actually take that fiber pair or you have transmit and receive and we can actually go and say well normally it could just be a hundred gig interface or we can actually put in by a hundred gig interfaces on it by transmitting a laser and it tuned at different frequencies and then you can get multiple hundred gig interfaces on that fiber pair so with the fiber on the ground you can actually extend it and bring even more capacity onto it over time and not have to chop down another fiber for example one other part of the ATS regions I talked about touched on a bit earlier is the transit centers and there are on ramp and off ramp onto the backbone and to and from the local internet connectivity they're optimized to build a local Internet can interconnection within that particular geographic area the availability zones are connected redundantly to all the transit centers that are local there and we generally put the facilities in locations where we can easily connect to and run fiber between different external networks and establish pairing relationships so now that we've kind of moved on from Atos regions let's talk a bit about the global backbone so the 8 RS global network backbone a number of a tourist services right on top of it today example is that of that is a tourist Direct Connect that goes traffic from Direct Connect pops or right over the backbone bring your traffic into a region internet connectivity we can also ride on top of the backbone and pick and choose what kind of locations on the internet we want to bring traffic to and from region to region communication also goes on top of the backbone finally services like ADA Amazon CloudFront for car front pops that are connected on to the backbone all their connections for origin fetch if it comes from a DOS resource for example if it's pulling an Origin fetch from s3 that will run over the backbone or if you're using something like s3 transfer accelerator that'll go through a cloud front pop that will talk to our network so this is a picture of the address global backbone this is actually from 2017 and Peter DeSantis has reinvent Tuesday Night Live I think presentation he's going to be doing a Monday Night Live presentation tonight with an updated version of the backbone so if you do want to see it you can take a look and see how the topology has changed a bit between now and then just give me an idea this is all around the globe connectivity hunter gig for all of our backbone circuits it's pretty cool something we're definitely proud of but highly recommend taking a look at a Peters presentation later this evening so what kind of asked a simple question here why have a backbone network one of them is security and the idea is that we're gonna have region to region traffic we want to have that riding on our own infrastructure rather than the internet we don't want to have to worry about traffic going onto the internet to get to another region and some of the potentials there for you know whether it be traffic being diverted or black hold or anything like that we want to make sure that that's not a factor so that's why definitely brings us the security angle of keeping the traffic our own on our own infrastructure availability is another really important part what I mean by that is if I go to a transit provider and ice you know I want to just drop down 200 300 gigs of traffic to get to some other potential to end point there's a good chance that might not get there a lot of the core transit providers that make up the Internet they scale based upon what they observe over a period of time so you can't just drop in a workload of a sudden rush of traffic without congesting something in their network so the idea behind this is well if we control the infrastructure we know what we need to scale it for we have to take that responsibility on and provide the right redundancy because some of those transit providers they're not always sometimes there's some level of oversubscription with some of them so you have to take on their responsibility and we have to build that on our network so that bridge responsibility becomes ours another part about this is reliable performance what I mean by that is when we go and build this backbone Network I get to pick and choose what types of paths we're gonna put traffic on and I can optimize for low latency when I'm connecting between any two different points but also how the whole thing stitches together and trying to optimize for any traffic from any region to any region or edge pop finally connecting closer to customers this is a really interesting one where if you look at if something was just an island and it had some network connectivity to the Internet you're limited in your options when it comes to things like traffic engineering I only have so many different providers I can select from if there's some kind of fiber issue in the area and multiple upstream providers are impacted I don't have a lot of options there now through the backbone all of a sudden my options open up I could send that traffic to a different City ingress and egress I have a lot more to play with and I have a lot more options when things that are outside of my control off that happen I can actually go and exercise that it gives us a lot of great availability characteristics and allows us to avoid things like whether it's congestion or problems on an internet exchange anything like that pairing disputes we can basically have a lot more options that we can go and exercise the main thing behind this presentation though I don't know how much we've really said this in the past before but we definitely know stress is that all commercial region to region traffic stays on our infrastructure except for China but that some of them want to definitely let you walk away with so when it comes to building a global backbone network there's a few things to consider and probably one the biggest things is going to be extreme auditing of fiber paths and what I mean by that is when you actually go and want to connect to different locations together that involves actually looking at what the end-to-end latency is going to be and what do you expect that fiber cable what path it's going to take and what's the end result that you're going to get and when you bring that into the network how are other regions or other locations that are going to rely upon that link use and you can actually go and model the network to say I'm going to bring a little link-up between these two locations assign the kind of routing costs that we put on there and that we can get an idea of how does that impact all regions in a full mesh of connectivity another port that's really important is path hazards specifically looking at when you're building a backbone where does that fiber actually go through does it go through a transit tunnel does it go through a bridge does it cut across like a waterway understanding a lot of the risks and actually assessing a score to understand this fiber what's the likelihood that it's going to get cut that's really really critical and in some parts of the world the fiber cuts are more often maybe it's due to lots of construction not a lot of permitting and controls in there and when you deal with those situations you have to find a way to solve it and you usually have to do that by dropping down an additional capacity on even more redundant paths to deal with what essentially could be a lot of fiber cuts happening and that's something that we do in certain parts of the world the other part is the repair expectations if it's terrestrial fiber for example if it was to be cut it could take a few hours maybe a few days to fix it could be a lot of other things that come into the mix there depending on what happens if power lines were cut down - generally it's a power that's gonna get fixed before the fiber but for terrestrial it's a little bit simpler when it comes to something like a subsea cable you're talking about weeks multiple weeks to get a ship out there to go pull it out from the bottom of the ocean and repair it and as a result of that you have to factor that into when you're building out capacity across a network and understanding you know you have to some of this capacity we may be out for a while do we have enough capacity on alternate paths to deal with that path diversity there's another thing that's really important I've mentioned kind of in the metro area it's certainly concerned and the backbone side is as well what I mean by that is a that fiber path you may have multiple fiber paths going between two different locations but do they ever cross over her intersect in some cases you may have a case where a cable will intersect with another one at a street intersection can you really consider that cable truly diverse the answer is no you can and we use this thing called shared risk linked groups a term within kind of the networking industry your srl G's to identify those points of commonality and basically we have to conclude that inner failure modeling to say yes these paths do actually converge at some point I have to assume that they will both get cut and in order to do that we have to go and build additional capacity on other diverse paths to go and basically build around it so another part on the fiber side is understanding the capacity and scale of when we're putting that infrastructure in understanding how many hundred gigs units am I going to get out of it for example what's the maximum that I expect to have and at some point when do I need to grow and basically build another cable on another performance equivalent path so you have to understand the underlying technology and the optical capabilities to understand exactly how much traffic I could squeeze out of that infrastructure another part definitely talked about the latency part but one thing that people need to understand is there's latency on a particular cable but when it gets cut there's gonna be a penalty for that traffic has to reroute around another path that's the reality of it in order to get some sort of additional diversity path there will be some sort of latency penalty so that's one of the things that we think about is that for every link in the network we think that it will go down what is the latency that we will see otherwise when when that happens and once you account for that and can basically come up with a judge you way to think about is that too much should we actually go and invest to build another path around there so that the latency bump that might happen during one of these faults isn't significant I mentioned a lot about 100 Gig interfaces on the backbone it's all optimized for 100 gigs we really like a hundred gig here one simple reason it gives you a lot more bursts as we drop down a lot of hundred gig interfaces if there's a sudden traffic burst I have room to play with before any kind of traffic management system will kick in you compare that to ten gig just simply not enough room so we really like a hundred gig and that's basically the new normal for a lot of the backbone infrastructure finally one of the things is when you're building a backbone network what should it look like and we took a lot of design patterns that we saw in the data center space and we replicate it into the backbone and you'll see that in a second here when in a picture not only that but we took a lot of the same toolings for example network monitoring and auto remediation system and we use that in the city in the backbone space as well it's been proven working pretty well for us so here's an example of what a Taurus global backbone fabric looks like similar the structure to the data center space in which case we have all these cells at the edge there's kind of two different main cells that we use the Transit Center and edge pops they are a component of it basically that's where traffic on ramps and off ramps into a backbone networking fabric we have a fabric between all these cells together to stitch them all together and then we have the backbone cells which is this is where we actually land the long-haul 100 gig circuits and make up the backbone one things that we do here is we like once again there's a notion of going wide we go and add that capacity we spread that out and stripe it essentially across a number of devices we don't just take a bunch of hundred gig circuits and we landed on a lag on a router and we're good we don't like that because we know that that router is going to fail at some point and we don't want to lose a lot of capacity that goes with it so we go and we strip that connectivity across many devices so when you lose that one you're only using a small percentage of traffic and that's how I love the the backbone network infrastructure is built on that so really services getting traffic to and from the backbone but also handles transiting traffic that may pass through a particular location one thing I want to definitely cover here for a get into the edge pops is some backbone path additions that we've made we've made a number of investments on actually some subsea cables over the last few years two years ago I believe James Hamilton talked about the Javie key cable that goes from Australia to the west coast of the United States happy to announce that earlier this year that is actually up and online and we have traffic on it run running on it right now we also have another cable which goes from Japan into the west coast the United States which is a Jupiter cable that we're working on which should be coming up in the next few years and finally there's a really interesting cable which is called the beta Bay Express cable and that goes from Singapore stops kind of in Hong Kong and then moves on to the west coast the United States what's really interesting about this cable if you could see it in the drawing there there's this little part in the middle of the ocean there which is a branching unit and traditionally with all these subsea cables when they go from one point to another they'll have branching units along them and that basically allows them to extend some connectivity that could go to a particular country and so can hop on and hop off but traditionally in subsea networks that was a set of fibers undersea fibers and there's not many fibers in some of the undersea stuff we only deal with thing you know maybe a dozen or so some of those fiber strands would come off and stop in that country and then hop back on but it was fixed it was in the cable no one's touching it again whatever you got is what you got well what's really neat about this cable is what they call a wavelength selector switch WSS and what that actually allows is you can remotely we basically reconfigure the cable using a wavelength division multiplexing technology so you can actually control how much capacity for example needs to go to Hong Kong which makes it really dynamic where you can actually scale up the cable in different ways based on actually how traffic may be growing versus in the previous world with subsea cables it was fixed and there's only so many options that you had there and so this just really wanted to highlight that this is something that we're constantly looking at not just in the Pacific but North in the Atlantic and other places as well as terrestrial of looking at different fibre investments mainly to lower latency provide additional diversity and support some of the capacity demands that we have that said I want to go in a transition over here - Steve Seymour to talk about some of our edge pop infrastructure cool so yeah we spend a bit of time looking at the the data sensors looking at the backbone but there are these other really important pieces of our infrastructure which are the edge pops so we have two particular ways that we connect out to the public internet you've heard about one of them the transit centers themselves and the edge pops are the other type and these have something in common the idea of these both of these types of locations is we want them to be somewhere where we have lots of options for connectivity with the outside world where we can connect with other providers with other networks so these are typically based in places that you've probably heard of as interconnection facilities carrier hotels and carrier neutral facilities coalos they're the kind of terms that we associate with this and these are buildings that are deliberately intended to be hosting large amounts of network infrastructure both our own and other providers now within those facilities there's quite often the the coexistence of something called internet exchanges and internet exchanges are basically large fabrics effectively switches that all of these different providers can connect into and those fabrics give you the option to connect with all of those different providers perhaps with just a low number of connections now those internet exchanges can actually span multiple buildings so you know a particular Colo provider might have multiple buildings in a city and the particular internet exchange might span across those so what are the purpose of these edge pops that we had well first of all the transit sensors associated with the region and physically they're located quite close to the region but we want to extend the edge of our network out further out into the internet so it allows us to extend that global infrastructure right out to what we call the the Internet edge we've built this backbone network the Tom has just been talking about if we can place these edge pops out on that backbone much closer to where our customers want to consume our services from we can get them onto our network in a much easier way it also gives us we're scaling that connectivity you know we can pick up lots of connectivity in multiple locations we can increase it very easily by adding in connections at these locations and of course as I said these are buildings and facilities where there are lots of other network providers so what we can do here is we can say that this is the best possible location to connect with that particular network and we can do that in all of these edge spots around the world so within the edge pops themselves we've actually got quite a lot of different services and these should be services that I'm guessing you're pretty familiar with so we've built things like AWS Direct Connect as Tom mentioned that's the service that we have that allows you to connect from your infrastructure into AWS so it makes sense that we'd want to put the equipment of that out at the edge of our network so this is a place where you can then get your fiber bring your connectivity in perhaps for a partner in to that location connecting to our infrastructure we've got cloud from our content delivery network we've got route 53 our DNS service and we've also got shield our DDoS protection service you can kind of see why these will make sense to be out there in these edge pop locations but these edge pops actually perform a couple of other functions as well as hosting those particular services obviously they give you access to the global backbone it gives you that on-ramp into our infrastructure and then you can be carried across to our regions but it's also where we do pickup that your external Internet connectivity itself so where we can pickup peering sessions with other providers with transit providers with our our peers and obviously with these services that I've been talking about so this diagram is probably starting to look a bit familiar now these are this is an example of the cellular architecture that we have within an edge pop and you can see that those cells are pretty similar in terms of how we think about things so the difference though is that we make it very easy to attach the services I mentioned so for example we can take cloud front we can take Direct Connect and we have a very easy way to connect them into this fabric that we've built out at the edge pops so this is definitely a pattern that we see repeating in multiple places within our infrastructure it obviously needs to have the cells that enable us to connect onto the backbone you saw some examples of that earlier it has the cells enable us to connect out to the rest of the internet and obviously we can keep scaling that as well as we need to so very easy to add in the services you can see them at the edge of this particular diagram so let's talk a little bit about cloud front first of all this is actually a really critical service that's out in our in our edge pops here and it's very important because the point of cloud front is we want to be able to cache content and get it closer to where our customers are viewing that content from well bearing in mind that these edge pops are all over the world these are locations that are are very well connected to perhaps people like your broadband provider at home so the people that are looking at the content that's going to be cashed in for out front they're very close to where these edge pops are if we take Route 53 this has this is slightly different what actually happens here is we use something called anycast so we can take a set of IP addresses that we use for route 53 to represent the DNS servers that we have there and we can actually advertise those IP addresses from multiple edge pops around the world so this is using both ipv4 and ipv6 and anycast is a very nice way of dragging traffic from customers to the nearest edge pop so slightly different to the way that CloudFront works and the way content delivery networks work mentioned Direct Connect a couple of times these are definitely intended to be in places very easily accessible to our customers we want Direct Connect to be in a Colo facility where you might have your infrastructure and we're connecting in to us becomes as simple as ordering a cross connect from that particular provider that means that you actually get the most optimal connectivity you can into our infrastructure and you're then obviously on our backbone and that gives you access to all of our regions so you can use things like Direct Connect gateway at that point to access B pcs in any of our commercial regions around the world now when we deploy Direct Connect into one of our edge locations we obviously need to make sure that it's highly redundant for you to connect to so whenever we deploy a Direct Connect location there'll always be multiple routes as available for customers to connect to and obviously our recommendation here though for customers that is that they actually meet us in multiple edge locations so that's why we deployed Direct Connect in to these multiple locations around the world as close as we can get them to you as a customer now something that is really quite cool that we have out on the edge is shield so this is our DDoS protection service and the great thing about this is this is where traffic enters our network remember out at these edge pops well by having shield out at the edge pops itself it means that we can actually scrub that traffic right at the edge before we carry it across the network it means that we're avoiding having to carry that traffic all the way to the region before perhaps it gets dropped so we can stop it out at the Internet edge and preserve the rest of our backbone for use for real traffic traffic that we do want to make it to the regions so that's really quite a cool thing that we have deployed out at the edge and then that final piece that's crucial out of the edge box is the Internet connectivity itself now you might be aware but there are two different types of Internet connectivity and that you would typically pick up at a location like this I'm Jenny referred to as transit and peering so when you have a connection to a transit provider it means you get access to all of the networks that are connected to that transit provider so to the whole of the Internet all of their peers everyone that they are connected to peering however gives you that ability to connect to one particular network so now you are connected to that individual network noting that they are connected to and you have the most direct path possible into that peer network now we think that is absolutely the way forward that's what we do a lot of so we have thousands of peering relationships at these edge locations all around the trying to give you the most optimal path from our infrastructure into and out of those particular networks we can establish that peering in a couple of different ways and that's something that we're going to dive into in a second so I've said peering a couple of times what does that actually mean well this concept of private hearing and public clearing they're two slightly different ways of looking at things so what is private peering or a pni a private network interconnection well this is actually where we talk to a particular peer and we establish a physical connection between our infrastructure and their infrastructure so we arrange a cross connects or fiber between one of our routers and one of their rooters we then add more connections in from more of our reuters to theirs to increase that capacity and it means we can define how much is needed and work with that particular peer to build as much redundancy into that connectivity as possible public peering however goes back to using constructs like those internet exchanges that I mentioned earlier so these are fabrics where many different providers connect into that particular fabric just like we do and over that smaller set of connections you get access to all of the peers that are on that particular exchange so the thing about that though is that we have all of that traffic coming over you know a relatively small number of links it's not one link perp here that we're connecting to here it is however a really good way of picking up a lot of connectivity and we quite often do it when we move into a new location so you'll see is connect to lots of peers over a public peering exchange and then as we start to see traffic flow with a particular peer we'll have a conversation with them and talk about moving perhaps to a pni and increasing that connectivity because we now know what it looks like we know now know how much is actually needed so just as an example you can kind of see the difference between those two types of setup and how direct they actually are so at the top there we've got the pni and it is as simple as our network on one side of the connection and the peers network on the other we arrange that cross connect we configure BGP we bring up the session and we are now announcing each other's networks to each other it's as direct as you're going to get however in the lower example here where we're going over an exchange you can see that we've got our Rooter on one side we connect into the switch the fabric that I'm talking about here and on the other side of you all of these different peers now at that point obviously we have an amount of bandwidth into that exchange we monitor it very closely but it is possible that we could end up with lots of traffic coming over there so that's something that we monitor very very closely and then we shift and people over to private peering when possible how does all of this work so Tom talked about the the control planes that we have internally well BGP is the control plane of the Internet and it's a very stable standard environment for a stable establishing session information between two peers so in those previous examples between two routers it's what's used to exchange that information of what is reachable behind those two networks so they exchanged messages regularly the two devices on each side of appearing session and it advertises these particular networks are reachable via me and we identify those networks using something called a s numbers autonomous system numbers and basically BGP here is providing that reach ability information so what does that actually look like this is another example here where we've got an Amazon root on the left-hand side connecting with an external network provider you can see that we've actually got a couple of different a s numbers out here so we've got the AWS global network on as160 509 on the left and it is paired with this external provider so we have a bgp session between those two reuters behind that route so though there are actually two networks and what bgp enables us to do here is see that there are multiple paths out to these different networks there's another network behind the one we're directly connected to and we receive all of that information using BGP we can then take that and actually decide whether we want to use this path or perhaps do we have another more direct path to that second network via another location or another set of equipment so the internet uses this mechanism to exchange routing information between all of the different peers out there all the different providers and the internet routing table at the moment is over 7,000 prefixes on v4 it's just over 60,000 on ipv6 at the moment but it is just a control plane protocol it's exchanging that information it's not actually carrying any of the traffic itself it's not carrying any details about the performance of the networks that are providing this connectivity or even the capacity so it's just saying that a particular network is reachable via this path but actually we still need that information so you know we we have to look at some other ways to achieve that so you've seen that within the edge pop we have a very similar design to the data center infrastructure and one of the things that you might have noticed is that we still need to make sure we go wired in terms of our connectivity so when we establish these connections to the external peers we actually want to make sure that we do that across multiple routers on our side you know we we absolutely acknowledge that hardware will fail at some point on you know either side of that connectivity so let's plan for that in advance and therefore make sure we have the connections striped across multiple rooters on our side but the problem is we know lots about our network we know lots about our infrastructure but we don't necessarily know lots about the peers that we're connecting with so for example we may have for Reuters on outside of a connection but what's on the other side of that so we can do a couple of things to help us here and here's the example you've got a whole bunch of amazon reuters on there on the left-hand side and we've got three networks that were connecting to on the right now at the top we've got a peer here who happens to have two Reuters so that's fine we can establish peering sessions across multiple of our Reuters spread across the two that they have but in the middle we've got a peer who only has one Reuter so we've established multiple peering sessions so that's good from our side but on their side it's only a single route we may not know that they may not have told us that we can actually extract something from BGP to give us a bit of a hint as to the situation on their side so there's something called the BGP peer ID it's an identifier for that particular router and we can actually use our infrastructure to look at where we're seeing that peer ID coming from so we're actually seeing in this case coming across multiple routers on outside so we have a pretty good idea that the peer we're connecting with it might only have one device on their side connecting to multiple of on our side that means we can then have a conversation about that see if that's something that we can change or improve or do we just need to build that into our planning and our modeling for connectivity failover so we have this connectivity we have the information about which networks are reachable via these various peering sessions but actually we don't know much about the performance at this point so what we need to be doing is actually monitoring the data plane itself so where the traffic is actually flowing and there's a couple of different ways that we can do this so ways that we can actually use the log data that we have from our various services to collect that internet performance data and feed it back into our infrastructure so we obviously have the services that that people are using out there that background telemetry we have from that is really useful in terms of giving us an idea of how that connectivity is performing so we can take that we can also use some of our other tools we have to monitor traffic flowing over the internet but actually we now need to consider what happens when something goes wrong because things will fail and this is where it's really important to have something that is automatic no we've got thousands and thousands of these routes across our network we're certainly not logging into individual rooters and and shutting down individual connections when we see problems we need to do this in an automatic way and this is something that is generally referred to as traffic engineering it's a way that we can look at that data that we have and then shift our traffic around depending on what we're seeing from those metrics so we ideally want to be able to route around problems that are happening and that needs to happen in two different directions and this is something that you may not be aware of but out on the internet things are very rarely symmetric traffic doesn't always flow over the same path in one direction and then come back the same way so we have to look at that both from an inbound perspective and from an outbound perspective so in terms of outbound traffic the way we can look at that is actually really quite simple we've got our connectivity out here we can identify the packets are perhaps not making it to a particular destination when they go over one of these paths so this is where we can then say okay something is impairing that connection perhaps there's a an interface that's congested or perhaps there is even a connection down further downstream that we can't even see well at that point using that information we have from BGP we can see that there was an alternative path and perhaps we weren't using it before well absolutely now we can we can shift our traffic out of the AWS Network to use that other path to get to that particular peer what about from the opposite direction if it's an inbound challenge that we're seeing perhaps we're we're not exceeding the traffic come into their network that we're expecting how can we influence that well again this is where BGP comes in we can actually push using BGP messages out to our peers that says we would prefer traffic to come in this particular direction so you can see here the on the right-hand side of the screen perhaps that particular flow is interrupted so we can use BGP to signal the fact that we would prefer that traffic to come in on a different path so you can see that these are two tools that we have to manage that connectivity all right well it's kind of wrapping up on the presentation here what I have you walk away with a few conclusions some of the things that we've covered here one of them is we have a strong isolation for failures in the network and we do this all across the network whether it be in the data center the availability zones the regions we've broken the network down into these individual components and where we've built fault isolation into them and we do that all across the layers reminder and another part that's really important is the extensive monitoring we talked about that we use inside the data center networks we used on some of the backbone as well as on the internet facing site and how that network telemetry feeds into Auto remediation system as I go and take action without operator involvement we talked about large amounts of capacity and over provisioning it's something that we really firmly believe in when we're building our infrastructure that we have to anticipate that links are going to go out of service and routers are going to fail and we want to be able to weather through that without any kind of compromising of availability and the customers ability to send traffic through the network you have to do that the same time as scaling all the way from host ports to different layers within the network and that we have to be able to bolt on additional capacity with ease finally the other part is our custom hardware those things that we do like I give you example the single chip routers platforms that we've innovated on and we've brought additional functionality where you know we don't need all the features that a vendor could drop to us on a routing platform we only need the features that we really care about and we can actually fine-tune the device based upon our requirements I don't have to worry about getting a feature from a vendor that's in a feature that's I don't even care about breaking the thing that I need and being able to have that end and a control lets you really build an infrastructure for the things that specifically we need to service our customers so it's about it wanted to thank you for joining we're going to have a meet and greet after this which is in the ari area East level one will allow it's going to start about 15 minutes after the session I'll run for half an hour I'll bring some samples of some fiber cables if you want to look at them they're in my backpack the police dog looked at it had to go through the security there so yeah definitely you know we'll have Q&A there we're not going to do it here we're gonna go straight over there now so thanks a lot thanks very much [Applause]
Info
Channel: Amazon Web Services
Views: 16,121
Rating: 4.7808218 out of 5
Keywords: re:Invent 2018, Amazon, AWS re:Invent, Networking, NET305, Amazon CloudFront, AWS Direct Connect
Id: tPUl96EEFps
Channel Id: undefined
Length: 55min 53sec (3353 seconds)
Published: Tue Nov 27 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.