AWS re:Invent 2016: Amazon Global Network Overview with James Hamilton

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

here's the Amazon regional Network 1414 regions all across the globe for more announced for next year great news I love what we've got we're going to have 18 we're going to have 18 regions to that point and these regions are real regions these are AWS reasons regions these are not hey I suck two stumps but two racks it opposite ends of the same data center and they're relatively independent and there's a wall between them so fires and unlikely to spread and there probably won't be a flood so those are availability zones no our availability zones are real they're separate buildings and they actually do survive through all of those different faults and I'll show you in detail because I want you to see exactly what the difference is and the best way to know the difference to see the difference simple as that 68 points of presence presence spread out throughout the entire globe this is the one that's great you haven't seen this before I got a question two years ago saying does Amazon have a private network do you deploy private networks is there any you spend on that a lot of companies talk a lot about it you considered that yeah we thought about it that is all 100% Amazon controlled resources that's AWS if you're flowing between one region and another it's flowing on that network it's at network is managed by one company it's not passed from one provider to another transit provider to another interconnection site to another interconnection site these interconnection sites are wonderful wonderful very committed to visuals but my rule is if you've got a packet the more people that touch it the less likely it is to get delivered it's as simple as that it's just one administrative domain is way better than many administrative domains and sometimes from the internet weird things happen like one company is not getting along exactly with another company and they're trying to work on a contract and maybe the resources get a little squeezed during that time to kind of rush the contract along we're not going to do that so if it's running on our network it's under our operational control we give you better quality of service and we always have assets we always have assets to be able to survive the fault there's no way we want that single link we'll ever will ever have any impact on anyone in this room because we have the capacity to survive a link failure we engineer it that way simple but it's we be crazy not to this by the way is not just the little tiny 10 gig network this is a hundred gig network every one of those links that I show you 100 gig every one of them hundred gig absolutely everywhere and of course 100 gigs not enough for many places and so it's many many parallel 100 gig links all of the place so this is a relatively this is a pretty important asset when we started this I go to admit I was a little concerned because it's really really really expensive and so I'm concerned the networking team is 100% committed that this is the right thing to do from a quality of service perspective absolutely the right thing to do and you know something if you the team is really good at finding great value and so these private resources that we have available they're short-term leases long-term leases their dark fibers that are lit under IR use there we're won several cases now we're laying our own cable and so everything is available we'll do anything that whatever's most cost effective to get the resources that we need to be able to serve is what we do and because we're not religious about there's one true way we have it's we get good value let me show you an example this is our latest project this is the hockey aku trans-pacific cable the reason one to show it to you there was a the groundbreaking in New Zealand was last week kind of a big deal if this this is a trans-pacific cable runs 14,000 kilometers at its steepest point it's six thousand meters below the sea 6000 meters below the sea that's about three miles three miles below the ocean interesting challenge with I can't resist telling you this because I was so captivated by it myself you start to get every time you get involved with technology you learn it's always harder than it looks now how hard could it be the string of fiber between Australians in the u.s. doesn't seem that bad shouldn't be a problem turns out signals and noise ratios being what they are you have to have repeaters every sixty to eighty kilometers that's unfortunate okay got it I understand they have to be able to work for 20 years without service oh that's okay I understand I understand oh and and they have to work three miles under the sea you got okay I'm with you I'm with you and these repeaters are electrically powered oh that's not good see now you've got how you've got electrical power three miles underneath the sea supposed to last 20 years and you've got to get power to it I mean those really really really long extension cords you see used in some in some lawns doesn't seem like the right thing so you've got to find a way to get power to these things and what they do if you look closely you'll see the copper sheathing that wraps the fiber the fiber is actually wrapped in copper so if you look at it closely you'll see this is a bundle of fibers some insulation and then a couple layers of copper now the problem is these are a lot of repeaters and so you have to have a lot of copper because it's carrying a lot of current it takes a lot of power to run them all and you can't do that it just it's not cost-effective to do that so what's the trick same trick that gets played on on long-haul transmission and terrestrial terrestrial power line and that is if you need a lot of power you can either deliver a lot of amperage which means you need a lot of conductor or lot of voltage and of course they go to a lot of voltage so the reason why those those pipes those conductors are relatively small is because they're running very high voltage these devices are running on direct current and it's actually ten thousand volts positive DC and ten thousand volts negative DC one more little tidbit just because I think it's a really interesting one if you look closely you'll say there's two conductors what if one failed what if someone anchors at the wrong spot what if a fisherman trawling gets a little aggressive and hopes to get something bigger what do you do them if one of these one of these conductors is is open to the sea if it's to the sea it's you it's your down absolutely down there's not a third conductor so how do you have redundancy in this cable the super interest file I think it is it's an interesting trick so you've got ten thousand volts positive ten thousand volts negative this ten thousand volt negative shunts the sea so it goes up to zero it floats up to zero so what you do if you're managing the cable is you lift this one to twenty thousand volts now you still have twenty thousand volts hitting every repeater it's the same it's the same voltage levels before same difference of potential you're using the sea water as there as this as the third wire if you will really cool trick service the go service the cable fault and then shift it back down to say to say the same way again so in a few times where to actually get we need to you've got to you've got redundancy kind of surprising okay what we've got here back to that beautiful Network I have to show it one more time now we're going to do is we're going to choose one of those regions I'm going to choose a fully developed region because I want to show you how big it can get and oh and the full richness it's possible inside a region so dive into a region let's see what we've got first thing we're going to look at natural AWS region this isn't fictional this isn't what I hope it will be someday this isn't you know a artist rendition this is what's there is actually what's there so every one of our regions there's 14 the world wide is going to be 18 worldwide every one of our regions has at least two AZ's when I say an AZ I mean a building we'll come back to that separate building most of our most of our regions have 3a Z's all of the new ones we're building we're going three AZ right now it just it feels like the right place for us to be this particular one is 5 AZ so relatively relatively big from a scale perspective every region has two transit centers the job of the Transit Center is to provide connectivity to the region to the rest of the world our private networks are the Amazon global network connects up into transit centers customers that are direct connecting to us are cooking up through pops or possibly up and through transit centers all because everyone we're peering with through the transit centers all of our transit providers through the transit centers and so two transit centers and another constant will always have now we've got we need to wire this up we've got five AZ's the first thing we're going to do is we're going to wire inside each AC we're going to run fiber to hook up each AP then we're going to run fiber to hook the Avs up with each other then we're going to run fiber to hook the transit centers up into all the lazy's you see what's going on there is a lot of redundant fiber here and the word redundant is a wonderful thing in the networking world because it means when things go wrong when someone decides to dig a hole in the wrong spot things keep running if you don't you don't feel that redundancy when you're running but when you when you don't read about it it's because of that redundancy we've got in this particular design in this particular AZ we have a hundred and twenty six unique spans which is a pretty substantial number and get this of course all of those many of those strands are more than a fiber in fact there's two hundred and forty two thousand four hundred and seventy two strands throughout that 126 length there is a lot of fiber strands here's another interesting little tidbit least it caught my interest is we use two inch conduit for the industry pretty much runs on two inch conduits well we're running a lot of fiber between these buildings do you want to dig another hole and run another two inch conduit like not especially and fibers are small so how you think it wouldn't be that hard but you need to have strength otherwise the fibers will break when they're polled so you need to have a core that's strong enough structurally that it can stick and last second thing is the the whole bundle has to be armored sufficiently well that when the construction workers pull it it's survived and it's able to stand weather in environments that underground and so company that we're working with is really doing phenomenal work is if they start a pretty conventional place it's very convinced very common to have ribbon cables in fibers that are actually in ribbons and what they're doing is they're taking the ribbons folding them into a V and stacking them and so it forms kind of a V it fills up a quadrant and then they're doing it again and doing it again and doing it again and by the time you're done that somehow they get 3456 fibers we're the first company to deploy this technology we absolutely love it it saved us a ton of money because of we're running so much fiber and you might ask because a networking world instead of running a lot of fiber you've always got a choice other things you can do one of them is is why showed you back there on the hope on the Hawaii cable that is running every fiber pair is running a hundred waves of a hundred gig you can run parallel waves on the same fiber and so instead of running a hundred fibers you can run one fiber and a hundred waves on it and so what you saw back there was a thirty terabit fiber with with six fibers thirty cherubic cable with 6 fibers so why don't we use the same thing here and the reason is current technology has DWDM or CWD ms coarse wave division multiplexing or dense wave multi division multiplexing cost more this bottom line is it's more cost effective for short distances to run independent fibers for long distances it always wins to run to run multiple waves on the same fibre silicon photonics will probably change that our plant will probably eventually end up with with running multi wave I'm very confident that will happen but it's not happening anytime soon it'll be a little bit what little bit a little bit of time yet so almost almost all of those are single fiber not every one but almost all of them ok let's jump into a fully scale Daisy again remember this is a specific region these are these are actual numbers from that region every AZ is 1 plus data center No AZ probably no data center has to a ZZZ no games like I told you third one is all over the network links we cut that covered final thing is this one I found this one I should nope I should know these numbers but in fact it's blew me away we have several not one several a Z's these are these are a part of a region 14 regions worldwide and single AZ we have several at 300,000 servers Wow big numbers here's a data center this is one where we kind of go backwards where a lot of the numbers I've shown you I find to be big numbers and surprising numbers and considerably bigger than they were last time I showed them these numbers haven't really gone up that much last time I showed you I think it was 25 to 30 megawatts right now I'm saying 25 to 30 to almost all of our new builds are 32 megawatts why aren't we building bigger facilities we could easily build 250 megawatt facilities I've been in 60 megawatt facilities nothing's challenging about it and then whatever you want here's what's going on it's it's you know the reason is the same reason we do everything it's data we just use the data and so when you're when you're scaling up a data center that when you're very small and you add scale you get really big gains and cost advantage and as and as you get the bigger and bigger and bigger it's a logarithm it just gets flatter and flatter and flatter and it starts to get to the point where the gains of going to a lot bigger are relatively small the negative gain of a big data center is linear and that is if you have a data center that's 32 Megan if 80,000 servers it's it's bad if it goes down but not nothing but we're actually having a submission scale that you don't notice that we can work through that but if you double that 160,000 triple that quadruple that start to get upwards of half a million if that goes down the amount of network traffic could be to heal all the problems it's it's not a good place to be so our take right now is this is about the right size facility it cost a little tiny bit more to head down this path but we think it's the right thing for customers that's region let me show you a little bit on networking always have to have my rant on old-school networking because it held back the industry for so long vertically integrated networking equipment whether were the Asics the hardware the protocol stack is supplied by single companies it's really it's the way the mainframe used to dominate servers and it's actually an interesting observation if you look at where the networking world is it's sort of where the server world was twenty or thirty years ago it started off with you you buy a mainframe and that's it and it comes from all one company the networking world's the same place and we know what happened in the server world as soon as you as soon as you chop off these vertical stacks you've got companies focused on every layer and they're innovating together and they're all competing to make great things happening and so the same thing starting to happen in the networked world is a wonderful place to be and it's what's what's happening is it's it's going to cause and is causing it's already causing the ratio of networking to server is going up in other words it for a given server size the amount of networking requires of support it is going up and partly it should have gone out before but networking was artificially expensive and so server resources were getting stranded now when they're moving to a commodity that's been longer happening we run our own custom built networks routers so these routers that particular one happens to be top of rack form factor these routers are built to our specs and this is the wonderful thing we have our own protocol development team when I rant about how poorly served we were by vertically integrated voters I mostly talk about cost and it was cost that caused us to actually head down our own path but it turns out as big as the cost gain is and oh my god it is a big cost gain the biggest gain is it's actually reliability what happens is networking gear is very expensive every company has people like me that that have big ideas and they say oh I've got a requirement I'd like you to add some incredibly complicated piece of code to your system and so they said sure and after a while the networking gear is absolutely completely unmaintainable and the next release comes out they don't test all that stuff that people like me asked for because nobody uses it anyway and it does work it just doesn't work our networking gear has one requirement for from from us that's the only source of requirement and we show judgment and keep it simple we actually it's our phones that ring and it's our pagers that ring if it doesn't work so it's well worth keeping it simple as fun as it would Dida I had a lot of really really tricky features we just don't do it because we want reliability so it's a much more reliable system and I honestly wouldn't have guessed that when we headed down this path I was making excuses saying initially it won't be as reliable it was way more reliable from day one we more from day one second thing is oK you've got a problem what do you do well if a pager goes off we can deal with it right now it's our data centers we can go and figure it out and we can fix it we've got the code we've got killed individuals that work in that space we can just fix it if you're calling another company it's going to be a long time if they have to duplicate something that happened to the scale I showed you in their test facilities how many test facilities look like that there's not one on the planet so it's just it's six months the most committed best quality most most serious company it takes six months it's just it's a terrible place to be so we love where we are right now we jumped on 25 gig early 25 gig it looks like a crazy decision if you look at it and I was heavily involved in this decision so I want I'll defend it you know those industry standards are 10 gig and 40 ply that pep would you build 25 it's just like you asking for trouble and oh by the way it's 25 was really new at the time and there's a bit of an optic shorting shortage happening at the time so risky as well but here's what's going on if you can't take that if you're not willing to find a way to solve the optics problem you have to run 40 gig so that's where most of the world we're confident that 25 is the right answer and I'll show you why real fast super simple 10 gig is single wave that's that's 10 gig 40 gig is 4 ways still the same thing it's basically it's all it's not quite this bad but it's 40 gig is almost 4 times the optics cost of 10 gig so it's just not a great place to be 25 gig is one way it's almost the same as 10 gig again not quite true it's a little bit more money but it's almost the same price as 10 gig and so what that means is on this model we can run 50 gig which is more bandwidth and we get serve and we get to do it at a much less cost because we're only running we're only running two waves and from an optics perspective absolutely right answer I am totally buy enough that it doesn't matter I mean what vendors are extremely happy to serve us and so it's not a problem but I believe this is where the industry is going to end up and the reason is it's whenever you've got the right answer when it looks that good it'll happen and it has happened we are deploying these by the unbelievable numbers which is good I mean I'm glad we are here is the here's the a6 that runs in our rudders today I am super excited about this because remember I referred earlier to saying hey servers went down this path and D verticalized you cannot you can now buy ASIC individually well our very large numbers you can buy eight six without buying the rest of the gear we work with Broadcom this particular ones abroad Khan tomahawks it is at the time when it was really the most complex by transistor count a some application-specific integrated circuit there is on the planet these are monsters these are absolute monsters but the beautiful thing about this this is a 3.2 terabit part what does that mean it's a hundred and twenty eight ports of 25 gig all ports can be running all flat out with no blocking it'll flow 3.2 terabits at the same time through this no wonder it's a complicated part why do I like that well non-blocking is a wonderful place to be but the real reason I like it is there's a healthy ecosystem so cavium Mellanox Broadcom marvel barefoot and ovm are all building parts there's six terabit and thirteen terabit pipes parts coming and there will be around the same price in just the same way the server was and so now what happens is that you separate off off the basically networking gears as to cost it has this cost and it has optics and that's basically all there is all the rest is lost in the noise what this means is this is on a Moore's Law pace that is a fantastic thing well optics aren't but with silicon photonics they're soon to be heading down that same path and many of the optics we're running today are in fact multi chip versions of silicon silicon photonics so good things are happening software-defined networking big topic today super-important for the last couple years we've had it since the beginning of ec2 because you need to have it since the beginning of ec2 in order to offer a secure service as we do starting around somewhere in 2011 I believe it was we made what was realistically a fairly obvious observation but an important one and that is whenever you've got workload that's very repetitive and happening all the time as almost any network packet processing is you're really better off in taking some of that down into hardware and so what we did is we offloaded the servers and dropped it down and drop that network virtualization code down onto the neck lots and lots and lots and lots of games follow from that first gain is more resources are available cores are available on the servers good news second gain is things that are hard to read and hard to understand but they're happening are little disturbances on the server like flushing tlbs and things like that are now moved off it's a little bit more secure considerably more secure if a if a hypervisor is compromised you still don't have access to the network because it's a it's a it's a separate operating system separate real-time operating system run Nik running all of our software that's all of our software running running on that Nick and so that offload does wonderful things another observation is kind of a it's an obvious observation but a super important one and this applies to every level of computing and so it's kind of a nice rule to the rules to keep in mind and that is if you if you offload the hardware you run rough numbers you rough a rep you run run roughly a tenth latency roughly attempt to power roughly attempt to cost and so it's a big deal if you can do it second observation is people say hey the reason why we had to build custom networking gear is you could never have the bandwidth we have in our data centers if we hadn't built custom gear that's not true I mean I can give you any bandwidth you want it's just more parallel links and I can do it with anyone's equipment it's not even hard to do it's hard to pay for if you're using some of these commercial gear but it's absolutely not hard to do you know what it's hard to do latency that it's physics one one is money physics is harder physics you've got a challenge with that it's just the speed of light and fiber is the speed of light and fiber has just you know there may come fibers that are a little bit faster but it is basically the fastest you're going to go so latency is key when you move to hardware the latency is fundamentally changed in touch the way I look at it is I tell software people your numbers the things you measure are called milliseconds and if you put it in hardware the things they measure our nanoseconds and microseconds and so you're changing by big margins and so this is the right place for us to go here's some good news here's some great news do you believe we're in the semiconductor business isn't that great we're in the semiconductor business that Amazon Web Services is not only are we building hardware which I thought was pretty cool we built this this is billions of transistors billions and billions of transistors if server we deploy has at least one of these in it some will have a lot more this is a very big deal imagine what we can do if I'm right on those trends I told you on Hardware implementation because latency the cost of power etc if I'm right on that point and I'm fairly confident on that one what that means is we get to implemented in silicon so now we've got in the same company reporting us into our infrastructure team we've got digital designers working on this we've got hardware designers working on the NICS themselves and we've got software developers and when you own horizontal and vertical we get to move the pace we're used to we get to make changes at the pace we're used to we get to respond to customer requirements in space where it's a pace we're used to we think this is a really big deal and if I'm right that says that there's going to be an acceleration of the amount of networking resources that are available to servers then this is a wonderful place for us to be because we're going to be able to step up to it at a relatively low cost because of some of the decisions I've laid down outlined to you so far good news there let's look at this one what one of the things that I do if you if you try to manage false by looking at at by looking at basically you have a fault you go toast mortem you say oh I shouldn't have done that and you learn from it and you don't do it again it's okay I mean you should do that we're religious about it but it's I look at other people's fault and try to learn from them and this one caught my view because oh boy I know this fault I know this fault but just by heart and this is a fun because I know it because we run a very large scale but it's actually a very rare event I almost guarantee you this company has never seen it before and they'll probably never see it again um but it does happen and very rare event at very large scale happen unfortunately it's more frequently than you think let's look at the impact this one chief financial officer this airline reported they lost hundred million dollars of revenue went away because of this fault the cancellations are listed there 2% of their monthly revenue were gone as a consequence of this thing and the report was switchgear failed and locked out backup generators let's stop you I know that one what happened I remember I happen to be in the data center for this one I don't know why they just fluke I happen to be in one of our Virginia data centers this exactly this event happened the ways I should tell you the way switchgear works and I'll tell you what happened the way switchgear works if the utility feed them through the switchgear goes down into the uninterruptible power supply and as long as the utilities there of course that circuit runs if utility fails the switchgear wait a few seconds just to give this utility usually comes back very rapidly most faults are incredibly short it's not worth starting a generator if it doesn't happen the generator starts up spins up to full rpm wait we wait the switchgear waits until the voltage stabilizes and the quality of power is good and then swings the load over to it the poor old generators hit 1800 rpm and about seven or eight seconds and they take loads about 15 to 20 it is not a good job do not apply to become a generator in a backup data center they get the load hardly fast okay so that's the way they work what goes wrong is if if there is a fault out there that looks like it might have been a short to ground inside the data center the switchgear smart and doesn't bring the generator on into the load because it could destroy the generator could damage the generator and they view it as they view it as a safety issue which is rubbish so what happened I am in the facility six hours later the switch gear manufacturer came to the facilities to explain the problem to us and be our data center manager absolutely apoplectic just completely a poletik that he's got a generator running and we didn't hold the load unacceptable and it's interesting that switchgear manufactures a woman absolutely unapologetic it that's the way it has to be fine there's there's other switch gear manufacturers so we'll buy from someone else they're all the same what are the odds they're all the same and so what we've done is the picture I'm showing there actually the picture I'm showing here is that that's normal commercial switchgear and we still use that but we changed the firmware so the firmware that controls us which ear does not do what I told you and as a consequence of not doing what I told you what happens is if there's a fault and it might be assured the generator we bring the story in the datacenter we bring the generator online we're going to do that and the reason we do that is because that's what you want us to do what are the impact maybe it's unsafe maybe with what are the risk let's look at it well the vast majority of the time it's outside to this datacenter anywhere that's just the vast majority of the time the one I we had experience with when I was at a it's actually it's kind of funny this the same pole got hit twice but it's a longer story but anyway there must be a bar nearby there and and someone drove into a utility pole aluminum utility pole which fell across to the two phases of the power lines and despite the hit the data center was was extraordinary and sort of switched gears at very unsafe don't go so let's look at what can happen well in that case it switches to the generators nothing goes no problem at all that's perfect let's say there was a short somewhere in the facility and the branch circuit kicks out everything else runs fine back the secondary power and those servers take over again you're fine okay let's say it's a fault very high up in the system the generator is actually going to come into a direction short Mike it might destroy the generator might not I don't know we've never seen it like it's honestly to my knowledge I've never read about it's never happens but it could happen and so maybe it will destroy the generator and from my perspective that's three-quarters of a million dollars we're very frugal we do not want three-quarters of a million dollars damaged but on the other hand we certainly don't want to drop the load and so we'll take that risk if that happens we've got a backup generators to backup that one all of our facilities are those magic words redundant and concurrently maintainable which is to say you can have a system offline of and at the same time have a fault and everything still keeps running so it'll just keep running through that no big deal so that's what we've done on that we're proud of we think it's one of those details that nobody would buy AWS because we do things like that but we had this vault twice the Super Bowl the Super Bowl in 2013 was down for 34 minutes exactly the same fault it does happen and it doesn't happen here it hasn't happened for years because of this

Info

Channel: Amazon Web Services

Views: 43,364

Rating: 4.9806762 out of 5

Keywords: AWS, Amazon Web Services, Cloud, cloud computing, AWS Cloud, Amazon CloudFront, James Hamilton, re:Invent 2016

Id: uj7Ting6Ckk

Channel Id: undefined

Length: 33min 35sec (2015 seconds)

Published: Tue Jun 27 2017