AWS re:Invent 2019: [REPEAT 1] AWS Transit Gateway reference architectures for many VPCs (NET406-R1)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello hello how are you guys anyone today have a good reinvent right on yeah we held the announcements this morning so bunch of networking things that he just was like yeah it's some networking stuff and went on to other things so yeah transit gateways hasn't changed some VPN that kind of thing so welcome everyone this is the transit gateway session for reference architectures I'm Nick I do network stuff and so today what we're really focusing on is transit gateway a little bit about how it works and then really sort of digging into the the reference to architecture so you know talking about the costs the architecture the scalability and really just walking through at least how I have conversations with customers or trying to build out their you know the enterprise network and their security their segmentation and how to think about all the different options that abyss provides to you and making a little bit simpler by making recommendations and then letting you know how to sort of alter those that might meet your individual requirements cool so let's turn off with you know the concept of having a lot of virtual private college or V pcs so you know the vbc when we talk about it we say it's your datacenter in the cloud you know it's very segmented it's it's yours nothing goes in or out unless you allow it to but is it really a data center cuz I work with customers that have 100 200 300 500 a thousand V pcs I think I've worked with one customer that had 20 data centers once that was the most so you know the fact that you can create these gbc's through an automation script that means it's not really a data center if you're creating data centers through automation then you know probably a job for you here 80 PS you know we need we built more data centers so there's that concept of you might have a lot more of these things than a normal V PC also the people that the security of a V PC an account is little bit different you know unless you have a very well organized identity structure in your data center you probably have one set of credentials that gets you into everything in your data center your root account does that and so you'd be very careful with that root account you have one of those per account the other option is the the thing to think about between like data centers networking and the cloud networking is the people that are operating and owning those networks you know on premises you have Network teams and security teams that manage those networks and the cloud it might be the marketing team has their own VP see they know nothing about networking potentially developers who are still terrified of networking because it's you know different and so you have people that are like owning operating running networks that aren't necessarily you know network engineers how many people identify themselves as a networking person here okay cool deal so quite a bit of those types of how about like how many guys are developers and maybe DevOps that kind of thing yeah so a good mix of that too right so that's that shows you the difference of people trying to do networks in 80's so we take a look at some of the challenges right so a lot of customers will start off with like a dev and a prod pretty pretty simple right connect then to VPN or Direct Connect on premises if this is your architecture you could probably leave because you know when I said many VP C's I mean more than two so you know if you go into this this you know you start getting more people more projects you start realizing you probably want some share some services get some peering in there some other people create some more VP sees up and so you know maybe six months a year in your cloud journey you're into maybe 60 pcs and this one goes all right well you know we also want make sure you're doing so good we will do disaster recovery so we need to duplicate all that we also have a way and network that we we need to use we need to connect our bbc's between regions so one let's burn up a landing zone so now we just we have a hundred new VP sees that we're not sure what to do with and then over here the developers they put an API in their dev VPC so the Devi PC is now also prod so how do we deal with that so we need a new dev VPC for them and then apparently we just spun up some sort of credit card application so we got to do like some firewall stuff so proxies and I don't know how to do that and at some point the network person is like what are you guys doing I don't know how to this is not what I'm used to so that's sort of what we're gonna do today is how do you sort of wrangle these types of challenges because the concept is you take a lot of those lines away you replace them with transit gateway and you start thinking about this in a better way so our network person also speaks a little bit of Spanish and it says me gusta which for those of you I'll do some translation here that means I like this cloudy goodness so let's get into it so it transit gateway I should also warn you this I have 99 slides and over 900 animations so this is more Clippy actually came to me and said you should probably turn to turn this into a Microsoft moviemaker so I got a badge from Clippy but this thing is going to keep moving so hopefully it won't eat lunch and still awake we're gonna move pretty fast so transit gateway it's your router in the cloud it's a regional concept it centralizes your Direct Connect on your VPN it scales dilled 5000 attachments and v pcs so scalable router you can do flexible traffic segmentation so you can use the route tables to put things in a different route tables and make sure things don't talk to each other it also allows you to do network interfaces so when you attach a V PC it puts a network interface on that V PC and if you're familiar we've been around for 80s for a while we talked about transitive routing for a while which is a terrible term by the way but transitive routing meant that either the source or the destination for a packet had to be on a network interface in that V PC and now that we're using network interfaces it allows us to do a lot more routing tricks which we'll get into so one of the first things I have conversations about with customers when I talk about transit gateways ago okay well do I need to or what happens when this thing breaks did you just put a router in the cloud because routers break all the time I don't like that so hyperplane is the way that we we address this so hyperplane is a distributed state management system that we have deployed on ec2 in each one of our regions it's the platform that runs network load balancer nack gateway EFS mount points and transit gateway so it's seen a lot of packets so there's something with it's a reliable platform for us the way it works is whenever you put an attachment in an availability zone you get shards of bandwidth in that availability zone across those ec2 instances and depending upon how much bandwidth we allocate to this concept is how much bandwidth you get so if we're transit gateway you get 50 gigabits per bandwidth which is also why we charge you per attachment because we're you know allocating bandwidth for T and as you have more attachments you get more bandwidth across more antis if someone else or other tenants allocate more interfaces then they also get this and it's a technique we use called shuffle sharding so this means that any one of these instances fails or if any one of our customers tries to blow up their entire transit gateway it's not going to impact other folks because we've distributed that load across this fleet and it also means that even though we have this availability zone specific sort of data processing architecture we still expose the transit gateway at the regional level so we abstract you away from the availability zone specific concepts that we have to make sure you've seen one regional router so it's a nice feature of that hyper plane concept so your router is not going to break that's what you need to know so let's talk about some configuration examples so I see it in sort of two use cases sort of like the flat I'm the network person and I just want everything to talk to everything and if you can't reach something it's not my fault another one is you know I don't I have some responsibility of the security I don't want things to all talk to each other so how do I make sure things are isolated so if we take a look at this we have our transit gateway we have a default routing table so this is a default functionality of transit gateway and we have a V PC that BBC when it attaches the routes automatically not Ralph Abel so ten one in this example because the the VP ctrl table so for the transit gateway we're out table supports 5,000 routes but the V PC route table only supports a thousand five thousands bigger than a thousand that means we can't just dump all the routes into the BBC rail table automatically so you has a statically defined which routes and which thing which propagation you want to go to the transit gateway so in this case we just said 10/8 is a static route to the transit gamer so pervy PC you'll need to configure that we'll talk about an orchestration solution that solves that later so as you add more V pcs more routes show up and then that means that now because they all have routes to each other they can all talk to each other so that's good and this is the default behavior from a wording perspective I use route domains and RAL tables interchangeably sometimes it's just easy to say route domains because bbc's and transit gateways both have route tables can be a little confusing so if we take a look at isolated in this case we have the same VP C's but this case we only have one route so we don't want them to talk to each other only to VPN so we'll create another route table for VPN will attach VPN to that route table and then we will propagate the VPN routes to the V PC route table and so you have a 0-0 route there and then in the route table for the VPN you have the V PC routes propagated there and so you can notice here that even though we have 4 V pcs they're attached to the same route table but that reality well policy says the only thing you're allowed to do is go to VPN so that means these bbc's can't talk to each other but they can get on-premises so one route table one policy allows us to do that for many many V pcs and these flights all get posted a SlideShare out sort of stuff and it's be on YouTube in a day too so don't worry about memorizing all my slides I wouldn't recommend it so if we take a look at how propagations work it looks like this so when you attach a V PC to a route table that basically defines where it's allowed to go so in this case allowed to go to 0 0 you propagate your routes to people that should be able to reach you so in this case the V pcs want to be reached by VPN so we propagate the routes to that route table and we want to make sure that those through two things matched that the routes that we have or them also propagate is that way there's two-way communication all right so let's get into the reference architecture so this is this is sort of the recommendation that we're going to build out here through the rest of the presentation so we have basically four main VP C's this is where we put our sort of accounts with workloads in dev test prod shared services and we'll talk about how shared VPC works which enables you'd have multiple accounts in the same vp c you also have some other accounts hanging out there that don't have workloads in it you know landing zones and administrative accounts billing we're gonna connect all of those to a single transit gateway and then use the route tables to do segmentation and then we're gonna connect all over VPN and direct connect to that transit gateway and then we're gonna start talking about some optional network services so how would we centralize outbound connectivity how do we centralize inbound connectivity how do we do in line inspection between V pcs for like intrusion prevention systems those kinds of things all right cool deal so the first thing I usually talk about customers with is the account strategy so you know how many pcs do you have how many do you want how are you creating movie pcs where's that line drawn and so for a lot of customers this comes down to sort of two core decisions so it's do you want large V pcs so you have less infrastructure and you put all your instances in there like very large you know tens of thousands of instances with you know hundreds of different development groups in a single V PC you need to be go to I am you need to go to resource tagging you need to figure out how to build that and use the tagging to do that but we've had a lot of customers that have done that because that's just the model they work best in I think particularly when I worked with a lot of sort of more traditional enterprises where there used to like giving VLANs out to business groups you know there used to having like a set of infrastructure per tenant like this is your VLAN and your you can do whatever you want inside that VLAN but once you leave the VLAN you know boundary we have to do something and that's more about that's more like creating a V PC for them and so they feel mortal comfortable with you know VPS is segmentation boundary and every group and every tenant gets their own BBC at scale you run into architectural problems with having hundreds of e pcs because we have limits on our side but it does make the billing and the sort of blast radius simpler to think about so one of the things now we can do is we can combine these these concepts with EBC sharing so if he bc sharing allows you to have an infrastructure account where you have full control over the VPC you set up the V PC Internet gateway route tables the policies whether not you want connected on-premises or not all the infrastructure stuff in that gateways that kind of thing and then you can take those subnets and turn those into resource shares so you can say that the public and the private subnets become the yellow share and just the private stuff that's become the blue share and we're going to share you know each of those out two different sets of accounts those accounts then only see the subnets they don't see the the networking gobbledygook they don't know about internet gateways they just say hey I want to point instances is Right deploy my instances this is good if your network person because you may not want the developers mucking around with network stuff and if your developer that doesn't like networking it's good because you don't say go mess with network stuff so if you take a look at what it looks like from the user they will only see their instances and the subnets have been shared with them if you go and look at the other tenant they would only see their instances so blue would only see the private subnets and their instances so they don't really know who else is there sharing a swimming pool with and they still have full control over their data their security groups all those types of things so why not use this I mean it's good because you get better allocation of your resources it's good because you get better separation of duties between your teams you can actually use it for cost avoidance because now if you're within the same availability zone there's no peering charges or things like that but some people still find separate V pcs to be simpler there's less if there's less worries about blast radius there's less worries about maybe one of the the tenants decides spin up some crazy kubernetes cluster that has a thousand IP addresses so now you have to do you know allocation of those addresses or if one of the tenants decides to do something dumb like put a unpatched WordPress server and their V PC and someone else in a different account opened up a overly permissive security group rule so now that tenant could now technically talk to another one if you do other things so there's some things to think about there so but if you have lots of semi trusted accounts you won't put in the sim same infrastructure it's a good idea but if you don't have someone to like manage all this if no one actually wants to take ownership of cream the VP sees you may well go distributed there are a few caveats so you can't run Amazon fsx or the classic HSM endpoints what you shouldn't really do anyways you should just use a new stuff it's better and then as of last week we can now do Network load balancer so the consumers can put Network load balancers and their VP sees the BBC the shared BBC still has the same limits as normal UPC's so you want to think about the limits as well and you can't eject people in that vbc so if someone's does something that BBC you don't like it's their subnet they can do that and so you have to think about organizationally if that makes sense for you but otherwise it allows you to condense a lot of your infrastructure in one place saves IP addresses you can just save on peering those kinds of things so if you're doing a lot of hearing today maybe you should take a look at BBC sharing instead because you still get that account level granularity without all the separate infrastructure there's another session tomorrow and on Thursday so you can do a search for shared VPC on that it's in your in your app so let's talk about segmentation so segmentation is interesting because it's one of those things that you know customers would like for us to be a lot more prescriptive about but when I talk to customers these conversations are just wildly different some customers have two V pcs where the people in those epcs like literally hate each other like they would never ever want to be in the same V PC some customers have you know two organizations that they put them in separate V B C's because the there org structure was different like the web app and databases are in different V pcs because they wanted to bill it differently like that is really makes sense so you have to think a little bit about why you're creating with segmentation boundaries you also have to think about the relationship with the security and networking team so do you have a central networking team to manage these things or there's embedded in each one of the service teams potentially and also in governance and compliance if your compliance officer just doesn't like the idea of shared VP C's like that might be a deal-breaker so it's something you have to think about from from your own perspective so we have these accounts how can we secure them so there's a couple options the baseline option is I am security groups we've had these forever and our customers have been really successful using these and they still exist and we highly encourage you to use them so reminder that no matter what you do by default instances can't talk to anything until you whitelist things so don't create overly permissive security groups if you're worried about security that's step one from there you can do other things so within the BBC you have route tables this controls if it's public or private or if has access to transit gateway or not or even which transit gateway it might talk to those kinds of things at the rail table you also have at the vbc level you also have Network ACLs so the network access lists allow you do what I consider like broad-stroke things this subnet should never use this port or this subnet should never talk to these groups of subnets so you can actually use it to do like tenant isolation within VPC or if you want to block a certain IP address report and then you've got the concepts of just using separate V pcs like we said and now with transit gateway we can do hundreds and thousands of e pcs so if you want to have lots of VP ceases it's much less of an issue than it was six months ago or a year ago and so one of the things I see with customers that there's this sort of tenant and infrastructure shared security line so if you're like the networking and security organization you can't always go in force people to use the correct security groups and so sometimes you have to take on some of that responsibility at the transit gateway layer or at the you know firewall layer potentially so then beyond that you can do groups of policy at the transit gateway so the trans gateway route tables allow you to set up multi V PC security policies and segmentation and from there you can also add additional services so if you want to do egress filtering or you know firewall inspection choosing detection those kinds of things you can do that too plus there's also things like guard duty which allow you to do cloud-based sort IDs so let's do some examples you guys deal with e oh wait yeah all right I put these pineapples on here and these ducks because one apparently someone I don't know who it was last year they requested more ducks so put more ducks in and make sure people are still awake so let's take a look at this so this is one model we already talked about you know it's just flat let everything talk to everything it sounds like it might be not it might be overly permissive but if you're using security groups that might be the way you're you're locking things down so this is the network team saying I'm allowing connectivity everywhere the way that you secure that up to you definitely legitimate model if we won't create something a little bit more custom how do we do that well I like to think about these as connectivity matrices so let's think about like the the types of things we have in this case we have VPN v pcs and a shared services and so we'll create a matrix here and fill out what we want them to do we don't even use to talk to V pcs we do want the piano chair service access for VPNs or V PCs and then VPN should only talk to BBC's shared services should talk to V pcs so if we figure this out how many different policies do we have it's two we have two different policies here even though there's sort of three things so this means when we go to configure this we can think about the association's and propagations along this little table here and so if we take a look at this and transit gateway and will bring up a little cheat sheet so we want to memorize boxes will create two route tables will attach things to where they need to be and so we're gonna in propagate the VPN and shared services into the BBC round table because those those those two green checkmarks and then the V PC routes go into the VPN Braille table and so that's you know that's how you would take this concept of I want this to talk to this but not to this you feel out that sort of matrix turn into route tables and then that turns into Association propagation so that works that works pretty well but someone might say hey yeah I can't manage the shared services from on-prem I would like to do that actually okay let's go change your policy see what that looks like so from now from VPN we want to access your services and so we'll just sort of make a more permissive policy for those drought tables we still have two policies so we'll create the same two round tables and do Association propagation so here we've got those two route tables and we're just going to propagate different routes here so we're gonna propagate the shared services route and the VPN route into the route table associated with VPN and share services so now this now we can access your services via VPN but you know we can't go between V pcs we can talk to shared services and we can talk to VPN so it's a pretty sane policy you know this when I talk to customers this is like 80% of the way ninety percent of the way for most customers something like this so you know my recommendations are security groups and I am like let's not forget about those just because we have these networking policies doesn't mean we should depend upon them the closer you can do the security to the applications and application owners the faster you'll be able to move and more granular that policy will be and that's sort of getting more closer to cloud native as opposed to your DMZ in the cloud sure everybody's are great if you can use them you can afford some tenant tenant boundary makes infrastructure a lil bit easier sever bbc's you know if you like simple be pcs are simple you take on some infrastructure challenges and for transit gateway you know really try to think about grouping your sets of policies together that way you don't have a separate policy for every single DBC you really don't want to do that so they go Nick yeah but that doesn't always happen you can't always do that okay well see what happens so this is our friendly I remember there were developers in the room so I'll try to talk nicely about them so the developers they're innovating and so they put an API that the production services want to use so the old dev BBC is now the prod BBC because they've done such a great job but from a security perspective or like we did we we just did our cheat sheet you're not allowed to do that you can't the the production can't access VP sees we already said that why'd you do that so can we fix this problem now because production needs this VP C is important well a couple things one thing you could do is you go and have a conversation with Amin you say don't do that or you could say hey let's move that you have a service you want to share with people okay let's put it in the shared services view PC that's what we built that for and so you know maybe they'll move it over there maybe they wrote it in confirmation or terraform and they can just relaunch it somewhere else that'd be nice not always so that'd be good what are the things we could do well we could put a network load balancer in front of the API and then we could use private link so we could put that private link makes that API looks like it shows up in someone else's VPC even if they have the same address base we do it's a nice feature of private link so if you can use private link that's great some things you want think about was privately because private link is on a per app basis so it's application by application if you have lots of things and you talk to lots of things you might want to look at transit gateway they both scale the thousands so scale is not really an issue for either one and the cost models are a little bit different one involves the load balancing the other one has a per the cost usually isn't a big deal on this so you may want to think about you know can we use private link for this and for me it usually comes down to the load balancing component is it a pure client-to-server relationship okay so what are some other options well what if it's not an API what if it's a database not really good for load balancing and privately well we have this features been around for a long time it's called vbc peering so you could just peer those two VPC together you can peer as long as it's API or database doesn't need to be shared off to more than a hundred and twenty-five things we can do that with peering without touching anything else at some point someone's like hey aren't we in a transit gateway session not that you gonna tell me how to do strands a Gateway can't I just shared this with transit gateway yeah you can't so let's take a look at that so we have to create a new matrix here so now we have two new types of things we have the database VPC and people that want to consume the database VPC and so the data HTPC only really should contact the people that want to consume it and the consumers should only consume the data B CPC otherwise the the same policy still apply and we'll need to fill out the rest of this matrix - you know disallow and allow the bi-directional communication and so now we're at four route tables but you know before we had nine check boxes and now we have 25 so just by adding a single sort of new policy we've you know almost tripled the amount of configuration that we've got to do and the amount of complexity that we're managing in this matrix and so if this happens a couple more times this thing might be you know 100 check boxes and so you know our network administrator comes in and says no bueno which I'll translate again it says yeah you know that's not so scalable that that could cause problems so you can do this but you got to be sort of careful with it so you know options ask them to move it don't do that same thing I tell my four-year-old just don't do that it's about as effective and then you can use private link compete in vbc peering again if you're doing a lot of you busy peering a lot of these exceptions take a look at shared V pcs because like maybe you should just bundle all these things in one one V PC in other words you know build groups of these things and so now we're talking a lot about sort of managing these groups of things and the V pcs so I'd like to talk a little bit about something we released two weeks ago called Stano so sto you can do a google search for it if you like but it's a management an orchestration thing we built that you can you can run a CloudFormation template in your own account to run this stuff and it allows you to have a management console it will automate the attachments to your VP sees it will automate the inserting routes into your V PC so you don't have to manage static routes all over the place it also gives you a place to audit and control workflows for attaching to V pcs as well as controlling who can talk to you and that kind of thing and there's no servers to manage so you know it's all service which is nice so it's all lambda it has some defaults built in so it has these four route tables so flat isolated hybrid and infrastructure so that's roughly you know think about like dev VP sees you know production VP sees and then your hybrid infrastructure and then maybe some shared infrastructure this also aligns to control tower and landing zone so over time this this will sort of build into the tool that we would use with control tower and landing zone for those cases where you have hundreds of accounts that you need to connect to automatically and in this scenario just like the transit EPC so is this built by the same people who built transit BBC for the spoke if they want to join the transit gateway network all they have to do is tag things so you tag your subnets and tag your V pcs and it will it's all happens sort of automatically so let's take a look at how that works so up top we have our sort of spoke BBC they have put some tags on there the subnets where they want transit gateway to come attached you do need at a CH to a subnet in every availability zone that is important the tag here is you just say attached to tgw you don't need to put anything in the value field and then on the V PC you will tag it with the propagate to and associate with so that's going to define which route table it goes to and where it gets propagated to so if you're the network administrator you may have to tell your users you know which tags to use from there it goes through the cloud watch events into event bridge to a lambda it's going to write it to dynamo so we know what happened and then it's going to kick off a state machine and go see what the transit gateway looks like from there and to make sure that all those things exist everything is valid and from there it will go just automatically start configuring things using cross account roles so it'll go put the static route in that V PC and it will do the attachments into that and it's configured to work within your organization so organizations you can just say anything with my organizations are allowed to do this so that's nice well what if you don't want all this Auto magic there are some sensitive parts of your network okay well we have a different workflow with that requires approval so when this inbound tagging thing happens and the state machine goes to check trans gateway if there's a specific tag there and that tag says that approval required yes so the route tables tagged then it kicks off the approval workflow so the approval workflow will then say hey I'm gonna send email to the network admin and they can log into the management console the management console you know it's going to read dynamo to find out what's going on and if you approve it then goes through the same workflow and creates the attachments and the the route inside the V PC so less manual work for administrators and centralizes that allows the the spokes to just tag things and get network access so it works with resource access manager makes this whole thing a little bit simpler to use so a lot of customers waiting for that for awhile if you want to try it out here's the link if you guys want take a picture for a second or you know search for a tuba stannum cool and these would be on SlideShare like I said too in case you're not handy with your phone so now okay let's connect this to on-premises how do we do that well we actually have a lot of options so we can connect over VPN or Direct Connect through a virtual private gateway you know that only gives us one point two five gigs per tunnel so it's a little bit limited we could connect in through a direct connect through the vgw or with direct connect gateway you know that that skills a lot pretty good you know up to 500 PCs and I'll show you how to do that you can also create your own ec2 based VPN so if you have a firewall or a router vendor you just really want to use you can historically people use this for transit VPC to get around a lot of limitations that trans a gateway solved usually the complexity here is more about the management more anything else it's pretty capable there's a lot of options but it's the management that becomes sort of challenging and then obviously sort of the home field favorite is transit gateway so the ones we're gonna review here because we're being a little selective about what we talked about is Direct Connect and transit GUI because they scale better so Direct Connect works like this you know we have a point of presence we have I don't know I think like 50 75 of them around the world something something in that way and you get a router there you get connectivity there and you create a virtual interface to your B pcs so basically you create like a VLAN and that map's to a V PC so you can do that 50 times on a one gig or 10 gig port on dedicated ports because this is an architecture session we should do this the right way and that means you should go get a second connection to a different direct neck location and connect in from there so this actually makes you eligible for you know three nines of SLA as the bare minimum sort of h.a architecture for us so you know they get you 250 v pcs but I want more give me more so what happens if they're the EFI accounts the V B C's are in different accounts or even different regions so if I want visible port connected lots of places well that's what direct connect gateways for so a direct connect gateway allows you to connect to different regions and different accounts across the world you can connect up to 10 different V pcs to a single direct connect gateway so it gets that scaling function and that's one private virtual interface that does that you can create up to 50 private virtual interfaces so 50 times 10 is 500 so as of mid this year sometime we and now this allow this a multi account multi-region functionality so one physical port essentially gets you about 500 V pcs if you like clicking a lot or if you're good at automation so another way to do this is with the transit virtual interface and you get one of these per physical port and what ends up happening is you associate it with a direct connect gateway you connect that to transit gateway you can advertise up to 100 routes into AWS and you can choose 20 static routes from your to advertise per transit gateway and you can connect up to 3 transit gateways per physical port that's one of the reasons in this architecture we've chosen here which I said there's one transit gateway for devtest produts cetera because if you want to do this across the world you know you're sort of limited to choosing three transit gateways but as of today because we just announced transit gateway peering across region what you could do is you could start peering your transit gateways out here and sort of using transitive routing to get there so you could say yeah maybe this trans gateway on the right here is the one in Asia Pacific and it would connect into Dublin and Frankford and France and so you can actually start getting a little bit more flexible how will you handle your transit gateways you do incur some additional charges here but you do get more scalability across Direct Connect so that's pretty cool because that's been that when three transit gateways was pretty limiting for a lot of people and so this is a big change for us so there's some other options so if you already have direct connect working and you'll have any problems with it but maybe you want to add VPN as backup you know that's great you can still continue using Direct Connect as is just add VPN as backup this would be if you know you want to add encryption or VPN back up to that also whenever you send traffic from on-premises into the transit gateway there's that to sent ingress charge and so if you're sending lots of traffic you may want to set up direct direct connect communication to those II pcs that are heavier traffic to reduce you know data transfer charges the other option is you can also use a public virtual interface so you create a transit gateway it's obviously in the 80s cloud from Direct Connect you set up a public virtual interface that public virtual interface is going to advertise all of our routes to you and including our VPN endpoints so then you create VPN to the transit gateway and that can ride over Direct Connect so you just choose Direct Connect as the path for VPN so that's good for a couple things one is it's super scalable because you can VPN all over the place but it also gives you encryption over Direct Connect so you know if you're regulated or care about the security and encryption of your your circuits this is the recommended pattern for encrypting your circuits and also if you want more than three transit gateways it also does that cool so let's talk about VPN a little bit so VPN is basically the same as it's always been the main difference is now we support equal cost multi path so each VPN tunnel is still 1.25 gigabits per second but if you want more bandwidth add more tunnels the only major caveat to think about is make sure your on-premises gear supports that for example routers are generally pretty good at it but some firewalls have maximums of like three or four tunnels or they don't like traffic coming in and out of different tunnels because firewalls just don't like asymmetric routing very much so you may want to inspect your on-premises gear a little bit and also new today is accelerated VPN so it's another option so what this allows you to do is you can use Amazon's global network to connect to the closest place to your branches or whatever that wants to come in so Amazon global accelerator basically puts in anycast so we put the same addresses you know around the world you can connect in through VPN so you create one VPN policy and give it out to all your branches and and they will automatically find the closest part of our backbone use that as an entry point into our network and then they can come in and you know connect to the transit gateways so you know it's better better for latency a little bit more reliable and you get to leverage you know this giant network that amazon has for your you know maybe st LAN sort of like connectivity so let's talk about network services a little bit so we've built this architecture out we have some accounts we have connecting to arm premises but you know maybe someone wants to do outbound filtering for PCI compliance or someone wants to bring in a centralized ingress load balancer because they want more advanced laughs functionality than we really provide or something like that the way you would do this is through attaching it to the transit gateway and this these next few slides are pretty important otherwise a lot of slides won't make sense later so I'm going to try to focus on the mechanics of this so we have two methods to connect these network services to transit gateway the first is this interface based attachment so from the transit gateway perspective it looks like a normal V PC what we're gonna do here is we're going to assume we want to operate across three availability zones in this model we create sort of a private and public subnet we put the instances in the public subnet public is just a word I'm using it's not actually public in this case yet and let's just say we want to do Internet egress so we'll need two different route tables for this because V pcs and our services V PC will need different policies so in the VP and the V PC rail table we're going to create a static route 0 0 to the transit gateway all Internet traffic transit gateway ok which means we're going to need two Internet gateway on our services g pc now so what we're going to do is we're gonna attach those sort of private subnets to the transit gateway and this is specific specifically because each one of these can now have their own route table we will propagate that 0/0 route over or actually a static route as a static route to the outbound DBC here so traffic comes with see lanes that roundtable says go to our services GBC traffic's now gonna come out of those network interfaces there on the right from there we're gonna create these dedicated subnets that have a route in each one of those subnets that points to the network interface local to that availability zone so traffic comes from transit gateway it goes I'm gonna go to the Internet how do I get there I'm gonna look at my BBC route table BBC route table says I should go to the EMI of this instance so as a static route there's no H a there from there it's gonna go to that instance it's a if it's a firewall it's gonna do firewall stuff it's a proxy gonna do proxy stuff whatever it does the next thing it's gonna do is say how do I get out to the internet so it's gonna take a look at its outbound route table and we have a zero-zero route to the Internet gateway so whatever we put in this sort of public subnet is where the traffic goes after it goes through the instances in this case we want the traffic to go to the internet but we could also in theory put this route back to the transit gateway and then it would take a u-turn and go back to the transit gateway so in this case we're just going to the internet so it's going up that way so the route tables are defining how this Transit gateway sort of works okay so the egress the the public subnet or the second subnet on the right defines the egress behavior because we're going out to the Internet we do need to apply source net because if whatever our VP sees work you know this service is you PC doesn't know where it came from the the private and public mappings of IP addresses don't carry over across G pcs so we need to make sure that if we go out to the Internet then we have a local address so we do a source net here in theory we could use in that gateway so we can show you that example and so you also need the return route so when traffic goes out on the internet and it comes back we undo the net it says hey I need to go to ten dot something something how do I get there well that's we have a return route in that public subnet that says go back to transit gateway goes back to transit gateway and that route table it says 10.1 is one of my attachments and goes there so we've created sort of this hairpin through another VPC all using route tables we could replace these with NAT gateway and actually not gate was a good candidate for this because NAT gateway has high availability built in because it's built on hyperplane so this this is a pretty common pattern this is probably most common pattern I see with customers as opposed to putting NAT gateways in every one of their VP sees they cannot put it in one V PC and it's just less management this also means you can pretty safely put the 0/0 route and all of your V PC route tables and then transit gateway can figure out from there if it needs to go on premises needs to go to the Internet it's all sort of figured out so what design decisions have we just made there when we did this a couple things one is it's all native performance there's no tunneling so there's no overhead there we can do up to 8,500 MTU but we've also taken on a couple other things here one is we can only create one attachment for a Z and that attachment is related to the subnet route table so we can really only get like one route per availability zone so it's not horizontally scalable plus on the transit gateway we try to keep traffic within the same availability zone if possible so we may not get equal balancing across those three instances if most of my instance are in AZ a most of traffic can go to the instance in AZ aid so also not horizontally scalable but the bigger thing the thing I don't like is that you've got a static route you got three static routes pointing to those network interfaces if one of those firewalls has a bad day well you're just gonna black hole you know some portion of your traffic until that gets fixed there's a lot of solutions that use like lambda or some other things to do monitoring that's okay it works people been doing that for a long time but it can create some you know challenges those things aren't super reliable you might find out as a 10 second or 15 second failover or longer just depending upon so test that stuff out there's no guarantees around how fast that failover happens it for me I eat for me when I test it is usually pretty fast but do something you want to keep in mind on so this is sort of like the higher native performance but you take some a chase where reliability on so the next option I like more I'll be honest about it so the transit gateway you have the same services UPC but this time we only need to create three subnets so we put our instances where we want we're gonna create the same to route tables we're gonna put the same static route into our V PC we're gonna put in Internet gateway on the BBC so same stuff so far the difference is now we're gonna create a VPN from these instances into the transit gateway so it looks like almost like an on-premises router it's gonna use public addresses to do this as well what the benefit is is we can now advertise the zero-zero route from all these instances and then propagate that into the V PC route table so now we're not limited to three instances anymore we can do up to 50 so you know we get 1.25 gigs per tunnel multiply that by 50 pretty healthy amount of bandwidth the other good thing is you know we're using BGP to do this so there's you know health checks associated with BGP there's all the whole checks on VPN so we don't need to make any route table changes if something fails hopefully that's detected by those two routing protocols that we've been using for a long time so you'll still have the return route to get back to the bbc's in the services route table that will be advertised to the instances over BGP so that's how they get the return route and then we'll put the 0/0 route to the Internet gateway because that's the only thing it has access to we still will apply the source net out to the internet same rules apply that we need a local IP address when we go to the Internet gateway so what are these design notes look like well whatever instance that is needs to support VPN you support NAT you support you know a couple different things so there's a little bit higher bar there there is IPSec overhead but we've gained four this is one as horizontally scalable so we can do up to 50 of these instances if we so choose and the the high availability is built into those protocols right so dead period detection and the BGP keeper lives are going to keep this thing moving and I don't have to worry about lambda and other sort of stat about sort of workarounds so this is like the horizontal be scalable with H a built-in and you know you don't have to completely soak all this in right now because most of the partners that we work with have turned these things into repeatable patterns cloud formation templates those kinds of things but the core mechanics is that's that's what's revolving this stuff around so they can sense folly yeah mostly cautious nons nods here okay so now we're going to take those two patterns and apply them to some actual use cases so in this case we're going to an outbound services use case this would be for like nack gateways outbound proxies these kinds of things like if you know your instances only talk to seven URLs and you wanna do some URL filtering great pattern in this case we do NAT gateways we do the interface based method and there's a and I just watched this movie and I liked it so much there's a gremlin hiding here that a lot of people don't see so what happens if 10.1 pings 10.2 is that allowed well doesn't have a it has a route to the to go the internet but not ten route so what happens is the packet goes to the NAT gateway and because the NAT gateway has the return route to get to 10.2 it actually never goes to the Internet it goes the neck gateway takes a u-turn and comes back and so we probably don't want that in a lot of cases so we can add a black hole route to the vbc route table for these instances that way we block that traffic as soon as it comes into the transit gateway okay we can do something very similar with VPN so same VPN attachment method that comes in goes out to the outbound VPC goes the internet same gremlins apply so you probably want to put the black hole route but in some models you're maybe that's a firewall on the edge of that you're using here so maybe the firewall will just block that anyways you may want to just handle that as part of your normal firewall policy so this could in theory also be your east-west pop pattern if you want okay so what happens if you want to ingress things if you want centralize your ingress so you know you can do this for like laughs or if you want to do third party load balancing you know inspection in some cases we also not pictured here have a new feature called ingress routing that allows you to do this and put a route table on your VPC on the internet gateway or the virtual private gateway it's pretty neat doesn't really apply to trains a gateway directly so that's why it's not in here but it's worth calling out go check it out so in this case we want to centralize our ingress we put a load balancer optionally or use route 53 or IP addresses what we want to do comes into the central V PC we're going to do is we're gonna place sources apply source net when it comes in and we're gonna advertise the slash thirty two routes that were sourced and adding into the transit gateway and then we're gonna put that transit gateway route so the V pcs have the return route to get to call it your wife so this means when traffic comes in through the laughs it gets NAT it goes to the transit gateway goes the instance and then the instance has a return route to know how to get back to that laughs and it goes back out to the internet from there so you can centralize your your internet ingress you know pretty cool use case there another pattern we see here is customers that want to put third-party devices or transit VCS or firewalls in front of transit gateway so maybe your networking team wants to have a familiar device in front of AWS you know treat it like their own datacenter so the way this works is you might have your edge VPC let's just say you do a VPN integration method there and then from on-premises you create a VPN to those devices and so that gets the traffic into AWS you do your firewall or your SD LAN or whatever you might want to do and then from there it goes into transit gateway to all your different V pcs and vbc the PPC traffic is still handled through transit gateway you can also do this over Direct Connect if you wanted to so you can get connect into those instances privately if you wanted so the use cases here really about if you wanted to do things like encryption over Direct Connect so you can have some routers from firewalls there if you want to do things like if you're using a non dedicated so like a hosted v direct connect we really only get one VIP this allows you to create a private VIP to that edge V PC and that edge EPC is then your ingress point into the rest of the APS if you want to st wham so if you have any proprietary encryption protocol or something you want to do or you know SDM protocol you want to be everywhere this allows you to bring that and front that into AWS without overloading that device with all the V PC to be PC traffic if you have a transit V PC where you have a lot of spokes connected you can take those spokes put them on transit gateway and then keep you know sort of your router layer on the outside so it's pretty useful to see a lot of customers doing this or is like firewall if you don't a TMS is an untrusted data center you want to put everything a firewall in between everything a TBS this is a really common pattern to cool I'm gonna show you how to do a firewall in between all your V pcs but I have to do a little bit philosophy first I firmly believed that a lot of developers came to NDB that's because the dmz inside your data center is so effective of getting nothing done you know I mean we're talking about like three four five weeks to open up a firewall rule and that's because you know you call them up you go hey firewall guy I need you to open up a port no sorry I'm on this call some developer thinks the networks the problem and I'm on a seven - because I'm trying to prove the firewalls not the problem again sometimes it is a lot of times it is but either way it's not a priority for him it's just sitting in some service queue somewhere meanwhile the developers like okay well I guess that task isn't getting done this sprint Thanks firewall people so that means a bit of a rant obviously but you know I think and there's a time in case to put firewalls in maybe ice especially compliance or if it helps you move faster I think a lot of customers actually move faster once they bring firewalls in database they see the cost and the complexity and the value versus the risks that they're solving and it gives them more data to say like yes or no we should or should not be doing this and so that's probably faster than convincing the security Jedi Council that firewalls are bad idea you know those people don't change their mind very and you're not really ever ready for them so you know you can bring them in try them out that's but you should take a look at alternatives and please if you can't automate this stuff you know and make this a little more streamline you can take a look at alternatives guard duties great there's other options like agents that kind of thing so before we move our entire DMZ stack to ATS let's at least think about it that's what I'm saying all right so here's how you move your entire DMZ stack to atavism so what you can do here is you would still have maybe a 0-0 route or 10/8 route in your V pcs so if you want traffic from 10:1 to 10.2 to go into the firewall so we've created 10/8 route have that route point to the VPNs so on the right here those those instances would advertise 10/8 or 0/0 to the transit gateway and you would propagate that into the BBC route tables so packets come from 10.1 had a route table it says to get to 10.2 you have to go to the server's of CPC goes there via the VPN firewall stuff happens the return route points back to transit gateway and then the route table for their points back up to the V PC so it does the same thing on the way back but importantly it may not select the same VPN tunnel to go back so you do need to apply source NAT here on the firewalls if there's more than one path then you have to apply source net on the first path coming in so what that means is apply source net or what I've seen some customers do in sort of smaller environments there were like active standby firewalls so they'll use like BGP paths prepending or other ways to make sure there's only one active firewall so you lose the horizontal scalability what you do you know you've written from having to use net so that's that's one option there cool so and so you also need advertise all these slash dirty - routes that your source and adding that way you have a return route to get back to the firewall for that if you're going to choose the source NAT stuff okay so like I said hopefully these things are something you have to implement yourself if you're using a lot of commercial off-the-shelf options we've been working with them over the last year to make sure that we have integrations and we have confirmation templates and we have reference architectures to implement these things each vendor takes their own sort of flair on it so some use interface some use VPN those kinds of things so these are the folks we were working with and a lot of these also worked with us on the network manager launch so they have integrations in the network manager too so cool one of the things I've also seen as part of this sort of services concept is a centralized private link so I think we've got something like 41 private link services now for eight of us and rather than putting them each one then all 40 into hundreds of EPCs customers would like to put one in a single VPC you can do that you can put that in the shared services VPC you create a route 50 through private hosted zone and then share that out with your other VP seeds and there's a couple different ways to do that so I've put net 321 down there at the bottom freeman wants to go really get deep on private link and you know doing secure connectivity and then let's go let's talk about NAT for a second here so NAT allows you to do something interesting so how do you know how nat gateway works okay hesitant hands I like this so there's something negative about NAT gateway a lot of people don't understand and I have to explain this before I show you the next thing so nat gateway will just say that we've got an instance 10-1 1313 there's an ack gateway that has an EIP elastic IP address of 54 1 1 100 and then there's an Internet gateway so whenever you create the knot gateway it checks to make sure the Internet gateways there because it sort of that's how we expect customers to use it hint we're not gonna use it like that here so when the packet comes for the ec2 instance to the knot gateway what address is gonna be the new source address so it was 10 1 13 13 after it goes in that gateway what's it gonna be it's going to be 10 113 100 because the nat gateway is very simple it's actually a very stupid thing it just takes packets put its own private source address on it and sends it to wherever the route table tells it to send it in this case it tells it to send it to the Internet gateway now we have a different fleet that its sole responsibility is to turn private addresses into public addresses so at this point that fleet will see this packet and says I know the public address are 10 113 100 it's 54 dot one dot one dot 100 and that fleet exists at the Internet edge so if we go back and we delete the Internet gateway and let's say we send this to a virtual private gateway and we send the packet there cuz now not gateway does this NAT looks up weirdest in it next send you the virtual private gateway it's gonna have a private address of 10 113 100 the EIP never gets applied ok so now we can do some cool things so I was working with a customer that they didn't have enough IP addresses in their system so they were giving developers like slash 28 and don't developers you're like I want to run kubernetes and lambda all these cool stuff I can't do that with ten addresses and you know they're like well sorry we're just poor on addresses so just get creative so one of the things you can do so let's just say for the purpose of this argument that 10.1 is our rattle space that you know it goes on to our on-premises network we can use it we can add a secondary Sider range so in this case we can add 164 it's special space that we can add to any V PC so we can add a second one we can give them something crazy like a slash 16 and from here we have a normal transit gateway during normal transit gateway stuff doesn't really matter what that route table is just ignore that but we have a normal transit gateway we can create a second transit gateway create attachment to it and only put it in the non-routable subnets from there we can created a 0/0 route that points to this other transit gateway from there we can connect and create an ack gateway fleet it looks a lot like our egress example but in this case the egress route points back to transit gateway so packets come into transit gateway does a u-turn at the NAT instances and so we've now turned our 100 two four addresses into the 10.1 hundred space and so then we can create a VPN into this new transit gateway and anything that's in that sort of hundred out 64 playground space if it needs to authenticate to on-premises or hit an API or hit a database on Prem you now have that one-way access from your sort of play space through NAT gateway and on premises so it's kind of cool and I've seen a lot of customers do that if you don't have enough addresses yeah cool and you might want to filter that space because you'll probably get those through the other transit gateways you can just filter the hundred i-64 stuff when it gets on prime so I'd also like to announce that we have something else we can do we can do multicast in the cloud man you guys are really underwhelmed by that so yeah we can do multicast in ABS now so the way this works one isn't sends a packet and that turns into multiple packets it's magic so if you have applications that have multicast dependencies financial services media entertainment those kinds of things so we can now do multicast or you want to run some application it has like maybe a clustering algorithm something like that so you can do that you can we do have basically multicast domains on the transit gateway so you can control who can and cannot subscribe to certain groups to make it more restrictive so it's pretty neat I think there's a blog post out it on this morning so check that out so let's talk about mortal regions in connect multiple regions by using Direct Connect gateway we covered that already so that's from our premises into ABS but then how do you connect VPC to V PC the new announcement so we can now peer transit gateways across regions so you can use the peering it's just a new attachment type it's static only but you can't do it within the region okay so you know if you had a multi region connectivity you could set something like this up or you've got multiple transit gateways all paired with each other so you can do a full mesh use our backbone everywhere we're gonna crypt all that traffic pretty great and then you can combine this with the accelerator VPN so now you can create cell ready to VPN comes into this global network connect from anywhere to any part of your rest of your internal network one of the things you do want to do create a different ASM for your transit gateways otherwise because today it's all static but in the future you know we'd like to do dynamic routing you'll need different ASM for that so when you create transit gateways you know make sure you give them unique ASNs like the standard stuff for example does that for you automatically but and you you have to delete the whole thing to create a new ASM so listen please that's the one thing you need to do okay we also announce global traffic manager or sorry global network manager for transit gateway so now you can view your transit gateways the attachments where they're going on a map and yeah it's pretty cool you can check that out too the cost there's a thing a couple things you want to worry about and think about well not worry but think about there's an attachment cost while I talk about the hyper plane stuff so there's a you know I think in Virginia it's $36.50 per attachment you know and then there's also a to sent data processing charge for the sender so that's that's what it cost some of the things I see you know sort of make design decisions is BBC peering is one sent in one sent out so it's roughly equivalent and then the ingress data into AWS if you're doing a lot of data transfer you may want to use VPN or Direct Connect to bypass terms of gateway to reduce data transfer costs and for those cases where you're doing the VPN integration into transit gateway that's not a public thing it stays on our network and it's also charged that interets Irae even though your public using public addresses doesn't go over the Internet that's a common question if we using a lot of peering cost BBC sharing is probably a good idea to look at so let's wrap this whole thing up so some of the takeaways are we have a lot of options I hope that we've given you a pretty good way to make all this make sense for you there's enough to make it make sense for you you can layer these things to get to get additional security in terms of like personal advice what I've seen with customers I wish to be in the hardware world and I'm used to like network engineers building out seven-year roadmaps for their network because that's how long you have to keep the hardware for you're not allowed to go back and ask for more hardware once it's out so you have to predict how much bandwidth you're gonna need in seven years which is impossible exercise so everyone just for Drupal's their estimates I haven't seen the network architecture diagram that lasts a TMS is innovation for more than like eighteen months and sometimes even twelve months like I was hoping I didn't have to make him a whole bunch of new slides this year I had to redo the whole stupid deck because we came out with features I actually did this session on time last time but then we release more features so you know try to keep your architectures in your vision for like next 12 to 18 months don't build like three year sort of architectures I get it like a lot of Vizio where diagrams like people think it will work and I've got do you actually have this running no no no no no no okay let's test this before you actually try this so it starts it full if you can if you have a complex hairy problems try to sort of contain them you're like do I need in spiral inspection for everything or just that one app like we can use route tables to contain that and more anything else just go out there and test it you can get all this stuff off marketplace hope you guys all have a nice dinner tonight but you can test a lot of the stuff maybe best for less than the cost of dinner so you can go on marketplace do your Bake off's try it off see if it works for you and that's the best way to find out what's going to work so there's a training slide of some sort and thank you thanks very much for coming really appreciate it
Info
Channel: AWS Events
Views: 33,385
Rating: 4.8684931 out of 5
Keywords: re:Invent 2019, Amazon, AWS re:Invent, NET406-R1, Networking & Content Delivery, AWS Direct Connect, Amazon VPC
Id: 9Nikqn_02Oc
Channel Id: undefined
Length: 66min 19sec (3979 seconds)
Published: Wed Dec 04 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.