Advanced Architectures with AWS Transit Gateway

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good morning my name is Alan Hockney I'm a senior manager of solutions architecture at AWS welcome to this session on advanced architectures with AWS transit gateway the session will provide you with the fundamental architectural and functional knowledge that you'll need to build advanced architectures with the AWS transit gateway and I'll stop calling it tgw just for convenience I won't be spending time in the console or showing you click by click walkthrough configuration rather this session is designed to share key concepts that allow you to leverage the robust capabilities of the TGW the session will start with a must at a modest paste covering the basic components and operation of the tgw however the session will accelerate quickly into advanced use cases and assumes in intermediate to advanced level of IP networking knowledge a final note before we get going while you can implement the architecture shown you should consult with the Solutions Architect to make sure that it's the most appropriate solution for your given need so let's get right into it customers tell us they want robust networking capabilities that allow simple seamless and global connectivity between their AWS cloud environments and their on-premises environments but we also heard from customers that they were running into challenges they wanted an easier way to manage point-to-point connectivity at scale they wanted more bandwidth for their V pcs and they wanted a centralized place to manage routing within into and out of the AWS cloud let's take a look at AWS networking before the introduction of the transit gateway a customer would build a V PC and then another and they would want to connect those two together and we can do that we have V PC peering but what happens when I add the next two V pcs well I can peer those but as you no V pcs are not a transitive point for packet routing so for all of these V pcs to talk to one another they need to be fully meshed V PC peering is OK for a 4 or 5 or 6 or 7 V pcs but what happens at scale here's the formula for calculating the number of peering connections required for a full mesh V PC infrastructure let's say that I have 10 V pcs do the maths that's 45 peering connections so at least 45 API calls for peering plus API calls into each of the V pcs to update route tables to point to the peers alright so you can automate that fairly simply but what happens when you have 100 V pcs that's 4,500 peers that starts to become complex to manage moreover V PC route tables support up to a thousand static routes and up 225 peers so a full mesh of 100 V pcs will create a challenge but customers told us we could also help them in other places so this is the diagram that we had but what about edge connectivity when you want to connect your on-premises infrastructure well when using AWS site-to-site VPN you create a customer gateway they need to find a VPN connection between that customer gateway in a virtual private gateway that's attached to your V PC and then you need to go through that process for each V PC that you want to connect similarly with AWS direct connect you create a private virtual interface or vith for each of the V PC virtual private gateways that you want to connect to so we simplified this with AWS Direct Connect a bit ago by creating something called the direct connect gateway and that allows you to connect multiple private VIPs and virtual private gateways together with a single global network object with this picture as background we set off to provide our customers with something simple to use skilled to thousands of networks enter AWS transit gateway a network transit hub that you can use to interconnect your V pcs and on-premises networks now when you want to connect those V pcs you simply attach them to the tgw we need access over VPN you can attach that eight of us site to site VPN T or T G W and newly announced in April you can now attach to rec Connect gateways to t GW is created in u.s. regions excluding gov cloud let's run down the anatomy of a t GW resource attachments are the logical connections made to the t GW they are both a source of and destination for packets today d GW supports three attachment types Amazon V PC AWS site-to-site VPN and AWS direct connect gateway many of you connect AWS client VPN to the t GW today using the V PC n't our roadmap includes the ability to connect the AWS client VPN directly to the t GW a moment ago I mentioned that we recently launched direct connect gateway support for the t GW so I just want to take a brief interlude here to explain how that works so first we introduced a new viv type a transit viv so now you can have 51 virtual interfaces on your direct connect connection you can have one transit viv and fifty-fifth that are mixed between public v that provide access to AWS public services using public IPs and private vist that provide access names on V pcs using private IP addresses so let's see how a typical configuration with direct connect gateway and virtual private gateways works versus the integration with transit gateway so I have my on premises environment and I have a direct connect gateway location I'm sorry I have a direct connect location I'm provision in connection between the two of them and I create a private virtual interface that private virtual interface is subsequently attached to a direct connect gateway and then for the V pcs that I want to have access to my own premises environment I associate the virtual private gateways of those V pcs to the direct connect gateway now let's see how this works with a transit gateway so let's create a transit Gateway provision our single transit VIP on that connection attach that transit VIP to the direct connect gateway yeah as shown this doesn't work let me explain why Direct Connect gateway will support either private virtual interfaces and virtual private gateways or transit virtual interfaces in transit gateways you can't mix the two of them on a single direct connect gateway so let's create a new drug connect gateway attach the transit v to it in the direct connect console or using the api's you can attach the transit gateway to the direct connect gateway identify routes to originate and you can also then attach the transit gateway to the V PC some other things to note about the direct connect gateway integration jumbo frames over the transit V or 8500 for the MTU a direct connect gateway can support up to three transit gateways conversely a transit gateway can support up to 20 Direct Connect gateways for each transit gateway you can originate up to 20 routes towards the on-premises environment in a last note that I should make about Direct Connect in general whether using the transit v4 a private VIP or even if you're using our public v for accessing public services like s3 and dynamodb think about redundancy you should certainly be provisioning a second connection to another AWS Direct Connect location to revive the redundancy for your infrastructure okay back to the tgw anatomy lesson we talked about our attachment types now let's talk about tgw route tables route tables in the tgw allow you to create routing domains similar to virtual routing and forwarding in the typical network world or VR F's in a typical configuration when you generate or when you create your transit gateway you also get a default tgw route table route table associations are used to make next top routing decisions for packets received from an attachment by the TGW each attachment can have a single route table Association oftentimes this association is with the default tgw route table route propagation allows you to define which of the tgw route tables will learn routes from the resource attachments you can propagate 0 you can propagate to zero one or many tgw route tables so here's the route table for our diagram I've added a static default route at the top and we'll talk a little bit more about that later but I've also added propagated routes 4.1 10.1 10.2 and 10.3 and they're propagated via their respective attachments from the VPC site to site VPN and Direct Connect gateway you can specify whether to automatically associate with and propagate to the TG w's default route table at the time that you create your tgw these features you enable these features enable you to exert as much or as little control on attachments and tgw routing as is appropriate for your particular environment so now we have a complete tgw configuration but we still have work to do on the edges we've defined the route table in the GW but the edge' resources still need to understand how to get to the tgw from a routing perspective let's talk about how routes are learned both by the T GW and by the attachments in the case of a V PC you have two options the V PC will automatically propagate to the route tables that you specify in your T GW or you can statically define a route entry in the appropriate t GW route table in the case of a WS site-to-site VPN we offer two configuration modes in static mode you'll need to specify a static configuration in the appropriate t GW off tables if you're using the dynamic mode of AWS site-to-site VPN routes are propagated over bgp and you can select which route tables receive that propagation or in addition you can select to input a static route entry in the case of Direct Connect it's the same case you can statically define a route or you can take the BGP propagated routes from the direct connect gateway and specify the route tables that will receive them in the transit gateway so that's how the transit gateway route table gets its routing information now I want to talk about how the edges these attachments how they get routes to the tgw in the case of the static AWS site-to-site VPN configuration your customer gateway will need to have a static route that points towards the t GW in the case of the dynamic configuration bgp will propagate the routes in the associated route table to the customer gateway in the case of the direct connect gateway at the time that you associate your transit gateway you are allowed to provide up to 20 routes to originate towards your on-premises environment these routes do not have to have anything to do with the contents of the route table you are able to statically define the routing prefixes that will be originated and then transmitted by direct connect gateway to your on-premises equipment over bgp in the case of VPC you'll need to statically define the path to the tgw in the VPC route table so to give you a sense of what that looks like I have a V PC on the left and on the right I have a transit gateway in the middle that's receiving propagated routes from those V PC attachments in order for the V PC on the left and right to communicate a static entry is entered on both of them pointing to the transit gateway the transit gateway has next stop information to afford the packet this is what a default configuration might look like all of the attachments are associated to a single tgw route table all the attachments are propagating to that same default route table and you'll note that although I have four attachments I only have three entries in the route table the on-premises environment is using Direct Connect as a primary connectivity point and a dynamic AWS site-to-site VPN configuration as a backup in this case the direct connect gateway router advertisement is what is installed in the route table which brings us to the question of how do we select best path so here's the path selection behavior first longest prefix match the second is static route entries including site to site VPN if you're using the static capability or the static configuration for site to site VPN that static route is preferred over dynamic routes this distinction is important because it's a slight difference in the behavior from the Virtual Private Gateway after static routes VPC propagated routes are prioritized then Direct Connect gateway and then dynamically received aid vs site-to-site VPN routes all right I'm going to go into two common use cases three advanced use cases and then a customer pattern that we're seeing start to repeat as we look at the specific use cases I'll walk through the configuration of each of them first and then I will verbalize a packet walk through and highlight a bit on the screen as I mentioned previously this is going to get deep fairly quickly so if you need to come back to this on YouTube feel free to alright let's start with the first of our common use cases a flat network so I have a V PC 10.1 / 16 it has an attachment with the transit gateway it is associated with the default route table the V PC route table has a 10/8 route pointing to the transit gateway attachment and the transit gateway is propagating the V PC attachment route into the default route table I can add a 10.2 10.3 and 10.4 V PC each of them configured identically propagating into the default route table associated with the default route table and in this case attach the same transit gateway so in this scenario if I want an instance at 10.10 dot ten to talk to an instance at 10.20 dot 20 the 10.10 210 instance will put its packet out to the default gateway in its local subnet that default gateway is the implicit router for the V PC the V PC will then do a lookup or the implicit router will do a lookup in the route table associated with that subnet in the 10.1 V PC and find a route for 10/8 with the next top of the tgw the packet is forwarded through the attachment to the transit gateway the transit gateway does a lookup in the route table associated with the attachment in this case we only have one route table it identifies a route entry for 10.2 slash 16 with the next top of the attachment for VP c2 and the packet is delivered in this configuration which is the default configuration you have full connectivity between your V pcs now a word of note because I know my terminology is going to bounce moving forward when I say route domain or when the slides say route domain that means a route table inside of the transit gateway okay let's look at another common use case so similar to the one that we just saw so I have V pcs 10.1 through 10 dot 4/16 they are associated with the routing domain for VP these they're attached to a common transit gateway and each of them have a V PC route table that has a default gateway pointing to the attachment for the tgw in this case we're going to add a VPN attachment that VPN attachment will be associated with a different tgw route table we'll call this the routing domain for VPN the VPN will advertise a default route we will propagate that default route into the routing domain for the V pcs and we will propagate the ciders for the attached V pcs into the routing domain for the VPN so now let's say that I want to send a packet to an on-premises piece of gear 172 16 1.1 so I have my instance 10.1 0.1 it puts the packet out on the network it goes to the gateway in the V PC the implicit router picks that up does a lookup in the V PC route table associated with that subnet it finds a default route pointing the transit gateway attachment the packet is transmitted to the transit gateway when the transit gateway receives the packet it does a route table look up in the table associated with the attachment for V PC 10.1 slash 16 that route table has a default route pointing to the V PC attachment and the packet is forwarded down the VPN now when 170 170 216 1.1 wants to reply back to 10.10 10 it sends this packet through the local infrastructure it comes through the VPN and is received by the tgw the tgw does a route table lookup in the route table associated with this attachment which is a routing domain for V PC it finds an entry for 10.1 slash 16 with the next top of the attachment for V PC 1 and the packet is delivered to V PC 1 note that in this configuration there's no east-west connectivity within AWS between these V pcs however there is full connectivity between the V pcs in the on-premises environment and a footnote and something that I'll review a little bit later there is nothing prevents unless you prevent it a hairpin turn through the VPN customer gateway so just be advised of that but with in AWS there's no direct connectivity between these V pcs let's look at some advanced use cases one is a centralized net so in the configuration here we have V PC a 10.1 / 16 V PC P 10.2 / 16 each of them has an attachment to a tgw they're associated with the V PC route domain each of them has a V PC route table pointing their default route to the tgw on the right side we have an outbound V PC then outbound V PC is connected or it's attached to the tgw and you may not be able to see a 2l on the screen but within the outbound V PC there are some lines the horizontal slivers indicate availability zones and the vertical slivers indicate subnets so why have i put the attachment in a separate subnet from this source net device or or nat gateway we want the NAT device to be in what we call a public subnet as you can see from the route table in the top right it has a default gateway pointing to the internet via an internet gateway and as a route for 10/8 that points back to the attachments for the tgw but if we put the transit gateways attachment in its own subnet it has the effect of allowing us to define an ingress route table so what happens here is I can define subnet route tables for each of the availability zones where the transit gateway attachments exist and I'll define a default route that matches the specific NAT instance or NAT gateway in the same availability zone so in fact this V PC has four route tables it has one route table for the public subnets and it has individual three individual route tables for the attachment transit subnets so let's see what a packet flow through this might look like so if I have an instance 10 to 0 20 that wants to communicate out to a note on the internet 10 2.0 20 will create the packet deliver it to the local gateway the VPC implicit router will do a lookup in the route table associated with that subnet the VPC route table says that the next stop is the transit gateway the packet is forwarded through the attachment then the tgw does a lookup in the route table associated with that attachment and finds a default route pointing to the outbound VPC attachments the packet is delivered to the outbound VPC attachment the outbound VPC attachment then delivers it because it's not in the local subnet to the default gateway for that subnet which causes the V PC to do a lookup in the subnet route table the next top address for default route is the specific NAT gateway or not instance for that availability zone and the packet is forwarded on once that ad instance does its source NAT the packets put out to the implicit router in this case the packets ported on to the Internet gateway and on to the Internet when the packets returned its received by the NAT device that corrects the destination address it sends that information the 10 dot 2002 to packet to the transit gateway when the packet is received by the transit gateway a route table lookup is done in The Associated route table and delivered to the attachment b4v PCB now this configuration has a couple of trade-offs on the pro side it's very simple to configure it's an attachment and some route table configuration of EPC the trade-off is that the transit gateway will prefer keeping packets in the same availability zone so if a packet is transmitted from a V PC instance and availability zone a the Packer will be delivered to the NAT gateway in the corresponding availability zone in the outbound V PC what does that mean that means that because we have a statically defined default route in our route tables in the outbound VPC subnet if there is a challenge or an issue with that particular net gateway or then that instance you will need to intervene to redirect the traffic either manually or through automated scripting there's another approach that you can take oh one more thing I should have mentioned I have a black hole route here that I didn't speak to before the reason for this black hole route is that in this case I do not want VPC a and B to talk to one another it is possible if I don't have this black hole route for traffic from V PCB to make its way to V PCA the inbound traffic will have a 164 address because it will in fact beam added and it will be returned because of the route table entry you see in the outbound V PC for 10/8 so just to note if you wish to prevent instances from talking to one another in this configuration using the Nats this black hole route is required alright so another way that you can resolve or another way that you can configure this for centralized NAT is to take advantage of equal cost multi path over VPN which is available to you with AWS site-to-site VPN on the transit gateway with transit gateway we give you the ability to horizontally scale your VPN using ecmp we've tested up to 50 gigabit and we're sure that you can go higher than that if you need it with this configuration I can spin up as many instances as I like the VPN terminates on those instances the instances announce a default prefix which is installed into the V PC route table V PC a and V PC be their route information is propagated into the outbound route table and otherwise this looks very similar to what we saw before so 10.2 done 0.20 puts a packet out it's received by the implicit router next top is the tgw it's received by the tgw a lookup is done in the associated route table next stop is the VPN attachment and at this point the traffic is flow hashed across the ecmp links the NAT instance will make routing decisions internally when it determines that it needs to go out to the Internet gateway it will put it back onto the subnet within the V PC that packet will go through the implicit router and onto the Internet gateway return traffic same as before received by the NAT which has done a source net so it's going back to the same one the 10.2 does here about 20 address is reset as the destination packets received by transit gateway the associated wrap tables consulted and the packet is forwarded to the attachment for V PCB so again trade-offs here this is a this takes a little bit more effort to set up you have no bandwidth throughput limitations at least not theoretical bandwidth throughput limitations in this one you can continue to scale your NAT instances in the prior example the attachments will burst up to 50 gigabits per second in this one you could continue to scale horizontally the challenge with this one however is I mentioned that ecmp is flow hashing the connections so if there is a change in the configuration of the edges right the number of VPN attachments that we have then your flow could be rehashed if you have a stateful firewall incapability that will likely cause a connection reset let's look at an ingress use case so we've got our VP SE and on the Left it's attached to a vbc route domain we have our edge V PC on the right again using the VPN ecmp approach A's routes are propagated to the edge route domain the instances on the right they propagate their routes to the V PC route domain so let's see what it looks like for traffic that's coming into the environment for fun we'll put an elastic load balancing load balancer on the front of it to load bounce the packets across the front ends packet comes in to the net instance the target workloads determined that instance then makes a change for the target right the destination address so let's say 10.10 10 and resets itself to source net with its 100 dot 64 address that packet is sent down the VPN tunnel when it's received by the AWS transit gateway the packet is assessed against The Associated route table for that attachment The Associated route table says the next top is the attachment for VP CA the packets delivered when 10.10 to attend wants to return the packet sends it to the implicit router VPC route table is consulted 164 remember that was the source that address is the next top to the transit gateway transit gateway has a route to the VPN it's delivered back to the source NAT and onwards to the Internet gateway one last use case and I'll just forewarn you that when you think I'm done I'm only halfway so hold on so we're going to do V PC - V PC inspection so tend to die 0.20 wants to talk to 10.10 dot ten the configuration is very similar to what we did before for the centralized V PC using VPN so I won't go over the whole configuration again well let's watch how the packet will flow here keeping in mind that I've taken away the blackhole route so 10.20 220 originates the packet goes to the implicit router on the subnet route table lookup occurs 10/8 is available via the transit gateway when it's received through the attachment route lookup and The Associated route table default route points to the V PC attached the VPN attachments in this case the VPN attachments are receiving the routes for these V PC subnets so I mentioned earlier that in the case of a VPN attachment the VPNs will receive over BGP the routes in the associated route table so if you notice the inline V PC route table only has the default gateway route and that's so that these instances can actually reach out to the public IPS that terminate the site-to-site VPN but the actual routing the 10.1 and 10.2 addresses are known to the instance route table because it's participating in bgp in the VPC Rock table for the inline V PC does not actually need that route data so the packets gone through that NAT instance it's been returned they're inbound packet remembers going to 10.10 at 10:00 but at this point it's been sourced and added so it looks like it's coming from 100 dot 64 when it's received by the transit gateway The Associated route tables consulted the packets for dodon two attachment B I'm halfway so now the 10.1 done 0.10 instance is going to send a response to 10.2 20 and i think you know how this goes at this point the packets for did the V PC subnet has a route to 1064 point to the transit gateway the transit gate we has a route point to the V PC attached the VPN attachments in the VPN that source that is reversed passed back to the transit gateway 10.2 route exists and points to the attachment for V PCB so let me reiterate the thing that I said at the beginning of the session all of these things work and you can do them talk to your essay before deciding you need to so let's look at a reference architecture we talked about flood and segmented common use cases we've also talked about how to address traffic outbound and inbound from the internet and I can do inline services for both east-west and north-south if you want to tie together transit gateway in multiple regions you have a couple of options well you have one option at the moment which is to use our transit VPC pattern so you have a hub set of VPN appliances that are participating with transit gateways to provide connectivity of course if you're just connecting to V pcs you can use inter region peering with V PC Atos transit gateway does have inter region support coming soon now the last thing that I wanted to share with you is a pattern that we've seen start to emerge with customers and I want to go through it largely because we're seeing some anti patterns and how customers are trying to deliver this type of architecture so we want to give it to you very specifically so you know what to do if you want to implement a similar architecture what we're going to try and do here is have a shared services VP see where all of our key resources exist VPC endpoints powered by private link route 53 resolver connectivity is available to and from it and through the VPN and direct connect gateway from our on-premises environment so let's start to build this now for this particular configuration I tested this with a session manager a session manager just if you don't know is a fully managed AWS Systems Manager capability that allows you to manage Amazon ec2 instances through an interactive one-click based shell or through the AWS CLI so session manager effectively provides you the secure and more importantly auditable instance management without the need to open inbound ports or maintain bastions or might rotate and maintain SSH keys it's a very very powerful capability and it also works through VPC endpoints using private link so you can have a completely isolated V PC and be able to get into instances using a I excuse me I am authentication so I'm going to use that as my V PC endpoint so let's start to build this so the first thing I need to do is I'm going to share my transit gateway with the other V PC instances and I should note that the V PC instance for development and test are in their own isolated accounts production and the shared services V PC or in a common organization and the AWS transit gateway is what's created by and effectively is owned by the shared services V PC so I need to use AWS resources access manager to share this transit gateway with the other V pcs so I'll create a share I'll define the shared resource which is my transit gateway I'll define the principles that I'm sharing it with in this case you see an account for development and account for testing and the organization that includes the production VPC in each of those V pcs I will accept I should say in each of those accounts I will accept the resource share and that allows me then to create attachments to the transit gateway so at this point and I'm using a default configuration single route table single Association single propagation point at this point I have connectivity between all four of these V pcs and my on-premises environment ok the next thing I want to do is set up the V PC endpoints for sessions manager there are three of them so to do that I will provision these V PC endpoints powered by private link in the shared services V PC when you provision your V PC endpoint you have this option enable private DNS name and I wanted to spend a moment talking about this because this is where we're seeing a common anti-pattern if you check this box what will happen behind the scenes is AWS will create for you a route 53 private hosted zone and we will associate that with the shared services V PC and the effect of that is that when you resolve the public dns name for let's say SSM US east to that Amazon AWS com you'll get back the private IP address and that's the behavior that you want but if Amazon is created that private house is done for you you have no mechanism to share that with your other V pcs so what we'll do instead is we will not check this we will leave it as it is and we'll get back after the creation an endpoint specific DNS host name for the V PC endpoint then we can create a private hosted zone at the apex in this case SSM US east to Amazon AWS comm create an alias record at the apex pointing to that endpoint specific host name and then using the AWS CLI or api's we can share that private hosted zone to the other accounts note that sharing of a private hosted zone between accounts outside the accountant is owning the private asset zone can only be done via the CLI or the API but it is possible so please do not use route 53 resolvers to talk to themselves to communicate zone information so at this point if a box in the development VPC wants to get the IP address for SSM USC students on a dubious comm it will create its dot - resolver it will get back the private IP address and it has network connectivity it can get to it now I also said that I wanted to enable connectivity to the on-premises environment so what we'll do here is establish using route 53 resolver endpoints the ability to resolve from our corporate infrastructure and resolve from our VP C's for the appropriate zones so for example I want the corporate infrastructure to be able to get to that SSM private endpoint and I want the development accountant for example to be able to resolve Corp internal addresses will use route 53 resolver endpoints to do this there are two types of resolver endpoints there is an outbound resolver endpoint which allows queries in the awsm AWS cloud to be conditionally forwarded to an on-premises DNS server and conversely we also have the notion of an inbound route 53 resolver and point that allows the on-premises environment to conditionally forward to AWS to get route resolution for things like private hosted zones so that's all well and good the corporate infrastructure has the ability now to resolve that SSM fully qualified domain name it will get the internal IP address it has the connectivity and the shared services VPC can resolve Corp dot internal addresses but at this moment the development testing and production organizations cannot they need access to that outbound endpoint to do that you'll again use AWS resource access manager and the resource access manager will let you create an outbound rule that you can then share with the other three VP sees you accept that rule and now when the an instance in the development PC wants to make a query to resolve Corp tout internal it makes that request to its dot to DNS server in the background route 53 resolver will make a query through the outbound endpoint to the on-premises environment receive the information and then return it to the instance in the development account please use this as a reference architecture and I'll just say it again because we've seen it as a common anti-pattern please do not connect inbound and outbound endpoints to communicate own information we have the ability for you to share private hosted zones and for you to share outbound endpoints some parting thoughts for you so as I'm wrapping up the session here things to think about the tgw provides customers a centralized place to simplify at scale VPC and on-premises connectivity the TW provides ecmp for VPN attachment types that allows you to scale your VPN connectivity horizontally and the TG UW provides you robust routing capabilities through the use of route domains and as you've probably heard the vast majority of our roadmap more than 95 percent of it comes from our customers if you have input on how we can deliver additional benefits to you using the transit gateway please let us know we want to hear from you I want to thank all of you for joining me and I hope this session helps you better understand tgw its capabilities and the flexibility and affords I'm happy to take questions out on the site afterwards enjoy the summit and thanks for your time [Applause]
Info
Channel: Amazon Web Services
Views: 26,627
Rating: 4.9239545 out of 5
Keywords: AWS, Amazon Web Services, Cloud, cloud computing, AWS Cloud
Id: S9fEydjJ9qo
Channel Id: undefined
Length: 41min 50sec (2510 seconds)
Published: Thu Jun 20 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.