AWS re:Invent 2016: From One to Many: Evolving VPC Design (ARC302)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

my name is Rob Alexander and welcome to my talk on BBC design this is the fourth year that we've been doing this talk how many of you have seen this before been to this one couple good thank you appreciate you coming back there is new stuff don't worry but we always start with this slide and that's because this talk is based upon practical designs we see working out in the wild in production workloads with our customers we kind of distill those down every year and find kind of the best designs we see working where customers are doing interesting things with VPC and that's what this talk is all about so do try these at home there's nothing theoretical here I will also point out that I do read all your feedback and the overwhelming feedback last year was more designs less screenshots of router configurations so you'll notice we we cranked down the the level from 400 to 300 that is directly due to less screenshots of router configurations so please give me feedback again let me know if I got the porridge right and will will correct from next year if I if I didn't get it quite right so it's 300 level talk but I don't spend a lot of time talking about the basics so if you don't know what our out table is in V PC you don't know the difference between a security group and Knakal you don't know what Direct Connect is don't know what in there gateway is or why you would want one did a few talks for you that you might want to watch in advance so these two will give you a firm grounding you can listen and hopefully get back and and catch up on the things that you might not have caught first time around so when I first started this talk four years ago it was mostly about why you would even want one of these things customers were like what why do I want to go into VPC what is it good for that has to quickly changed these are more my conversations today's how do I build something like this so we are going to get to that and that's what this talk is about it's getting from one to this but first you have to start with one so you know a VP see a virtual private cloud is your isolated container your own private network within AWS where you define all the networking that goes into it and upon creation the one big decision you have to make is you have to choose what IP space to allocate to it site or block to designate for all the resources you're going to launch in that VPC so that sliders fixed on VPC creation and you can't change it after after you designate it so the biggest is slash 16 all the way down just last 28 and I always recommend go big go as big as you can if there is not a reason for you not to go to slash 16 do slapstick 16 if you're worried about wasting IP space don't I've never had a customer come back to me and say I wish I would have made that BBC smaller the opposite however it happens way too much so just a few guidelines on the IP space design so a few recommendations clickers not behaving so plan for expansion you know if you're not going into if you're only going into one region think about other regions when you're thinking about your space and make sure it's still going to make sense with other regions and play absolutely consider future connectivity to your corporate networks other partner networks any kind of external connectivity you might be in have that in mind when you're designing your IP space and make sure you're not overlapping that's just asking for headache down the road that you don't need so the next thing after the space after the vbc spaces you got it you've got to carve out some subnets so subnets are available these own specific VPC is a region-wide construct subnets are AZ specific make sure you distribute your IP space equally across all A's YZ if you're in a region that has three or more a Z's make sure you plan for that so even though you might only be going day one into a Z's save that equal distribution of IP space for at least three a Z's I promise you in a year or two you will want that space maybe even a few months depending on how you go you know each a Z has its own resource pool so for example if you're in the spot market that's a whole nother market over there that you're not taking advantage of in that third AZ so plan for it so what about subnets how big how many should I have well not this many and again this is not uncommon you know I spent many years in enterprise IT and we used subnets as a unit of deployment and a unit of isolation this is very common and the tools that you have available to you and VPC are much richer and much more give you a lot more functionality than a subnet so if you do this if you do one subnet for application and across three a Z's you're going to have at least one person with their full-time job of managing your IP space and it's just not necessary so these are my recommendations for VPC subnet design the first thing is that traditional switching limitations and the reason that you might have done that in your own data center don't apply here is a virtual network there's no performance impact for having thousands and thousands and thousands of instances in a single subnet there's no broadcast domain you're not and I have any of those limitations VPC so I always advise consider large mixed-use subnets don't use them as isolation containers make sure you have a very large diverse ability to pool from IPs and you're not constraining one 7.f because this is the subnet for app a and you run out of IP s over there but you can't poo I peas from another not because that subnets for application be it just it's not necessary and VPC use security groups instead to enforce isolation you know by default you get to start with 500 security groups and a VPC that's much more robust ability to isolate things than summits and then tags for grouping resources you know tags can drive cost allocation cap they can drive very fine-grained permissions and controls through Identity and Access Management I am you can't get that out of subnets so what do you use subnets for you use subnets as containers for routing policy that's really what my primary recommendation is think of them as containers for routing policy and we'll talk about what that means and if you don't know what I mean about when I say performance and thousands and thousands of instances in a single subnet why that doesn't matter with VPC please go watch this talk net 401 will give you a really good under the hood perspective of how V PC really works so here's a good starter home this will this will work for 80 percent of you for both starting and a long way into the future have many very large and complicated customers using a very similar subnet design in a single V PC and those of you that you can see here slashed waiting to give you a thousand in your public little over a thousand and the slash 20 will give you a little over four thousand that just this distribution means you have about 50,000 IP still left over for future expansion so it's a good start but we're actually going to start with three subnet spray-z because I do want to focus on that point about routing policy so I'm going to have I have three different routing policies for this V PC so a subnet dedicated to each one of those routing policies and we'll go through what that means so I took my advice and I'd put a slash 16 on this V PC the first routing policy well the first is to point out that by default all the subnets in a V PC can talk to each other so it's a flat network it's a star there's like a virtual router that is represented in each subnet by this dot one address which is handed out by the V PC DHCP server and that every route table by default will have this local route that facilitates that routing within the subnets that that route is put there by default and it's you cannot manipulate it so the first of my subnet policies is I'm gonna have a public subnet so anything I deployed in this public subnet is get automatically going to get a public IP I'm going to create an Internet gateway and igw this is a logical construct that allows and facilitates the egress and ingress of traffic into the V PC so just attaching the igw doesn't mean that automatically you've opened up your V PC to public traffic it just gives the facilitation of that what you also have to do is provide a route to the igw which you see there so your default route now is pointing to the igw and with that and public IPS instances in that subnet can now egress out the internet gateway and out to public networks and depending upon their security group and network apple configuration they can also be reached from the outside now my routing policy for my bottom subnet is very different I want that bottom subnet to only be private IP s no public IPs and and no routes to public networks so it's going to be tied into my corporate network later so I'm going to create a virtual private gateway a vgw you see represented there at the bottom and that's going to be a termination point later for things like VPN tunnels or Direct Connect connections and so you see there I have added a route for my corporate cider block that represents my corporate network and that is directed towards this virtual private gateway so everything private is now directed there to reach those external networks and then my third routing product routing policy is this subnet in the middle which is kind of a hybrid so it's also a private subnet so everything in there has private IPs but we also want to facilitate those instances and resources in the subnets to be able to reach public networks so how do you do that well the first question is why would you do that why would you want that why would you want private resources to reach outside and there's a lot of reasons and some of them might not be as obvious as others the first one is AWS API endpoints so if you're making any API calls from instances in the subnets whether that's you know auto-scaling calls or moving in eyes or trying to reach up AWS public services like s3 or Kinesis lamda SNS those are all public regional resources that exist on the public network so you have to be able to get out of the view PC to be able to talk to those endpoints and same with third-party services any software platforms or paths or size platforms that you might use obviously exist outside your V PC so the traditional way to do this is to deploy a NAT instance so there's an ec2 instance that it's providing network address translation so it has a public IP in fronts those instances that are only private IPS translates and allows them to egress out and you facilitate that in the routing table by pointing your route to the actual NAT instance itself the instance itself to be able to get out the problem with that is is scaling is that you know as your traffic increases in your requirements to get out through this gnat instance it's a single point of failure it has a dedicated amount of bandwidth so it's not scaling it's a single ADC to instance and eventually you're going to you know burn the thing up so how do we do scaleable and available net so this is a first set of requirements we're going to go through a number of these requirements as we go through our different designs so this is the first point we're public subnets for resources are reachable from the internet we have private subnets with egress only access to public networks we want scaleable available NAT and for now we're dealing with just one AWS account one region and one BPC so back to the nad on fire you know we used to have all these complicated ways to deploy multiple NAT instances adjust route tables to go between them and facilitate some kind of high availability with ec2 based NAT instances fortunately the late last year we launched the nat gateway which is pretty much from an objects perspective a drop-in replacement for them for the nat instance you point your routes to it the same it takes an EIP elastic IP as a public IP address so from you know the design perspective it looks like an ad instance but underneath the hood there's a lot more going on here that makes them out gateway special I'd like to just point out a few things so what why would you use an ad gateway over in that instance you know most of us know how that works private comes in with a source IP import the nat instance source and that's that so it replaces that source IP with its own public IP that's all well and good the tricky part is that source port because that has to be unique when you're talking to networks out so if you have for example lots of instances going out and trying to get a security update that has the same endpoint out there the same destination port an IP address your source port has to be unique it has to be unique connection for every single one says you have more and more and more connections going out to the same destinations every time you're opening a new connection on in that instance it's having to check its table choose a new unique source port and in a very atomic and transactional way open that connection and make sure that that source port is unique you know this is quite a similar problem with a database but databases work in milliseconds this needs to work in microseconds so the traditional you know scaling for an ad instance has been to vertically scale it keep everything in memory keep it fast keep everything in microseconds but horizontal scaling of of nat instances has been very difficult and so that's where nat gateway comes in we applied some very innovative engineering to address this problem of horizontal scale for NAT so NAT Gateway is a fully horizontally scale has full replication of connection state and is highly available so so there we are you drop it in that gateway and there are a few requirements it's not like an IG w or v GW where it's a V PC wide service you deploy in that gateway in a specific availability zone highly available and fault tolerant for that availability zone so you still need that internet gateway you still need separate subnets to be able to define the routing policy to use that that gateway it requires an elastic IP assigned to it so it has a public address and it's each each net gateway is burstable 210 gig of bandwidth today so it actually appears as a lasting network interface in your subnet when you create this net gateway but you'll notice there's no security group assigned to it because you can't assign a security group to that gateway so how do you secure access if you want to be very restrictive about who can actually use the net gateway how would you do that so the first thing is network activists they still apply so if you're doing some coarse grain control and the subnet access itself those will still apply to get to the net gateway the second thing is is what I've been harping on is use routing policy so have subnets like we have that are that have a route table assigned to them that allows them to use then at Gateway versus subnets that don't and the third one is really the key one so use security groups to restrict outbound so I find most customers don't ever mess with their outbound rules on their security groups and by default if you create a new security group your outbound is open to all traffic so you see there you know a destination for everything port range for everything that's a default VPC security group so what I recommend is having dedicated so your default changing make it only RFC 1918 space make it only the space within your V PC you know if you're doing a lot of other network connectivity other networks and stuff this might look a little different but the default here is I put my outbound only to my B PC cider it only so they can only talk to stuff in the VPC and then i create a separate nat enable VPC security group that then enables them to go anywhere so that gives the default now is much more lockdown so you can do this you can have one at Gateway and multiple AZ's using that not gateway but as I said before it's an it's an AZ specific deployment so for real availability if that something ever happened in that AZ you would have lost all possible NAT functionality in your V PC so we always recommend to deploy one at Gateway crazy OOP sorry you'll get the slides later too fast to take the picture so pros and cons of NAT gateway so pros to drop-in replacement for NAT gateway I mean for NAT instances it's very easy to migrate you just update your right route table you can even migrate the EIP over if you want to be consistent the only caveat is that obviously you're going to lose any open connections that are that are open at the time that you do the cutover it's fully managed highly available in fault-tolerant scaleable at 10 gig and it supports a PPC flow log so you can monitor and track all of the network traffic that's going through the Gateway on the con side there's no higher level functions it just does not so those of you that are pulling that instances that are doing things like IPS or UTM or any kind of URL filtering or packet inspection you know more the high level security functions that's not the neck that Gateway so you know we have lots of partners that offer those features and their NAT instances along with providing that so whether that's you know our Sophos or for Dannette or we have a whole range of partners our provide security appliances that do these high level functions and as I mentioned you can't associate security group so here we are we got our one V PC and now we're moving on to our next one where we have an internal only corporate IT app that we want to connect to our internal corporate network over IP SEC tunnels so what are the considerations when we're talking about multiple VP pcs when you start moving into one 2 3 10 100 V pcs what should be thinking about and why would you want to do that and why not one big VP PC and my first question back to you whenever I'm asked that from a customer is well why not 180 based account because those two go hand-in-hand as soon as you have more than one account you have more than one V PC so there are a few things that go with both having multiple accounts and multiple and multiple V pcs and the first one is blast radius so if you have one V PC and you have thousands of developers somebody's going to step on each other's toes you know I am allows you to give very fine-grained resource control over who can do what in your V PC but you cannot I am API limits right so if for example you have one misbehaving app that rounds out your throttle and limits on ec2 describe calls that affects everyone else so there is a certain level of blast radius that you must consider when you're putting resources into V PC this thing is gonna drive me crazy here's another very good reason to have both multiple accounts and multiple V pcs probably the best reason out there if you're not doing that come talk to me same with this one you know regulatory and compliance reasons keep them two separate accounts separate resources this is also another very common one so with disaster recovery separate region and separate account so you have all of your data replicated to a different a different region for disaster recovery and in case there's some kind of compromise in your account in your production that compromise cannot affect your disaster recovery because it's all in a separate account separate VPC completely isolated and then we do some things a little fancier to illustrate you know where we go with more VP season more accounts so here we have a you know an account in a V PC per business unit and we're sending all the application logs as three access logs ELB logs out of these accounts and these B pcs from these business units you know they're able to manage their own bottom line have cost visibility and get their own bill because they have their own account but we're sending all of their you know digital exhaust to a separate account you know things like a degras config also to track all the changes within their V pcs so if somebody changes a security group it changes in your gateway alters a route table you're going to know about it cloud trail to monitor all of their API calls is the audit log of all their API calls and then PPC flow logs again so you know all the traffic that's going in and out of their VP sees all this stuff is going to another account and another VPC dedicated to logging to auditing and analysis of this data so we might have an EMR and they're doing you know MapReduce jobs to reduce this data I have Amazon redshift doing you know data warehousing of this data for for querying analytics and then of course s3 has as the storage back-end so now we've gone through some of the reasons why that you would consider multiple V pcs let's let's dive back into our private app so this is the one we showed before where we have only a private connection to corporate network over an IPSec tunnel but the application in there wants to make heavy use of s3 is a primary data store so how do we facilitate in a private only VPC access to a public resource like s3 so our next round of design requirements so VPN connectivity to private only V PC no egress in the V PC to public networks so no public IP is whatsoever private IP access to Amazon s3 very contents this would be grass control access control so we control exactly what goes out of the V PC be able to filter it and one AWS account one V PC one Regent still so by default if you didn't do anything else this is the route that you would take to facilitate that connectivity you know you have to go down in your data center all the way out your data centers public board or back around to the region and that's obviously it's not an ideal way to go fortunately we have V PC endpoints which are there we go to meet exactly this requirement to provide private IP connectivity to a Davis services in this case s3 so V PC endpoints there's no internet gateway required there's no NAT you're not using public IPs it's free and you have very robust access control and it's private IP connectivity to s3 so here's an example of creating a s3 endpoint and assigning it to a subnet so this is the CLI you'll notice you actually tell it what route table you want so there we are and it creates the route for you so you see the destination is a prefix list and the target is the endpoint itself and we'll talk a little bit in a minute about what a prefix list is and to give an example here's here's a subnet that does have an Internet gateway what would happen if you created and assigned a V PC and point to this one here you have your default realities near gateway but you're still using the prefix list fors s3 for the service to access s3 that's obviously a more specific route for s3 so that's going to be used over the Internet gateway for s3 in that region if you access s3 in another region that endpoint is only for this region in this case uswest - so you would use the internet gateway to go to another region to access s3 and what if you had an AK gateway so if you already had a default route out of the nak gateway it's actually really important that you that you also add the VPC endpoint to that private subnet because otherwise you're funneling all of your s3 traffic through the neck gateway and if you know that that gateway it's charged on and you know the data transfer rates through that gateway so the BBC endpoint is not its private IP so it's a it's a cost decision there to to add a VPC endpoint to subnets that are using NAT to get out of the out of the out of the VPC so prefix lists are an abstraction so they basically represent the entire public IP space of s3 in the region and so all you need to do is reference the prefix list and it will figure out what the actual back-end IPs and this solved you know a lot of headaches that customers have before in trying to keep track of what actually consisted of s3 RFP space so obviously those those ranges change all the time in the backend but all you have to do is reference the prefix list and you can use them in your outbound security group rules so similarly to how I advised within that gateway you can do the same thing to restrict access to the VPC endpoint so you can say only certain instances in the subnet even though the subnet itself has a route to the VPC endpoint you can apply a security group to instance is to restrict them outbound to be able to use the BBC endpoint by referencing the prefix list and here's the fine-grained access control I mentioned so there's multiple stages you can be able to to apply policies to control egress out of your V PC so the first one is I can apply an IM policy to the V PC endpoint itself that says it can be as detailed as users and groups amount of content specific objects in this case I've gone pretty generic which is a specific bucket so you can only use this V PC endpoint to go out to specific bucket so I applied it to the endpoint and it's only forget input and so instances that have that are in that about they have the route to the V PC endpoint are allowed to get out if they are going to that bucket so that's on the endpoint itself but I can also apply a policy on the bucket so you have it on both ends now the bucket can say only specific V PC endpoints or only specific V pcs can talk to me so here we check and make sure that it's coming from a very specific V PC endpoint and if it is then you're allowed access to the bucket so multiple stages of access control so just to recap real quick so you have the route table that gives the subnet the resources in the subnet itself again subnet defines routing policy gives access to the V PC endpoint the second is that policy on the endpoint itself that says what you get what can go out of this V PC through this endpoint to access what resources and s3 the third thing is the bucket policy itself which says who can access this and then the fourth is the security groups so multiple stages so we'll just give a little in points in action here so we create a backups bucket for all these internet apps so that they can push those up to s3 but we have a compliance application which has more fine-grained requirements around what can go out and so we create a separate endpoint for them so you can create multiple endpoints and have different policies on these endpoints on it on a V PC the restriction is that you can only have in a route table one route to os3 so that prefix like so you can have multiple ones so for example if I wanted to add more buckets I wouldn't be creating new endpoints for these subnets because they already have a route to an endpoint so instead I would have to alter the policy itself to allow access to these additional buckets so pros and cons of the endpoints so secure highly scalable highly available access to s3 you don't have to worry about scaling it or bandwidth fine getting control of access to the content s3 from the V PC you can control which VP sees which endpoints can access which buckets and there's no public IPS that required and so all source IP is accessing s3 are kept private private your private IPs from your V PC negatives so cons if you do restrict an s3 bucket to a specific V PC or or specifically PC endpoint you can't use the s3 console anymore to manage s3 I hope that's obvious why but the console doesn't run your V PC so as soon as you put that policy out there you will go into this management console and you'll get a nice big error and you need to enable dns that doesn't mean you can't use your own dns if you do you just need to forward it on to the Amazon ec2 V PC DNS because we got to resolve those prefix lists all right so we got two down what's next so this is the part where this usually happens you know the it the easy deployment methodology of V pcs the ability to spin them up very fast do projects do tests and then actually deploy applications and connect them back into your corporate networks you're going to have a lot of e pcs very fast and and while that dynamic nature of V pcs is a great feature you know adding IPSec tunnels bringing in new networks propagating those into your corporate network making change controlling your Network border these are impactful changes and when they're coming rapid-fire like this your network engineers are not going to be happy with you so let's to emphasize you know just the what's involved in just one of these VPN connections let's zoom in on one of these and see what is a real H a VPN WPC connectivity looks like so every time you create a VPN connection to a V PC it's two tunnels so that's the redundancy on our end so one tunnel will Det will land in separate availability zones and that will connect down to one of your customer gateways so this is your VPN device terminating the tunnels and your end now to provide H a on your end you're going to want to customer gateways come on so that you can have redundancy on urine so now we've got four tunnels so four tunnels are involved in each connection and then BGP please don't do static routes always dynamic always BGP and this is the arrangement when you when you connect to us we advertise down our ASN and the cider block that comprises the VPC itself you advertise up your network so your internal corporate network what also comes down from us work our meds so multi exit discriminators so this is a way for BGP to for us to influence the routes that come down to you so we use those in our VPN as a service that backs the vgw to tell you which tunnel we're going to prefer so you need to make sure and honor those when they come down and you can do the same to us so if you want us to prefer if you want your traffic to prefer one cgw over the other you can do the same you can send up to us either meds or you can send up a s path through pending to favor one side of the other and then obviously you would react eyes the VPC ciders that you're learning from the V PC into your internal network through whatever IGP protocol you might be using on your corporate network alright so starting to stress out your network pipes are starting to really stress out your network operations team and all these things are accessing common shared services that exists on your corporate network so whether that's DNS or directory services logging monitoring security common data sources all that's coming down so our next set of requirements is to fix this situation so how do we centralize network connectivity so we're not being so impactful to our network teams centralize the management security and the common service access give account owners some kind of control and freedom over their own VPC resources but still have centralized security control over that access and now we're moving into many eight of us accounts many VP C's still one region so this model is the hub and spoke with peering so we are going to facilitate this by moving those common shared services up into a hub B PC so this shared services V PC that you see here and then all of the spoke B pcs will be peered through V PC peering so this is a one to one relationship between a spoke and the hub that facilitates private connectivity between V pcs and you'll notice the shared services V PC is the only one that has a consolidated connection down to your corporate network and it's also the only one that has any egress to the public network so the IG W and the V PC endpoint they're all only in the shared services v pcs the spokes are completely isolated so let's zoom in on the Shared Services Hub and one of the spokes to see how it works so here we see the establishment of the peering connection itself so this is a API call where both sides have to agree so you send it to each other and you're both thumbs-up that facilitates the creation of appearing connections that doesn't mean everything can start talking to each other all of a sudden what it does mean is you can now start creating routing rules that reference the peering connection so you'll see here I create a private route table on the spoke that says I want to reference the shared services subnet through the peering connection you can be as specific as you want you don't have to say the entire VPC through the route so you can take this down to a slash 32 and that means that your spoke would only be able to talk to a single host in that over that peering connection but on the other end we give the shared services VPC access to the entire B PC space of the spoke so now these two can actually route between each other all private IP connectivity between the two Vee pcs but what about if I want to reach down from the spoke into my corporate network I can add a route for that so I've done that here to reference the peering connection because that's the direction of my corporate network but when I actually try to do that the hub PPC is going to see traffic trying to egress that's not part of its network and it's going to drop it and that's that's by design so you can't have a transitive relationship with VPC peering so I can't be paired with something and then reach other things that that PPC is connected to whether that's external networks or other spoke VP C's or the vgw or a GW or the V PC endpoint all of those are transitive relationships all I can do with a peering connection is talk to the actual cider space in that PPC that I'm peared with but what if we do want to facilitate that kind of connectivity what are our options there so probably the most common initial is to deploy a centralized proxy so a lot of customers there oh they already have centralized proxies for a lot of their outbound access and they'll deploy a centralized proxy layer up into their shared services V PC and will provide you know HTTP connectivity to facilitate this in the end kind of communication so how would that work so here we have internal ELB deployed into the shared services V PC backed by proxy fleet and this is this is not a transparent proxy so this is not inline this would be explicit proxy so you would have to actually configure your endpoints your instances or devices to use the proxy for HTTP communication so here you see you're going across the peering connection to the ELB and hitting a resource in that V PC that can now use the V GW and down to your corporate network and similarly to use egress out of the Internet and then this proxy can be a centralized source of URL filtering and determining what content can and can not go out so it's a centralized resource for all HTTP traffic and you see here the proxy is the only one that has all the routes to all these resources so let's zoom in and see like with this how this is actually configured and you'll see here that I'm I'm building up layers but keep in mind that star network so by default everything is flat everything can talk to each other so all these layers that I'm representing are determined on security groups routing policy and network Ackles to actually form these layers so first we've got an internal ELB so it's all private IPs and that's the destination for my proxy backing it R is the actual instances that are performing the proxy functions so these can be you know whether it's open source squid or a third-party vendor that you're using to do your proxy does it behind the elbe and they are the only ones that have public IPs and they are the only ones that have a route to the internet gateway and so the private subnet in the spoke VP sees you can actually use security groups across peers now so that actually wasn't possible until this year is to share security groups across across a peer relationship so the ELB itself could have a security group of spoke to say these spoke resources can use me and are able to access the EOB and similarly for accessing public resources go same for using the V PC endpoint and finally administrator it's so operations staff back on your corporate network they're going to have to land on something and that shared services VPC so something like a passion hose administrative hose that they land on that the date they can then jump on to spokes or any period relationship but they can also use the proxy so if you want to facilitate for example as 3 connectivity private s3 connectivity through your V pcs through the shared services V PC you can certainly do that and so those devices can be configured to use the proxy in your shared services and out they go so a few to do's on the Shared Services Hub make sure you're using I am to control you know talked about these teams owning their own V PC having their own account but you got to make sure that they can't alter the design that you've put in place so you got to make sure they can't tear it on the peering relationship or put a igw on their V PC so all of the hi blast radius network specific API calls you need make sure you're locking this down so create net up net ops roles and make sure you apply these roles to every one of your accounts so that your network operations teams are the only ones that can jump in there assume that role and make network related V PC changes there's a link there for we now have managed rolls so there's one for network administrators which means it's a managed policy which means AWS is responsible for keeping it updated so as we release new network related features and products that policy will automatically be updated so take a look at that we've talked about cloud trail config and VPC flow logs make sure you're making use of those to track everything that's going on and all your VP sees so quick pro con on this one so we're minimizing on-premises Network changes by you know consolidating down to a fixed number of pipes up into the region we've reduced a latency and the cost of cloud applications because we've moved those shared services up they're no longer having to come down to your corporate network to access them all we've provided spoke accounts control over their own resources but we have centralized control of everything that's coming out of them with the proxy and the security groups are working across all the peers so that we're securing the traffic on the con side for the hub-and-spoke obviously you're managing the cost and the management of the actual proxy so all those instances that comprise the actual proxy are your responsibility I mentioned it's not a transparent proxy so you're having to do the configuration to point everything to that proxy to use it for HTTP it's restricted to l7 to HTTPS and we point out the no transitive networking and the peering data transfer across so there is a cost to use peering once in a gig and so if you're doing very significant data transfers between period V pcs you want to pay attention to that traffic and and figure out whether the peering makes sense for you alright so here we got multiple hubs you know you have maybe a dev hub and maybe have a prod hub but it's not limited to just that you might have a data services hub that might be tied in but they can be connected to each other you know peering is very flexible and can also be dynamic so it's something that I don't think is quite apparent at first blush with peering but you're going to have you know it's just an API call to establish connectivity between two V pcs so you might have a deployment use case where you have a promotion of data from one environment to another and traditionally these environments wouldn't be connected but programmatically you could just issue a V PC peering connection connect them up transfer the data tear down the connection but now we've got we've got your next hot mobile app and one of these spokes so we're going to zoom in on this one spoke and the hub so we have this mobile mobile app and this is what I'm calling an hybrid server this which I'm trademarking and this is I'm seeing this more and more where you have a private V PC the resources in there to have no I double you at all private but you're using public AWS resources to provide a front into the resources in the V PC so for example Amazon API gateway is providing a landing point for your mobile ap eyes and it's providing strong authentication and it's providing metering and it also provides access to lambda functions lambda can reach inside your V PC so lambda can privately access your V PC it does that by creating an elastic network interface that the function then uses to access you know the interface itself has a private IP from your V PC and it uses that to access resources in your V PC privately so there's no public access here so here we're accessing you know we have a backing store of Amazon Aurora deployed into this V PC for the mobile app to use and that works great but what doesn't work is the new functionality that we want to launch which depends upon legacy applications that are still back in your data center back home so this is very common where customers are you know advancing the their mobile application at a certain pace releasing new features but they have dependency on the legacy applications they can do this build these things up in the cloud and still be able to reach back to legacy apps and make use of the data that's still in your corporate network how do you do this you know we've already talked about transitive not working so here we go we've got all these spokes now that have requirements to have end-to-end layer 3 connectivity so no longer just proxy high-level layer 7 we won't end to end connectivity and we've also got some other regions out there that are growing we'd like to bring them into the fold so now we move on to BBC mass transit and here's our requirements now so now we want to again we're centralizing and minimizing the network connections but now we want to allow in the end routing no minimal operational overhead and we want to leverage the AWS network with many paid Avis accounts many V pcs and mini regions so this is where we bring in a concept of a transit V PC and a transit V PC is dedicated to just that facilitating connectivity to between networks so we keep it as simple as possible all that's going in here our dedicated ec2 based routing devices that are going to be responsible for maintaining VPN tunnels to a lot of different networks and facilitating the communication between all of them there's no other resources in the transit PPC we keep it simple just has a default route to it to an IG W and no other resources in the V PC that are depending upon pointing to these things and then we start adding spokes to them and the spokes are dead simple there just point default route out to their vgw it's kind of turning the model on your head because you know these ec2 based VPN devices are basically asked acting as customer gateways but in this model it's the reverse they're not really the client they're the hub so as we add more and more spokes this is again all BGP all dynamic routing we're learning all these networks and these centralized transit routers are facilitating Indian connectivity between all spokes and we can do the same for bringing in external networks we just land the VPN tunnels and connect them directly to these transit routers and and it's you know as I mentioned in the proxy these are now centralized methods for traffic control for security and filtering so if this sounds like something you're actually interested in the solution builder team at AWS has actually built a solution for transit VPC and we're going to talk a little bit about how that works so there's a CloudFormation template that sets all of this up for you it's based off the Cisco cloud services router so the CSR which is a full-stack virtualized ASR so there's all the features of your of your hardware ASR but with a BYO L or pay-as-you-go license model and the cloud formation stack for transit EPC launch it and in the template all you have to do is choose what bandwidth you want so at the low end is two with 500 megabytes megabits excuse me per router all the way up to four point five gigabits per second per router so it creates that transit VPC for you deploys in with the IPS assigns carrier network space to it so this is not RFC 1918 space it's shared access space and then it has a very cool automation to be able to actually start creating your transit connections so spoke connections see very simple all we have is igw out and it uses s3 for storing configuration so we have a route to a VPC endpoint and how do we do availability for these things so I'm sure you're wondering like this you have centralized routers are now responsible for all this communication between all these spokes we use easy to auto recovery so if you're not familiar with this feature this is all driven by a status check alarm called status check failed system so this is a cloud watch metric that is a summation of a bunch of ec2 related health checks that are rolled up into this one single metric that say what the health of the underlying host its power it's networking as software everything about that that physical host hosting your instance and what you can do is set when the alarm is triggered on that metric if it fails you can trigger to recover the instance which means it will be migrated to new hardware automatically and it will retain all the characteristics of that instance from private IPs to the en eyes that are on there to the EBS volumes that were mounted all the way to the instance ID itself so it is identical and this is supported on all the instance types you see there with EBS only its dependent upon EBS storage to facilitate the migration so how do you add a spoke so this is all driven by tagging so you go to the vgw of the spoke you add this tag that says transit VPC spoke equals true there's a lambda function out there that's running every minute that cycles through all of your accounts and all your regions to check for this tag on a V PC that it does not have VPN connections already so it finds that tag it builds the necessary VPN connections so it creates your CDW's if it needs to the customer gateways creates the VPN connection it bundles that out up into XML file and encrypts that and stores that on s3 which triggers another lambda function by putting that object onto s3 launches a cisco configuration takes that configuration puts it into something cisco friendly and then actually uses the same private access it lambda access so it creates an en I has permissions to SSH and dear transit routers and pushes that configuration into the CSRs automatically so within a matter of minutes those networks are connected the tunnels come up bgp shared everybody's happy and you can also use a there's some knobs and dials in there to be able to do route influencing so if you want to preferred route between your between your CSRs you can do that too so by default they're active active which is fine if you're stateless but if you're doing something like a PC as ours are so going to do that or if they're going to do stateful file walling or anything that needs to maintain state and not have a symmetric routing then you'll want to look into this feature so this again just representing all the tunnels here is driven by a tag that you put on the vgw so you put a tag that says preferred path equals CSR one or CSR two now will again drive a configuration change to prefer one or the other and I promise this is my only router configuration screenshot I just wanted to show you that this is what it does it just automatically adds a path prepending in there to influence that route so this was from the CSR - which is the one you didn't want to prefer so it it doubles up on the buffer so that is going to be obviously less preferred it applies that route map to that neighbor it's finally dying and the same is for removing spokes so all you have to do is change that tag either false or blank it out or whatever that polar is going to be running every minute and checking the state of that tag and it will tear down that connection push the new configuration that has removed those connections cisco configurator lambda function will launch and it will reconfigure your routers for you automatically so it's fun stuff it's I encourage you to check it out it's fun to play with it's really easy to launch the template and get running and start figuring out if it's going to work for your multi VPC configuration alright so here we go we got it looks very similar to Bob and spoke but now it's all l3 so this is all driven by tunneling VP IPSec tunnels into these centralized routers and and BGP driving all the dynamic routing between all of the various networks so you can go everywhere you can go from your branch to a spoke you can go from a spoke to another spoke your going to go spoke to your to the internet so full into end so now we're starting to get there so we're bringing in that second region also into our transit we're representing here that you can run a transit hub in each region or you can move other regions over and have spokes you know it's completely up to you how you want to run the transit VPC you know where you want your point of consolidation to be and I just point out here that we're also running in a full mesh between transit hubs to different regions over the AWS network so real quick pros and cons of the transit VPC so indian routing central transit routers can perform higher level functions their central point of control for you the spoke BG w's are dead simple and their HJ by default this minimizes on-premise networking changes because you're not changing anything about your connectivity to the transit VPC all the really dynamic stuff you're you're launching all these new PPC's that's all happening in the transit VPC itself and it can minimize cost if you're already doing this and you can offload this function from you know purchasing new networking gear or something for a Colo instead you can move this up into a transit EPC on the on the con side we see availability and management of the transit router is up to you we talked a little bit about the availability the licensing cost of Cisco and the cost of the data transfer is important to point out because everything transit related is going over the igw so it's all internet out so for example if you're going from transit to spoke or spoke to transit to home each one of those is an internet out step so you need to pay attention to your data transfer costs so if everyone's okay I'm running low on time I didn't I didn't realize how this was going to go I've got one more section take about five minutes but I'll try to blow through it so this is just bringing in Direct Connect to this picture and more regions so we want to use an existing private network that we have so we have an MPLS Network we don't want to use the internet anymore to connect our regions but take advantage of free existing network and we want to bring in more regions so dedicated scalable network connectivity database leverage our existing corporate network predictable latency in many many many many regions many accounts many V pcs so transit V PC with Direct Connect this works perfectly fine and there's actually you know a cool little feature here with what we're calling it a detached vgw and for those of you that are the V PC hackers in the crowd you will noodle on this one tonight figure out what else you can do to play with this one so this is a cool one you can use a vgw without actually being attached to a V PC I don't know if you ever try that but you can so if I tag this thing with the transit tag the transit router is going to build up tunnels to this v GW now right now it's one sided there's nothing else attached to the v GW but you can attach a private Direct Connect interface to it so now on one side you've got the private virtual interface for Direct Connect attach this vgw another side you have your VPN tunnels into your CSR's so all the BGP propagation is going to happen and facilitate through this like free-floating life raft of network connectivity is not attached to V BC so this scales up to about a gigabit of flow so if you're doing multiple gigabits you can create more of these but if you're doing you know like tens and 20s 30 40 gigabits of traffic of a Direct Connect you don't want to do this you want to go a more traditional route which is setting up the private Direct Connect directly into the vgw attached to the transit VPC and then running tunnels over Direct Connect so you can do this privately like this so here we're representing the tunnels from your on-premise VPN devices directly into the CSR transit routers or you can also do it over public it's up to you there's pluses and minuses on both so with public your obviously it's a public network now and you're getting all of AWS public IP space advertised down to you and you also have to secure that on your side you have to treat it like a public network so you'll have firewalls and such here we go nine security layers on your side because it is a public network but you can still facilitate the tunnels across the Direct Connect connection it's private dedicated bandwidth and you'll just be terminating those total tunnels on the public VIPs are the transit routers instead of of the private ip's the advantage of the public is that once in the u.s. once you connect in in one Direct Connect location with a public Direct Connect interface you get access to all the AWS regions so if you were running transit hubs in multiple regions through this single connection and here we represent it in the Equinix in Chicago to the new US East region is where you're directly direct connected into so you obviously have access to your regions all your VP sees in that region but you would also have access to the remote regions of just click whatever work so you'd be able to VPN and it would still be all over abus private network you would not be able to do that with a private Direct Connect v private Direct Connect interfaces can only be connected to V pcs in the same region that your direct connected into I'll skip this this can be home work work for you later there's a really good deep dive if you want to go to all the details of Direct Connect M VPNs net 402 and just real quickly I mean hooking up at existing MPLS like IP VPN network to a region is is dead simple most of your providers they're already in our Direct Connect locations that ats region is already a point of presence on their global cloud so all it is is facilitating adding that new pop to your existing IP VPN you know calling up your provider and saying hey I want to add us to I want to add to the network because you know your facility they're facilitating all the routing you're having that relationship with their PE routers in each of your locations and they just bring in that new network onto your vrf representing your cloud and now you've got all of the the cider blocks from the ADA prescription so same with bringing in multiple locations here we have Chicago and London the only thing to watch out for is as your MPLS cloud gets to a certain size there are some limits so there is a 100 route max in the VPC route table so if you have many many points of presence on your MPLS cloud you're going to need to get your provider to either summarize those down to nice and clean smaller routes or advertise a default route to you because 100 is a hard limit for us in the VPC route table and then you're obviously going to be seeing our Ras in coming in from different locations so you have to do things like BGP EAS override to make sure there's not any bgp loops when you're bringing in multiple regions all right so pros of using Direct Connect with transit EPC private network no internet dependencies predictable latencies because you have that dedicated bandwidth those dedicated links into the regions and access to public networks of all US regions over a single us-based direct connection connection on the negative you know you're getting all of our public IP space you might not want all that so you might be having to do some filtering on what you get from us as far as public network announcements and then 100 route limit we just talked about and obviously the cost of a provider network and of all these Direct Connect connections alright so we finally got to the picture thank you very much for your time so I ran a little bit over I really appreciate please give me feedback thank you

Info

Channel: Amazon Web Services

Views: 20,762

Rating: undefined out of 5

Keywords: AWS, Amazon Web Services, Cloud, cloud computing, AWS Cloud, AWS re:Invent 2016, aws reinvent, reinvent2016, aws, cloud, amazon web services, aws cloud, re:Invent, ARC302, Rob Alexander, Expert (400 Level), Architecture

Id: 3Gv47NASmU4

Channel Id: undefined

Length: 66min 5sec (3965 seconds)

Published: Thu Dec 01 2016