AWS re:Invent 2019: Optimizing for performance in CloudFront: Every millisecond counts! (NET309-R1)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so in today's session we're gonna be talking about optimizing for performance in Amazon CloudFront every millisecond counts I'm Tina Tran I'm a principal Solutions Architect with AWS specializing in edge services and joining us on stage and a little bit will be Karthik lucemon a senior software development engineer from our clobberin engineering team as well as a very special guest Chris O'Brien a senior engineering manager from one of our customers at tinder so in today's session we're gonna be giving you an overview of the cloud front global content delivery network we're gonna talk about some of the workloads that we see frequently from our customers before bringing Chris on stage to share Tinder's experience and some of the results they were able to achieve by using the service after this we're gonna pull back the covers a little bit we're gonna take a look under the hood at some of the optimizations that we make and some of the things that we think about as we deploy our network right so why don't we hop into it okay so as I said before cloud fronts are global content to deliver your network and this the slide right here basically gives you an eyeball view of our global presence these little dots that you see on there are other points of presences or edge locations and or AWS regions now we're growing at a very rapid rate actually last week we had 210 points of presences around the world with our first one in Rome last week actually every time I present this slide I have to make sure it's up to date because we're always rolling out a new edge locations of these 210 points of presences we also have what we call regional edge caches which is a mid tier cache that sits between your origin servers and the edge locations around the world the purpose of these regional education is to reduce load on your origin but also provide a better cache hit rate for your applications this is baked into the network and you get this as part of the service there's no additional charge for it the the next thing you'll see here is there's lines around this graph and between the edge locations and our AWS regions we have a global backbone network that's privately managed by AWS later in the talk I'm going to go into why this is significant and some of the things we think about as we deploy this backbone network and we're growing at a very aggressive rate 50% year-over-year this is a trajectory we're hoping to continue into upcoming years so next year when we come back this slide will look drastically different and because we're deploying our own infrastructure and managing our own network we have the opportunities to do optimizations both at network and server layers we're deploying our own hardware and we'll talk more about that later on ok so this session is called optimizing for performance in Amazon CloudFront but let's take a step back a little bit and kind of set some context right why does performance matter and what are we hearing from our customers well the first example I have here is a web page that's taking a long time to load we know on average a web page visitor will wait about two seconds for page to load before they move on and if your business depends on this website it's not good right there's also a ripple effect to this if your page takes too long to load some of the crawlers out there that are used by search engines to index your web sites will spend a certain amount of time on the page indexing it before moving on if it takes too long to load it's able to index less it reduces your chances of showing up on search results also not good for business right the last example that we see more and more every day is the video streaming workloads as the world's moving towards this over-the-top delivery platform for video delivery for live and on-demand video delivery we're starting to hear more from our customers that they want the capacity and scale required to support these workloads they don't they want their end-users to have a good experience they don't want buffering they want they don't want any playback issues and this is a big deal for these customers so being a customer obsessed company like AWS these are also our motivations in providing good service to our customers for me there's a little bit of the cherry on top I have a very demanding customer at home my four-year-old daughter every time her video stream starts to buffer she comes knocks on my door and says daddy come play with me usually it's not a big deal but when you work from home like I do sometimes it's also not good for business right so all right so I talked about a few of the workloads but we'd really have customers using for us for all different workloads from API acceleration to large file downloads like software updates game patches of course you have your static delivery such as reporting reports images files but we also have a very cool feature built into the service called lambda edge and what lambda edge is it's the ability to invoke lambda functions on the request and response path of a cloud front distribution and in doing so you can actually do things like render pages right there at the edge right and it's you're invoking lambda functions and that run nodejs or Python code right it's literally a full programmable language here are some of our customers it's just kind of an eyeball view of some of the customers that have been bloom nted the service successfully i'm sure if you did a dig on some of those sites that you're surfing on the internet you'll be able to find more but this is kind of just to give you an eyeball view we have customers in every industry using the service for example in media we have prime video Hulu ml BAM Sky News prime video actually uses us to stream Thursday Night Football in over 200 countries around the world Sky News actually recently used us to stream the royal wedding to 23 million viewers around the world and financial services we have customers like Intuit who rely on us for their content delivery right slack also uses us their messaging service that integrates with a number of communication platforms around the world and they actually rely on us for API acceleration so these are just some examples and with that being said I'm gonna bring onstage Chris to share Tinder's experience and implementing the service and their journey thanks Tina a match made in cloud hopefully that's the last of the puns with no guarantees I'm a senior engineering manager for tinder I work on the cloud of structure team and we're responsible for making sure our summers have the best possible experience on the platform many of you know probably know what tinder is but for some of you that don't tinder is the world's most popular app for meeting new people with billions of matches to date our size and scale means greater choice and access to diverse set of matches and so if you if you don't necessarily understand maybe you've never used ten or maybe you understand what it is but you don't necessarily have ever used it let me walk you through some of the steps so the first thing is to create a profile you want to basically present your best self enter a short bio set your location specify your discovery settings for potential matches that you might be interested in meeting and then upload photos let's go ahead and see in action so when you first log in you'll be presented with a series of potential matches you can see photos you can also choose to dive a little bit deeper and read the person's profile and also look at some of the other photos they have and then choose to either swipe left if you're not understood or swipe right if you are interested if both of you swipe right it's a match and you can get to chatting you can share some laughs and maybe even set up a date so it looks like a fairly easy simple or intuitive app is actually very complex under the hood there's multiple API calls that are happening at any given time for any of these different functions I just described and all of them are happening over HTTPS which means that there's a TLS handshake for every single call the further the distance of the customer the higher the latency that they may encounter and so as I've kind of alluded to you know many of our customers especially far away from where infrastructures hosted in US East one could encounter latency issues and so to just maybe set a little bit more context around how that comes into play let me just give you a high-level overview of our of our technical architecture so we have a number of different clients iOS Android web that are all connecting over the Internet to our two AWS and US East one where our back-end infrastructure is hosted and as I mentioned all these API calls need to establish the TLS handshake because it's happening over HTTPS some of you remember may remember this from school or studying for job interviews but there's a lot that goes into a TLS handshake there's a lot of information that needs to be exchanged and so we thought and it turns out that it was correct that we might have an issue with this particular handshake that it might be creating an excessive amount of latency for our customers especially around the world and so we started to measure some of this performance from our clients and so these numbers are reported from from our iOS clients as an average in these different countries we have India that is over 700 milliseconds to establish this handshake Germany 470 milliseconds even the US even just an average of the u.s. is 210 milliseconds and so with numbers this big that naturally the question is how might we be able to address this late and see how can we reduce the latency and so one thought that we had is like Tina talked about we can we could maybe leverage the Amazon CloudFront global presence they have all these different points of presence throughout the globe maybe our infrastructure is hosted in u.s. East one but cloud foreign is much closer to the customer but of course cloud from being a CDN and we have dynamic content that is not cashable we didn't necessarily you know it isn't the question we asked was isn't isn't cloud for Ana CDN isn't that just for static content and what we didn't realize at the time but we realized today is that there's actually proxy mode proxy mode gives us a number of benefits we can terminate the TLS handshake geographically closer to the end-user at the pop we can reuse connections from the pop to our origin and we can also use the optimized AWS global network to communicate from the pop to the origin instead of going over the slow Internet and so with that theory in mind now it actually comes down to implementing it and so just taking a quick step back just another way to look at our technical architecture at a very high level so we have all these clients iOS Android web they're connecting over the Internet to elastic load balance where that's hosted in u.s. East one behind that elastic load balancer is a number of instances that are hosting the tender application and so we wanted to try to do is put cloud front in the middle sure that the clients would still be connecting over the relatively slow internet but they'd be connecting you know geographically closer these the cloud front would have locations that be terminated closer to the customer rather than having to traverse the internet to go all the way back to to Virginia and so obviously we you know having as many customers as tinder does you know things could potentially go wrong you know maybe our theory was not correct maybe we had a Mis configuration and so we wanted to choose a test location to start a test country we want to have enough active users to have a conclusive result we want to have a balanced distribution of iOS Android and web clients and of course the location should be as far as possible from us east one for the greatest improvement it doesn't do necessarily do us any good to perform a test in the United States to kind of spoil or to kind of break the suspense and spoil it to the country that we select it was Indonesia but as I mentioned before there's always the risk that things could go wrong the theory could be wrong we could make we can make an error in configuration and so how could we route a percentage of the traffic in Indonesia to Amazon CloudFront and how can we roll back if cloud front or the configuration has any errors and so how we were able to do this was basically leverage Amazon route 53 traffic policies and the traffic policies allow us to basically layer a number of different record types or different types to accomplish exactly what I just talked about we can have geography based record record entries for to specify that it only apply to Indonesia we can have weight based records to specify that only a percentage of the customers in this case 25% will be routed to CloudFront and we can also leverage failover policy to respond quickly in real time if there's any challenges so during the Indonesian test we verified and tested configuration in the staging environments we're at a partial traffic as I mentioned before 25% of Indonesia - cloud from and of course you know things don't always go smoothly we thought that we anticipated every every possible issue but we did not we encountered some a few errors and our web clients but with the traffic policies that I just described them we were able to rollback easily and ultimately complete the deployment in 20 days so and so this is the result so this is as measured by our clients on average so without Amazon CloudFront this this TLS handshake was being established at 550 milliseconds and with Amazon CloudFront this was being established in 70 milliseconds which was a nearly 90 percent reduction and that obviously makes the cloud infrastructure and tinder Engineering feel really good but what does it mean for the customers it actually means that app login loading of profiles image and video uploads are all 30 to 45 percent faster and other api saw a performance improvement of 40 to 60 percent depending on the payload size and with those results looking very very promising we wanted to go ahead and deploy it globally we went country by country again leveraging the out the amazon route 53 traffic policies and ultimately we're able to roll out to all the countries around the world and this is kind of a snapshot of the handshake latency before cloud front i showed this at the beginning and this is what it is today so in the case of India which was 750 milliseconds it's now as hands say 50 milliseconds the US was 210 milliseconds and now it's 60 milliseconds and so obviously again like we you know the tenor engineering feels really good about the results but what is the actual impact to our customer base and I think the biggest takeaway is that it means more active users than before and as I mentioned before you know tinder operates on its size and scale means that you get greater choice access to more potential matches and so naturally with more customers on the platform then that means potential more matches for for new new clients or or returning customers or even existing customers so users that were going away because of slower experience return of the application and of course the overall experience on the application is improved you know profiles are loading faster and and what that means profiles are loading 20% faster and we're also increasing activity on the application image uploads grew by 15 percent total swipers which is left and right swipes increased by 3 percent and of course overall browsing on the app was faster so just just to recap Amazon CloudFront is for our dynamic workloads we are experimenting with using cloud front aren't our static content in fact our episodic series called swipe night which some of you may may have heard about that would that happened four weekends in October interactive series that content was hosted on Amazon CloudFront and we're gonna continue to to expand that throughout 20 20 20 ultimately implementing Amazon CloudFront was simple and fast and our API performance improved by 30 to 45 percent and with that with that being said I'm going to bring Tino back up on stage and he could talk about some of the improvements under the hood that made that possible thanks Chris awesome so now that we've shared with you some of the results that you're able to achieve by using our service let's take a look under the hood right let's let's take a look at our global network and particularly in this part of the talk we're gonna be talking about our backbone network right so earlier I showed you a slide that had the eyeballed view of our global footprint and I mentioned that between these edge locations and our eight OBS regions we have a global backbone Network that's privately managed by AWS right this backbone network is actually very significant to our service and provides us with a lot of ability to give our customers with consistent reliable performance and we're gonna dive into detail about that but before we go into that I'm gonna share with you how I like to think about it because I like I'm more of like a visual person right so as a Solutions Architect part of my job is to fly to different cities around the world and meet with our customers to help them design and architect their solutions right one of the challenges that I face when I arrive in these cities is I need to figure out how do I'm gonna get to their offices right I have a number of options I could rent a car and maybe take the inside streets if I'm lucky I'll hit all the green lights and I'll get there on time but usually you'll get red lights and I'll be late in some cases I might be able to take the highway which is usually fast but again if I hit it at the wrong time of the day I could hit rush-hour if there's a traffic accident again I'll be late now in some of these cities you have these very sophisticated metro systems or subway systems right and you can literally get off the airplane out of the airport walk down to the subway station and hop on a train to get to where you need to go it's usually great it works it's on time most of the time and even if that one train that you're trying to use is down there's usually a secondary line that you could take that also goes to the same destination right so with that I'm able to reliably get to where I need to go without all these outside factors so to me the backbone network is kind of like having a sophisticated metro system it allows us to provide that reliable performance to our customers right and let's let's take a look a little bit deeper about some of those benefits okay so I mentioned reliable performing but the first point I want to talk about here is availability right having a backbone network allows us to provide better availability in terms of that network path and when I say that I'm talking about path diversity we want to ensure that we have multiple paths to get to the same destination once you're on our network and by operating traffic on our own network we're able to do things like understand different parts of the world where fiber cuts are more happy more likely to happen in which case we can lay down more fiber and ensure that we do have truly redundant paths on top of that we're also looking at capacity and scale right operating traffic on our own network allows us to scale based off of our understanding of our customers workloads as opposed to relying on transit centers who typically scale based off of traffic patterns that they've seen historically if we had a customer that needed to drop a lot of traffic on our network or on that network it's not guaranteed to work right because you'd be dependent on their their capacity the next thing is performance so we're always monitoring and optimizing those paths from one point to another but we're really not just thinking about the primary path because we know that the things break all the time right fiber cuts happen and we want to ensure that there's a reliable performance so we're also measuring and monitoring or secondary paths right the third thing is proximity having a global backbone network allows us to be closer to our customers in certain parts of the world you're limited in the number of options you have for connecting to these networks and having a backbone network allows us to open up our options we can do things like do our own traffic engineering to route ingress and egress traffic to different cities this helps us overcome things like congestion at certain internet exchanges or even sometimes peering disputes and the last point is security by having this global reach we're actually reducing the number of networks between current viewers and AWS infrastructure and at the on-ramp to this global backbone network our AWS pops we're doing things like DDoS detection or look bad signatures we're filtering out malformed requests and once it's on the backbone Network its infrastructure that we know and manage where AWS takes responsible for the responsibility for the security of it also worth mentioning is this backbone network is also used for cross region traffic so any region to region traffic only rides on the backbone network with the exception of China okay so I talked about the the backbone network in the infrastructure on the backhaul but how do we actually connect to the internet right the first way of connecting evade that AWS connects to the Internet is through our regional transit centers that sit within the A to B s regions themselves networks can come and peer with us directly there but we also have our points of presence our edge pops these points of presences are the same edge locations where we deploy the CloudFront and other edge services like route 53 and more right and when we're deploying these points our presence is what we're looking at and what we're thinking about is we want to be where the eyeballs are we want to be where all the viewers are the closer we are the better and in these cities where we deploy the pops what you'll typically find are these interconnect facilities right where networks can come and peer directly with one another we within the same facilities there's also these things called internet exchanges where providers offer network switches that you could plug in to and quickly access a large number of networks we actually leverage both we'll start with the internet exchanges and as we learn more about the capacity that we need in the scale that we need will actually appear with those networks directly and in doing so you know this helps us scale our network connectivity for for AWS and Amazon as a whole and particularly CloudFront benefits from all of this network connectivity right it also allows us to kind of optimize for cost when you use the cloud front service you're not paying for data transfer from an AWS region to the cloud for Angelo keishon you're paying for the data transfer out from the edge location to the public internet which tends to be a typically as a fraction of the cost of what it would be if you're going directly from the region if you go to a web site like peering DB it's a public database of all of this network connectivity information we're in over 100 facilities and connected to over 100 approximately 170 internet exchanges this is all on the public Internet we also have private peering which is not disclosed here okay so that's the global infrastructure that's the backbone network right in this part of the talk I'm gonna go over some of that last mile connectivity what are what are some of the optimizations we do from the viewer to our edge locations and back right and the first thing I'm going to share is our intelligent routing and what I mean by that is the first thing that needs to happen when somebody makes an HTTP request is you need to tell them exactly where to go and how that happens is they usually make a DNS query right so if I wanted to go to tinder calm my plot I type that in on my browser what's happening behind the scenes is the bro we're making a DNS query to the local ISP resolver in most cases if that resolver doesn't have the answer it'll actually do a recursive query to route our route 53 servers now in the case of cloud front rel 53 will actually recognize that hey this is this is a domain for Claude front so let's handoff the query to the CloudFront DNS servers that sit within our same location right and from their cloud front can dynamically figure out what's the most optimal pop that's in that view or two and when it does that it's actually looking at a number of factors because we learned that you know it's not about just sending your customers or your viewers to the nearest edge location that's not that's not guaranteed to give them the best performance we're looking at a number of variables we're looking at performance when we talk about performance we're actually measuring the round-trip time of a TCP handshake from all of these viewer networks around the world to our different edge locations and we take about a billion of these samples a day it actually lets us understand at that point in time but what the connectivity looks like right we can understand if there's network congestion and decide accordingly the next thing that we're looking at is pop health we want to make sure we don't send a viewer to of an edge location that might be out of service due to a hardware refresh because we're always upgrading our servers or to a location that is almost overwhelm thread we're also looking at server capacity so within those edge locations we want to make sure we have enough CPU IO memory we want to make sure that you know we have the processing power to process that request with the least amount of latency possible and then the last thing that we're looking at is network connectivity we're monitoring our links to these different networks and we want to make sure that we're not sending traffic to edge locations that might be close to flooding their links right so with all that information we can then come up with providing an answer in real time to the client and the client can then go ahead and make a connection all right so that was our intelligent routing the next thing I was going to talk about is TCP congestion control so performance is super important and one of those factors is throughput right and this is the same TCP that you might have learned about in college being a CDN we want to make sure that we're sending data back to the viewers or the clients as quickly as possible and achieving the most optimal throughput on that last mile connection well what TCP ideally you know the the latency for the round-trip of sending a packet from one point to another and getting getting it back would be the length of the pipe right however if you notice on this this diagram some of those pipes are of different lengths because some of those routers along the way might have different bandwidth limits right if you send too much traffic in some cases some of those packets might get stuck in the buffer queue at one of those points along the way and even worse than that if you you send way too much traffic you might deal with congestion event like packet loss now you know TCP has been around a long time and there's a lot of algorithms congestion control algorithms can be used to manage to this to determine you know how much how much how many messages can be sent over that connection most of them rely on this packet loss event to determine hey do I need to scale down my congestion window or do I want to go ahead and ramp it up more so so I can get achieved better throughput when for instance a congestion control algorithm ike cubic will actually scale down and gradually ramp back up right at Amazon we're always playing with these congestion control algorithms and earlier this year we deployed a new one called BB R which was actually created by Google and what BB R does is it manages the congestion control based off of actual congestion rather than packet loss and how it does that is it's measuring the round-trip time and the bandwidth limit along the path right it's looking for that bottleneck bandwidth limit and the round-trip time is ideally gonna be the length of the pipe so it's probing the network to identify what the you know what the most optimal round-trip time should be right it's also proving for changes because traffic engineering happens paths might change so it also sends a bunch of packets to kind of identify if changes happened along that path or the path changed end of the bandwidth limit decreased or increased so by doing this it's able to kind of rapidly change the congestion congestion window and adjust accordingly and in doing so its goal is to saturate the bottleneck limit right so that router along the way with the lowest amount of bandwidth we want to send as many packets as possible up to that limit now we actually seen some pretty good results when we rolled this out actually one of our customers to tell a service that monitors mobile networks around the world actually reached out to us and said hey did you guys make any changes all the sudden we saw a bunch of throughput on these cloud foreign endpoints and what's going on here and the answer was yes we rolled out bbr very simple change we saw about five to 20% latency improvements varying by pop in a region and most of these benefits come from four networks that have a lot of packet loss like your mobile networks okay so the next thing I'm going to talk about is TLS so as more and more traffic is coming to the Internet right more and more sensitive data is being shared one of the standard ways of protecting that data in transit is using TLS however for summers it can be challenging if you're trying to do this termination yourself because TLS is growing we know that it's doubled over the past three years it's actually we see that it's close to a hundred percent of our traffic it also adds latency and when you're a CDN and your job is to reduce latency for workloads it's not a great thing because of that extra handshake and then the versions are always changing because you know security is evolving and there's new versions coming out so managing this is not something that customers really want to do themselves they tend to offload this to the CDN like cloud front all right so one of the things that we're rolling out we're actively rolling out actually is a library called s to n s to n is an open source library open source by AWS and what it allows us to do is actually scope down the library so to the focus of our use case which is TLS right in comparison to something like open SSL which is like five hundred thousand lines of code we now only have to review about six thousand lines of code with with s to n now this allows us to react quicker to security patches and deploy those types of changes it also allows us to read the code with more scrutiny actually in deploying this we're already seeing less event blocking in our termination servers which gives us more capacity at our edge locations now I mentioned that TLS comes at a cost right so it's this extra handshake that happens on after the TCP handshake the first and it's about two round-trips or it is two round-trips the first round trip is typically you know you get the server certificate you authenticate the server and then you need a negotiate you know what algorithms or what cryptography you're gonna use to do the encryption on the subsequent messages the second round trip is in exchange of these keys so that you can able features like perfect forward secrecy so there's value in that but it comes at you know the cost of two round trips and as you can you saw from Chris's slide there's a lot of happening there right and depending on how far your users away from the edge locations it'll affect your latency but TLS also has a feature called session resumption and this is something that we implement in conference so with session resumption we're actually able to provide the client with a session ticket that the server would only know it's a bunch of cryptographic information about the security profile of the the original connection and then on subsequent connections when the when a client goes to resume a TLS connection they can actually send the client hello with this session ticket and in doing so the server can validate that andrian ich reestablish the connection with the same security for profiles that it had on the original connection so now we're able to reduce a one-hole round-trip from from the initial or from the subsequent TLS handshakes all right so with that being said I'm gonna bring Karthik on stage to serve some of the server-side optimizations thank you - you know I'm Karthik while I'm not giving talks about server optimizations I'm actually at my desk trying to make our web servers better and make it faster so that makes me an engineer so Tina talked about all the network challenges we face and some of the optimizations we do on the network side and because of that we know that the request ended up on the same on the best possible pop for your customers and in my part I'll talk about other different optimizations we do on the server side and towards the origin while we fetch the content specifically we'll talk about how do we optimize dynamic content and media workload and lastly we'll also talk about how do we what do we do to reduce the load on your origins so there was a Internet code I read a while ago that there's only like two hard things when it comes to computer science one is cache invalidation so second one is naming things I totally agree on the naming things part but when it comes to caching it's not just in validations which is hard it it is everything about caching is hard so steno and Chris I mentioned latency is super-important like if you have bad latency it leads to bad customer experience so to make the latency better what we have to do is cache the content in our service so when we serve the content right out of our cache servers it's what we call a cache hit and when the content is not in our cache servers when we go and we have to go all the way to your origin server to fetch it it's what we call a cache miss so in order to provide the best possible experience all we have to do is catch everything at our edge locations but as you could imagine the entirety of Internet is so big that it is impossible to catch everything at our edge locations so the first challenge is how do we effectively utilize our cache space to look at to understand that we'll have to look into the edge location architecture as you can see in the diagram here each pop has an array of physical servers and each of these servers have three layers of their proxies inside that we call it the l1 l2 and l3 L wants act as a load balancer it also has a little cash but we call that the har object cache it explicitly kept only for caching the really popular content while the l2 is a cache with layer it the sole purpose of this layer is to just read and write from the cache l3 is does not have any cache space it maintains a connection pool towards the origin and we'll see the benefits of the connection pools towards the origin so when a request lands and the l1 l1 host it looks at the configuration and to make sure that you know the request actually came for a valid customer and it it decides whether or not to satisfy that request so once it decides to satisfy the request it would do the TLS termination and once it dusted the ilist termination it would immediately look into the hot object cache and if the content is not there in the har object-- cache it has to send the request to the l2 server where which is the majority of where the majority of our cache lies but at this point it can choose to send the request to any random l2 server but if you do that you're unnecessarily replicating the content and your chance of getting a cache it goes down drastically so to solve the problem what we do is we use consistent hashing algorithm to select peers and we use the the URL as the key for that consistent hashing and as a result if the same object lands on any of the l1 servers they all would end up picking at the same l2 so we are optimizing for the cache hit rate and we're also reducing the unnecessary redundant data storage in our cache servers so for for the sake of simplicity let's assume that the content is dot there in our l2 server and it has to forwarded to l3 but we are not up we're not up we know that the l3 does not have cache space we have to optimize for connection pooling so we do the same consistent hash peer selection but in this case we don't use the URL as the key we use the origin domain as the key so we send all the requests for a given domain to a single l3 and that way we get a good hit rate on the connection pool and the LTE goes and fetches the content from the origin and then we save it in our cache service so that's how our cache requests workflow works what about dynamic content how did Clough and help tinder how did we make tinder use case much faster so there are a couple of things the first one is the TLS termination at the edge very important but apart from that we also do a we when the request lands on l1 server we look at some of the distribution properties for example if you look at the TTL you said in the distribution and the the different types of headers you want to forward to your origins and these properties tell us that whether the response we get from the your origin service cashable or not and using that we decide if it's a dynamic or a cashable request and if it ends up being a dynamic request we skip all the cache layers and then we directly go to the l3 layer well so you might wonder why do we have to go through the l3 layer why can I go to the origin directly from l1 that's because the l3 layer provides two important functionalities which are really crucial for getting better performance and those two are releasing a latency and improving it stupid so how why does this persistent connection give you these performance benefits so to understand the latency part let's look into the same three-way handshake that we have studied in our school right we know that to open a TCP connection it does the three-way handshake where you send a syn synack and a sin that takes about one and a half round-trip and after that you send the HTTP request on top of the open TCP connection then you get the response so the whole transaction takes about 2 RT T but if you leave the connection open and send the subsequent request to the same origin on the same connection you're getting the response back in a single RCT so even so where we have done this we have effectively reduced our latency from two RTD to a single RTD that's 50% improvement right there the second part is the true prett and to understand why we get better throughput on open connections or the persistent connections we have to look a little bit into how the TCP conditioner window works Tino talked a lot about it already so every TCP connection has a condition when they are saying - it's and this condition window starts it's a really small value and it grows to the maximum potential maximum possible value that the receiver says that it can handle but it takes a while to grow to that maximum potential and you'll get the best throughput only when the tcp condition in the gross to its maximum potential so it takes a lot of time because of the way TCP works it has the slow start phase condition avoidance phase and if there's a packet loss in between the condition window would drop down again so and it's not going to go to the maximum capacity probably within the first request so it might take a couple of requests so reusing the connection again and again would give you better throughput over time and combine this with the power of managed AWS backing network which has very few packet losses then you were born together the best to put possible so that's dynamic content so let's now look into the tinder case exactly and how tinder looked before they started using coffin they had their origin in u.s. East one AWS and their customers detest customers were Indonesia and when they open the app they request we went all the way across the globe to the u.s. East one and the request was at the mercy of multiple is beast and and their performance so it literally took half a second to reach the servers and Yossi Swan which was which lead to slow termination and bad customer experience but went in the Mayflower front their customers didn't have to go all the way to the US East want their request got brought it to the nearest edge location in this case it ended up being Singapore and and the Singapore was just 70 milliseconds away so your TLS terminate the TLS got to me right away and when Hofmann had to forward the request all the way back to the u.s. East one we sent it by our backbone Network and we were able to reach us east one from Singapore within two hundred and fifteen milliseconds so the overall round-trip time reduced from like five hundred milliseconds to like 285 milliseconds which is about forty percent savings and the latency right there so if you love lower latency or you love getting your data faster get Maastricht often we just a swipe away so that's dynamic content but we're not just for dynamic content our maximum workload is on cashable and media content in 2019 alone we did a lot of large-scale media events how many of you here are football fans did you guys watch Super Ball yeah couple Seahawks fans anybody know okay so if you guys watch the Super Bowl last year or if you livestream Thursday Night Football through Amazon Prime video there's a good chance that your content was delivered through Coffman Thursday Night Football primary do streams that to about 18 million viewers worldwide and even the Commonwealth Games which happened last year a TV New Zealand streamed through coffin to about three million viewers this is awesome but delivering live media does not come without challenges there are multiple challenges in delivering live media content but I'm going to talk about my favorite one let's go back to the superball day we all gathered we grilled some burgers had some beer having fun but when the clock hit the game time we stopped doing all that and what do we do we started watching the stream everybody started watching the same stream at the same time so what does happen when this happens all we end up sending a bunch of requests to the same part in that region and this causes a unique problem in the city in world and we call this the flash code or this is also commonly known as their thundering herd problem so what is this flash card cost to the cache servers so it's the same use case a lot of people started watching Super Bowl at the same time and all the I want all the requests got spread across RL one servers but as I mentioned before these l1 servers are trying to optimize for hit rate so they're gonna forward the request to a single l2 using a consistent hashing mechanism but this optimization works against us at this time because it's gonna overload that l - sir where if it's gonna take all that load and that server is gonna get poor loaded but they all want servers and eventually you will not be able to handle any more requests but the l1 servers are still trying to send the content back to our customers so it will try to send the request to another available l2 and the same thing is going to happen to that l2 server and it continues and this cost is what we call a cascading failure inside our system so to avoid this we augmented our consistent hashing algorithm with a little bit more data we gave it two properties one is to track the popularity of the object that each elements are reserving and the second one as we also taught the L ones to learn about the load of the each l2 server so with this when the when some of the other ones over started sending the request to L to it immediately learned that the l2 server is at its 95 percent load so it can't handle any more requests so I decide to send the load across the other available oil to service it's an even though you're replicating the content in a bunch of service it's kind of needed at this point to serve the content to our customers and keep the streaming going without any buffering issues so that's great but this is the first time that our viewers are watching this media so nothing is in the cache they are hot so the all these requests have to go down to the origin to get the content back but if we forward all these requests down to the origin we're gonna see the same problem that we saw in the lq server we're gonna start overloading the l3 server but instead of spreading out the request again what we the l2 servers are smart in the sense that they understand all the requests that are get that are coming in at the same time is for the same object so they decide to not forward all the requests instead they did this forward just one request and from each l2 server so this eventually reduces a lot of load on the l3 server and also on your origins and then it gets the response back for that one request you would take that response and use that response to satisfy all the other requests that came for the same object and when we get the response back apart from satisfying all the other recursively the l2 layer would cache it immediately because that's what they all told us at the treatment right from caches and the the response has to bubble up all the way to l1 and then today viewers but when it reaches the l1 the l1 knows the object popularity now it is tracking the it is tracking it and then it realizes this request is super popular a lot of people are watching it so it's a good idea to put it in my heart object-- layer so it caches in the l1 layer and from this point onwards any other new customer who's watching the video stream from the beginning will not even reach the l2 to serve the content will serve the content right away from our l1 and that's the best case possible for our customers it provides the best latency and the to bid so I mentioned we also work on reducing the load on your origins but we have to understand why we had to go and do that so we have 210 parks today and if we come and talk about Clough for next year it wouldn't be 210 I'm pretty sure it's gonna be a very bigger number and every year we keep building more and more puffs which means we're getting closer and closer to our customers which is great because they get the lowest latency and the content is delivered much faster but this also puts a different challenge on the Oregon side ask you build more pops if you have global viewership the content the same content is being requested to all our pops that means if it was a catchment is going to your origins to fetch that content so this puts a lot of load on your origin and you might have to scale up your origin to meet up with the demands ought to scale up with the way that Amazon is scaling in terms of the parks so we wanted to avoid that so we built a super pup we called it a Regional Education it tear cache that sits between our tops and your origins and so from after building the regional each cache if there was a cache miss in our pub instead of sending the request directly to your origin it would send it to the regional edge cache regional education mungus cache with so it can't save the content there for way longer period than our pop regular pop can be so this reduces a lot of latency for our customers and also reduces the load for them so this is a graph of when we enabled regional H cache in India our latency P 90 latency reduced by like 20 percent which is great and I love this graph this one shows the amount of bytes we serve on a daily basis from regional education the blue line is the total amount of ice we serve about 12 petabytes from region s cash every day and also 12 petabytes we only pulled 3.6 petabytes from our customers origin so which means we end up saving about 8.5 petabytes of data being pulled from our customers origins on a daily basis which is a big win for our customers so to recap at all Amazon is a global network we have a year or growth rate of about 50 percent and as Chris mentioned profit is easy to deploy super fast and easy to manage and AWS backbone network that comes along with most of our paths provides of reliable performance to connect your origin and we are always optimizing our infrastructure and service to provide you the best possible experience and thanks to all these optimizations Tina's daughter now can watch our video without any rebuffering issues and he can be productive at home while working from home so and that's the same smile we want to see in all our customers face and that's why every millisecond matters thank you [Applause]
Info
Channel: AWS Events
Views: 6,922
Rating: undefined out of 5
Keywords: re:Invent 2019, Amazon, AWS re:Invent, NET309-R1, Networking & Content Delivery, Tinder, AWS Lambda, Amazon CloudFront
Id: DeygvViFlXQ
Channel Id: undefined
Length: 51min 35sec (3095 seconds)
Published: Thu Dec 05 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.