AWS re:Invent 2017: Cache Me If You Can: Minimizing Latency While Optimizing Cost Th (ATC303)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

thanks for coming my name is Konstantin Gonzalez I'm a Solutions Architect out of the AWS Germany organization and this is Marcus post-attack from our customer team internet and today we're going to talk about caching so our goal today is to help you earn more money and also to save some money so that you can afford your next trip to reinvent next year and we're going to do this through speed this is this session is all about getting more speed into your application we're going to share some best practices through some architecture patterns and to make this a little bit more concrete we have real world experiences from marques company team internet and as a side effect where you're also gonna save some money and you're gonna see in a minute how it works so this session is the first talk in the EdTech series please make sure you check out the other end tech sessions they're great sessions lots of interesting stuff to learn about real-time bidding engines and machine learning for EdTech and all the other stuff so make sure you check your schedules and go to the other sessions which which are going to be in a different room so what is EdTech there's a lots of different things going on in EdTech but on a 10,000 foot level it's all about connecting publishers with advertisers and that sort of happens in some form of bidding engine and bidding is something that is limited in the EdTech business because there's only so much time you can let your customers wait on your website until you decide what ads to give them so the bidding engine with a limited time creates kind of like a hard cutoff until when you're bidding extra needs to have happened and then you'll go off and show your ads and that cutoff is is actually something that can happen in a couple of milliseconds so it is very typical to have just a couple of hundred milliseconds time in which you can have your whole bidding process and the more you can get done during that hard cutoff time the more profitable your business is going to be so it really comes down to having as much speed as possible in your application so you get to to get a lot of transactions and do these transactions really really fast now how do you get to speed right how do you improve your speed there are really three strategies you can you can follow one of them is you can increase the rate at which you're running your transactions and minimize the time each transaction takes and that can become really complex there's only so much you can optimize out of your application you can go really down into the nitty-gritty details of your app trying to increase rate but it at some level it becomes really complex now the other thing you can do is you can paralyze your application and polarization is also something that can turn out to be complex once you go beyond a certain a certain level of polarization so the the third approach you want to suggest to you is actually the lazy approach and it turns out that it's also an easy approach and that is to do a lot to do list so in a way caching is a mechanism for you to do less than your application so that you can get more done in aggregate so think of caching of ways to save time through doing less but still being able to accomplish more in terms of transactions and we're gonna show you how this works so in a way caching means do the hard stuff that consumes your time only once and then reuse those results as many times as possible and the reasoning here is that it turns out that in the cloud and in NIT memory it tends to become cheaper and faster than CPU so if you can get your transaction computed and then you save those results it's a lot easier to retrieve those results than computing them again so our talk today is structured or around these four layers of cake we structured them into a CH web tier app tier and database and each layer in this app in your application offers some great opportunities for you to save time and to become faster through caching okay so this is how most applications look like in some way you can relate your application to this diagram there is some database where all the truths happens there is your actual application that sits in front of your database and where the transactions happen then there is some kind of web layer where you touch the internet with your application it could be a webserver it could be a load balancer it could be something else and then at the edge this is where the outside internet comes into play where your actual clients your users are coming from and we're going to work from your user back to your database through those four layers that I mentioned before so let's start with edge caching which is your first opportunity to become faster and add some speed into your application and the way to do is is by using Amazon CloudFront which is our content delivery network from Amazon Amazon CloudFront gives you a hundred and seven edge locations and that number is increasing almost every day now in 55 cities 24 countries and it supports both static and dynamic content so what it does is it runs a network of proxy servers worldwide that are able to cache your static content such as HTML pages images JavaScript CSS that sort of thing but there is one secret behind platform and the secret here is that CloudFront can also add a lot of value and a lot of speed to your application even if it doesn't cache at all and the reason here is that cloud phone gives you an optimized last mile delivery mechanism so think of cloud phone is a network acceleration layer that you can put on top of your application that gives you faster access to your users worldwide and I still remember almost five years ago sitting in a conference room with Markos Ostertag now vp of engineering and we were debating around cloud phone and I thought Marcus why don't you use cloud running he told me no we have nothing to cache here so here's Marcos and he'll he'll tell you what he did with cloud front Thank You Constantine yes Constantine just mentioned my name is Markus aesthetic I'm the VP of engineering at team Internet and before I want to start talking about cloud form and what we're doing with cloud front just give you a rough overview about who is team Internet we are the leading company in the domain monetization business or everything around around domain parking buying and selling traffic from the domain parking space we're only 35 people our headquarters is in Munich Germany and we are a very very tech focused company so we're always trying to leverage tech as much as possible and trying to squeeze out everything with Tech and so we're trying to be as small on a number of people we're hiring um but leveraging take as much as possible we have two main products one product is called parking crew that's the red logo on the right which is a domain parking platform like many of you might already have seen some of our pages hopefully any other on the project I want to talk about more in detail about today is um tonic tonic is a real-time bidding marketplace for this domain traffic I think many of you are working in the EdTech business so this should be pretty common to you we have a user that actually on types in the domain name which is part of the parking company tonic gets a call on a server-to-server basis from the domain parking company we're trying to figure out which is the highest bidder for exactly that request then we bring back the bid to the parking company and then the parking company actually sells the traffic to us hopefully so the challenge is for us when we started with tonic was that we needed to support clients worldwide that's pretty obvious right we have the need of a latency below 300 milliseconds I see there are some smiling faces in the audience I know 300 milliseconds is a lot in the attic business on but it's still a challenge right and then we have 100% consistency at the database level so the reason for that is that we're working on a prepaid basis so our users are actually have some kind of budget or account on our platform and we only can deliver a bit who actually still has some money in his account so the questions on that arise where should we go multi region or is there an alternative and just as constant he mentioned when we talk with Constantine about that he was like yeah maybe you want to do CloudFront so as we talked this was the idea we had in mind this was kind of the architecture we had route 53 on top of everything or in front of everything and then we had the most important regions for us which is us EU and asia-pacific and then as I said we didn't have the need of 100% database consistency so this is magichead comes in because there needs to be some syncing between different regions to get this consistency and all of you who might have already thought about building something like that or actually have build something like that I'm pretty sure you will say yeah that's a complex problem because master master things are working but there are complex to maintain there are complex to work with there is always a buck in it as always so as we talked about with Constantine he said maybe you want to do something different and then just as President Lee mentioned I said we can't use CloudFront we don't have anything to catch because it's a real-time bidding system what should we catch there every answer is different from what we did just a few milliseconds before but the problem is concept in said was more about that we don't need to catch anything in CloudFront it's just about optimizing the latency for our customers so that is the architecture we're running today we have cloud firm on top of everything as constantine already mentioned there are 107 different edge locations so we're using just all of them automatically because we just click the button of doing it all over the world and then we're just working out of the use East one region so we only have to maintain one region we only have to maintain one database in this region so no master master swing no replication problems no database consistency problems because we're just using one database or multiple database but only in one region and so this works pretty well for us we're saving a lot of milliseconds on every single request and as we all know everything is about speed so maybe you just want to try out even where the TTL of zero CloudFront might help you to the best to just get a little bit faster than you are right now but clock one can do a lot of more things besides optimizing the last mile and our Constantin we'll talk a little bit about lambda edge thank you so cloud phone can help you with caching and even if you think you can't cash second use case would be Excel Network acceleration and here's the third thing you can do with cloud fund because since last year we introduced lamda at the edge which means you can actually modify content at the edge layer as it is flowing through the cloud phone network and there are actually four bits where you can do that the first is when the end user issues an HTTP request against the cloud front cache you can run your own lamda code and do something there you can modify the viewer request and there's the request maybe it's not can't cannot be fulfilled by the cloud from cache you get the second chance to modify the request as it goes from cloud phone to your origin in your back-end and as you deliver your content you can run your third lambda function to modify the origin response and then right before your response reaches your user from the cache there's a fourth opportunity where you can run your lambda code so it means that you now get to execute your code right if in front of your user at the edge layer of your application worldwide and there are some a couple of interesting use cases where you can leverage that the first one is you can do content customization at the edge so that means that as your as your content is being delivered right to a user you can have a last minute or last second or last millisecond change in the kind of content you deliverable for instance if you want to optimize for mobile devices or if you want to do something else the other thing you can do is you can do visitor validation you can you can fail a lot faster if your visitor gives you the wrong API call because it doesn't have to wait until it goes to the backend and then you're back and figures out moment well wait a moment this is not a valid call let's give them an error message you can do that error checking at the edge and then take off load off of your back-end here and the third thing you can do is you can use it for a be testing you can use the cloud fund edge layer with redundant functions to differentiate between a and B groups in your a B testing and do this really really fast in a way so that a B testing doesn't affect the speed of your application so let's do a quick recap here about edge caching you can use cloud front to reduce your last mile latency you can leverage caching or doesn't matter you will always get a benefit out of CloudFront if you put it in front of your application there I have a hard time figuring out a scenario where Clutton would not add any value so try it out and see if it helps you in your latency or in your caching needs here the other thing is you can save cost with cloud phone - because gigabytes delivered over cloud front are cheaper than gigabytes delivered without cloud front through the normal IP address or through normal region networking so even even if it doesn't help you a lot with latency which I would doubt it really helps with your bill - and the other thing is you can now use Nunda at the edge to bring your code your logic closer to your applications or sorry closer to your users so let's move on to the next tier and that is the web tier this is kind of like the the edge of your actual application in the cloud this is the bit where you deliver your content to your user and in web tier caching what we do here is we introduce an extra layer between the web tier and your application tier here and here's there's a lot of opportunities here to catch something even before it reaches the cloud from cares now some people would say wait a minute why do I need another cache here I'm already caching at the cloud from layer right well if you're caching on a content delivery network like cloud front that's fine but a lot of those requests are going to be forwarded back to your back-end because they are not captured because their original requests or something like that and this is where you can have a bigger impact by deciding what to cash upfront as part of your web cache in tier there are some popular solutions based on varnish nginx or even an Apache module squid what habit all of them work similarly and the key thing here is to try to cache your HTTP response in memory as it is already done to avoid crafting it from scratch so instead of really going through all the motions to deliver that HTTP response you can cache it at the web layer and that means that you should take a look at your instances and favor those that have more like the r4 family if you don't want to go through the trouble of modifying your existing application you can use a neat trick here because you can use the API gateway which gives you an an in-memory cache right at the layer so on top of the normal caching layer that cloud phone would give you you can add an in-memory cache right as part of the API gateway and and and you can forget about everything else on this slide and just think of API gateway as an opportunity for you to add a simple in-memory cache on top of your HTTP stack so one of the things customer says ok I get it let's do caching what are the best practices here and the thing is you can cache all of the static content already you can put all of those images HTTP stuff in memory versus on disk and get some extra latency that you can save there or or save some extra latency you can even catch logged out users if you know that users logged out your your applications gonna follow a different path and having the information there already that he is logged out that you need to ask him again for authentication or something like that is a valuable information that you can cache and that helps you save on your applications the other thing is you should look into your log files and try to identify those HTTP requests that are really frequent and then you want to cash upfront so this is kind of like active caching if you know that these users are gonna deliver a lot of traffic why not cash them upfront so that you can deliver that response a lot faster and when caching when you think about your caching strategy and we're gonna we're gonna dive deeper on this in in a couple of slides choose your TTL s because whenever you choose to catch something you need to choose when to invalidate that information and if you choose a time to live that is too long then you may see some crazy stuff after every deployment because then you're gonna work with old data and it will confuse you so make sure you have some sort of invalidation strategy here but even if you decide to use a really really small TTL just a couple of seconds you will see in your log files that it really helps save a lot of time in aggregate let's recap on the web caching itself and we're gonna repeat and go deeper into some of these issues in a couple of of slides here but on the web caching it it pays off to have an extra cash between your content delivery network and your application Amazon API gateway can give you an easy way to add an extra cash layer on top of your application without having to run all this extra servers and make sure that you understand your caching strategy here what do you cache and how long is your time to live here okay let's move on to the core of your application or the first half of your core which is the app tier so your application is sitting on a bunch of easy to servers and we're not talking about how can you add value by putting a cache on top of that and and help you save time here and if there's one thing that you take away from this talk here is you can really cache everything you can cache your sessions you can cache your results we can cache aggregations that you put together out of different database connections and cache them you can even catch your templates even though they're sitting there already pre computed on disk it there's a lot of value in having them in memory versus on disk so you can even catch your static content within your application even if you're if they are cached somewhere else you can catch your environment you can catch your configurations like there is nothing you cannot cache and you'll see in a minute that it makes sense to cache everything in memory as fast as possible because everything counts right so who knows Depeche Mode I'm an old guy I know and I'm from Europe so so here's a Depeche Mode song called everything counts in large amounts and nothing is truer than in caching and edtech because many antique applications see tenths of thousands or even hundreds of thousands of requests per second and that means even if you can shave of a microsecond or a millisecond in the latency of your application they will add up big time so let's take a simple calculation let's assume that you can save one millisecond out of every transaction through some form of caching or optimization and if your application delivering 10,000 requests per second let's add up the numbers and you don't need to pull out your calculators here on mobile phones I did the numbers for you it all adds up to more than 7,000 instance hours per month that are saved that you don't have to deliver your application phone this is not just saving money because you don't have to run all of these ec2 instances this also saves a lot and user experience it makes your user experience in aggregate that much better so that you will see a lot more more people so it really pays off to be super super nerdy here and try to shave off as many milliseconds out of your application because every millisecond is going to multiply it 10,000 fold in your typical application so the the next logical question is how do I find those milliseconds that I want to optimize for and that is where logging and monitoring comes into play so let's hear from Marcus how he is he was really good how he is dominating monitoring at the team Internet okay now you set their expectations right so as Constantine said it's it's important to know what you can cache and how you can cache and in the end it's always about trying things out um the problem is if you try things out and you don't monitor those things you don't learn anything right so what you're seeing here is a part of one of our dashboards we are having this is one of our data about dashboards we're pushing a lot of data to them and then trying to figure out how we can optimize our caching I just want to give you an overview or an idea about what we are what kind of metrics we are caching so that you might be able to adapt that to your application or mind fight some funds find some ideas applying that you might want to try out inside of your application what you can see here is each row is an own cache inside of his own business domain so for us it's for example we're cashing domains we are caching advertisers we are caching budgets we are cashing with actually caching everything and for each of those caches we have three main metrics and that's what you see the columns on the left side you're seeing the so-called unique keys seen that means how big is that dimension so how many different values have we seen in a time frame like a second or five seconds or 10 seconds this is important for the caching because the bigger that I mentioned so the more well use were actually seeing in that time frame the bigger the memory consumed of this cache will be because obviously if you're caching everything that comes in for different values that will add up inside of the memory so that's why this number is very important for us then in the middle column you're seeing the percentage of the hit ratio the dark blue part of the bars are the hits and the light blue bar light blue part of the bars are the misses you can see in the first row that this cache is something working like 50 to 60 percent of a hit ratio which is okay and then you see in the second of the third row those are pretty good I think the people in the front row will see that there is a small line of light blue at the last row so those caches are working at 98 to 99 percent of a cache hit ratio which is awesome so this is exactly what you want to have right and then we have another metric which is also telling us something about hits and misses but the absolute numbers so every request we're doing against our cache we're pulling or are we actually pushing a metric if was this one requests a hit or was it a miss this gives us an idea compared to the percentage that are we increasing or decreasing the overall volume of requests we're doing against our caches this is obviously important if we correlate that to the overall request we're seeing because sometimes we see a peak in the requests but we don't see a peak inside of the cache requests that's because sometimes you don't need all the caches you're having for that requests because you're answering the requests or you're blocking requests before they even hit the caches and this is very important too which are the caches I still need to answer or asking I still need to ask even if I'm blocking requests or not because that obviously gives you an idea about what caches need to be very flexible and scaling in and scaling out what caches can be more stable and be more settled and that's this gives you an idea about how you want to build your caches so as I said it's everything about trying things out so sometimes you build something in your application your monitor things for depending on how much you press you're seeing minutes hours days or even weeks to find out based on that metrics to see is that something I want to work on or even more want to work on because it seems to work but I need to tweak some things or is that something that was just a stupid idea so if you have this metrics now you actually can do something which is influenced I would say by Vincenzo Pareto how many of you know about the Pareto principle that's good for the others but the Pareto principle is about that there is always a small percentage or most likely a small percentage of sources or kind of impact sources they that do a huge impact on your overall system so that's why it's called most of the time that 80/20 principle because you will find heavy hitters those might be some publishers that are sending requests to you that also might be some advertisers that are buying a lot from you those might be some things like I'm seeing much traffic from the US but not so much for example from South Africa or something like that you need to find those heavy hitters those very few sources that make a huge impact on your system because if you found them you can do special things for them and I really mean things like I want to do a total different caching for you I want to keep those things in memory on the application instance and not doing something like an external cache or something so if you're able to find those heavy hitters you can adapt to that inside of your application inside of your caching and you will make a huge impact out of it because even if you do that for a very small percentage it will have a huge impact on your overall system and talking so much about memory and caching in memory constantin now will talk a lot more about memory and what we can do with memory Thank You Markus so we're in Las Vegas right so if you look if you walked on the casino you will see those heavy hitters because these are the people who throw a lot of money around and they get special tickets to shows they get special rooms they get all kinds of special treatment so try to find your special guests and your special treatment to them is give them RAM as much as possible and if you look at your existing machines and thanks to Marcus you know you should be monitoring you should be monitoring Graham users how much RAM is actually used in your applications and if you're not using close to 100% of your RAM you have a caching opportunity there already so use the RAM that you already have even if it means duplicating data if you if you find those heavy hitters and you have their records always present on RAM in every single instance doesn't matter where they show up you will be able to service them immediately don't need don't need to ask another machine because duplicate data is good if it helps you achieve speed and that means that you can also pre-load popular data into your cache not just cache the stuff that you already did you can use the time that you're idling for pre-computing data that can help your future customers and have them in RAM always ready to use some operating systems come with a file system cache I'm just assuming that that yours does but if you're programming in a specific language like PHP or Java or.net or whatever there is always a caching framework you can use to leverage that ram inside your machines so what do you use what do you do if you don't have enough RAM on your application servers you can add more RAM with Amazon ElastiCache which me which is basically Ram as this it helps you manage a fleet of machines that are not nothing but RAM as a service and be part of your application as an extra RAM based caching layer it comes with two engines a Redis base engine and the memcache D based engine and both of these engines are great they're just different use cases that you would might want to you look at to choose the right engine for you so memcache T is the easy option it's fast it's open source but it doesn't offer any kind of persistence but it's a nice workhorse for caching everything on some caching server like Amazon ElastiCache if you want to be more fancy around your caching we would suggest taking a look at Redis already's comes with a lot more features it also offers a persistency scheme so if the caching nodes go down there are some mechanisms you can use to have your cached content always elate available highly available and it also comes with a scripting language that you can use to offload some of the computations right at the cache level which can help you increase even more your performance and and be more sophisticated here so it it's a good idea to check it out and it's also even faster than memcache T in many many cases so let's recap the app side before we move on to the core of the application so the the thing here is really monitor everything only if you can see your request you will be able to make out those opportunities where you can increase speed in your application by caching the right stuff try to find those heavy hitters the big people the big customers that are dominating your application usage and do whatever it takes to make them as fast as possible because they that will benefit all of the other users of your application as well cache everything there's nothing you cannot cache and if you care something just cash in ram there's no value in caching something on disk Ram is always gonna be faster than disk so use that Ram and consider using something like elastic cache to add more RAM on top of your applications here so let's move on to the core of everything which is the database this is where truth of your application resides and similarly to how you would place a cache on top of your application you can place a cache in between your database and your application and it's a little bit unintuitive because we have an extra piece of architecture between the database and the application and ironically it is gonna help you increase the performance and decrease the latency even though it's an extra piece in the puzzle here and it has an extra link to our chain but we will see in a minute how fast it can be and this is where where most of Marcus's work of the last couple of years is coming from because he has really optimized caching all the way through to the database here thanks again Konstantin yeah I think when you actually signed up for that talk you might have think about database caching right because this is where everybody focuses on and as konstantin said i think it's also the most important part that's why I love this part the most and talk as you can see in the Clyde cloud in this slide sorry caching the traditional way we have an application that is directly talking to databases I just took DynamoDB in Amazon RDS here as an example and then we put something in between like elastic hash also just mentioned by constantin if we can't do that on our application service in RAM we should buy our external RAM and that's ElastiCache but then we always come to the point where we need to think about cache invalidation there is this very famous quote from Phil Carleton there are only two hard things in computer science cache invalidation and naming things and we all know how hard it is if the variable should be named foo or bar right so the other thing is cache emulation for me there are two ways how we can cache one is the typical cache invalidation with a time to live so you have on TTL on the key that says please just live for 60 seconds or leave until this time two times dampers and then the caching engine like gratis or memcache D or whatever caching we're using is invalidating that and from there on it answers with and I don't have anything in here which is cached the problem that arises is if we take the TTL very very low we obviously are not so efficient on the cache head ratio because the cache invalidation will lead to more and more invalidations if the TTL is very low and then we need to ask our databases again which is exactly what we actually don't want to but if we do a very long time to live we have the problem that if something changes inside of our back-end database we get not the right answer because the cache has an old answer still safe and that will be safe for a long time because our TTL is very long so the other way of how we can cache is keep the cache and sink all the time because if we always know that the cache answers always with the same answer then the database would do then we don't need to take care about the TTL anymore right how we can do that synchronous writes our application can actually write for every update or write to the database not only to the back-end database in this case just as an example DynamoDB but also to the cache so every time we write against DynamoDB we also ride against Redis or memcache T for example this works but it needs a huge change inside of our application the benefit we get out of this is that some databases for example dynamodb and Redis give us a so called off the write return value so I just want to show you a little bit of code the most important part here is the orange one that's a call against dynamodb you can see for those who don't know that Java scripts or no GS and the return values updated new tells DynamoDB that after dynamodb did the write it should give us back the value which is now inside of dynamodb for exactly that object or data item we now can do exactly the same with Redis that's the black part red is that's exactly the same we do an increment by float on Redis we update a value and Redis gives us back from the value which is inside of Redis now after we wrote that the orange part um now most of you will see that's no js' we do both calls in parallel so we do the dynamodb call and we do the Redis call in parallel and then the if Clause says hey now compare those two well use the reason why we're doing that and we're and that's actually really apart out of our application we now can monitor if our cash is running out of sync for whatever reason we can either do something against it like invalidating the cash right away because obviously now our cash delivers wrong answers which is bad we can invalidate it so on the next call the cash will be updated automatically or we can just alert on that that somebody looks after it or we can just do nothing because it's totally fine or whatever so this is not about what we can do it's more about how we can recognize that our cash is running out of sync but most of you now will say it's nice but I don't want to change my application Deathmatch lucky us we are running in AWS and AWS has always a workaround where we don't need to do those heavy lifting things we can use AWS services for example if you're working with DynamoDB or amazon aurora there is a dynamodb stream or a so called stored procedure which can trigger a lambda function so why not let the lambda function do the update inside of my cache so I don't need to do that in my own application because now the lambda function makes sure that the recently updated item or R or a row inside of my database gets also updated inside of my Redis or inside of my cache this is uncoupled this has nothing to do MS my application my application still can write and update always against the backend database but it can read all the time from the cache because it can be sure or that value inside of the cache is updated the downside of this approach is there is a small delay for cash updates that's pretty obvious because lambda the trigger of lambda and lambda itself needs some time but this will build below that's what we are seeing below a second so if you're right now working with a TTL for example of ten seconds this is still way better because this works within a second and not within the ten seconds you might deliver a wrong answer based on your TTL but we don't want to kill we don't want to care at all about caching and this is where a new service from Amazon comes in I think it was launched six months ago or something like that which is called Amazon DynamoDB accelerator short text and the idea behind is it that we don't we as a customer don't need to take care about the caching at all we just talked to Dax all the time the benefit for us as the customer is that there are SDKs out there for Java and JavaScript right now that are DynamoDB api compatible that means we don't need to change our application we just need to switch over as you can see in the light blue code here what we had was a dynamo DB client and now we're just using the Amazon Dax client from there on we can do our get item put item whatever we did against dynamo DB we do that now again stacks and Dax is taking care of caching the things saving the things taking care about TTL all this kind of stuff so it's actually a write through cache so we're just talking against the cache all the time and don't care any more about what the back-end database dynamodb in this case needs to be updated all this kind of stuff that's taken away from us as a customer you can do multiple tables with one Dax cluster so even if you're running multiple hundreds of dynamo DB tables you actually can use just one text cluster which is based on charting and you could do multiple notes inside of the docs cluster as we're talking in the attack track speed is everything we said it we did some performance testing of tax without tax you can see on the Left we're talking of an average so talking about dynamodb directly we're talking about an average of five to six milliseconds which is stayed pretty good right we have a consistent performance we obviously have no warming phase because we're talking about our back-end database and we have detailed metrics per request which is cool with Dex we're talking about an average of 400 to 450 microseconds so only 1/10 of what we're seeing with DynamoDB directly we have a very consistent performance around this 50 microseconds obviously we have a warming phase because now we're talking to a cache but what we've seen in our application was that even on cold keys the average is below what we're seeing on average against DynamoDB my assumption is that the reason for that is that our connection handling to dynamically be out of our application is not so efficient what the Dex team can do while talking to DynamoDB so latency is everything and every time we need to wake up the connection or build up a connection that takes time obviously the Dex cluster does that it does a better job than we do right now you we don't have metrics on request on the Dex but you get a lot of metrics like the cache hit ratio and all this kind of stuff out of cloud watch with typically delay of cloud work so still pretty good talking so much about caching and talking so much about everything we say cache everything do all the things in everything so catch everything sometimes has what should I say catch everything sometimes means that you might forget something and we did exactly that we forgot about something we forgot about negative caching when we build up one of our biggest caches we had the problem of sometimes or actually unfortunately many times DynamoDB answers with the no result and it's total ballot because in many times if we're asking our DynamoDB hey give me the highest bid for the specific targeting options dynamodb says I don't have a bit and it's totally fine it's also an answer the problem was in our application we had something like if dynamodb answers then save that to the cache but DynamoDB had no results so we didn't save something to the cache and so for the same targeting options we're again did the request against DynamoDB and that's what you can see on the slide now because without negative caching so without not saving that valuable information of DynamoDB has no result for this specific query we had a cache hit ratio of 25 to 30% and then we implemented the really simple thing of a with this question dynamodb has no result to our Redis cache and with negative caching our cache address you went up to 89 to 95% so a huge increase in the cached ratio which actually means that we don't need to ask our dynamodb that often which obviously means we are faster because our cache answers a lot faster than dynamodb and we save a lot of money because we pay DynamoDB on the provision throughput so think about that when you're thinking about what can you catch that even if in a no result there is valuable information in it because you don't want to query your back-end database for the same query over and over again just to get the same answer you can do you can save the answer even if it's a no answer to the cache so let's wrap up the database caching thing catch everything even the negative results just as I said I think that's the most important part we learned a lot of the out of that and we saved a lot of money while we were learning that considered the cache auto-update with lambda for those that might be suitable you can combine application with database cache because constantine said it read memcache are a good solution Redis memcache are also a good solution for database caching so you can use actually the same database class at the same caching cluster if you want to and I highly encourage you if you're able to do use tax because tax is an awesome service which takes a lot of pain of the whole maintenance away from us as a customer and it just works out of the box and delivers awesome results so now we have all the tiers and I think Constantine will wrap up the whole talk cool so this is the recap after the recap as you go out of this talk and hit lunch and the rest of Raymond it's easy to forget some stuff so maybe you should pull out your phone and take away some two or three things you want to do on your application so who's using cloud font already let's have a quick talk okay if you didn't right raise your hand check it out it can deliver value to you even though you think there is nothing to cache monitor everything who is monitoring really everything Oh some really good tip if you didn't raise your hand just write down monitor everything this is where you find your your opportunities to cache and see where your opportunities are and they will likely end up in the top 20% of your users and you want to do something really special for them which will have a huge impact on the rest of your application consider adding a web cache even if you're using cloud fund already it can pay off to have another cache between your app and your web cache and the cloud fund if you're not caching at the a player who is using elastic cache already okay if you didn't raise your hand check it out it can add value to your application and explore what you can cache and if and remember that song everything counts in large amounts and even if you can save a millisecond of your transaction it will add up big time and as you can see using decks it can be easy to save four or five whole mini seconds just by using decks in front of dynamodb and that's mind blowing because you can then for your 10,000 requests per second app you can multiply that by 10,000 you can save thousands of dollars and more importantly you will be those milliseconds faster for every transaction and that will give your users have better user experience it will give you more time to run those real-time bidding ups and find those better users and create more value so cash everything and with that I hope you get a lot of speed out of this talk and we still have 10 minutes or so for questions thank you very much thank you we do have a couple of mics up there what feel free to walk up to a mic and ask your question hello in your architecture diagram when you went from three regions to one region yeah and three databases to one database what strategies did you use to mitigate that database as a single point of failure actually as we're working mostly with DynamoDB AWS takes care of that obviously if DynamoDB is going down in the whole region we will have a problem but what we're doing we're actually replicating on our own system to another region which is kind of a cold standby so we have everything in another region and then that's just something you need to commit to that if the whole region is going down you will go down if you're not ok with that and obviously you should do still the module region set up but if you're having something like ok I'm ok with that that I need to manually migrate over if you're working with cloud formation or so that's pretty simple you can migrate over the whole architecture to another region spin up everything there because the data is already there if you make sure of that does that answer your question yes thank you I remember remember DynamoDB already works out of three different availability zones these are three different data centers that have non-correlated risks and I work with a lot of enterprise customers they are very happy if they are sitting in one data center and they have a disaster recovery solution the second data center and with DynamoDB you get a fault tolerant database that is already running all those three data centers so having a fault tolerant row sorry having a disaster recovery solution in the second region is something that typical enterprise customers dream of so this is already a lot better than the usual set up next question so for database caching you mentioned three options using Jack's changing your applications so that it talks to either the ElastiCache or the database or making a lambda function do you have any criteria for how you decide which one of those is going to be the best solution tried out and monitor everything that's what's the obvious granter right actually we figure out based on the business domain what the cash is working in so sometimes we we are more we are more happy to change the application and actually monitoring if we're running out of sync I don't think there is a rule of thumb you need to find out on your own what works best for you if you are okay with the delay of this one or two seconds the a synchronous way is more on an architecture level than actually building something in your application which might be hard so it might be easier to add something to the overall architecture on the AWS side then rebuild something inside of your application then obviously this might be the good choice but changing from dynamo DB to the tax way is really very very simple and easy so if you have I think it's right now Java and JavaScript are the SDKs that are available right now so if you have an application that is working with Java and JavaScript I highly encourage you just to tried it out because it's a it's a very easy fix inside of your application which gives you a lot of things coming from AWS at that point so my approach would always be thinking about tax first then seeing if I can work with the a synchronous way with the lambda thing and then go back to rebuilding something inside of my application because that's most of the time the biggest pain right by the way for those people who are still sitting here thanks for sitting here there is a quick plug here if you like this talk and since you're sitting here you seem to have liked it oh there's another talk by Marcus and me we're gonna dive deeper into how to save money in the AR CSV or three running lean architectures talk tomorrow so feel free to drop in and we're gonna go into a lot of data into money saving tips there still have some time left next question please okay and how do you handle or have you approach how do you approach handling the issue cache coherence between the layers because that's kind of the third hard problem right cache coherence between layers is a hard problem you need to understand your application and you need to do trade-offs about how old and infirm you're capable of dealing with sometimes it's okay to have to work with old information for the sake of better performance sometimes it's not okay and this is where you get to choose your TTL logic and where you need to do a smart decision here how your TTL is going if it becomes critical to always have the latest data no matter what this is where you want to be closer to the database and when you want to have a tactic where the cache is updated at the same time as your database and this is probably one of the other criterias for the previous question on how to choose which strategy to employ the database name yeah but what I mean when you do that you can immediately update the database level cache that you have in your application in memory application cache yes you need to invalidate those as well and that's right and they're not CloudFront you have the ability to have front you know you don't have to invalid cloud from cloud phone actually asked you can actually configure a TTL of zero with cloud front and then cloud front will simply issue an HTTP header request to the origin and ask has this content changed and it is a lot cheaper to answer to cloud front no this has not changed you can feel free to use your cached copy and you will save something even in the event where you give clouds on a TTL of zero so for these types of data feel free to give clawed phone a TTL of zero lot for both still be able to cache content but it will use those help requests to verify that it still is working on current data and you can have a similar protocol between the application layer or the database layer or you can merge the application cache with the database cache into the same memcache the instance and kind of like work together again it is this case specific for cloud fund you can use the HTTP head protocol there which saves a lot of bandwidth especially for images and stuff and yes this is where you become a lot more geeky and understanding where the trade-offs are one one thing I want to jump in here what we're doing is we're actually thinking about the more control we have about the cache itself the higher the TTL can be that's kind of our rule of thumb so actually we don't the cache of CloudFront because that's total AWS but we we have 100% control about what we are keeping in memory in our application so I can invalidate those things in memory on my application server within microseconds so that gives me a huge opportunity to increase my TTL because I know if I want to invalidate it I can do it right away with CloudFront I can't do that right away because that's not something I built inside of my application so I need to figure out what seems to be the right TTL and we in our application tend to have a lower TTL for those caches we don't control that much than what we control on our own Redis for example but also in our own memory maybe helps and that's another great reason to have your own web cache on top of your application but before crowdfund because then you can give cloud for a TTL of zero and have that decision whether it's still valid or not under your control within your application Thanks thank you thanks crosstalk I have question about how do you catch the transition data light budget of the advertiser sorry I didn't get the last part the transition what can sectional data light Patchett of advertiser okay the the what of the advertiser sorry budget the budget okay um we're having that in Redis um and what we're doing so we update everything in Redis and then in kind of a let's call it a cron job so several times a day we're actually updating the budget from the Redis back to our in our case Amazon Aurora or Amazon DynamoDB so we are not really afraid of losing something in there even if Redis is not highly durable but we're totally okay of losing something like the last five or ten minutes that's kind of the risk approach you need to do there and so every five or ten minutes we're saving everything back to our dynamo DB which is then kind of the backend database which is highly durable and nothing happens in there does that answer your question but two key problem is that you need to know you deliver some advertising you deliver some ass then you need to - the budget you need to do the calculation and those happens Tom hi Tom yeah but reddest us all as we are working just in one region that's the benefit of working just in one region you can increment and decrement by float inside of Redis in an atomic way okay so and then we get the right back return so we can do something like hey please decrease or actually increase with a negative for sine so decrease the budget and give me the answer which is done after the decrease as that's atomic we know exactly that's the right budget we have for this advertiser right now okay thank you last two minutes for questions and already thank you please remember to rate this session we would like to be back here next year so let's have the last question here now do you have any solution about expiring data especially in the s3 I mean when I the data immediately knows when the data should be expired so when you you mean keeping a tap on when is the data when when was the last update and when did the previous one expired right yeah maybe you can yeah you can build a hash table on dynamodb to have that information ready and then have s3 as the lazy part of that layer oh that's one strategy you can have here which means that the the source of truth about the updates is in dynamo DB but the actual data is or the blobs are on s3 that's one strategy here yeah actually using something like dynamo DB as a metadata storage so you're building something like a meter data storage in whatever database you want to have it which you then can update and then you have kind of a pointer or a link to the a3 object but I think it didn't work right out of the box in s3 so it's always that you need to create something like a metadata the other strategy you can have is if you're using cloud front you can issue version numbers as part of your URL that means if you update the data you update the URL with a different URL which means that it's guaranteed that cloud fund will not have that one in its cache and it will always be forced to fetch it from the origin because it encounters a new URL so as your application generates a new URL with a more recent version of the data cloud fund will be guaranteed to go back to your origin to get that latest revision without needing to issue a special invalidation call which takes time and and an effort is not under your control so by giving out unique URLs for each new version of the data you can have a consistency right through the whole chain through crowdfund okay thank you you're welcome so thanks a lot for coming here enjoy the rest of Raymond and thank you

Info

Channel: Amazon Web Services

Views: 3,083

Rating: 4.9130435 out of 5

Keywords: AWS re:Invent 2017, Amazon, AdTech, ATC303, AI, CloudFront, DynamoDB, DynamoDB Accelerator (DAX), ElastiCache

Id: WFRIivS2mpo

Channel Id: undefined

Length: 58min 46sec (3526 seconds)

Published: Tue Nov 28 2017