ElastiCache Deep Dive: Best Practices and Usage Patterns - March 2017 AWS Online Tech Talks

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to today's webinar ElastiCache deep dive best practices and usage patterns our presenter today is Michael Labib actually over the last 15 years my close worked in a number of industries such as the airline's retail health care technology architecting large-scale distributed systems today he has been helping organizations of all sizes architect and build sophisticated high-performance cloud applications using in-memory no SQL services at AWS today we also have Daren Brisbane Bob and Shaw Justin Thomas and Dan Szymanski who will be our webinar moderators today and answering your questions that you enter in the Q&A panel Michael welcome the presentation is all yours Thank You Jerry and welcome everyone to the webinar today we're going to be going through quite a few information in terms of how to integrate Amazon ElastiCache and your workloads understanding various benefits for in-memory data stores we're going to look at various caching strategies of how you can implement caching in your applications and we're going to follow with a caching demonstration using Amazon ElastiCache one thing I'm going to advise everybody feel free to ask any questions throughout this webinar we're going to answer as many questions as we can so to get started what kind of level set in terms of what Amazon ElastiCache is it is a managed service for two open source and memory engines which is Redis and memcache D is hardened by Amazon so essentially what that means is we have various enhancements that we've included in those engines that make sure your your hear workloads are running efficiently and reliably in the cloud it is obviously high performance as the data is in memory and it's fully managed so you don't have to worry about the administration involved with managing your clusters for the purpose of this webinar we're going to really focus on reddit on versus just talking about both and I'm cash TN runners so Redis is a the most popular in memory key value store available today and that's for various reasons for one it's very powerful there's a lot of different commands and data structures that you can utilize with lettuce many of which we're going to be talking about in the next few slides it has persistence so you can create snapshots and and support your recovery point objectives there's high availability so you can create distributed caching topologies where you can have your data failover to read replicas and there's various other types of operations and commands in terms of you know lua scripting and also creating pub/sub and geospatial queries and other things one thing I like to do is I like to kind of briefly introduce you to the various data structures or reddit just to give you some idea in terms of you know how you can use it the first data structure is probably the most basic it is a strength data structure binary safe so you can essentially put any type of data into a string value it can support up to 512 megabytes worth of data this is great for storing you know any sort of HTML documents JSON objects images or counters or anything of that nature the other thing I mentioned about it is your key value can also be a 512 megabytes as well they're both string values another data structure is a set and a set is a great way to kind of group members of a particular collection for example your key could be a maybe your customer list or your customer you know ID and all the members or the values or potential or are your customers and then the nice thing about a set is it removes a duplicates so you can't for example have the same value into that set and this is great for capturing you know maybe your unique visitors to your website maybe you want to make sure that you you don't have duplicate customer records within a particular collection and other interesting things is you can do operations like Union intersection and differences with other sets so this could be very powerful for example you have a set with one customer list and maybe another set with a different customer list and you want to see which customers exist in both steps this can be very useful and powerful for creating recommendation engines another data structure is a sorted set similar to a sense except that your your members can be ordered by a particular score and so this is also good for deduplicating information because you're overwriting any values that already exist into that set also you're grouping this collection together under one key and then you're sorting it by a particular value which is your score so a lot of use cases that use a sorted set could be things like a leaderboard could be maybe a time series data where you want to capture the order of you know when these particular events occurred and in other types of operations you can do is you can you know get a specific range of the members in that set you can find a particular ranking of an element based on that score and so on another data structure here is a list so a list is great for creating things like a message queue and a time line you can push in top elements on the list from the head and tail you can also insert some values within a specific position in the list and so this is a really another way of capturing a collection of members within a glooming are key a hash is another data structure and this is an excellent way to represent an object say for example you have a key maybe that key is a particular customer object or customer record each field and value could be attributes related to that customer for example field one could be the customer first name could be the first name and then the value is the actual first name co 2 could be you know say something else maybe last name value value 2 is the actual less than so on and so forth what's great about a hash is that you can add get and delete individual field within that structure so this is a great way to maybe query information that specific information that you're looking for or adding adding attributes to that object so other data to other data structures are actually capabilities with reticent scripts so you can create little snippets of code maybe that execute business logic on your keys so this is a great way to maybe encapsulate some of that logic and have it execute in this in memory engine there's also geospatial queries and this is supported part of Redis 3.2 which allows you to really execute commands on a special sword set that handles longitude and latitude so a great way to maybe recommend particular you know points of interest based on a person's longitude and latitude location then there's also pub/sub so this is a can be utilized to create notification systems also chat applications and other things and with what's great about lettuce is you don't have to do anything to turn on these capabilities it's just there it's already available for you to utilize so other other kind of value ads that Amazon ElastiCache provides in addition to managing Redis for you is it also is 100% like open source compatible so what does that actually mean what it means is and this is also true for event cash T it means is if you are currently running because a redditor memcache D on your own and you want to bring that over to Alaska cash all you essentially need to do is pour it over your code and it may be change your endpoints and you're good to go the other cool thing is that you know if you are say for example running lettuce in a multi a Z environment and you're managing this on your own those lettuce nodes in the different a these are chatting with each other so in addition to you paying the cost for those ec2 instances you're also paying a data out transfer cost with ElastiCache you that that cost is actually included in the service so when you compare the cost for your ec2 and that data out charge with the cost of managers are just utilizing a lotta cash it's very comparable so from the TCO perspective it doesn't really make a lot of fun for you to manage our lettuce on your own the other thing is you also get the benefits of us managing the cluster doing the failover is for you etc so what we're going to do now is we're going to talk about various usage patterns where people can utilize a memory data stores and specifically we're going to focus on reddit so the most common one is caching and essentially people do caching for two types of reasons the first one is they want to reduce latency attributed to the data retrieval from their database and this database could be an Oracle database it could be dynamodb Cassandra it really doesn't matter so if you are a latency-sensitive on you know workload where you just want to speed that up caching makes a whole lot of sense another reason why you want to utilize caching is maybe you want to manage the scale so for example you have a lot of traffic and you're not getting the throughput that you need for that workload this is another reason why you want to place a caching layer adjacent to your database and so by doing that what you're actually doing is you are increasing your throughput in the numbers of you know 4.5 million writes per second to 20 million read per second you know with using Redis on a lack of cash that's just based on some benchmarks that we've tested you're also getting sub millisecond performance and from an implementation standpoint it's actually very easy to create to utilize a cache if you are using DynamoDB another popular pattern is say for example you wanted to automatically cache data that is written to a to a table what you can do is you can utilize a dynamodb string and L AWS lambda function and any time data is written to the DynamoDB table you can have a lambda code trigger and in propagate that change into your cache now what's cool about this is you might not want to just cache the data exactly as it's written you might also want to convert that data to a particular data structure like a hash maybe you want to utilize it in a list maybe you want to you know add some other kind of data decoration or do something else with that data prior to caching that data so this is a great way to encapsulate that business logic and have it execute automatically for you now there's various ways you can actually cache your data and so it kind of boils down to a proactive and a reactive approach the proactive approach is anytime you are writing to your system of record or your database which you're immediately doing is you're caching that data into your cashing your addition to your distributed cashier cache layer and by doing that you know the pro is that you are increasing the likelihood that you're going to find that value in the cache when your applications are looking for the con with that approach is that you are essentially or potentially caching data that you might not even need because you don't know yet what your applications are looking for you just proactively just caching everything so there might be added cost with just doing this with this approach the other reactive approach which referred to lazy loading is where you are caching data after you checked the cache and you found that it's not there you are clearing the database and then you are reacting to that myth by caching that data and so the the Pro with this approach is essentially you are caching only the data that you know your applications are looking for the con is that you are increasing the likelihood of a cache miss which is the data wasn't there essentially in the first place and your overall you've added a additional overhead in terms of responding to that request the other Pro though with this approach is that from a cost standpoint you know you're only caching what you need let's say in practice though people are using both approaches together and what they are actually doing is they are it's important to get an understanding of the the frequency of change of your your data and in applying appropriate TTL to your your-your-your your cache data and then kind of leveraging based on that the frequency of change leveraging those TTL is accordingly session caching makes a lot of sense to use a distributed cache and abstracting those sessions outside of the individual nodes that you have and the reason for this is because in a dynamic environment where your web server does your web server numbers can change for example you can have additional web servers being thrown in with the added load or web servers being taken out of your fleet when the load goes down you don't want to lose those sessions so taking those sessions off the actual nodes that are hosting error that that have the web servers installed on them and putting them in a distributed cache makes a lot of sense in order to keep that data now a lot of frameworks and containers kind of abstract the logics in terms of utilizing this cache so it makes it very easy to store your sessions whether in memcache tea or letters and a lot of times it's as simple as doing some configuration changes and in all the lower level details are kind of hidden from you so it makes it very easy to inject this layer to your workloads IOT is kind of a new and I'd say a popular use case where you know there's a similar to like a real-time use case where you have a lot of you know events occurring from sensors this could be anywhere from hundreds thousands maybe millions of devices that are coming in which is one thing you can do is say you're using the AWS IOT service you can contain a create a rule that basically calls a lambda function and then copies those events into your Redis environment now as I mentioned earlier a sort of set will sort the data based on a score so one popular way to manage your sensors is to use a time stamp as your actual score and that way you have the order of when those events occurred you can also use things like a hash that captures the actual details of that event or from that device and then have it so you can query a query for it at a later time now if you wanted to you know retain that information for longer periods of time you can always store that data and you know an Oracle database a dynamo deep database or you can capture that raw event data into essentially a free or archive it or do whatever ever whatever you really want with it so it's a quick snippet of how that landed trigger looks like in an IOT rule essentially in this example there is a transaction block that is adding the sensor event into a sorted set called sensor data the date is the timestamp and in the event a device ID happens the bb-8 ID of the device an arbitrary device ID that exists and then there is another data structure used here which is a hash that has a device ID along with all the attributes of that device so there's the sensor so in this case is the temperature information and so you can use both of these data structures together to query that information that's needed for your applications and then you're executing this information at the end of the transaction block streaming data is another popular use case so similar its IOT where you're capturing this kind of real-time data in this case it's not necessarily sensor data it could be any type of data from any type of data source whether it's clicks train data whether it's stock feeds or whatever you can have an AWS lambda function trigger off of that Kinesis stream and then it will take that data and write those records into Redis again you can utilize say a sorted set or other data structures to capture that data you can have an application which is at ec2 instance maybe reading that data off of your ear Redis cluster and then again if you wanted to store that data in another data store you can whether it's RDF dynamodb will display whatever makes sense for your workload straining data enrichment is also an interesting use case where say for example you have data coming in from various Danish data sources and it's coming into a stream maybe you want to do something with that data before you start using that data so for in this example you might want to do some data enrichment say you want to check to see maybe you want to deduced the records coming in maybe it's customer information so you can take that data you can put it into a set you'll de doop it and then maybe you want to check to see if that customer is also in a different sex so you can compare that those sets against each other or maybe you want to decorate that data with information that you have in a hash you can do all those operations inside of your lambda function against your Redis cluster and then once you've sort of cleansed that data or decorated that data you can place that instead new data into a cleanse stream for usage into you know some other workload streaming data analytics is another use case where traditionally you may have a you know a Kinesis stream and then you could have a spark Emaar job running that's consuming that stream and in doing something with that data maybe have a scholar job that is capturing that stream and again running some sort of business logic on top of it and then all those aggregations of summaries are written to the Amazon s3 or they could be utilized by something else like Amazon redshift now an interesting pattern here is using a spark lettuce connector where you can essentially you know utilize Redis within that spark code and this is this is very useful not only from you know speeding up that workload because you're fetching that data from an in-memory data store but also from that uh you know that decoration from that business logic standpoint because now your your your EMR or your spark code significantly gets reduced because a lot of times you're you're doing things like aggregating data or maybe you're decorating data and things like that so if you're utilizing Redis you can take advantage of those built-in data structures which will reduce the coding complexity from those large shops and there's a lot of interesting benchmarks that you can check online that show you how this can speed up your work clothes which there's a lot of cases which is a dramatical increase so ElastiCache mweadows has essentially two different modes one mode is you know more of a scale environment where all your data resides on one individual note your primary note and then you can have a read replica that has a copy of the entire data in that node and in the case of a failover your read replica gets propagated to be the new primary through a DNS propagation the other mode which is what we're going to talk about here is that cluster mode which essentially is shard in your data across a variety of nodes and it's more of a scale out environment so the cross remote enable is really supporting Redis 3.2 it allows you to scale up to 3.5 terabytes of data within your cluster you can have up to 20 million reads per second or per million whites per second as I mentioned earlier other interesting things is that in terms of failover it's four times faster than the non-clustered mode because it's not based on DNS propagation which is great the other thing is when you are sharding your data you lower your risk of potential problem because only a portion of that data is affected if there is an outage and we'll talk more about that in later slides we also have the ability to resize your cluster this is actually a newer release that we we released recently we'll talk more about that in a bit and then lastly I'll mention is that we also have cluster level backup and restore which means that you don't have to backup each individual node you can just you know issue a a backup command will copy all the individual data from all the individual nodes and will restore all that data on the individual nodes well so just the kind of level set in terms of how Redis cluster works so essentially with Redis cluster you have 16384 hash slots within your cluster every key that you have within your cluster essentially you know fits within these hash slots and then the hash slots are divided by the total shards that you have and so for example if you have five shards you'll have hash ranges divided by the 16,000 24 will be divided by five and that's where you get those ranges now the wave letters cost the works or the client work is when you want to interact with the cluster the client essentially has a map of all the nodes and charge associated with your environment and it also knows you know things like you know what is the slot range which one is a primary which one is a Republika and it has that information readily available so when you issue commands against your cluster the client knows where to direct that traffic so let's kind of visualize this this is an example here where we have a three sharded cluster with to read replicas each shard here is Sigma is described by a color and then the gray border is the primary now if you notice with the with it with the shards each one of them has a different hash slot range and the rear upper cos contain the same range as the primary so within a Redis cluster you can have a total of fifteen shards so in this example again there's only three the other thing that you can have is up to five Reed replicas for your environment so in this example we had two so let's talk about how failure scenarios work so the first failure scenario is imagine your single primary shard fails this is actually a very simple case within 15 to 30 seconds what we do is we propagate or elect one of your lead replicas to be the new primary this is not happening through a dns propagation it's just happening through the Redis cluster environment and then you're able to start writing again to your cluster again within at 15:30 after that 1530 seconds takes place the other thing that we do is we fail we repair that failed node now one thing I want to call out is that even during that failure you're always able to read because you have replicas in this example that you can read from I'll also call out that only the key that are effects that are part of that a failed shard are affected in this situation so say for example you have your data equally distributed across those three shards only 33.3% of your data is potentially affected during that outage that write outage the second scenario here is when your majority of primaries fail so in this case you know we're showing to two of your shard primary nodes failing and this is traditionally a problem with open source you know lettuce and the reason is is because it it has trouble electing which read replicas should be the new primaries with ElastiCache we have controls around us we're able to do the election we're able to repair the failed nodes no problem so there there is there is an additional latency that is added to that to that failover but essentially we have the control to make sure that this is your cluster will be healthy alright so one common question I get is how do i migrate from a non cluster lettuce environment to a clustered reddit environment it's pretty straightforward all you have to do is create a create a new cluster take a snapshot of your non clustered environment and then restore that snapshot on your clustered environment and essentially what happens is you seed each one of those shards with your full are DB file and in the shard each individual shard will discard any keys that don't belong to that hash slot range and then the other thing to make sure is that you have obviously a Redis cluster client that knows how to interact with lettuce 3 3 + 3 3 + 9 + and then obviously you don't want to pay for your old client so you determinate that old I mean the old closet another question that I get is how do I change the number of shards I have allocated so say for example you created a cluster of three shards and now you want to change that based on your you know what you're observing from maybe your usage pattern or your data growth so this is also pretty straightforward what you do is you create a new cluster you take a snapshot of the old one and you restore that snapshot on your new clustered environment and then again you just terminate that old cluster so we're going to go through some of these principles here I'm not going to kind of call out all of them but one of the main things here 4x4 architecting for our availability is to make sure that you really understand the your rpoS the recovery point objectives make sure you understand what your are Chios are make sure you're creating enough read replicas that handle your load make sure from my memory perspective that you're utilizing memory correctly so for example when you are sizing your cluster always reserve 30% of that reserve memory on each instance just so lettuce has enough memory to do processing like you know like taking snapshots or replicating data across read replicas I also recommend using the newer type instance types when available so for example in m4 we'll give you somewhere in the neighborhood of thirty to thirty four percent more throughput than an m3 instance that I always recommend using a latest engine version there's always enhancements that are happening so you want to take advantage of those enhancements and then use the client a Redis client that makes sense for your workloads a lot of smart clients know to send your read traffic to your read web workers and make sure obviously you can write to your primaries so utilizing that and we're going to talk about monitoring your workloads and and obviously being aware of swap and in CPU or important things to keep in mind as well and then the last thing I'll call out here is even though we can handle the case where your majority of primaries fail I'd always recommend have an odd number of shards it just helps with the overall recovery time to be a bit faster so monitoring your cluster so all your all your metrics are placed into a cloud watch and what's great about cloud watch is you can essentially create an alarm on anything the things that you want to do with cloud watch is you want to be proactive in terms of the health of your clusters so things like being aware of CPU utilization maybe you want to be if you want an email when potential spikes occur swap usage you always want to see that to be low you you never want to see evictions evictions or essentially when the engine is a victim key because it doesn't have enough memory to do you know to manage itself so you might want an alarm on that so these are the types of some of those types of things that I'd recommend looking at definitely setting alarms and being proactive in terms of what's happening within your cluster other things to be kind of aware of is you know how many connections are happening against your cluster so when you are writing code or your app your developers or wedding code against your cluster make sure they're using connection pooling and utilizing those connections up properly if they are not in say for example there is a you know block a code that is creating a lettuce connection and then it's executing without closing that connection you will have a you know an idle connection just lingering around there and you know in total you have about 65,000 max connections available so you want to make sure that you know you're not causing potential problems with rogue code but there are some parameters that can help you mitigate that like the timeout TCP cable ivan and others the other one that you might want to be aware of is your max memory policy which really is your eviction policy so in the event that an eviction actually occurs you might want to select a pass the mole love you know a policy that makes sense for your your use case all right so let's talk about some caching strategies so for this part of the talk I'm going to kind of move into you know like you know our our fictional workload that we're going to talk about here so in this case we have customer data and our primary database is a you know relational database and then we have a cache cluster where we want to store customer information in there and in this you know example we're going to have a java application that is going to be querying our database and then it's going to be utilizing the cache now there's a few different ways that you can cache data and so we're going to talk about some of the some of the common ones but there is this is definitely not limited to you know all of different possibilities so one possible way is to cache your your sequel result set so what this means is say for example your query you execute a sequel statement against your database you get back a result set from your database you can take that actual result set and you can serialize it and then you can cache it in to your cluster now what's cool about that is when you cache it you can easily create a interface that your applications can interact with like a dowel pattern and then you can say you know give me execute this query and then your query will check to see if that the results that is in the cache or if it's not go go fetch it from the database the con with this is you still have to you know iterate against that results that you're not really reducing the data access pattern what you're really just doing is reducing your latency from fetching data from your from your database another pattern is you know where you cache specific values from your result set so say for example you execute a query you get back that result set you iterate through that results that it maybe there are certain fields and values that you want you take that and I see some people you know creating maybe a JSON object or a XML structure and then they take that information and then they cash it into their cluster now for my pro this is a this is really easy to implement you have the flexibility to create really anything you want people typically use a letter string when they do this the con is that now you're kind of dealing with different types of objects within your application right so you're clearing to check to see if maybe this particular JSON structure is in the cache if it is you iterate through that if it's not you query your database you get a results that you iterate through that results set you have your data so you're dealing with different types of data you're also not taking advantage of some of the advanced data structures that let us provide with you another interesting use case here is say for example you have a database and you know you clear the database to get your results set and maybe this is a customer record and you take that you take that information and then you persist it into a customer object so you have a domain object in this example I'm using Java so in my domain object I I pop you late it the state of that customer may be the first name last name all that stuff and then I take that object and then I serialized it as a byte array and I also cache that so this is interesting because it kind of goes back to the point about you know the string data structure you know it is binary safe you can do essentially anything you want with that string on with that object and so this is a the pro with this is that you can you can retrieve that object in its native form with its state the con here is that you need to know what you're doing right essentially to support the advanced application development use and I'm going to demo this in a bit just to show you some the capability perspective how this is done and that it's just another option and then lastly another you know caching strategy is so you get that record from the database then you are that information and then you're utilizing a specific data structure for that data because it matches your data access pattern so in this potential in this example I'm taking a customer information and then I'm using that custom I'm creating a key for that customer ID that's going to be my my customer key value and in the sorry this value is going to be the hash map of the customer attributes so the first name last name all those things will be part of my map and then I persist that map in to Redis and maybe in this particular example I do this because when I when I want to query that map or that hash I want to query for specific attributes of that map and I'm using a map simplifies my data access from my application perspective so it makes a lot of sense maybe in this use case for me not only to reduce my latency but to speed up my actual coding my coding objectives and then the con again is minor in the sense that if the data is not cached that I have to kind of start from scratch and pull the data from the database and then convert that data into the data structure that I need all right so we're going to kind of just talk through you know what's involved with the demo that we're going to we're going to show the first step and you can essentially do this on your own it's pretty easy to create this demo and in this example I created a customer table with just you know some customer feel the other thing is you want to make sure that you set the appropriate security groups where your ec2 instance can talk to your database you want to load data I use a tool called mock row which is you know free tool or you can create sample data and just insert that data into your your database the other thing you want to do is you want to create a ElastiCache Redis cluster you also want to create an ec2 instance where you can deploy your application then you want to make sure from the ec2 instance that you've installed your Redis client on there so you can test against your you know against your cluster and you know see the keys and things like that in this demo because I'm doing I'm going to be showing you variety of things this is the kind of the object architecture and hierarchy that I've created so essentially you have a customer domain object which has all my customer properties the cost of the captures a custom estate I created a Dao which really is encapsulating the logic involved with retrieving customer information that Dao was obviously implemented with the implementation class and then I further created a cash cow which is abstract emulsion of you know taking an object converting it into a byte array you know setting and getting data into the cache and returning that back to the caller so this is just a sample again of the structure that I created and I will walk through the other thing that I want to call out here is that the basic data retrieval business logic that is going to be demoed here is your the applications which is really just going to be main methods are going to call are going to are going to fetch information through the Dao they are going to the code the implementation code is going to check to see if that particular data data is in the cache if it's not in the cache it's going to retrieve that data from the origin database then it's going to set that data into the cache and this is kind of how lazy loading works and for this demo I'm just using an arbitrary five minutes to expire the data and then it's going to obviously return the results so if you want to follow along just grab that URL and you know just pull it up and you can start looking through the code as I show it as well the first thing with that said I'm going to kind of switch over at this point and then launch the execute the demos [Music] so first thing that I'm going to do here is I'm going to connect to my ec2 instance then what I'm going to do is the first demo that I'm going to show you here is the row demo so what I'm going to do is I'm going to fetch a row for a particular customer X before I do that let me also duplicate this session I want to connect to Redis as well so you can just see that I have no data in my cache I'm gonna do a keys here as you can see I have nothing in there now from the from the cache row I'm passing in this value of 2 to 3 this is going to this is going to kind of mimic a customer record and 2 to 3 is a value that's in my sample database first thing that happens here is it checks against the cache found that there is no data so I have a Miss so that is what it's doing max is it's such touching the data from my database the second thing that it does is it's updating the cache again which is this lazy lazy loading pattern and then it's caching that data for five minutes if you see here that is the data that came out of the database if I were to execute the same command again with that same value which is customer two to three you can see the second time I call this when it checks the cache there is a hit so those are success there so it found the row that row this was returned from the cache now if I look over to what's inside my cache if I just do a keys I can see that I have this new key which really is my sequel statement I use that as my key and the value is actually the result set or the serialized results that I got back take a look at another example here so the second example we're going to do the object cache object demo and let me just pick a different customer record here so I'll just do you know maybe teaching five see what happens so the first time I execute this it checks the cash it gets amiss so the object is not there and the way I implemented this code it tries to see if maybe perhaps there is a cached result set so we can utilize and create an object out of that row the checks to see if the results that was cashed no it's not there so then it fetches the actual row from the database converts the row into a object and it caches both the row and the object into the database then ever turns back the object so what's actually being displayed here is the results out of that customer object and we're going to take a look at the actual code in a bit but this essentially came out of a database so if I were to cache this I mean run the same code second time what we see is that it checked the cache it got a hit just found the object in the cache and ever turn back that object so let's take a look on the right hand side and just do a keys and we see a whole bunch of things got added here so I got a new I got a new key that has the sequel statement that I just executed for customer two two three it also has this new key which says object customer ID two to five that is the actual serialized customer object that was stored into the cache and retrieved from the cache up to the application now the third example that I'm going to show here is I'm taking a cache and then you know it kind of combines a few things so it is getting a row it's also capturing that row it's using a customer object and then it's converting that object into a hash and persisting that hash into the to the cache and so if we check we execute again first time this executes there's a bunch of misses and try to find a hash it's not there try to find an object it's not there try to find a row it's not there then it starts loading this data into the cache and and it outputs the JSON structure of that hash and that JSON is really the hash weapons representation in JSON format just to make it easy to kind of display it if I look over to my right hand side if I do a keys oops I do a keys I'll put back what I see is I have the new row that I just executed I also have a new object for that customer record four three two and I also have a new key which is the hash value for customer 4 3 3 4 3 2 so that is a hash stored for that customer now let's kind of take a quick peek here at the codes let's start with the first one the cash row demo and it looks like I'm I need to speed this part up I'm just going to go through maybe just this one example here just to be fit on time so with the cash row demo the first thing that the application is doing is it's calling this Dao and it's I mean it's instantiating the dough and it's calling the get customer row it doesn't know where this customer row is going again that's encapsulated within my Dao if I were to look at the implementation of the Dao and I look at the get customer get customer row method essentially what's happening as I create my query then I call this get row from the cache I check to see if the row is in the cache if it is great that's a hit if it's not I execute the sequel statement I set the row with an arbitrary you know five minutes into the cache then I eggs and then I exit but let's take a quick look at what that getro looks like real quick into the cache and essentially what it looks like is I I am taking that key I execute that key I did to get bytes on it because it's stored as a byte array I store that value into a byte array and then I check if it is not know I convert that byte array into a a cash throw set and then I return back I set the I the results that from the cash flow set and I return back the rosette so the other demos work essentially the same exact way but they're utilizing different objects and different types so I'm not going to go through that this would be good on time here but I just want to recap essentially what we talked about so caching can greatly improve your data retrieval speeds there are various different strategies that you can implement so use what makes sense from both your data access pattern and in terms of what type of data you're caching and then the other thing to think about is that it can also read greatly reduce the cost of your environment whether you're trying to scale your database maybe you can't handle the load or you know or if you're comparing you know deploying this on your own and then all the additional cost involved with that TCO calculation versus just using a managed service likewise the cache thank you very much appreciate your time that is that is the over overview thank you Michael and that we do appreciate your presentation today in your demo thank you for that on your screen everyone's going to see our questions here we have three full questions on the bottom one on the top of the screen so please feel free to complete those Horus your feedback is very important to us your Q&A panel is in the upper left hand corner so you can still submit some questions we have about five minutes here to complete our Q&A so at this point we'll go in and start our Q&A all right looks like we got a couple questions I'm going to kind of go through a few of these that were asked one of the questions was do you need to differentiate which node to connect to when doing a read or write so a lot of this depends on which Redis client you're using so what we ended up giving you is a a configuration endpoint to your cluster and then your cluster client essentially has a map of what your primary shards are and what's your read replica note what your primary node is and the rear uh Poconos are for particular shard so you don't have to do anything besides connecting to the endpoint that we provide you and in the using the proper clients that will trap direct your traffic appropriately another question here is can we use TTL settings to programmatically insert transient data that can be retrieved for a limited amount of time so you can you can assign any TTL value that makes sense for your application so whether that is you know a second to something much greater than that whatever whatever matters whatever that rate of change towards your data makes sense like for example if it's data that is you know that it's stale in your database for five minutes maybe you just want to cache that data for five minutes if the data changes every 30 seconds maybe it makes sense to cache that data for 30 seconds if the data changes every day or every week cache that data apply a TTL that corresponds to that rate of change okay so I got a question on how I can query I have a lot of tables in dynamodb and I want to know how I can how I can put some of that data into ElastiCache and a query that within my java application so that's a so essentially this would be true for either your job or any programming language it's the same logic your ElastiCache and your DynamoDB or your other relational database do not talk to each other so what your application is essentially doing whether you want to create a data service for this or whether you just want to code this logic directly in your application all you're doing is you are checking for a particular record or a particular key whether that value exists in your Redis cluster if it if it does not then what you are doing is you are querying DynamoDB for that value and then what you might want to do is you might want to cache that value after you found that it was not in the cache and apply an appropriate TTL that makes sense for that data and so the other thing to think about here is whether it makes sense whether you know this data is going to be queried and if it does then proactively cache that data as soon as you insert it into dynamo and this is where you might want to use a lambda function to trigger automatically after that data was inserted into DynamoDB and then when you do that you have the flexibility to add any kind of business logic based on that that that record so maybe you wanted to cleanse it maybe you wanted to just cache particular fields maybe you just wanted to you know aggregate that information you have that flexibility within that lambda function thank you guys I think we only have a time for just those questions appreciate you joining the webinar and have a good day I'd like to extend a special thank you to our presenter Michael for his time and great presentation as well as to Darren Bob and Justin and Dan for the QA moderation if you have any comments on our previous and upcoming webinars or suggestions on topics you wish AWS to cover in future webinars please feel free to email us at AWS - webcast at amazon.com your feedback will help us improve our webinar programming thank you for taking the time to join us today enjoy the rest of your day
Info
Channel: AWS Online Tech Talks
Views: 11,364
Rating: 4.716814 out of 5
Keywords: redis, in-memory, memcached, caching, cache, data cache, database cache, memcache, application cache, web cache, Amazon, AWS, Amazon Web Services, cloud computing, AWS cloud
Id: zmDUDSYnAv4
Channel Id: undefined
Length: 53min 39sec (3219 seconds)
Published: Thu Mar 23 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.