I've been using Redis wrong this whole time...

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
redis is an open source in-memory key value database it also happens to be blazingly fast in my current role we have a redis cluster that caches HTTP responses from a third-party dependency this cluster at peak times handles around 2 million requests a minute which is pretty impressive this typically has been one of the more common use cases of redis that I've come across speeding up i o based operations by adding a caching layer in the application stack but redis can be used for so much more than this single use case in fact I'd go on to see that using redis only in this capacity is a waste of its talents so in this video I decided I wanted to look at some misconceptions about redis and how we can replicate database functionality with it the first question I want to answer is how did we get here why has redis so easily been relegated to the caching layer instead of being the actual database itself and why does my co-worker Flinch whenever I mention using redis as the primary database well I think there's two common concerns when it comes to using redis as a primary data store the first is because redis is an in-memory database which implies that the data must be ephemeral the second is that because it's a key Value Store well it cannot handle complex data structures or queries let's take a look at each of these concerns the reason redis is incredibly fast is because it stores its data in memory rather than on disk now if you know how computers work you're probably aware of why this may be a bad idea to use redis as a primary data store if the instance is destroyed the memory is wiped and the data lost or is it let's actually test this out here I have a single redis node set up on my home lab kubernetes cluster yes I finally have a dedicated home lab now let me know if you're interested in my setup and I'll happily do a video on it maybe on another Channel let's go ahead and connect to it I'm using kubernetes for this cluster so it's easy enough to port forward and connect in okay so we're inside of our redis instance if I run the keys command you can see that it's completely empty let's add a new value using the set command here we're going to set the key of power level and the value to be just over 9000. if we run the keys command again we can see the power level key now exists and we can retrieve the value using the get command now it's time to apply some chaos engineering but let's go ahead and delete our redis instance after a short while a new redis instance comes back to life due to the power of kubernetes if I go ahead and connect to this new redis instance and run a get command against the power level key we should receive a nil back however that's not the case so what gives how does this data still exist even after restarting the redis instance well it's because redis actually has persistence built in in fact there's actually two different types of persistence that redis uses the first method is snapshotting which is known as rdb this is a point in time snapshot of the data set at specific intervals similar to the snapshot feature found in other databases whilst this method is good if you need to restore your data to a point in time it's not what is actually bringing our redis instance back to the same state as before we crashed it that behavior is handled by the second persistence method called aof which stands for append only file when configured to use an aof redis will write an entry to a log file for every write operation the redder server receives during a restart these operations can then be replayed reconstructing the original database this is actually similar to how postgres stores data using Wilds or right ahead logs in fact the redis documentation claims that both the aof and rdb persistence will give you a degree of data safety comparable to what postgres can provide now that's a bold claim but I'm inclined to believe it certainly based on the Real World experience of using redis at scale okay so data persistence in redis can be set up to be as comparable to one of the safest databases out there but what about the second reason to not use it as a primary database the lack of complex data structures in a typical relational database we're used to abstractions such as tables rows and columns in addition to these we have indexes partitions primary Keys foreign keys and many others all of these Concepts allow us to query for data and perform complex operations when it comes to redis it's much more simple but that doesn't mean it's any less powerful redis is much closer to the abstractions found in traditional computer science such as arrays hash Maps sets and many others these data structures are used by traditional databases to create the actual abstractions we know and use and therefore we should be able to recreate them using redis so to do that let's go ahead and actually look at some of them the first thing we're going to do is replicate a simple user table which contains a user ID and an email address in SQL the table would look something like this we can easily recreate this in redis using the basic operations of set which allows us to store a record as a key value pair so we could just set the user ID as the key followed by the email as the value that would get us what we're looking for but there is a problem with this implementation as redis does not have any logical groupings or even schemas then we lose the information about what we're storing when we store the ID as a key for example this uuid could be related to a user but it could also be related to a product or an order there's no way to tell without actually looking at the data itself and even then there's still a chance of losing the information about what you're storing therefore it's a good idea to specify the data type in the actual key sort of like a namespace the idiomatic way in redis is to use a colon to separate the namespace from the I identifier so in our case it would be user colon ID this way we provide some additional context about what the data type is that we're storing but it also has some other benefits let me explain with a typical scenario you'd find within a web app user authentication in this case we want to perform authentication with a user's email and password currently we can obtain a user's email address with their ID but that's no good in this case we actually want the inverse which is obtaining a user's ID given their email address if it exists in the database so how can we do that in our current database setup well the naive approach is to search through all of the user entries by using the keys command for everything in the user namespace we could then iterate through every entry and check if the value matches our email if one does then we have the user ID by the way in a production system don't use the keys command it can and will lock up redis which will block any other queries from completing instead you'd use the scan command which is a little bit more complex but it allows you to iterate through cursors for testing within this video though the keys command is absolutely fine this method of comparing each record's value in the username space against our input is called a sequential scan this works similar to how a SQL database would perform the same operation it's not the most optimal method however as it can take a long time to scan tables that have a large number of rows to improve performance you would use a database index on the email column which acts as a sort of lookup table storing the user's email with a reference to the associated row ID if this sounds very similar to a key value pair well that's because it is and that means we can easily replicate it using redis one approach to implement this would be to store a key of the user's email with an Associated namespace and a value of the user's ID with this we can easily obtain the user's ID from an email input using the redis get command but that's not the only benefit this model gives us we also have a sort of unique constraint in that we're able to also see if an email already exists in our system there is a better data structure to use for unique constraints but more on that later so we have our simple table and a pretty fast index but if this was a normal application we'd want to store more data than just the user's email address we'd likely want to store the user's name a hashed password and other fields so keeping with the authentication model that we've played around with how do we store the hashed password in our table well we could do this using another key value pair where we store the hashed password against the user's ID with some namespaced prefix but that's not really scalable another option would be to store a Json encode a string of our user's data structure as the value to the user's ID this works but it does mean every time you want to add a new field to our user we would have to fetch the record decode the existing value make the change re-encode it and store it using the set command a better option is to use the hash type for our value a redis hash is basically a dictionary or a hash map in other languages it allows us to store a more complex data structure against a key here we're going to use the HM set command to store multiple hash key values against a single key of our user ID Fields we're going to set are email and hashed password the values for each of these are defined after The Field's name in the command we can then retrieve this full structure using the H get all command which takes in the primary key here you can see we have our full data representation of our user which we just previously set if we use the redis sdks in different programming languages we can easily decode this into native data structures as well here's an example using python what's really awesome about this approach is if we want to add additional data to an entry we can easily do so using the H set or HM set commands for example let's add a name to our user's entry just as we can add fields we can also delete them using the H Dell command although one thing to note is that the hash will no longer exist if no Fields remain using this data structure let's look and see how we would handle a simple authentication flow first we find a user ID given the email address provided if one doesn't exist then we return false next we can use the h-get command to get the user's hashed password from their user ID hash and finally we could verify this hash password against the password submitted and we've ended up with something that looks very similar to how you would interface with a typical database so what next well we've looked at tables sequential scanning and some basic indexes but what about ordering in a typical database you may want to return values based on a timestamp or a ranking to Showcase this let's first set up a counter for our users to track the number of times they've logged in or something we could add this to our user map but it's going to be more powerful to add it to another data structure this data structure is the sorted set which allows us to store unique members which would be our user IDs with an Associated score which in our case would be the number of logins per user to get started with this sorted set we can use the Z add command which will insert a member into the data structure creating the sorted set if it does not exist already the first argument for this command is the key we want to use for the set itself in our case this will be the user logins we then set the score or login counts we want to associate with our member which in our case is going to be 10 and finally the member itself which will be our user's ID we get back the value of 1 which tells us we successfully added a single member we can retrieve this entry by using the z-score command and passing in both the key for the set followed by the the key of the member here we get back the value of 10. as well as using Z add we can also use the Z anchor by command to both add members to our set or increment the score of existing members if a member doesn't exist then it is added to the set and given the increment value as its starting score in our case we've added a new member to the set with the initial value of 7. that covers how to add entries to our set next we're going to look at how to perform some range operations over the data within before doing that I'm first going to add three other members to our set each with different scores now that our members added let's ask some questions the first is we want to order our users by the number of times they've logged in in descending order we can achieve this by using the Z range and the Z rev range commands this allows us to select members by range of their sorted indexes if we use the z-rev range command with the values of 0 and -1 we receive a sorted list of our members in descending order so how does the Z rev range command work the first parameter is the starting index which in our case is zero and the second parameter is the ending index which in our case is -1 which means to go until the last member if we wanted to retrieve our members sorted in ascending order well we could use the Z range command instead we can also scope this command to retrieve only a subset of our members let's say we want to retrieve only the top three most logged in users we can do so by using the Z rev range command with the values of 0 and 2. we can do the same to retrieve the three least logged in users by using the Z range command with 0 and 3. as we're directly selecting indexes you may want to know how many members exist within our set we can retrieve this number by using the Z card command using these two range commands we can easily sort our members in either ascending or descending order whilst also limiting the number of results we get back very similar to the SQL you see on screen this only applies to the sorted indexes however basically the relation of each member to one another there may be situations where you want to filter based on the actual score value itself similar to using a where command in SQL for example we may only want to retrieve users that have logged in greater than 10 times Well we can actually do that by using the Z range by score command followed by the Min which in our case is 10 and the max to ensure no upper bound we would just use the plus infinite value this will return the two members we have that are greater than 10 sorted in ascending order if we want to retrieve the members in descending order we can use the Z rev range by score command instead we also need to swap the position of the parameters as well we can also use these commands to retrieve our least logged in users to do so we'd instead use the minus infinite value when specifying the Min range we're looking for and setting our max value appropriately in our case we're setting the max to 5 in order to retrieve any members that have logged in less than or equal to five times so great we not only have sorting by values but we also have filtering by range as well we achieve this by using sorted sets but there's another type of set in redis we can use for unique constraints this is the standard set which stores unique members inside of it earlier we created a unique index for our user emails using just a key value pair but what if we don't want an index but instead we only want a unique constraint for example let's say we want to allow our users to select a username but we want to have unique instances of usernames within our database by using the S at command we can add username values into our set creating the set if it does not exist already then let's say when a user goes to set their username we can use the S is member command to check if the username is available if it already exists within our set we get back the response of 1 which represents true otherwise we get back the result of zero the set works as a sort of replacement for a unique constraint but it does require storing related data across two different structures which adds complexity to our data model so the last database concept I think worth touching on is actually really important when it comes to redis perhaps even more so than normal databases that concept is transactions in databases transactions are used to group operations together in anatomic fashion I.E either all of the operations perform or not at all this concept is very important as it prevents us from being left in a partial state to visualize this let's look at some example code for creating a new user we first check to see if the user's email already exists if it doesn't then we proceed with adding the user entry followed by setting the user's email to Id mapping this code is susceptible to a race condition however in rare cases it's possible that in between the time we check if an email exists and then set the user's ID for that email another entry could have been written this means that our second entry will overwrite that existing one even when it returned a success to the user this is very bad however we can prevent this by setting the NX option which stands for not exists in our redis set command what this does is enforces the set command to only apply if the record does not exist already this prevents us from overwriting the original record and lets us know if we encounter an error however it does leave us in a partial state where our user record was created when it shouldn't have been now we could resolve this using reconciliation which is where we unwind our partial State and remove any created entries but this adds a lot of complexity and can still be prone to errors it's much easier to use a database transaction which fortunately redis does support let's look at how to use them to begin a transaction we first need to call the watch command on the key we're interested in in our case this is the user email key that we're going to set up the watch command will cause any transactions to be discarded if a change is made to the watched key from any other client once we have our watched key set up we can enter the transaction using the multi-command and we can confirm that our client is in a transaction with the letters TX next to the prompt next we can go about adding our user hash using the H set command now to simulate a race condition I'll open up another client to redis and set our user's email to point to a different user ID this is the key that we're watching back in the original clients we'll also attempt to set the user email index key but point it to a different user ID finally we'll run the exact command to apply the transaction which returns nil letting us know that the transaction didn't apply we can validate that our user email points to the existing user ID and that our user model was never created showing us that the transaction worked as intended now that we've taken a look at how redis can be used as a primary database it's probably worth talking about a couple of reasons why it may not be worthwhile to do so the first of these is that when it comes to redis there's a significant amount of work that needs to go into creating what databases give us for free out of the box take postgres and SQL for example setting up tables indexes and constraints are all achieved declaratively using a powerful query language this contrasts to redis where each of these operations has to be built using The Primitives that redis offers that being said it is a good thing to learn how to implement these features using Primitives but if you know you're going to need these complex data structures and want to get started quickly then a database like postgres is going to be a better choice the second reason that I would opt to not use redis is actually due to the fact redis stores all of its data in memory earlier we covered that red is storing its data run memory is much safer than one may initially think however there is another elephant in the room memory tends to be much more expensive and less available than storage considerably so this means you'll likely end up paying more money for using redis when it comes to larger data sets either by horizontally or vertically scaling your redis instances however if your data set is small or you need to be really fast then redis may be worth that additional cost to you for me well on personal projects I typically couple redis with postgres often usually as the caching layer before the database itself my data sets also happen to be rather small therefore in the future I would likely consider promoting redis to the primary database on my initial iterations and only migrating to postgres when I need to either way I'd love to know your thoughts has this video convinced you to try redis as a primary database or were you doing so already let me know in the comments down below otherwise a big thank you to my channel members including my newest one canarius and a big thank you to everyone else for watching I'll see you on the next one
Info
Channel: Dreams of Code
Views: 220,998
Rating: undefined out of 5
Keywords: redis, database, cache, python, programming, cpp, golang, postgresql, databases, data, data storage, software, development, engineering, software engineer
Id: WQ61RL1GpEE
Channel Id: undefined
Length: 20min 52sec (1252 seconds)
Published: Sat Sep 30 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.