Redis Crash Course - the What, Why and How to use Redis as your primary database

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video we're going to talk about redis and how redis can be used as a primary database for complex applications that need to store data in multiple formats first we will see what redis is and its usages as well as why it is suitable for modern complex microservice applications we will talk about how redi supports storing multiple data formats for different purposes through its modules next we will see how redis as an in-memory database can persist data and recover from data loss we'll also talk about how redis optimizes memory storage cost using redis on flash then we will see very interesting use cases of scaling readies and replicating it across multiple geographic regions and finally since one of the most popular platforms for running microservices is kubernetes and since running stateful applications in kubernetes is a bit challenging we will see how you can easily run redis on kubernetes i'm proud to say that i have partnered up with redis to make this video so big thanks to redis for sponsoring and making this video possible now first of all redis which actually stands for remote dictionary server is an in-memory database so many people have used it as a cache on top of other databases to improve the application performance however what many people don't know is that redis is a fully fledged primary database that can be used to store and persist multiple data formats for complex applications let's see the use cases and examples for that let's look at a common setup for a microservices application let's say we have a complex social media application with millions of users and let's say our microservices application uses a relational database like mysql to store the data in addition because we are collecting tons of data daily we have an elasticsearch database for fast filtering and searching the data now the users are all connected to each other so we need a graph database to represent these connections plus our application has a lot of media content that users share with each other daily and for that we have a document database and finally for a better performance for the application we have a cache service that caches the data from other databases and makes it accessible faster now it's obvious that this is a pretty complex setup but let's see what are the challenges of this setup first of all all these data services need to be deployed run and maintained right this means your team needs to have some kind of knowledge on how to operate all these data services plus of course for high availability and better performance you would want to scale your services and each of these data services scales differently has different infrastructure requirements and that could be an additional challenge so overall using multiple data services for your application increases the effort of maintaining your whole application setup of course as an easier alternative to running and managing the services yourself you can use the managed data services from cloud providers but this could be very expensive because on cloud platforms you pay for each managed data service separately now on the development side your application code also gets pretty complex because you need to talk to multiple data services and for each service you would need a separate connector and separate logic and this of course makes testing your applications also quite challenging and finally the more number of services that talk to each other the higher the latency because even though each service may be fast on its own each connection step between the services or each network hub will add some latency to your application in comparison with a multi-modal database like redis you resolve most of these challenges first of all you run and maintain just one data service so your application also needs to talk to a single data store and that requires only one programmatic interface for that data service and latency will be reduced by going to a single data endpoint and eliminating several internal network hubs so having one database like redis that allows you to store different types of data or basically allows you to have multiple types of databases in one as well as act as a cache solves such challenges so let's see how redis actually works first of all how does ready support multiple data formats in a single database the way it works is that you have ready score which is a key value store that already supports storing multiple types of data and then you can extend that core with what's called modules for different data types which your application needs for different purposes so for example ready search for search functionality like elasticsearch or redis graph for graph data storage and so on and a great thing about this is that it's modular so these different types of database functionalities are not tightly integrated into one database as in many other multi-modal databases but rather you can pick and choose exactly which data service functionality you need for your application and then basically add that module and of course when using redis as a primary database you don't need an additional cache because you have that automatically out of the box with redis that means again less complexity in your application because you don't need to implement the logic for managing populating and invalidating cache and finally as an in-memory database redis is of course super fast and performant which of course makes the application itself faster but in addition it also makes running the application tests way faster as well because radius doesn't need a schema like other databases so it doesn't need time to initialize the database and build the schema and so on before running the tests so you can start with an empty redis database every time and generate data for tests as you need and fast tests can really increase your development productivity okay great so we understood how redis works and all its benefits but at this point you may be wondering how can an in-memory database persist data because if the redis process or the server on which redis is running fails all the data in memory is gone right and if i lose the data how can i recover it so basically how can i be confident that my data is safe well the simplest way to have data backups is by replicating redis so if the redis master instance goes down the replicas will still be running and have all the data so if you have a replicated redis the replicas will have the data but of course if all the radius instances go down you will lose the data because there will be no replica remaining so we need real persistence well redis has multiple mechanisms for persisting the data and keeping the data safe first one the snapshots which you can configure based on time number of requests etc so snapshots of your data will be stored on a disk which you can use to recover your data if the whole radius database is gone but note that you will lose the last minutes of data because you usually do snapshotting every five minutes or an hour depending on your needs so as an alternative redis uses something called aof which stands for append only file in this case every change is saved to the disk for persistence continuously and when restarting redis or after an outage redis will replay the append only file locks to rebuild the state so aof is more durable but can be slower than snapshotting and of course you can also use a combination of both aof and snapshots where the append only file is persisting data from memory to disk continuously plus you have regular snapshots in between to save the data state in case you need to recover it this means that even if the redis database itself or the servers the underlying infrastructure where redis is running all fail you still have all your data safe and you can easily recreate and restart a new redis database with all the data now a very interesting question is where is that persistent storage so where is that disk which holds your snapshots and the append only file locks located are they on the same servers where redis is running well this question actually leads us to the trend or a best practice of data persistence in cloud environments which is that it's always better to separate the servers that run your application and data services from the persistent storage that stores your data so with a specific example if your applications and services run in the cloud on let's say aws ec2 instance you should use ebs or elastic block storage to persist your data instead of storing them on the ec2 instances hard drive for example because if that ec2 instance died you won't have access to any of its storage whether it's ram or disk storage or whatever so if you want persistence and durability for your data you must put your data outside the instances on an external network storage as a result by separating these two if the server instance fails or if all the instances fail you still have the disk and all the data on it unaffected you just spin up other instances and take the data from the ebs and that's it and this makes your infrastructure way easier to manage because each server is equal you don't have any special servers with any special data or files on it so you don't care if you lose your whole infrastructure because you can just recreate a new one and pull the data from a separate storage and you're good to go again so going back to redis example ready service will be running on the servers and using server ram to store the data while the append only file logs and snapshots will be persisted on a disk outside those servers making your data more durable great now we know you can persist data with redis for durability and recovery while using ram or memory storage for great performance and speed so the question you may have here is isn't storing data in memory expensive so you would need more servers compared to a database that stores data on disk simply because memory is limited in size so there's a trade-off between the cost and performance well redis actually has a way to optimize this using a service called redis on flash which is part of redis enterprise so how does this work it's a pretty simple concept actually redison flash extends the rem to the flash drive or ssd where frequently used values are stored in rem and the infrequently used ones are stored on ssd so for redis it's just more ram on the server and this means that redis can use more of the underlying infrastructure or the underlying server resources by using both ram and ssd drive to store the data increasing the storage capacity on each server and this way saving you infrastructure costs all right so we've talked about data storage for redis database and how it all works including the best practices now another very interesting topic is how do we scale a redis database let's say my one oneredice instance runs out of memory so data becomes too large to hold in memory or radius becomes a bottleneck and can't handle any more requests in such case how do i increase the capacity and memory size for my redis database we have several options for that first of all ready supports clustering which means you can have a primary or master redis instance which can be used to read and write data and you can have multiple replicas of that primary instance for reading the data this way you can scale red is to handle more requests and in addition increase the high availability of your database because if master fails one of the replicas can take over and your radius database basically can continue functioning without any issues now these replicas will all hold copies of the data of the primary instance so the more replicas you have the more memory space you need and one server may not have sufficient memory for all your replicas plus if you have all the replicas on one server and that server fails your whole radius database is gone and you will have a downtime instead you want to distribute these replicas among multiple nodes or servers so for example your master instance will be on one node and two replicas on other two nodes well that seems good enough but what if your data set grows too large to fit in a memory on a single server plus we have scaled the reads in the database so all the requests that basically just query the data but our master instance is still alone and still has to handle all the rights so what is the solution here for that we use the concept of sharding which is a general concept in databases and which redis also supports so sharding basically means that you take your complete data set and divide it into smaller chunks or subsets of data where each shard is responsible for its own subset of data so that means instead of having one master instance that handles all the writes to the complete data set you can split it into let's say four shards each of them responsible for reads and writes to a subset of the data and each shard also needs less memory capacity because they just have a fourth of the data this means you can distribute and run charts on smaller nodes and basically scale your cluster horizontally and of course as your data set grows and as you need even more resources you can reshart your radius database which basically means you just split your data in even smaller chunks and create more shards so having multiple nodes which run multiple replicas of redis which are all sharded gives you a very performant highly available redis database that can handle much more requests without creating any bottlenecks now i have to note here that this setup is great but of course you would need to manage it yourself do the scaling add notes do the sharding and then resharting etc so for some teams that are more focused on the application development and more business logic rather than running and maintaining data services this could be a lot of unwanted effort so as an easier alternative in redis enterprise you get this kind of setup automatically because the scaling sharding and so on is all managed for you now let's consider another interesting scenario for applications that need even higher availability and performance across multiple geographic locations so let's say we have this replicated sharded redis database cluster in one region in the data center of london europe but we have the two following use cases first our users are geographically distributed so they are accessing the application from all over the world so we want to distribute our application and data services globally close to the users to give our users better performance and second if the complete data center in london europe for example goes down we want an immediate switch over to another data center so that the ready service stays available in other words we want replicas of the whole redis cluster in data centers in multiple geographic locations or regions this means a single data should be replicated to many clusters spread across multiple regions with each cluster being fully able to accept reads and writes so in that case you would have multiple redis clusters that will act as local redis instances in each region and the data will be synced across these geographically distributed clusters this is a feature available in redis enterprise and is called active active deployment because you have multiple active databases in different locations so with this setup we'll have lower latency for the users and even if the redis database in one region completely goes down the other regions will be unaffected and if the connection or syncing between the regions breaks for a short time because of some network problem for example the reddish clusters in these regions can update the data independently and once connection is re-established they can sync up those changes again now of course when you hear that the first question that may pop up in your mind is how does redis resolve the changes in multiple regions to the same data set so if the same data changed in multiple regions how does radies make sure that data changes of any region isn't lost and data is correctly synced and how does it ensure data consistency specifically for that redis enterprise uses a concept called crdts which stands for conflict-free replicated data types and this concept is used to resolve any conflicts automatically at the database level and without any data loss so basically redis itself has a mechanism for merging the changes which were made to the same data set from multiple sources in a way that none of the data changes are lost and any conflicts are properly resolved and since as you learned ready supports multiple data types for each data type they use its own data conflict resolution rules which are the most optimal for that specific data type so simply put instead of just overriding the changes of one source and just discarding all the others all the parallel changes are kept and intelligently resolved and again this is automatically done for you with this active active geo-replication feature so you don't need to worry about that and the last topic i want to address with redis is running redis in kubernetes as i said redis is a great fit for complex microservices that need to support multiple data types and that need an easy scaling of a database without worrying about data consistency and we also know that the new standard for running microservices is the kubernetes platform so running redis in kubernetes is a very interesting and common use case so how does that work with open source redis you can deploy replicated redis as a help chart or kubernetes manifest files and basically using the replication and scaling rules that we already talked about set up and run a highly available redis database the only difference would be that the hosts where redis will run will be kubernetes pods instead of for example ec2 instances or any other physical or virtual servers but the same sharding replicating and scaling concepts apply here as well when you want to run redis cluster in kubernetes and you would basically have to manage that setup yourself however as i mentioned many teams don't want to have the effort of maintaining these third-party services because they rather invest their time and resources in application development or other tasks so having an easier alternative is important here as well so redis enterprise has a managed redis cluster which you can deploy as a kubernetes operator if you don't know operators operator in kubernetes is basically a concept where you can bundle all the resources needed to operate a certain application or service so that you don't have to manage it yourself or you don't have to operate it yourself so instead of a human operating a database you basically have all this logic in an automated form to operate a database for you and many databases have operators for kubernetes and each such operator has of course its own logic based on who wrote them and how they wrote them the redis enterprise on kubernetes operator specifically automates deployment and configuration of the whole radius database in your kubernetes cluster it also takes care of scaling doing the backups and recovering the redis cluster if needed etc so it takes over the complete operation of radius cluster inside the kubernetes cluster well i hope you learned a lot in this video and that i was able to answer many of your questions if you want to try redis enterprise cloud be sure to check out my special link in the video description for a 200 credit and if you want to learn more similar technologies and concepts then make sure to subscribe to my channel because i make videos regularly about different devops and cloud technologies also comment below if you have any questions regarding reddis or any new topic suggestions and with that thank you for watching and see you in the next video
Info
Channel: TechWorld with Nana
Views: 70,428
Rating: undefined out of 5
Keywords: redis, redis devops, redis explained, redis crash course, redis tutorial, redis database, redis as a primary database, what is redis, redis replication, how does redis work, why redis, techworld with nana, redis persistence, data persistence in redis, redis on flash, scaling redis, redis in kubernetes, redis use cases, how to use redis, how redis works, redis high availability setup, redis ha, redis microservices, operating redis, redis enterprise, multi-model database
Id: OqCK95AS-YE
Channel Id: undefined
Length: 23min 36sec (1416 seconds)
Published: Wed Dec 01 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.