Clustering in Redis

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Hello. Today we're going to talk about scaling Redis. If your application grows, then you're going to need more memory, more CPU, and more throughput. So we're going to learn what it means to scale Redis to give you the extra capacity you need. We'll specifically learn how Redis Cluster scales and provides high availability. Stay with me. [MUSIC PLAYING] Before we jump into the details, let's first address the elephant in the room-- DBaaS offerings, or DataBase as a Service in the Cloud. No doubt it's useful to know how Redis scales and how you might deploy it, but deploying and maintaining a Redis Cluster is a fair amount of work, so if you don't want to deploy and manage Redis yourself, then consider signing up for a Redis Cloud, our managed service, and let us do the scaling for you. Of course, that route is not for everyone. And as I said, there's a lot to learn here, so let's dive in. We'll start with scalability. Here's one definition. "Scalability is the property of a system to handle a growing amount of work by adding resources to the system." The two most common scaling strategies are vertical scaling and horizontal scaling. Vertical scaling, or also called scaling up, means adding more resources like CPUs or memory to your server. Horizontal scaling, or scaling out, implies adding more servers to your pool of resources. It's the difference between just getting a bigger server and deploying a whole fleet of servers. Let's take an example. Suppose you have a server with 128 gigabytes of RAM, but you know that your database, we'll need to store 300 gigabytes of data. In this case, you'll have two choices. You can either add more RAM to your server so it can fit the 300 gigabyte data set, or you can add two more servers and split the 300 gigabytes of data between the three of them. Hitting your servers RAM limit is one reason you might want to scale up or out, but reaching the performance limits in terms of throughput or operations per second, is another. Since Redis is mostly single-threaded, a single Redis server instance cannot make use of the multiple cores of your server CPU for command processing. But if we split the data between two Redis instances, our system can process requests in parallel, effectively doubling the throughput. In fact, performance will scale close to linearly by adding more Redis instances to the system. This pattern of splitting data between multiple servers for the purpose of scaling is called sharding. The resulting servers or processes that hold chunks of the data are called shards. This performance increase sounds amazing, but it adds some complexity. If we divide and distribute our data across two shards, which are just two Redis server instances, how will we know where to look for each key? We need to have a way to consistently map a key to a specific shard. There are multiple ways to do this and different databases adapt different strategies. The one Redis uses is called Algorithmic Sharding, and this is how it works. To find the shard for a given key, we hash the key and then mod the result by the total number of shards. Because we're using a deterministic hash function, this function will always assign a given key to the same shard. But what happens if we want to increase our shard count even further, a process commonly called resharding? Let's say we add one new shard so that our total number of shards is three. Now, when a client tries to read the key "foo," they will run the hash function and mod by the number of shards, as before. This time, the number of shards is different and we're modding with three instead of two. Understandably, the result may be different, pointing us to the wrong shard. Resharding is a common issue with the Algorithmic Sharding strategy. This can be solved by rehashing all the keys in the keys base and moving them to the shard appropriate to the new shard count. This is not a trivial task though, and it can require a lot of time and resources, during which the database will not be able to reach its full performance, or might even become unavailable. Redis uses a clever approach to solve this problem-- a logical unit that sits between a key and a shard, called a hash slot. The total number of hash slots in a database is always 16,384, or 16 k. The hash slots are divided roughly even across the shards, so for example, slots 0 through 8,000 might be assigned to Shard 1, and slots 8,001 to 16,384 might be assigned to Shard 2. In a Redis Cluster, we actually mod by the number of hash slots, not by the number of shards. Each key is assigned to a hash slot. When we do need to reshard, we simply move hash slots from one shard to another, distributing the data as required across a different Redis instances. Now that we know what sharding is and how it works in a Redis Cluster, we can move on to high availability. Redis Cluster is what provides sharding and high availability in open source Redis. High availability refers to the Cluster's ability to remain operational, even in the face of certain failures. For example, the Cluster can detect when a primary shard fails and promote a replica to a primary, without any manual intervention from the outside. But how does it do it? How does it know that a primary shard has failed, and how does it promote its replica to the new primary? Say we have one replica for every primary shard. If all our data is divided between three Redis servers, we would need a six-member Cluster, with three primary shards and three replicas. All six shards are connected to each other over TCP and constantly ping each other and exchange messages. These messages allow the Cluster to determine which shards are alive. When enough shards report that a given primary shard is not responding to them, they can agree to trigger a fail-over, and promote the shard's replica to become the new primary. How many shards need to agree that a fellow shard is offline before a fail-over is triggered? Well, that's configurable, and you can set it up when you create a Cluster. But there are some very important guidelines that you need to follow. To prevent something called a split brain situation in a Redis Cluster, always keep an odd number of primary shards and two replicas per primary shard. Let me show you what I mean. If you have an even number of shards in a Cluster, say six, and there's a network partition that divides the Cluster in two, you'll then have two groups of three shards. The group on the left side will not be able to talk to the shards in the group on the right side. So the Cluster will think that they are offline and will trigger a fail-over of any primary shards, resulting in a left side with all primary shards. On the right side, the three shards will also see that the shards on the left as offline, and will trigger a fail-over on any primary shards there were on the left side, resulting in a right side of all primary shards. Both sides thinking they have all the primaries will continue to receive client requests that modify data, and that is a problem, because maybe Client A sets, the key "foo" to "bar" on the left side, but a Client B sets the same key's value to "baz" on the right side. When the network partition is removed and the shards try to rejoin, we will have a conflict, because we have two shards holding a different data, claiming to be the primary, and we wouldn't know which data is valid. This is called a split brain situation, and is a very common issue in the world of distributed systems. A popular solution is to always keep an odd number of shards in your Cluster, so that when you get a network split, the left and right group will do a count and see if they are in the bigger or smaller group, also called majority or minority. If they are in the minority, they will not try to trigger a fail-over and will not accept any client write requests. Again, to prevent this kind of conflict, always keep the odd number of primary shards and two replicas per primary shard. Well, that was fun. Bet you didn't expect this video to scale that much. If you'd like to learn more about deploying Redis for production, check out our course in Redis University, Running Redis at Scale. We have collaborated with our Technical Enablement Team members, Elena and Kurt, to produce a curriculum to prepare your Redis instance for large scale deployment. As always, thanks for watching and I hope to see you again soon. [MUSIC PLAYING]

Info

Channel: Redis

Views: 45,393

Rating: undefined out of 5

Keywords:

Id: 3WOfXRjYnGA

Channel Id: undefined

Length: 8min 27sec (507 seconds)

Published: Fri Sep 10 2021