A Brief Introduction to Consistent Hashing

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

understanding consistent hashing has three key components in this short lecture we will cover the concept of hashing load balancing and finally solving the problem of load balancing through consistent hashing what is hashing a hash function uses a mathematical formula that scrambles its inputted data we can create hashes of nearly any digital content a document an image or a song at its core all of the jatai data is represented in binary a string of zeros and ones for example the word hello in binary is 0 1 0 0 1 0 0 0 and so on it's important to note that a hash function is scrambling the binary code representing the word hello not the letters in hello since the hashing algorithm is just a function the same input will always produce the same output no matter what in normal situations if two inputs produce the same hash we can be pretty confident that the two pieces of content are identical we can therefore use hashing to quickly check whether a changes have been made to an original piece of data or whether or whether the inputted password is correct when logging into your email note that there are many different hashing functions that do the same function but their mathematical formulas are on how they scramble the data is slightly different what is load balancing a server is a computer that serves a request and sends back a response to another computer for example when you log on to your Amazon account your computer is making a request to the Amazon servers in order to render the homepage remember a server is just a computer that sends back information to you when a company has multiple servers we need to solve the problem of how to equally distribute the number of requests that come in to each server let's say you have four servers you want 25% of the requests to go to each server so how can you equally balance the load the traditional way of solving this is through the modulus operator when a request comes in each request is randomly assigned a request ID we then take the request ID in hash it request number one is hashed to be number ten we can then use this hash which is a string of numbers find the remainder divided by the number of servers and the remainder is 2 therefore request one will go to server 2 because the hash function is uniformly random in theory you can expect all of the servers to have a uniformly random number of requests and an evenly distributed load this was the traditionally accepted solution but it assumes that you will always have a set number of servers and that you will never add or remove a server from your infrastructure as a company grows and the number of requests grow we want to be able to add servers with minimal change to the rest of your infrastructure in this approach when you add or remove a server you have to reassign each request to a different server this is because requests number 10 modulo 5 will be 0 so it will go to request server it will go to server s0 servers hold information in your load in their local cache information like your preferences and user profile from the previous session are saved so that the next time you log on the page can rerender where you left off we want to avoid dumping this important important information however with this approach when the entire system changes all of the useful cache information that we originally had will be completely dumped this is because the same request will now go to a different server we want to avoid this we want the overall change to be at a minimum so how can we solve this this is where consistent hashing into play in consistent hashing we first make a conceptual ring of hashes we map onto the ring the hashes of the request ID and the hashes of the server however each request ID will go to the server that is immediately clockwise to them because of the way we map on to the hash ring in a clockwise direction the change to your servers will be smaller and more uniform when a new server is added when a new server is added the only server that is affected is the immediate neighbor the requests that originally went to s1 now go to s for the other servers are not affected because the hashes are uniformly random we can expect the load to be uniformly random and on average be equally distributed however this does not always happen in practice it's possible to have a very non-uniform distribution between servers in this diagram we see most of the requests are being handled by one server s0 in order to ensure a more evenly distributed load we can introduce the idea of virtual nodes remember that there are many different kinds of hashing functions each server is hashed with different hashing functions the inputted server ID is the same but the outputs are slightly different this means that server s 0 can be hashed with different functions and the outputs will be from hash 120 from hash 250 and from hash 370 the inputted server ID is the same but the outputs are slightly different this means that s0 is hashed with hashing function 1 and then output it to be 20 then hashed its and then it's also hashed with hashing function 2 and 3 and the outputs are slightly different this means that server s0 live lives in multiple places on the hash ring instead of being here on the hash ring it can now live here and here and here this increases the randomness of the load and when a new server is added the requests that are affected are more uniformly random additionally if one server has more capacity than others we can put it through a higher number of hashing functions so that it lives in more places along the hash ring because it lives in more places it will serve more requests and that is virtual nodes that was a brief introduction to consistent hashing thanks for watching

Info

Channel: H Barton

Views: 80,059

Rating: undefined out of 5

Keywords:

Id: tHEyzVbl4bg

Channel Id: undefined

Length: 7min 54sec (474 seconds)

Published: Mon Nov 19 2018