What is Database Sharding?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so so how do you query this database so I would optimize the queries using an SQL let's say we have a lot of data so optimizing queries is so you know old school so I could make an index on the table all right indexing is indexing is cute but you're looking for something it's just serious right we got lots of data secondly use a new sequel no we are not going to learn our DMMs now for the final time what do you think we should do so sharding I'll use shaadi okay hired what is shadi let's say your pizza and you can't have the entire thing by yourself so you break it into slices and call your friends over eight friends now each of these friends is going to get one slice of pizza what you have done effectively is partitioned the pizza according to each friends share just like that we can have servers which are going to be taking the load of the requests which are which are being sent into it so if there's over here now how do you get that - the pizza model user ID number zero is going to start here 100 starts here 200 what you have effectively done is taken all the server requests that you had and map them onto a pizza such that each of these slices is going to be served by one server in this case so ID number six the key thing to notice here is we couldn't eat the entire pizza by ourselves we needed friends to finish the pizza to handle the pizza effectively and when you're getting friends along you are effectively taking the range of the pizza and breaking the pieces when you are doing that you are partitioning the pizza this kind of partitioning which uses some sort of a key to break the data into pieces and allocate that to different servers is called horizontal partitioning horizontal partitioning depends on one key which is an attribute of the data that you're storing to partitioned data you can contrast this with vertical partitioning there's a link in the description below which uses columns to partition data effectively but we are focusing on the horizontal partitioning bit and specifically we are focusing on one concept which is sharding now we mentioned that sharding is taking one attribute in the data and partitioning the data such that each cell gets one chunk but what I mean by server's the server's here our database servers you can contrast this with what we have been talking till now about normal servers normal servers are application servers their platform servers which deal with data but they try to be as stateless as possible to keep things decoupled and really nice and clean this is going to be dealing with the meat of the data all right and we can't afford to have any goof-ups over here consistency is important this is one of the key attributes of any database that whatever data you persist in it is what you can read out of it later on and there is some sort of synchronization that if a person makes an update the new request is going to read that update okay so that is consistency also what we look at is availability meaning that the database should not crash and stay down you don't want that you want your application to be running all the time but consistency Trump's availability when it comes to data in most cases there are more things to think about what should you charge your data on in our case we have used user ID but in applications like tinder which use location you could shard on the location and then if a person says find me all the users in City X then X may fall in this shard and all you need to do is just read through this shard which is what this database database server number seven can do for you right that shard is going to be smaller in size it's also going to be easier to maintain probably going to give you faster performance everything good about sharding and the first problem that you have to take into consideration is joins across shard if these are across shards what's going to happen is the query needs to go to two different shots they need to pull out their data then join the data across the network and this is going to be extremely expensive so one of the problems here joins the second point comes when you look at the pizza and you realize that this is completely inflexible the shards are inflexible you can't have more pizza slices or less pizza slices it's already done but we want our database servers to be flexible in number so one of the really good algorithms for this is consistent hashing you should have a look at that there's one database which actually uses this and that is memcached right this doesn't really implement consistent hashing you can use an application logic above the database memcached to get your work done so it's not really a problem it might be a problem but you can't have dynamic number of shots now to overcome this problem what we do is take a shard which has too much data in it and then dynamic break into pieces so this pizza slice is like a pizza for us yeah when we magnify it enough it's going to be a really large slice and then we break it into smaller pieces so there's going to be some sort of a manager for every particular shot which is going to map the requests to the correct mini slice so to speak in the pizza slice single pizza slice using this technique which is hierarchical sharding we can get rid of the inflexibility over here so point number two is no longer a big problem now one of the smart things to do here is to create an index on these shots assuming your query requires that this index could be on a completely different attribute compared to the user ID and one of the good examples of this is find me all the people in New York who have age greater than 50 so if these are the city IDs then New York is going to land let's say here and then you can index on age so you'll find all users in New York within a given range of age so all of your queries are fast see that's the most important thing about sharding your read performance goes up and your write performance goes up because all of your queries fall on one particular point but what happens if a shard fails let's say there's some sort of electricity issue over there in that case you could have something like a master/slave architecture the master/slave architecture is a very common architecture what happens in this is that you are multiple slaves which are copying the master whenever there's a write request it's always on the master the master is the most updated copy while the slaves continuously pull the master and read from it what then happens is if there's a read request it can be distributed across the slaves while if there's a write request it always goes to the master in case the master fails the slaves choose one master amongst themselves right and so there's good single point of failure tolerance over here conceptually it's quite easy you just take your data break it into pieces break in two ranges essentially and then persist in different places but when it comes to practical application this is quite tough because this guy consistency is difficult to do and if you're just starting out with your system and you're thinking about starting I suggest that you take into consideration other mechanisms like indexing like using no sequel databases which internally actually use these kind of concepts but to use those ready-made solutions or to use well-known solutions like indexing is probably the way to go before you go for sharding a database even more difficult in charting is to hit the like and the subscribe button at the same time if you're able to do that then you'll get notifications for for the videos and I'll catch you next time
Info
Channel: Gaurav Sen
Views: 609,343
Rating: 4.8679166 out of 5
Keywords: sharding, database, database interview, system design, server side design, interview, partition, database partition, gaurav sen, gkcs, software interview, interview question, database design, scalability, scaling, database optimization, horizonal partitioning, partitioning
Id: 5faMjKuB9bc
Channel Id: undefined
Length: 8min 55sec (535 seconds)
Published: Tue Aug 07 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.