How databases scale writes: The power of the log ✍️🗒️

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey everyone this is G kcs today we'll be talking about a way in which we can optimize rights in our database so you have your client which sends information to your server the server has to write to the database because it needs to maintain some state how do you scale this you can think of your database as a data structure the speedup queries the traditional database uses a data structure called a B+ tree this is like a binary search tree without the binary inside it there's multiple paths that each node can follow and the B+ tree is preferred because it gives you good insertion and search times both of them are order log n so whenever you fire a SQL command of an insert or a select that maps to an insertion or a search operation in the B+ tree and the underlying data structure is manipulated each of these operations requires an acknowledgment from the database which says yes I have successfully executed the request the scale our database we want to reduce the unnecessary exchange of data which is acknowledgments and headers and also we want to reduce the i/o calls which will help us free up some resources and reduce the request response times if we can condense all of this data into one block and send it in one shot to the database and get one acknowledgment that's a good idea so that would be more efficient use of your bandwidth because you don't have any extra data that you are sending around and also you're going to be doing less work because there'll be one i/o called one I or response condense data queries into a single query that's our first idea the server has to give up some memory to condense this data query these bunch of data queries so that requires additional memory so this is one of the drawbacks and the reason we did this which is one of the advantages is that we need to use lesser i/o operations if you have a lot of write operations what is the fastest data structure which can help you write operations quickly [Music] the link list it's the most basic data structure if you think of it all you need to do is when whenever there's a write in a linked list you just go to the end and append something to it and that's exactly what we are trying to optimize your so the write operations in a linked list are order one it's constant time because all you need to do is you have this linked list over here and you go to the end and you just add a node and it goes to the next node we have a well-known data structure which follows the philosophy of a linked list which is called a log okay and you can get the idea of whether log structured merge tree is coming from we are going to be using a log to assist data because it's really fast when it comes to write operations what is the disadvantage of a log read operations are really slow in a log if you need to find something you just can't jump to one point you have to actually search through sequentially the entire log and if you can think about your database as a log that's a huge amount of data which you'll need to read for every query that a person will be sending so the advantage of this is that you have fast writes but the clear disadvantage is that you have slow reads so we have two clear advantages with us which we want to keep because fast writes and lesser i/o operations is something that sounds really good but the drawbacks are something that we want to cut down on so the additional memory you can't really help much because if you need to condense the queries and keep them somewhere they have to be somewhere so the additional memory is fine it's not something we can optimize on much all you can do is you can kind of constrain the maximum number of blocks that you'll be keeping in memory before you flush into the database so this thing can be managed what we really don't want to do is we don't want to slow down our read operations if you're thinking about Facebook where there's a lot of people who are posting things the only worse thing compared to a slow post is a slow reader operation where you want to know what your friends are doing and it takes you 30 seconds to actually get that news feed onto your onto your browser so slow leagues is something that we definitely want to avoid and what can we do about that well this is some good - the first thing is that slow reads on what you know your your reading not on the log you don't want to be reading on this linked list because it gives you by definition a sequential search which is order n so that's pretty bad can you improve on this no not as a linguist you can't but if you convert your data to a far more sensible data structure which is something like a tree or something like a sorted list a sorted array that makes your searching really fast yeah so if you have instead of the linked list if you use something like a sorted array then your search time becomes order log n but this seems counterintuitive because you already had a B plus tree which was giving you log in such time it was giving you log in insertion time so what should we take a B plus tree or a linked list a linked list is great at writing but terrible at reading but the reason we have moved on to it is because B trees give us or a write performance and that's why we have the MAGIX addition of using a linked list with a sorted array to get great write speeds and great read speeds so if we would have this data in a sorted array it would take us login time which is really fast this takes less time compared to order n of course the sequential search is really slow so can we convert our linked list to a sorted array we can we can we don't want to do it in the memory because that defeats the purpose the whole point of having a linked list was so that you do fast insertions if you want a sorted arrays insertions are really slow so you won't do it here you instead will do it somewhere over here in the database okay so whatever information you get from the client you sort it and then you persist it so that the read operations are super fast let's talk about the mechanics of how this is going to work firstly we mentioned that we need some sort of a log that's over here which is being appended to by the server so every time a new record comes in the latest record over here is J so point 31 you keep appending to this log after a certain point which is a threshold of the number of Records you can keep in memory you persist this to the database in one shot so what's happening here in the DB is that this data is going to be sorted before it is persisted or rather it's going to be persisted in a sorted fashion so you can see that the numbers here are jumbled up while here it's 12 17 19 23 31 47 it's a nice sorted way what you can do here on top of this is do a binary search which allows you to query this data in a efficient manner but what if you have some more data coming in so if after this append is complete you want to append new stuff to your database so you don't just take it and immediately push it to the database if we do the same thing as we did over here now that we have these records and you want to persist them the database all you need to do is you need to send this information to the DB when you want to persist this in the DB the DB can do two things one is it can take these six records and merge it with these six records so it'll create a sorted file of size twelve but that's not maybe the smartest thing to do because if you think of it this is going to happen all the time after 10,000 records in the DB you send in six more records what it needs to do is it needs to sort n thousand and six records for the additional six records that you sent here so the increment of six records is actually forcing the entire database to be sorted you know flushing is nothing compared to sorting huge bunch of numbers this is time complexity and log in so how do we optimize this one of the things you can do is instead of merging arrays all the time you can keep them in sorted chunks so your database is going to be represented by chunks this is a chunk of size six this chunk after being sorted will be of size six now whenever there's a read query in your database all you need to do is you need to search through this chunk do a binary search on this chunk if you don't find it then you do a binary search on this chunk and there's just two chunks that you have to go through this is better than slowing down your write operations tremendously so you're right operations are reasonably fast even now but your read operations are slightly slower so if you have n insertions the number of chunks you'll have is divided by six if you draw the graph for this you know it's after all a linear graph so this is again order n n by six is the graph literally so your read times are extremely slow even now imagine in something like Facebook you have billions of records in the database 1 billion divided by 6 is still very very slow so how do you optimize this well you can use some sort of a hybrid approach which is taking some records and merging it with the other records as long as the short time is not too high so when you have six ecords in six Eckerd's over here you merge them together to get one sorted array and this uses the standard mode sort technique so I take the first two elements over here because they're sorted I know that they're the smallest in their respective areas so twelve and one of the smallest elements in these two respective areas which one is smaller one so I take one and Sam and I pump it out over here now my point is still at twelve over here while my pointer has changed to 26 over here 2l is smaller than 36 I get Tim pointer changes to this point John which is at position 17 so 17 is smaller than 36 I take John and this is the basic merging process when you're taking two sorted arrays and you want to merge them into a single sorted array what you need to do is you need to move with the two pointers and keep taking the smaller element so the next one will be iris which is 19 and then push the pointer forward so it had moved here these are now gone so there's 23 again it is bigger than 36 so gada will come and they'll be Jane and then finally after Jane when the pointers are 47 with Mac 47 is greater than 36 so you will pull out Larry and then so on and so forth so you get this nice sorted array of size 12 so now what's happening is that every time you are getting a chunk you are deciding on whether or not you should merge this chunk with the existing chunks in the background what we are going to do is we are going to take chunks and we are going to merge them together to make just sorted Aries so that when there is a search query this large sorted array can help us reduce the overall time complexity in the background what we are going to do is we are going to take chunks and we are going to merge them together to make larger sorted areas so that when there is a search query this large sorted array can help us reduce the overall time complexity to clarify things if you get three chunks in total so that means you have 18 records in total which are broken into three chunks then the time required to search in one chunk is log to the base 2 of 6 so that is approximately equal to 3 and if you have to search in all the chunks for a particular record let's say I'm searching for record number 8 I saw sure I don't find it I searched sure I don't find it answer and I find it in the end so the total number of operations I'll need to do here are log of 6 to the base 2 3 times so that is in total around 9 operations while instead if in the background I was able to merge two sorted arrays and change that into a sorted array of size 12 then the total number of operations I have to do in this for these two chunks is just 4 operations and the number of operations I have to do here are 3 so 3 plus 4 gives you 7 operations so there's a clear saving here but obviously you need to sort I mean you need to merge two sorted arrays and then persist it so that has some overhead let's see what is the best approach if you get another sorted chunk of size six what I'll do is I'll take these two chunks and merge them together and have two sorted chunks of size 12 can I do better yeah I can take those two sort chunks of size 12 and merge them into a single sort of chunk of size 24 so log of 24 to the base 2 is equal to 5 approximately and instead if I had them all 4 if I had all four chunks unmerged then I would have 3 into 4 which is 12 operations so you can see that there's a clear saving of 5 instead of 12 if you sort arrays together so when do we decide to sort arias it's very similar to how the merge sort works if you have two blocks of size 6 you convert them into a block of size 12 if you have two blocks of size 12 you convert them into a block of size 24 and so on all the way up till your maximum size which is going to be n number of insertions that's the maximum possible size you can have what happens with this strategy is that you have a large number of blocks of varying size for example this one is of size six this one is of size 12 and so on and so forth you'll have some blocks of 24 also which have been merged after some time so your read operation is spread across these blocks and to speed up your read operation you can try some clever hacks the idea you can apply here is something called a bloom filter there's a video I made on that you can check it out the basic idea of a bloom filter is quite simple if you have a book let's say with a lot of words in it so you want to find the word cat in that book sociaty now one of the things you can do is you can search the entire book for the word cat or if the book provides you a particular data structure which is called a bloom filter yeah I had 26 positions in this bloom filter there's 1 for a 1 for B 1 for C D and so on and so forth if there exists any word in the book starting with the letter C I mark this as true I mark this as 1 so the position of C will be marked as 1 so if the word cat exists in this book definitely this point is going to give marked as true think about some edge cases what if I have the word Karuna in this book will this letter be marked yes but what if I don't have the word cat and I just have the word karana in this book will this letter be mark yes so that's called a false positive so in bloom filters you can have false positives you cannot have false negatives if C if there is no word with C there's no way that this letters going to be marked that's the general idea of a bloom filter and you can think about you know your book being the database and your search query being the bloom filter query if you are looking for whether this record exists in this chunk all you need to do is you need to create a bloom filter on top of this chunk which tells you whether or Eckerd exists in this in this list or not so this speeds up your querying because you get false positives the rate of false positive can be managed by increasing the number of bits yeah instead of saying whether there's a letter with but there's a word with C I can say are there any words it's C a so I'm taking combinations now so Co is going to be marked but C is not going to be marked so this is going to be a proper negative and I can speed up queries that way I can use that concept to create bloom filters here and speed up my search queries also anytime I merge two areas I'm going to be requiring a separate bloom filter for this one a good idea is to create a bloom filter of larger size because there's more information here the chance of a false positive is higher therefore I need to increase the information in the bloom filter to reduce the chance of a false positive this way when you are reducing the false positives you are indirectly reducing the read times also you don't have any useless reading that you are doing because this is accurate there's a lot more to log structured merge trees but this is a reasonable introduction to the concept what you want to do is you want to speed up writes for that you use the data structure which is a linked list but you can't read well on linked lists so you in the background merge sorted arrays which can be binary searched to get fast read operations so your rights are fast because you're flushing rights in in a sequential fashion but your reads also fast because you are reading from sorted tables which have been made in the background that's the general idea for lsdm some of the important terms you can take from here are this sorted set of Records is called a sorted string table yeah the reason being that it is a sorted string table the other thing that we did off of taking these two arrays and merging them together is called compaction so compaction is taking lots of sorted string tables and using the merge process condensing them into a single array so that we can do fast search queries if you have any doubts or suggestions on this you can leave them in the comments below if you like the video then make sure to hit the like button and if you want notifications for further videos like this hit the subscribe button I'll see you next time

Info

Channel: Gaurav Sen

Views: 70,805

Rating: undefined out of 5

Keywords: system design, interview preparation, gaurav sen, system design interview, database, databases, log, log structured merge tree, write operations

Id: _5vrfuwhvlQ

Channel Id: undefined

Length: 17min 22sec (1042 seconds)

Published: Fri Apr 10 2020