B-tree vs B+ tree in Database Systems

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

i get asked sometimes about the importance of data structures and algorithms and my opinion is is very clear on these these are very critical it really depends why are you learning learning them if you're learning for to pass an interview or to get a job then it's they are just a burden you're gonna memorize them you don't really understand what they do but if you truly go into this data structure and really look at the route why they will do it do they exist and what problems do they solve they're gonna make you an adept uh software engineer you're gonna use this to your advantage and you're gonna nobody's gonna their way through you at all because the you really know when to use this and i'm not going to talk about development here coding here really i'm talking about actual engineering working with a database right for example and that's the topic of today b trees we're going to discuss all about the concept of b trees and how would improve to b plus 3 and what is the difference and the advantages what problems do they solve what are the limitations of these things all of this stuff we're gonna discuss it in this episode stay tuned welcome to the backend engineering show with your host hussein nasser and guys if you like this deep dive uh discussions of our database concept you might enjoy my introduction to database engineering udemy course over 14 hours worth of exclusive content that you won't find anywhere else so if you're interested check out the pinned comment below and the show notes as well if you're listening on the podcast it really supports the show thank you so much scott all right guys so b3s in order to discuss this data structure why does why was it invented we need to go back into a very real use case you have a table in a database it's a huge table millions and millions of rows and you want to search for a particular row by its identification let's say id number 1007 you want to look at that id okay and you just you don't want to really look at that id for the sake of looking at the value 1007 no you want to pull more information right so you want to get to that row so you can pull more stuff if you want to search for 70 you already know 70 you're not searching for 17 for the 17 you want more and that's the key 17 the value is the tuple for the longest time what we do is one by one let's read the table page by page oh one page one thousand rows another page one thousand rows read it nope 1007 is not any of them so you have to read one by one until you find it and if you sure that this is unique so you're gonna stop the search right and then exit there this is called a full table scan you're scanning the entire table right and if it's not unique you know there's more to it you're gonna scan the entire table eventually which is very slow scanning one million rows and worst case scenarios two million three million very very slow so it's slow the trick is we're saying make it faster make it faster that's what we want so how do you make searching one million stuff faster well one way is to parallelize it let's just split the table in in three four five seven and then let's distribute the jobs on these threads or these machines and let this search that becomes complicated but that's that's one way of doing it another way of doing it is that partition it partitioning a very very important concept we break the table into chunks based on key sizes so okay values from one to one thousand and this partition values from two one thousand to two thousand is in this partition on this table so it's just essentially break it down to multiple tables right another solution we're doing is create an index so we search on the index this small data structure and then once we find it we jump into the exact page that we need in all these scenarios and many many other scenarios we always reduce the search space that is it we're not doing anything magic really if we want we do i made a video so like how to work with a billion row table okay it's it's a it's exclusive for members in this channel and the and the takeaway from that video was in order to work with a billion row table you need to avoid working with a billion rows yes it's not really rocket science don't search one billion rules find any sneaky way to avoid working with large datasets and then and and it's try to segment and eliminate things that you know is not it's not going to give you the results that's the trick and here's where this topic is indexes the indexing i made a video about indexing check it out but the idea of having this index so that i can search only the things that i am absolutely sure that the row that i'm looking at the value of look at is in that space right and when you've when you design and structure your index that way you're going to get a smaller set you're going to search fewer search space and that the fewer the the search space the smaller the search space the faster the results rocket science i know okay so what's saying how do we build these structures these trees came into the equation so okay how about we build trees so how do we build these trees for the longest time we had this idea of a binary tree which is a very very simple beautiful structure that says okay you put a value on the top and then larger values go to the right smaller values of this node go to the left very simple data structure okay so if we put a value of 100 here okay anything above 100 goes to the right of the tree right and i'm doing right here it's my right i don't know if it's you right and anything to the left of the three is less than 100. so if you're searching for the value of i don't know 200 then you always go to the root so that's one cost jumping to the root you say okay i'm looking for 200 you know what you're looking at is 200 greater or less than 100 this is a cheap check that gives you almost no cost at all but that check eliminates almost half if you're lucky half the search space because just like that you're now searching the right hand side because ah you know that 200 is going to be in that area and as a result you only going to search 100 and above so you're searching less results and this is a very simple example but the problem with binary three are by default they are not really balanced what does that mean take this example let's assume the root is one okay and then any value that puts uh that you add is greater than one what will happen two three four five six seven with you're gonna end up with a linear search space that you didn't really that is as absolutely useless because this is called an unbalanced tree you're gonna end up almost like if it's not even slower than a full table scan you're going to jump into multiple pages to get into the value of the 200 but you end up doing more work effectively to search that so binary trees didn't really satisfy database workload because of this lack of balancing okay so we need a self-healing self-balancing tree this is where b3 came into the situation and b3's guys stands for really doesn't have it the paper doesn't really say what it stands for it could be balanced could be boeing which is the company that uh the the research lab that this paper came from or could be bayer one of the creators of this uh the researchers well yeah it could be anything really and the beauty of this is it's really truly balanced so if you think about it what does really balance really mean that means as you insert the tree balances itself and if you're an engineer almost like wait a second you're balancing as i'm inserting stuff the moment you hear that if you really pay attention then there's a cost at all the cost and that is where you need to look so there is a cost on rights which is this constant rebalancing that needs to happen to balance itself so it's the left is equal to the right but at the cost of this as i'm writing i'm continually balancing my tree the reads my reads are gonna be very very quick quick as a result so b3s are nothing but a generalized binary tree you can have any number of children not just two you can have seven you can have a thousand really the databases actually calculate that on on the fly let's take an example all right so this is one of the most popular websites that demonstrates b trees and it comes from university of san francisco fantastic website really very simple right no fancy stuff just to the point so we're going to explain b trees here with in this website so effectively so let's say my b3 has a maximum uh chill uh number of child the degree of the order is three that means it can have up to three nodes okay so i'll go ahead and insert a bunch of values that's gonna insert a value of one so this is this is going to become the root which is one here if i insert the value of two right then when i want to insert a value of 2 we're going to look is 2 greater than 1 well yes but i'm not going to add it as a children because i can actually put the value 2 inside the note itself as an element okay so if i get an insert now i can have another value and here's the thing you might say jose how many elements or keys can you add in the node itself it is the value which is the maximum degree minus one as effectively it's always like that so three minus one so two is the maximum you can get so now if i want to insert three what will happen here is okay i'm going to insert three here but i'm gonna exceed my my coda in this node because i'm just exceeded by that so i'm going to split this note and that split hurts that split sometimes really hurts specifically if you're doing a lot of stuff the splitting of the page and doing this work that's all database io right think about it this way as i go through this so if i insert into the value of three it's gonna insert that but what will happen is two is gonna be promoted as root and one is gonna be to the left and three is going to be to the right let's go ahead and do this boom as we can see again this now i still if i now answer the value of four what will happen here so if i come to here and i say okay four i'm going to add four four is greater than two so it has really too nearly to go to the right of the tree right not to the left so what will happen here is as i insert this it will go to the four it will go to the add and then it will add it as another element in this right three and then you can see that now the three has three and four okay so how do you read this tree let's continue reading it so the value is 2 which is the key here there is something that is hidden here that is not shown which is the actual value of that point the pointer where this value points to so to the left we have the value of one to the right of the tree anything that is greater than two we have this node now you go to this node and you find three and four are elements in this node so let's add five what will happen what do you guys think what will happen when you add five well five is greater so it needs to go here right and if i add it right here it's gonna go to the node but we just decree exceeded the number of elements it should be just two we shouldn't go beyond two so now four is gonna be promoted as a node three is gonna go to the left and five is gonna to the right but since we want to keep the levels four are gonna be pushed so we're going to be 204 and let's go let's add it so we can explain this add it we'll get added and then promote that four and here's here's how you read this let's read this this is a little bit different than a binary tree so this node has the value of element of two an element of four okay the key is two and four to the left any values that are less than two are here any values that are between two and four are in this pointer you go here and you search there could be one element could be 700 depends on the degree of the tree and then any values that are greater than four goes to the left so it's a full balanced tree but look at the work that we're doing as we insert and we're just balancing and undoing all that stuff uh so and that's because we we have a very small uh element degree of the size which databases shouldn't use a value of three you should have a huge as as big as your page size all right so talking about the b3s let's talk about the the the benefits of this if i want to search let's find the value find value of three very simple right before we go through it three three well three is between two and four first of all i need to go jump to the root and say okay where's three three is between two and three two two and four so i mean i need to jump to this pointer so i follow this pointer to whatever note it points me we are seeing an actual node but in databases this is a page mostly at least the postgres implementation this is a page itself this is a page this is a page and there is thousands and thousands of elements in that page right so this is an i o this is an i o so now i go to three and then i found three this is a very very short three essentially and you just found it the value three i'm not really interested in the value three i'm interested is what is the content of the value three and that is the value here there is something that is not shown in this diagram i wish they they actually showed them but there is a there is a value there's not this this is not just the tree we're not looking at just the numbers here there is some content that associates with the key it's almost like a key value right so the 2 has a value next to it in b3s every element has a value attached to it so two could have any value it could point postgres implementation pointed directly to the tuple id right my sql implementation pointed directly to the primary key right in that case and that's why if you use a primary key in in my sequel that is like a good then your indexes are going to be a huge very huge right so be careful what primary key you use in my sequel right and and and post because they don't have that problem effectively because everything is a secondary key and they always point to the tuple all right so that that's that's what we're looking at so who's saying why are you telling us that there's values here because that is the difference in the other data structure that is called b plus three and what problems it is solved so now when i when i'm searching here i this is a page right this is a whole page so if there is two and the value of it whatever that value is if if if you if you're in my sequel then it's the pointer to the primary key so the primary key data type goes there if it's postgres is the tuple i believe it's i don't know if there's two bits maybe forgot what's the table id value size so that that is the tuple id in that case so you you put more bits here so this page is gonna be is gonna fit fewer elements because we are carrying also values right if i only had the keys i could fit more elements in my page which is let's say in pos because it's 8k i can fit more elements there and as a result i can search i can i can traverse through much more elements in a single io compared to this data structure which has the values which let's be honest the values are out of burden if you think about it another thing here is what if i am searching for one three and five at the same time how do i do that well oh let's search for one oh one is here so i'm gonna read here okay we found it okay now now i want to search for three oh so let's do it again go here and then three so you did a double jump to get to the value of three okay i wanna search for five well let's do it again here five so you did three ios to get to five and this is not uh a great example but you you you see that the work you did it's like three jumps three hops what if you're reading 100 you're gonna go all over the place to find these values because they are although they are ordered in the tree but the they are they cannot be they are not sequential once you find a value it has nothing to do with the value 5 despite them being next to each other in logical view so now we're looking at a b plus three and the difference between a b plus three and a b three is really just the realization that we're wasting uh space we're wasting space by putting the values in next to the key in the elements it says okay what if i don't put the value here so okay you have to put the value eventually right no i'm going to put the values at the at the leaf nodes only so now that oh saying that means you have to duplicate your keys as we're seeing the value three here also appear at the leaf but that is a very small cost compared to the benefits that we're getting all of a sudden now these intermediate nodes that we're searching are so tiny we're only searching keys here they effectively can fit entirely in the memory so i can traverse this stuff without actually having to worry about the value of them but once i find something here's another benefit of a b plus 3. once you find something there is a pointer to the next actual key so technically once you find something you can find pretty much everything range queries in this case and a b plus three are much more effective than an mb3 because if i give you hey find me all values between one and five in a b3 if you go to a b3 between one and five you have to search one and again you have to find obviously two you have to find three you have to have four i have to five because you have to find every one of them the values itself right so you have to traverse go back and forth the tree to find the stuff however in a b plus three just go and find one we found one once you found one just go because you know everything is sequential right if index are always sorted by default so if it's sorted then okay where's the next value oh there's two there's a point a nice beautiful pointer now you might say hussein isn't the pointer adding the pointer is a cost of course there's always the cost to everything so people studies okay the addition of that pointer really doesn't really affect much but the value of having this beautiful range queries where i can search and then immediately find two and three and four and five these might fit into a single page if it's like five but let's say you're searching for i don't know two thousand or three thousand right you might read it really depends again on the data size and the values and all that stuff you might read maybe three or four pages right depends on the on the size and all that stuff right but yeah that is the benefits of the difference between b trees and b plus trees almost b3s are never used in databases right although i see some value to the simplicity of the b3 right but studies shows that the cost the addition cost of the b3 which is b which is the duplication because you have to duplicate the key itself in the intermediate node and in the leaf node plus the addition of the pointers to to l to to to do a linked list at the leaflet is is almost negligible to the to the cost of the b3 right itself but if you like i mean if you're for instance if you absolutely know that your workload is gonna always gonna be okay i don't know key value store where you're always searching by one value you're never never gonna search by multiple values then you can use a b3 right but i still think that it's very hard it's harder to fit a b3 in memory fully compared to a b plus three right why a b3 is is you have to put it all or nothing really so a b3 is really it's it's all or nothing you can you can you have to put it all in memory i mean you can play tricks with paging it's like okay let's page some part of the tree but not all of it but yeah it's it's really unpredictable right but b plus three i've got reference post case because that that's the database i try to focus on according to postgres when they implemented their b plus trees 99 of the cost and storage of b trees is in the leaf this is just one percent all this nodes that helps the reverser just to get to one leaf is one percent so definitely you can fit a one percent in it's right in in memory let's say your index is 100 gig that's a huge index by the way if you think about it right then you can fit if you have enough memory if you can fit one gigabyte in memory that is fine then the traversal are so fast to get to one leaf and then you're gonna start doing some io once you get to the actual leaf value right and i i think that's still i'm still actually thinking about this stuff like is there benefits to using just pure b3 versus b plus three but always to me it looks like always b b plus three always wins over b3 right that's what i what i noticed i noticed like mongodb even being a key value store they they use b plus 3 they don't use b3 and maybe that that's uh that's a that's sort of actually let's check oh i take that back look at that mongodb uses b3 data structure uh they did not specify except the b plus three it looks like they are using just a b3 you know what i think that's enough uh showing off the the screen let's discuss this part a little bit you know this this reminds me of what it's like this whole thing with mongodb and just b3 according to their dock they don't not they're not using b plus three okay they're using a pure b3 so my guess is they're gonna have trouble fitting that tree in completely in memory as a result because the search you you cannot search through it effectively right just to get it to a leaf it's all or nothing if it's a b3 because the values are with the keys so you're burdened with that that made me remember discord remember the video we did on discord because i remember covering this mongodb the discord moved from mongodb to i believe cassandra and then later they moved to cellar db because of this exact thing let me read this the messages were stored in a mongodb collection with a single compound index on channel id and created app around november 2015 we reached 100 million stored messages and at this time we started to see the expected issues appearing the data and the index could no longer fit in ram and latency started to become unpredictable it was time to migrate to database more suited for the tax again i'm just putting two and two together here my guess is maybe because of the choice mongodb used a b3 for their indexes they can't fit the indexes efficiently in ram compared to if it was a pure b plus 3 where you can fit the actual traversal part of things in memory right and you can you can put some of the leaves which contains the keys and the data itself right which which points either point to the row tuple or the actual value and if you do that maybe that was it that was caused mongodb to not fit could not fit the b the b3 structure in entirely fully memory because think about this way if you can't fit in memory entirely then you have to go to this but how do you know what part to put in the disk and what not you can't have the choice unfortunately you have to put you you would decide okay i'm going to put half of it on memory and half of it but how do you know you might be unlucky that the queries that needs to traverse the tree or in this that you will you you'll end up hitting the the uh the desk constantly to retrieve pages just to traverse bad stuff so to me i still think b plus 3 is the superior structure regardless of your uh really use case if you think about it because like that's what i thought it's like okay i want a simpler data session i'm going to use a beautiful simple b tree i don't need to pay plus 3 because i'm just it's a key value store i'm just searching for a key and then give me my value and that's why was my initial response as it was in the video right like i started changing my mind while making this video then it was like wait a second that's fine and all if i have a small index but if it's a huge index with a lot of stuff then if i can't fit all of this stuff in memory it's gonna spill to disk then i'll end up searching all of that stuff in disk you might argue hussain if your b plus 3 is so large i don't know one terabyte then you're what is one percent of one terabyte it is 10 gig yeah it's around 10 gig so i mean do you have 10 gig war for ram well if you're managing a thousand was a petabyte worth of content please invest in a better memory you should have at least 256 or 512 ram to fit your 10 gigabit of index but think about it this way right it is it looks like bps3 always wins regardless and that's the discussion i wanted to have guys it says very interesting as like as i make these videos i keep i keep remembering old videos that we covered and i can we put things and we links together is fascinating i'm gonna reference the discord video that i made obviously in the description if you're interested in that stuff and uh that's it for me guys so what did we discuss we discussed the beauty of the data structure algorithms very important to understand not memorize for an interview actually understand so it helps you once you understand what things together you can put one two and two together and these are oh wait a second this doesn't make sense and that the power that's the that's where the power of the engineer comes comes in beauty then we talked about b trees and we talked about b plus trees and the differences between them and how do we how do they actually uh um how do they actually thrive in an actual database production we've seen an example of mongodb again that's what they say they say b3 they didn't say b plus three honest uh again i'm gonna i might make another video about b3s and then how postgres effectively determined that the degree of b trees because if you think about it like postgres or other database they don't say okay oh we're going to use i don't know a 2000 degree b3 no no they don't do that they derive that based on your data type of the index you're trying to create i'm going to make another video discussing that hopefully stay awesome goodbye this guy

Info

Channel: Hussein Nasser

Views: 15,670

Rating: undefined out of 5

Keywords: hussein nasser, backend engineering, btree, bplustree, b+tree, dbms, postgres btree, mongodb btree, b-tree

Id: UzHl2VzyZS4

Channel Id: undefined

Length: 31min 50sec (1910 seconds)

Published: Sun Jun 27 2021