10.2 B Trees and B+ Trees. How they are useful in Databases

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi the topic is Beatriz and B plus trees this is one of the important topic and is a difficult topic for the students to understand this is useful in databases what are the things that are required for understanding B trees and B plus trees thoroughly this is the list of topics if you understand all these things hence you'll know all these things then you can understand what B trees and B plus trees are mostly type people try to understand how insertion and deletion is done just they try to learn the procedure without knowing everything so this video will be little nd reason I am going to cover all these topics that are related to B and B plus trees and help you understand perfectly what B plus a and B and B plus trees are let me read out the topics discus structure how are this looks like and how the data is stored on the disk then nix when you are storing data from databases then indexing is used for faster searching then what is indexing what is multi-level indexing then we will also discuss what are M basis trees and that--as massages are converted to B trees and then B plus trees so this is a little bit lengthy but I suggest you listen everything go through it then you will never forget this topic let us start with a disk structure let us look at disc structure this structure is likely discuss like a platter on which the concentric circles are made these are logical circles this is not physical and these are circles are called as a track so let us say this is a tag 0 track 1 track 2 and track 3 then this pieces are called as sectors so this is sectors and this is also sector like this let us say this is sector 0 sector 1 sector to sector 3 & 4 & 5 so discus divided into many tracks and sectors the intersection portion of track and the sector this one or this one all these intersecting positions these are called as blocks so any location on the disk can be addressed with track number and sector number so the address on a disk that is block address block address is mentioned in the form of track number and sector number so like this if I take this particular block then this is track 1 and sector 0 this block tracked to sector 1 so each block can have its address in the form of track number and sector number and typically the block size we take it as 512 bytes for steady purpose we take it as 512 bytes it can be of any size it depends on the manufacturer or for disk so we assume that every block is of fight will bytes when the disk is read when we read or write to and from the disk we always read and write in terms of blocks now if I take just one block that is not 512 bytes the beginning address that is first byte is 0 and the last byte is 5:11 then each byte can have its own address that is called as offset let us say this is the tempt byte so we can refer any one byte on the disk in terms of first of all block address and inside that block we will take offset that is the address of that particular point every block we assume that the addresses starts from 0 and last addresses fight will file Evan because total fight will so any address any byte we can take it as address and that address is called as offset to reach a particular byte on a disk we should know track number sector number and the offset let us say in this particular block this particular point I want to read this is suppose this is one byte for reading that by I should know these three things then let us understand one more thing this is mounted like this on a spindle and the head post will also be there so by spinning it will change the sectors by this arm movement of a header that is the header that is reading the data from the disk or writing the data on the disk by this movement tracks are changed so we get position ahead in a particular track and in particular sector by moving this post that is head post or spinning this one so by spinning sectors are changed by moving head tracks are change this about the disk now the data on the disk is always stored in terms of clocks now one more thing let us take this as main memory usually we call it as RAM RAM is a technology and the type of memories main memory with all our programs inside main memory and the programs will access data suppose they are accessing the data from the disk then the data from the disk must be brought into the main memory and then only it can be accessed and if any results are obtained and the results are again written back onto the disk so the data cannot be directly processed upon the disk it has to be brought into the main memory and then access and this is one important point to understand and one more time I'll tell you organizing the data inside the main memory that is directly you sued by the program data inside the main memory that is directly used by the program that is data structures and organizing the data on the disk efficiently so that it can be easily utilized that is DBMS so study of the methods or the approach used for storing the data in an efficient way there on the discus all about DBMS designing organizing everything now here study on this one is data structures now let us move on to the next how data is stored on the disk now let us know how the data is organized on the disk in the form of database I have an example database table here with some columns and some rows let us say they are hundred rows and how the data is stored on the disk in the form of blocks holiday we have studied that and assume that the block size is 512 bytes now this table the structure of a table is given here there are more number of fields I am not showing them there but here I have written all so total there are five fields and these are their sizes employee IDs of size 10 bytes and this is of 50 bytes and the Department is 10 bytes so on so total it takes 128 bytes now it means each row size is 128 bytes this each rules 128 bytes now when we want to store that table on the database then in each block how many rows can be stored that is how many records can be stored so number of records per block so the block sizes 5-12 and the required size is 128 so this is 4 it means we can store four rows of this table in one block it means that these four rules will go into one block and these four rules will go into next block every 4 rows are stored in one block so one block can contain just four rows as we have a hundred rows here that is 100 records here then how many blocks are required then 400 records how many blocks there are hundred records and each block can contain just four records so this is 25 so total 25 blocks required for storing this database table on a disk now when we this table is taking 25 lakhs imagine if you want to search for any particular record you have written a query select from employee table that is employ table and I want so-and-so record for employee ID 5 or employee name Smith if I am searching then how much time it will take so the time depends on number of blocks you are accessing so for accessing this entire table we have to access 25 blocks so number of block access is more 25 because the record may be present anywhere we don't know where the record is present so we have to search in the table so for that at most 25 blocks we have to access we can access block by block and check for that particular record which we are looking for but at most 25 blocks we have to access can we reduce this time yes now we will prepare index for this database table so index index what we will store in index we will store a key that is my ID and pointer pointer to a record so let us say employee ID 1 this is present here and employee ID - this is present here employee ID 3 this is present here so we will have a pointer actually to the table inside the disc inside the block so we'll have a pointer to a record so this we call it as a record pointer we will have a pointer to a record on the disk in the block for each record we will have an entry inside the index so this is called as a tense index next where do you store index we will store index also in the disk only let us say in this block we are storing an index this is for index this is for the database table and no record more blocks are there because it is taking 25 lakhs now index is also kept on the disk now the question is how much pays this index we'll take on the disk let us analyze see employee ID is 10 bytes then what about this record pointer the pointer that we have taken here the record pointer let us assume it is taking 6 bytes so each entry size is how many bytes 16 bytes each entry is taking 16 bytes so how many entries are there here hundred entries are there now the question is how many blocks required for this one so entry sizes 16 bytes right now number of entries per block so block sizes 512 and each entry size is 16 so this will be 32 so it means we can have a 32 entries in one particular clock so total how many entries are there here hundred entries so 400 entries hundred by 32 so how much this is three-point-something let us say 3.2 just assume it is more than 3 so it means that we require 3 plus blocks for storing this index so let us say 4 blocks are required for storing this index now whenever we want to search we access index so at most how many blocks we require four blocks four blocks of index and once we got a particular key let us say suppose I am looking for this using that address I can access the record so for accessing the record I have to access one block one block so maximum four blocks of index and one block of database table so first search in the index atmos how much time four blocks then once you got the index entry then go to that particular block of database so that is what no block so total four plus one blocks are required for accessing any database record for accessing any record we have to access four plus one that is just five blocks 35 blocks because we don't have to look into the entire table this is the benefit of index na next I will talk about multi level index next what if the index size is also growing how insurers have indeed known have thousand records thousand records means 250 blocks now here we require four blocks nor acquire 40 blocks because there will be thousand entries just I am adding zero to it when the rows where hundred we require four blocks now thousand let us say 40 blocks are required just assume I am NOT doing all calculations then the searching in the index itself is larger so can we have a index above an index yes now how you prepare this index for every entry here if shall we have a entry in this next level index high level index know for each block block so how many entries can be possible in this block 32 entries so we will have one pointer to 32 entries of this particular index right so let us say the starting key is one right next key will be 33 and it will be pointing on to the next block so this is a sparse index it will not have each entry for one entry here like it will not have the same number of entries it will have one entry for one block of index now as I said there are 40 now again what is the size of this one just sixteen only so how many are possible 32 are possible in one block then for this particular index how many blocks are required just one plus point something let us say two so two blocks are required so this high level index can be stored just in two blocks so database table into 50 blocks first level index 40 blocks then next high level index just two blocks so search into blocks once you get an entry access a block here then once you get an entry accessible okay so total how many blocks two blocks from this side one blocks from here one block from here so in this way by adding one more level of index we are reducing the number of block access so this is the basic idea originating for P and B plus trees I will just draw multi level index and then I will show you how it looks like let us look at multi level index once again suppose this the main index now above this we need one more index suppose this is too large then how this will be having it will be having a pointer to set of entries in an index then the pointer to set of entries in an index I am just taking randomly I am NOT drawing every counting and drawing so this is having a pointer to this suppose this is also large this is also large then you can have one more index which will have a pointer to set of entries and a pointer to set of entries this is how we can go on adding multi level index if the database table suppose here is a database table and if size is growing this index will grow and we will add one more index one more index adding multi-level indexes will reduce the number of block access so if I just turn it around and see it like this next level index then next level index if I turn it around and see this this all the multi level index looks like now what is the requirement of multi level index shall we add the index by manually shall we check that the database size is growing and then we should I add indices one by one that is high level in missus no we want the high level indices to be added automatically and deleted also automatically if this is increasing table size is increasing and this index is increasing then you add one more index if you are deleting the records then the ISIS eyes will reduce and then you remove the high level index we want self-manage high level indices that is multi level indexing so the final thing else we want self-manage multi level indexing and that's all it gives the idea of b-trees and B plus trees so now you can see that this multi level indices are looking like a tree if I just turn their side ups then it looks like this P trees and maple streets are actually originating from and they search trees so next I will discuss what are my search trees then finally we'll go to B trees and B plus trees let us talk about M based search tree C for knowing about my search tree let us recall about binary search tree which is useful for searching purpose and for searching any key value we start from the root and if the key value is smaller we go on the left hand side if the key value is larger we go on the right hand side like this is a binary search tree if I am searching for 15 then 15 is greater than this I will go this side 15 is smaller than this soil go this side and it is found so we know that the smaller keys are arranged on the left side and the larger keys are arranged on right side this is what done in binary search tree in binary search tree how many keys we can have per node just one key and how many children each node can have how many children there can be two children each node can have maximum two children so that's why it is called as a binary tree binary binds two children it means can we have a tree with more than two children or more than one keys let us check 20 and 50 these are the two keys now this side I will have the keys which are smaller than 20 so that is 10 and 15 and here I will have the keys that are in between these two that is 30 35 and here I will have the keys that are greater than this so that is 60 and 90 so yes we can have a Saoirse tree like this also there's a search tree this is the following the idea of same as binary search tree see the keys are arranged such that they are in the increasing order so if you see the keys here see first key is smaller than second key and smaller than third key if you have more keys they will be arranged like this and if I am searching for 30 then I'll start from here 30 is greater than 20 30 is less than 50 so it must mean this child come here today is found so no doubt the amount of time taken for searching is little more than binary search tree but the approach of searching is same so these trees are called as same resources so what is this envy see how many keys it is having in this example it is having pinkies then how many children it is having so each node can have maximum 3 children so this is three ways search tree this is a three way search tree this is called as three velocity so in this way my search tree means what a search tree in which each node can have at most M a children and a search tree so each node can have at most M children so how many keys n minus 1 keys so M s based on the number of children that is the degree of a node so my search trees are the extension of binary search trees you can say or else you can say binary search tree is a type of my search tree with a degree 2 so you can have degree 3 also degree 5 also degree hundred also degree thousand also so it means you can have 100 keys here and each node can have hundred hundred plus one children so you can have any degree of search tree this is about MV sauce tree let us look at the load of my sauce tree like for binary search tree Harvey prepare a notice we will have data here and two children then here you need more number of children so let me draw the node for this one if the MV search tree is search tree st am writing it suppose form a search tree means each node can have four children and three keys let me draw a node structure this should be a child pointer pointer to the child and this is key one and again this is a pointer to a child and this is a key to and disappointed to a child and this is key three and disappointed to a child so for child pointer so this is four weig and when it is a four-way then there can be three keys that is my search tree and Lord can have M minus 1 keys so it is just like this this is a diagrammatic representation and if you are making a data structure then the node structure looks like this so if there are a m'children than M minus 1 key spaces are required no next can we use this search trees these no structure for preparing index let us see so I will try to use this for making index so let us check how we can do this for for a search tree where there can be 4 children so this is a child pointer okay now this is key 1 and this is next child pointer then this is key 2 then this will next child pointer then this is key 3 and for child pointer so child pointer 1 2 3 4 then what is this I am adding see this after every key I have a pointer this is nothing but C if you have keys here and you are taking a my search tree benefit of my search tree other index then apart from the child pointer you should have also a pointer to a record that is a record pointer so yes we will also have a record pointer so this is a record pointer record pointer so we can utilize my search trees for indices that is multi-level indices like tree will have multiple levels so for making multi level indexes we can utilize my search tree but the node structure should be like this so just remember this known structure see keys are there and child pointer so for degree 4 so for child pointer 3 key so 3 record pointers for each record for each key there is a pointer to the record to the database record now if we simply use MV search tree what will the problem and how we can overcome the problem and we introduce B trees now we are going towards B trees now till here we have discussed about my search tree no I am taking you to be tree and in B tree we will use the same node structure just remember this again I may not be showing you this will there with no structure that is followed in B trees but first of all let us know what is the problem with MB search tree then how B trees are different from my search trees let us see how insertion is done in my search tree and what is its problem then we'll move on to B trees for that to make you realize the problem I have taken 10 way search tree 10 Ravens a degree 10 means each node can have 10 children and the nine keys and suppose I have to insert these keys let us see how I can insert first 10 there is nothing so I will create a node and insert 10 how many more keys I can have here 8 more key I can have here but for inserting 20 I will create a new mode as a child and insert 20 then inserting thirty ten that smaller 30 comes from right side so I will not insert here but I will come to this and 20 30 is greater in this oil insert 30 here so I can insert like this also but this is wrong I should first fill up this node then only I should think of creating next node but there is no control on my search tree you can insert as you like so it means if I have and the keys then what will be the height of and missus T n so this will be too much time-consuming language become similar to linear search that's the problem of return my search tree so what is the problem with my search tree the creation process is not under any control there are no guidelines for creation process you can create as you like if suppose you are creating it like this then it's a waste of time it will take a lot of time for searching then there must be some rules or guidelines for creating my search t yes those guidelines are called as B trees so B trees are nothing but in the search trees with some rules let us see what are those rules I will explain let us see what are B trees beeps trees are nothing but MB search trees with some rules so what are the rules let us see see the first thing every node you must fill at least half half what if the degree is M I mean by 2 children must be there so M by 2 seal value is taken not flow and by 2 missed if degree is it 10 means 5 a children must be there for an or every node must have 5 children then only you think of creating new node that's what to control the height of MB sir this is the rule you can create a next mode only if the current node is at least half of filled with the children this is fun and by do children must be there then second thing how we can impose this condition on the route we cannot impose this condition on a route if we impose then we cannot insert even a first value also so route can have minimum two children so route means the first one so first node can have minimum two children right so it means it can have just one key also no problem but what about all other nodes they must have at least half of the children this is the conditions - conditions not trade conditions then the third rule is all leaf nodes must be at same level all leaves at same level that's it these are the three things if you are following for creation of M based search tree so actually we are creating B tree and one more important thing here is that the creation process is food point is the creation processes bottom-up creation processes bottom-up so now I will take some keys and I'll show you how a B tree is generated from bottom up so for that I will not take a 10-way search tree I will take a small degree tree and show you how a B tree is created in other words how insertion is attack we will create a bee tree of degree 4 means each node can have 4 children and 3 keys so I have taken some of the keys here then I will be inserting more and more nodes so I will add more keys afterwards let us start first there is no node at all first key I am inserting so that is 10 first key so it can have how many children do children and as per the index 1 he said that it can have one record a pointer the next 20 20 will be inserted here that is greater than 10 40 is inserted here mixed is 50 there is no space for 50 already 3 keys are there degree is full so we cannot insert 50 50 there is no space then what we should do split this node and create one more node so out of 10 20 40 and 50 two keys I will take this site and one key I will send this side 40 I will send it in the root then this is the left child this the right child that's it so when the node was a fill there was no free space then for inserting a key we have splitted a node and one of the key we have send it on the road out of 4 including 50 out of 4 to this side 1 that side 40 up this is how splitting is done while creating a b-tree and you can see that the tree is growing upwards so that's it it's created bottom-up and I was telling you that this is B trees are useful for implementing multi level indexing that if you say this is one first level index then above that this is the second level index created automatically if you add keys here automatically this was created now let us see I will add more keys here then again you can see that this is growing so let us continue I will insert few more keys let us say 60 so 60 comes here then 70 70 comes here then 80 80 there is no space here 80 should come here there is no space then split again then what I should do including 80 50 60 70 and 80 so 80 should grow this side and 70 should be in the root that's the child now that tail looks like this then next I will insert 30 30 30 will come here then I will insert 35 there is no space for 35 here then split so as there is no space I will split this side so out of 10 20 30 and 35 35 this should go in the root and the 10 painting this side and 35 on the side right side 35 on this side then who will go in the route 30 so I will redraw it 30 40 and 70 I'll remove this yeah what's yeah this is the child Hispanic child and this is the child and this is the child this is what the tree looks like now see whenever the node is becoming filled we are splitting it and we are sending one of the key in the parent now we'll try to fill few more keys here and let us see I want to insert five so where's five income side income in the beginning so 5 10 and 20 now I want to insert 15 15 there is no space so 5 10 15 and 20 there is no space for 15 then arrange them in this order then there is no space man split it so for splitting I will create one more node on this side 5 and 10 this side and 20 this side now 15 should go up here this should work right I shall no point here everything that vitae why I am sending third key I want to send second key okay that is your choice so you can make it either less bias or right by asked so either way you can do that no it is left by us means it is more keys or on the left side so whichever one you want you can take it so 15 goes off 20 on this side and 5 and 10 decide now there is no space here also so here also you split it so out of how many keys 15 thirty forty and seventy so fifteen and thirty this side seventy that side and forty in the parent so this will be on this side this will be on this side now all these links will also change because we are splitting it so this is on this side this is on this side this one and I will draw it a little far from this one seventy then 70s left and seventies right so now you can see there are three levels one two three so three levels of indices are created and this is how v3 is generated by inserting keys and the height of a b-tree is increasing and this is same as multilevel in this so this is the creation process so it's under control and all these nodes are at same level so they are looking like more like or index and this has same level this is forming index and above one is also an index so I have shown you how to insert and how to create a b-tree now let us see how it is useful for the databases just minor things are very important and key points I am showing you as we discussed just now that every node will have a pointer to the child node as well as a record pointer this 20 will have a record pointer 5 will have a record pointer 10 will have a pointer 15 will have a record pointer 30 you have a record pointer every key will have a record pointer also that is pointing on the database so this is useful for implementing multi level index apart from the child pointer this is a record pointer so a b-tree node can have what key child pointer that is a block pointer and this is a record pointer that's it that's it about be trace I did not show deletion it depends how much we have understood from this one just you reply me back leave you a comment and if required I will prepare deletion also okay actually I wanted to show you what is the application of B trees and B plus trees so I have started my topic right from indexing and brought you up to this point so that you can understand the relation with the databases and B trees otherwise mostly people study just support B trees and B plus trees without knowing indexing and databases so now you can relate it with the databases now the last thing what is B plus tree it's a very simple thing let me show you quickly just I will modify this what is B plus tree in the B plus tree we will not have a record pointer from every node we will not have a record pointer from every node we will have record pointers only from leaf nodes only from leaf nodes will have a record pointers then what about these keys every key will have its copy in the leaf node so means who is missing here 15 is missing here so 15 will be here also 20 so 15 will have its copy here also so there will be record pointer from 15 also 20 also then 30 here will have a copy of 30 also and 35 know what about what about 40 40 we will have a copy here also 40 will be having a copy here so 40 and 50 and 60 so 3 record pointer so 70 will also be present in this one so every key in the tree will definitely present in a leaf and this leaf nodes are connected like this like a link list so this becomes a dense index now this is exactly like multi-level indexing so this is B plus tree so what are the differences from B tree to B plus C let me quickly tell you that all the keys are present in the leaf nodes and in non leaf nodes there duplicates may be present every key must be present in leaf and there are no record pointer for non leaf nodes the record pointer will be from only leaf nodes so leaves knows all these nodes as they are at same level they'll form a linked list these are the points then this becomes a dense index this will becomes a dense index then this is a sparse index next level the next level is pass index so rather than be 3 B plus 3 is more like multi level index so what is Plus in this one that I have shown you so if you understand me tree understanding B plus tree is not difficult that's all with this topic there are analysis and a lot of things are there so it's a big topic the purpose of coverage of this topic was to relate it with the databases all right so I have covered more from the database site than just be T's and B plus trees so just be plus B be trees and B plus trees can also be taken as a topic and discuss leave your comments let me know if required I will prepare a separate video

Info

Channel: Abdul Bari

Views: 647,605

Rating: 4.9533529 out of 5

Keywords: b trees, b+ trees, m-way search trees, algorithms, databases, dbms, abdul bari, bari

Id: aZjYr87r1b8

Channel Id: undefined

Length: 39min 41sec (2381 seconds)

Published: Fri Mar 30 2018