21. Database Indexing: How DBMS Indexing done to improve search query performance? Explained

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome back this site and today we are going to do database indexing mostly database when I say it's a relational database management system which we are going to discuss so this question has been very very frequently as database indexing so I will share you my understanding of how database indexing works okay so before we start T1 discussing the indexing right we have to first understand these three things right how actual the table data is stored first thing that should be clear how table data rows are actually stored what type of indexing present understanding the data structure used for indexing and how it works right so once we have this three better Clarity then only we can understand how dbms make use of all these three things and able to perform the indexing okay so one thing worst I would tell you that so whenever the tables which you have let's say you have a table this is row this is column right and you have a data one two three four a b c d right so generally what question or what whenever the indexing come generally what engineer might be saying that okay we will create an index on this particular let's say we are doing an indexing on this particular column so we will create an index one two three four five right and it will point to this rows right and any query will happen it will happen on top of it it will label to fetch faster okay you want four so on this index it will able to find the faster and it will goes to point to the particular row okay so what I am trying to say is that this is very high level but today we are going to step down into one level deeper because that's where the interview follow-up questions asked okay so now let's go into little bit deeper and I'm not going to shut down now or cut down this video shorter because if if even if it is go little bit longer I will continue to do it I will not do any trimming for sure okay so let's first discuss how data is actually stored very very important so now let's say this is your table you have a row one two three four rows and three columns and let's say these are the three columns employee ID name and address so this is generally a very very you can say that typical table let's say employee table so this is just a logical representation this is just a logical representation this is now this is not how actual the data is stored in the DB right this is just a logical representation not a physical representation remember that okay so when we say that hey this is logical representation then how the actual data is restored okay so now let's say in our table we have this data four rows three columns so what dbms will do what dbms will do database management system what it will do it will create data pages right and one Data Page generally it's generally of 8 KB in size depends upon what DB you are using okay but generally the size of one data page is 8 KB okay each Data Page can store multiple table rows in it okay so now we know that this is just a logical representation these rows are actually stored in data Pages which dbms creates right how so this is how a typical Data Page looks like so when I say that it's a 8 KB so in 8 KB how many bytes are there 8192 bytes okay now 96 bytes are resolved for the header in a header what generally information it have it has page number how much free space this page has checksum and Etc okay so Header information I would say that it has very basic information like page number free space check summer Etc which is 96 byte after that eight zero six bytes eight zero six zero bytes are totally reserved for data records when I say that data records it is actual the rows so here the actual the rows will be stored so actual data is stored in here and last 36 byte is known as offset array actually so what does offset contains is its contain an array and each index of an array holds a pointer to the corresponding data in the data records right so in this data page this is an Adder let's say it has the pages Row 1 data Row 2 Data Row 3 data let's say only three rows are there in this offset there is an array 0 1 2 3 so 0 will point to let's say row 1. one will point to let's say row two so what I mean to say is that this offset array contains a array of you can say that a pointer and this what it point to it point to the actual rows now you can ask me hey why we need this offset array we will see that how it helps dbms but don't worry first of all what I want you to understand is how Data Page looks like so one thing very basic thing you should know is dbms creates data page and inside a data page actual the rows are stored okay so that only you should understand at this point of time and don't worry about the offset array we will see with an example and how dbms make use of it okay so offset is just an array of you can say that pointer pointer to what pointer to the particular row inside a data records space okay so total 8 KB data pages and out of this 8 KV 8060 byte is actually there for the rows or actual data and rest is for header and offset array okay so now it's clear that these rows which we are creating it looks like it is creating it's stored like a table but actually it is going here in the data page so now let's see this so if one table rho is let's say 64 bytes so now let's say this row 1. is of 64 bytes so this is let's say I'm just saying hypothetically let's say it is 16 bytes right this is also 16 bytes right this is 32 bytes I am just saying and it all combined to 64 bytes so one row one row is your 64 bytes right and we know that for storing the rows how many bytes it is available in our data page eight zero six zero bytes so eight zero six zero divided by 64. so almost around 125 it comes so means in one data page if it is a size of 8 KB then we can store 125 DB rows in it right if one row is of 64 bytes and one data pages of 8 KB then we can have 125 DB rows in it right and how data page is generally look like so we have header so I have just kept for Simplicity only page ID or page number like one right and then the second part is your data record data record with actually holds the data the sick the way the row is inserted into a table or by a user in the same way generally the rows are inserted so user has inserted Row one so it has inside this it put Row one Row 2 Row three Row 4 Row 2 Row three Row 4 and if let's say it has added five row five okay like this it will add how many it can store 125 DB rows it can hold then what is the third step third step I have told you it's an offset array so here I have created an offset array okay so this offset areas index like an array 0 1 2 3 4 5 right zero one two three four so zero it is pointing to let's say Row 2 this one is pointing to Row 4 2 is pointing to Row 5 3 is 3 is pointing to Row 3 and 4th index is pointing to R1 Okay so what does offset array hold is now it's clear it's an array and this array actually contains a pointer pointer to what the actual data which is present inside the this data page now you can ask me right hey what is this ordering at zeroth index R2 is stored right and at Fourth index R1 is stored so it doesn't make sense right it is not ordered not increasing order not decreasing order what is this order right so don't worry about that part because when I will come to that then it will make sense better how dbms is make use or offset array but now you should be clear with this that how's your data is actually stored by a dbms so this table is actually visible to you but for dbms the way it is stored is like a data page so for each page it gives a number and it stores the data in a page right in this data records area right and for each this rows there is an offset it also maintains the offset also where it put a pointer and the ordering dbms will decide when we discuss about indexing okay till now it is clear right all these numbers which I am saying 8 KB 8 KB 64 bytes that depends right that depends upon DB to DB so but for this one this is the most common generally you will see data Pages 8 by 8 KB cool so till now we know that okay this table how it is represented in one page so dbms remember that dbms creates and manage this data page so this this page whichever I we have talked who manages dbms so total control over the dbms so for storing one table data it can create many data pages so now let's say this table grows and it contains let's say Road 10 000 like low ten thousand so definitely 10 000 rows cannot come into page one right so it's upon dbms only to create multiple Pages it will create page to page three page four or page 100 and keep on storing the rows in it okay so dbms creates and manage this data pages and for storing one data table data it can create many data pages okay I know that the very big questions come into your mind here if there are hundreds of pages how dbm is gonna manage it right we will come to that one but for now for Simplicity a table data is generally stored in a data page and one data page is full dbms can create more data pages and keep on storing the table data till now very Basics we know now where these data Pages ultimately get stored these data Pages ultimately get stored in the data block now this data block is actually a physical memory this is actually a physical memory right like this on a disk there is a so you know right disk and then there would be some sector track so there would be one data block you can say the data block so what does data block means first understand what is data block means so data block is the minimum amount of data which can be read and write by one input output operation so you can say that this is the minimum amount of data which one IU operation can read or write so this is the minimum amount of data which you can fetch in a single shot or which data blocks can have one I O operation okay and who manage this data block it is managed by the underlying storage system let's say like disk or sdd let's say if it is as data block so it is managed by storage system underlying storage system so remember that data blog is not managed by dbms it is not managed by dbms it is managed by storage underlying storage system like this data manage dbms manage only data page right so whatever the data you have you have to put into somewhere into the memory or disk right and inside a disk what ultimately we have is data block so these data Pages which dbms creates who creates which dbms creates ultimately gets stored there inside a disk or inside a memory right inside a disk so it is stored actually in the data blocks and dbms has no control over this data blocks it can be scattered so now let's say it if this disk has thousands or millions of data blocks let's say dbms has no control that okay my page one goes here in this data block my page 2 for example it can go into this data block data block let's say 100 data block let's say one so what I am trying to say is that dbms only has control till data Pages it can decide what data I can store into this what rows I can store into a particular data page right but when it is writing to the actual disk memory right then It ultimately gets stored into a data block and it has no control over it okay so now you know what is data block it is the minimum amount of data which can be read and write by an input output operation so data block size can range from 4 KB to 32 KB and most common size is 8 KB so now let's say your data page which dbms manage it is of 8 KB data block which actually stores the data in the like a physical location it range from 4 KB to 32 KB but most commonly you will find is also same like an 8 KB so if it is 8 KB means one data block one data block stores one data page only but if it is let's say 32 KB then it can con store multiple data pages data page one data page two data page three right so one data block itself can restore multiple data pages okay so based on the data data block size it can hold one or many data page okay now there is one more thing is dbms maintains the mapping of the data page and data block because I told you that dbms has no control over the data block where this particular page where my page where the data page is stored it has no control over it so that's why it maintain a mapping that's why dbms maintains a mapping that hey data page 1 goes into this address data block one data page 2 ultimately goes to this address data block one itself so it is possible that one data block this is let's say data block one one data block can hold multiple pages data page one data page two it's stored in the same data block because data block size can be higher than the data page right so dbms maintains this mapping that okay this my my this page because dbms maintain this page right okay this page one is stored in this block this page 2 is stored in this block this page 3 is stored in this block right like this so remember dbms controls the data page like what rows goes inside in which page or sequence of the page Etc but has no control on the data block data block can be scattered over the disk okay so I think you should be clear with till now though first part how the table data actually is stored in the DB so this is not this is just a logical representation the ultimately is stored in a data page for one table there can be so many data pages right and each Data Page stored in ultimately the physical memory of data block and one data block can hold many data pages right and dbms maintains the mapping also which page goes into which data block okay till now we are clear now let's see the second part what type of indexing present in rdbms now here if you see that the Things become little you can say that interesting so this name you might have heard a lot clustered indexing and non-cluster indexing I know the things which will be going in your mind primary index as soon as you heard a name indexing it you will be like primary index secondary index and what is that okay Composite Index like there would be so many indexing so all these things should have come in your mind but from the indexing dbms perceptive there are categorized into two type clustered indexing non-clustered indexing so forget about like before we understand cluster indexing and non-cluster indexing first we have to understand what is indexing when we we are saying indexing indexing indexing but what does indexing means right what data structure is used for doing that indexing how it works then we can come to that ok cluster and non-clustered but first we have to better understand what indexing is how it is done so in a simple term if I say that what indexing is it is used to increase the performance of the database query so that data can be fetched faster okay so indexing is just nothing but doing to increase the performance of the query so that it help us to find the data much faster without indexing dbms has to iterate each and every table row to find the requested data so you know that it maintains the data Pages data page one let's say there are 1000 data pages inside this thousand data Pages there are let's say one lakh rows data right now if you have to find something that hey give me a data with employee ID 35 now dbms doesn't know that which this employee 35 record goes into which data page it doesn't know all dbms has till now is indexing okay this data page is goes into this block so what it can do is that in a worst case it will iterate over this mapping okay my page one it will goes to this physical memory load the data load this block into the main memory take out the rows read this page and see it read this page and see that hey does employee ID 35 present no if it is no it goes to the second one okay my page 2 is store is in this block again so it is already in memory so it will load that page page two right and then it will again it's page 2 it will check all its rows hey have I found this employee ID 35 or not let's say for example worst case this employee is present in the last data page the last mapping which is present here then it has to iterate over all the data pages and inside all the data Pages it has to iterate over all the rows so what is the worst time complexity big go of n so without indexing it has to check each and every row right and if there are let's say millions of rows present query can take some time to fetch the data right because it has to goes to that particular data block from the mapping fetch the data load into the memory read the data page take out its rows and then read it and see so we know that indexing is used to make the query run faster without it it would be Big O of n but with it what would be the best time complexity we can give so there is a data structure which is used which is known as B plus tree it provide big go of login time complexity for insertion searching and deletion so this is very very frequently used B plus 3 while for doing the indexing right so without indexing big of n is the time complexity to search any data inside a table with indexing our purpose is to make our searching faster and which data structure it use it uses B plus 3. right so B here is stand for balanced actually so now till now we are slowly understanding okay what indexing is and for making the indexing Works what data structure is used B plus 3 it provide a bigger login time complexity so before I go further into the cluster cluster index let's first see how B3 works it will clear your mind actually how this data indexing works so I'll talk about B3 so B3 and B plus 3 is very very similar right so if you understand B3 B plus 3 is somewhere near to exactly same okay so now I am saying B3 and At Last I will set that okay it is a B plus T how it adds certain additional feature but both are exactly same so I am just telling you how B tree works so it maintains the sorted data okay all leaves are at the same level M order B tree means each node can have at most M children's for example two order B tree means each node can have at most two children three order B tree means each node can have at most three children right you can have any number of this mm model so you can have let's say five order B tree means you can have at most five children okay and M minus 1 Keys per node so now here if you see that when I say that let's say three order B tree so let's say this is the three order B tree so it can have three children so in one node so in the three order B tree a one node can have three minus one M minus 1 keys right so three minus one two so it will have one key here one key here right and it will have pointers here so this is a pointer this is a pointer this is a pointer so this pointer will point to this child node let's say this is key one this pointer will point to this pointer child node 2. and let's say this is key 2 and this pointer will point to this one so in one node if in three order B3 you can maximum have M minus 1 so 3 minus 1 2 keys here and three pointers and these three pointers can point to a child so now you can say that hey what this three pointers generally signifies so let's see that we will come to that part so now let's say that I'll totally Show an example however it will become total clear let's store the below data in a three order B tree so three order B3 means it will have how many keys M minus 1 3 minus one two keys so I have here key one key one and key two and it will have three pointers pointer one pointer two pointer three so now here if you see that this pointer one left one so left one is like always point to the child node Which is less than this key less than this key and the right one is like greater than equals to this one so this pointer 1 will be whatever the child it will point to all its data would be less than this key and whatever this pointer to which is on the right side of this key whatever the child it will point to ultimately all its data could be greater than equals to this key right so similarly for this key to this pointer 2 is its at left side so whatever the child it will point to it will have a data less than keto only and this pointer will have a whatever the child it will point to it will have data greater than equals to right so that's why here we have two keys three pointers so that's how one node can have three Childs that's right three order maximum three child you can have right now let's see an example let's understand with an example it would be super clear so don't worry about till now what we have uh understand about this DB right pages and all so we will come back to again and everything will start making the dots clear but keep that till in your back of your mind that whatever we have is known till now okay so now let's say we want to store this data into a b tree okay so I am using a three order B tree okay so how many so understand this this line is a pointer let's say this is one pointer two pointer and third three pointer and two notes M minus one notes so two nodes so this is the notes slots I have so first I want to store nine so I have a key so this is the key one so I have it in space so I have stored a 9 here okay now I want to store the data 33 so it always maintain the sorted data I have a space in this node itself right so I have stored in the sorted order 33 so both my keys are now full so it also has the data this also has the data currently these pointers are pointing to nothing this is pointing to nothing now I want to store 75 so where I would store now because in my node because I'm creating a three order tree so in the three order I can only hold two notes two keys now where I would store 75 so now what it will do is that in the sequencing order it is like 9 33.75 so what it will do is so what it will do is it take the middle one and take it to the parent right so now what I have done is I have created a new node from the middle one from the middle one of this three and make it to the parent of this so now 33 now 33 left pointer points to this 33 right pointer will point to this so left pointer means less than 33 right pointer could have like greater than equals to 33 data it will hold okay so till now it's all clear and this is how the data will look now now we want to store 41 so 41 where we will store so it will first check at the 33 41 is greater than 33 yes it goes to the right side now here if you'll see that a 75 so 41 so it will sort this so it will do 41.75 and it will see that yes there is a sufficient space right so it will store it so this is how it will get stored after 41. right because 41 is greater than 33 it goes to our right side from the pointer and then it is stored in a sorted Manner and it has two nodes we can keep so it's okay now we want to insert 98 so 98 is greater than 33 it goes to the right hand side now it want to store over here so now again 41 75 98 in a sorted form but again we can't store three keys because it's a three level three order B tree we can only store two so what it will do is we will take the mid one and take it to the parent so its parent is 33 there is an space so it will store there and 75 left will point to 41 and Its Right will point to 98 so this one is like less than data 75 this one is greater than equals to 75 it will 0.2 now we want to insert 240. now I think you got the point right what we are doing so 214 it will check uh where it will go it will say that okay it is greater than 75 so it will goes here greater than equals to 75 it will go here there is already a slot empty so it will put it here in the sorted manner 214. now let's say we want to insert 126 126 is greater than 75 it goes to its right right and uh 98 217 so this is the sorted how it would look like but it breaks the B tree because we can't store three nodes only two node two keys we can store so this has to be mid one after sorting it has to take out so now it has to take to the parent so 33.75 126 126 left will point to this 98 it's right will point to 214. but here if you see that this parent itself will get break now because we can't restore 3 33 75 126 we can't hold it so this further has to be taken out the mid one 75 now it should has goes to its parent so there is no parent we created a new node 75 75 left will point to 33 75 right pointer will point to 126. 33 left is already pointing to 9 it's 33 right already pointing to 41. so it is like this so you got it right how B Tree stores the data in a sorted Manner and plus it can we can construct M array or M order of B trees right so let's say now I am storing 55 55 is uh 55 is less than here it is greater than this it will go here there is a free slot 41 free slot was there so it will store the 55 here similarly for 72 I think you can do now yourself that it will come here but it will break it will divide 55 it will take it to the parent and how it will look like so this is how B3 works right so now you know that balance tree this works I might have explained faster because uh you can learn this data structures uh more in depth or you can implement it but without understanding this indexing understanding the indexing would be incomplete so b b plus 3 is exactly the same as B tree with additional feature that child nodes are also linked to each other so in the B plus 3 it has additional feature that this all child nodes are linked to each other so that's okay if you understand the B3 B plus T is also exactly the same so now let's come back again to our dbms dbms is using the B plus 3 to manage its data page because I told you that for one table if it is a let's say one lakh rows it can have so many data page data page one data page two Data Page 100 let's say and inside each database there will be hundreds of rows so there might be some kind of ordering right how to order those right so that's where we that's where B plus 3 comes into the picture dbms uses B plus C to manage its data page and rows within that pages how okay so just have this like it is using the B tree exactly the same only with slide differences the root node or intermediated node holds the value which is used for faster searching the data possible that value might be deleted from the DB but it can be used for sorting the tree right Leaf node actually holds the indexed column value don't worry I'll tell you what I mean to say that how what dbms's do we will see that so uh what I mean to say is that here is so here when we are maintaining the B plus 3 the ultimately the leaf node this Leaf nodes actually holds the index values so let's say this is the column in this column I am this is I want to put an index so this is an index column so what is the index value 19 25 30 17615 so these Leaf node are actually the one which hold this index values the root node and the intermediary node this root node and intermediated node might restore some data let's say 32. which might even not even there earlier it was there but it might have deleted it's not even there now so it is possible that it might be present here into the root node or intermediary node because they just help us to do a faster searching go left go right and all but this Leaf node actually holds this current values of the column which is indexed right that's what I meant to say okay so now let's see how dbms is actually using B plus 3 to maintain the data pages and indexing now let's say this is the table very basic table right how many rows one two three four five six seven seven rows are let's say ro1 Row 2 to row seven right and we have two column column one column two very basic table now I have put an index on employee ID I have put an index on this employee ID so we know that whenever we are putting an index now dbms will apply B plus 3 on it apply what B plus 3 on it to sort to manage this data how right so exactly the same way which we it has done exactly the same way it will do this so now this is the column where I have put the index now it will read this 19 25 30. so now it will start creating a b tree out of it right so what before even uh so don't look at this pointer now right I will explain you but now see this first is 910 so you know that I am still using the three order so in three order means how many one node how many keys you can store M minus one which is 2 right so and there how many pointers would be there three pointers would be there and two keys I can store so 19 so in the key one I have stored 910 ok don't forget about this pointer now this pointing to database and all forget about this let's first focus on this construction of a b tree the second is 25 so now the second row is coming 25 so the key to was empty so I have put 25 into this part third one is 30. so now if I put 30 here it will break because it's a three order Tree in one node you can have only two keys you can't put 30 so you have after sorting whatever the mid one is you have to take it through it parent so it will look like this so there were no parents so I created a new node I put 25 there 25 now left pointer will point to 19 and Its Right pointer will point to this node which holds 25 30. so now this is the first different you should notice I told you right in the dbms when it used B plus 3 Leaf node actually holds the index column value these are the index column values right so Leaf node stores it the left pointer hold less than 25 right pointer store greater than equals to 25 so here if you see I have put the 25 here also I have put the 25 here also because in the leaf node itself we hold this value in future it is possible that 25 row is deleted so I might delete this but this could still be there this is an intermediate node this is just for faster searching anything comes he goes to write this right you will find values greater than equals to 25. but this doesn't signify that this is the index column value in the DB which is stored no okay got it so so this 25 right we have to put greater than equals to 25 so we are restoring the 25 right so 25 would be stored here and 30 so in a shortened manner then what we have to store 17 so in this 17 where it will go it is lesser than 25 it goes to the left side of it 17 is lesser so it will go to this there is already one space present so it will store in a sorted manner so 17 first 19 later after that 6 where the 6 will go so 6 is lesser than 25 it will goes there now there is no child so generally six would be in the first of 6 17 19 certain manner but again three keys we can't store over here we have to now divide take a mid one take it to the parent who is the parent 25 so it will in a sorted manner 17 25 17 left will point to 16 17 right because this is a still have the index value right so it will point to 17 and 19 greater than equals to 17 17 and 19. and 25 right will point to it's 25 30. so you see that three child it has currently each node then we want to store one so one is uh less than 17 it goes to the left so there is one space so it is stored in a sorted manner one six hahaha similarly the five so the five lesser than this it goes here so generally the 5 should be here and between a 1 and 6 but it will break so uh three keys we can't have it so it should goes to the parent right if 5 goes to the parent the five left would become one five right greater than equals to five and it's a key also so 5 should also be there five and six but at the parent level you can't have three notes right so again the mid one 5 17 25 again it has to go to the top so it's hold five it's right hold 25 now here if you see that I am not putting 17 25 right because this is just an intermediate node intermediately node has told you that it is not index column index values column index values are only the leaf right so I am only changing the intermediary node it's going top so that's why I don't have to put 17 here because it's not index column index value so 17 right will point to 25 and this thing so its graph will be so this B tree is constructed so now here if you see that at the bottom this bottom part this bottom part this is now sorted so here we have a mixed data but bottom what we have now of this battery is 1 5 6 17 19 25 30. see sorted right now we understand that okay how whenever we make any column index whenever we make any column index dbms do B plus 3 on it but shransh how the data pages are created now how it gonna link because you told that one table can have 100 of data page data page one Data Page 100 how those gonna manage right so because these are actually has the row so what till now I have understood is okay dbms has so many data pages data page one data page 2 like this each database is stored in a data block and one data block can have many pages like page 1 page two data block can have certain let's say page 5 page six something dbms maintains the mapping of page page one block one page two let's say again block one like this at mapping it does I know that whenever any column is indexed right it maintains a b tree You Now understand how B tree is constructed so that the data of this column is now in the sorted manner in a sorted manner but now we need to connect the dots so now B3 exists I understand this I know data pages I know data block but what is the dots connecting to all of this now let's understand this okay now let's understand this our table has empty and we are doing inserting the row from start right so now let's say we have make this column indexed and now only first row is inserted 19 E1 this is the row I told you now that first what dbmss will do it will create a B plus tree right for this data also so it will put 9 10. now remember in dbms with this key it will also hold one more pointer value or you can say that the pointer with this key is which ultimately points that this belongs to this data page this 19 stored in which database so now let's say I told you that this is the first time we are creating a table this is the first row so data page is currently empty so I created a data page one and stored one row 19 E1 and stored put a pointer here that okay this 19 is points to this data page 1. and let's say this data page Can Only Hold three rows Can Only Hold three rows okay so when I put 19 so dbms ultimately actually this is just our representation dbm is what ultimately has done is it created a data page one stores in this 19 E1 row it created a B Tree on this index column it's also maintaining the index put a 19 with this 19 it also maintains one more pointer that this 910 row belongs to this data page and also this data page ultimately has to be stored in some memory or data block let's say it is stored in one so it is also now storing this data page one data block stored in one so this is the data page one however in the disk okay so now second row is inserted so when second row is inserted it is inserted we are in the data page first so Data Page second row is inserted because Data Page can hold three rows it is only the second row it inserted properly now it will update its indexing because neuro is inserted right it has to update an indexing so it will check that okay one node is free one key is free so it will put 25 here and also manage one more pointer with it and this pointer that 25 is stored in where data page one only it put this pointer in which Data Page this 25 holds This Record holds right till now database is in data block one only now 30 so when 30 is stored so first it will be stored in the data page 1 only right so because it has the space all right so you can or you can say that first what it will do is it will try to see that which data page it has to pick right so one more question it has to comes to your mind that hey change if there are so many data Pages already exist and when you inserting a new row let's say we are inserting 30 this is database 1 data page two data page 100 which database to pick and insert which data page right this is a very interesting question that whenever you are insert a row which Data Page you will insert it how dbms knows it so this is how it knows it through B3 only so when I inserting a 30 so first it will goes that tries to insert a 30 into the tree B tree so it will see that okay it will try to construct a tree you already know that how we did it so ultimately it's gonna be look like 19 and here 25 30. so what it will see is that hey my nearest index is stored in which data page 25 was stored in data page one so what it will do is dbms will try to load check the data page one right so from its data page one memory it will check out its mapping database one it is there in the data block one it will load it right and what it will put is this row and put it back into the data block right and one more pointer it will store with 30 is that it is stored in data page 1. got it we will see one more example now let's say 17 I want to store now which data page so let's say if you have so many databases with data page it has to pick how it will determine first it will go into this B tree it will try to find out the correct logical sequence of this 17 so 17 is less than 25 it goes here there is a space available so it will be stored here now it will check its neighbor index that which data page it is stored and it pick one of the very uh you can say that one of the data page so 19 is stored where so 19 is stored in data page one right so it will see that okay let me pick also data page one so what it will do is dbms will pick the data page 1. so data page one it knows that it is stored in data block one it will hit the goes to the memory loaded from the data block one it loads the data page one now in the data page 1 we already know that it is full three data it already has it cannot insert 17. now what to do now what it will do is dbms will do page splitting page is splitting so now in the page splitting it will split I create one more page so it will create new page page 2 right and it will split the data so earlier page one has 19 25 30 now 17 it has to store run now it will just split the data so now let's say in page one it puts 17 19 rows data and in the page 2 it put rest of the data whatever the page one so page one has this three data right 19 25 30. so whenever it pick a particular page so it picked page one page one has no space so it now the page is splitting happen so in the page splitting a new page is constructed and it divides the data of the page one plus this new data right so 17 19 I have put into page one and 25 30 I have put here in the page 2. right it can be anything depending upon the dbms and it will update the pointer so now 17 will point to that it is in page one 19 it will update its pointer that it is also in page one now it will update the pointer 25 is present into page 2 30. it updated that it now it presented to page tools earlier 30 was present into page one so it has to update the pointer so it will update the pointer that this index value 30 it's all corresponding data is present into page 2. and also it will maintain a mapping that hey this page 2 is stored in let's say data block one itself okay now what data we have to store after 17 let's say six so it will try to insert a six six will go here okay so first it will try to identify its correct logical location okay so it will find out it's that okay 17 has to be go up it it find out its correct location now it will check its nearest neighbors nearest index in which database it has so nearest its all are linked in B plus 3 I told you right so it will check that okay this uh 710 this 17 is stored in page one so it what it will try it will try to insert the six and two page one so dbms will load the page one so from the physical memory it goes to a data block one loads the page one page one has now this two records it will insert the page one already has 17 E4 19 E1 it will store the six also because three we can store so it will store six and six data is E5 E5 right and it will update the pointer of this to that it is stored in page one similarly for one also so ultimately once goes here now here can you tell me one when it goes to the one one will see it's to its neighbor hey which database you are stored in six will say that hey I am a student data page one so dbms will try to insert the one and two also into the data page one so it will load the data page one it can easily load it that it has the data page one is stored in Block one so now it will store page one page one already has three rows it is totally full now 17 19 6 17 9 10 6 and it's data E4 E1 E5 E4 E1 E5 some data is already there now it tried to insert one but you can't insert one right it's already full so what will happen now page is splitting so it will create a new page so page 2 was already there so it will create let's say page three it will now split the data so let's say it's 17 19 it will keep here and it will put let's say 1 and 6 here 1 and 6 here right and it will update the pointer so one is now pointing to page three six is also now pointing to page three it has to update the pointer earlier 6 was pointing to page one now it has after splitting six has two points to page three right and also it will add a new mapping also that page one is there in Block one page two was there in Block two uh block one now it will have that let's say page three goes into let's say block two somewhere while storing into the memory it knows that in which block it goes so it does the mapping now does everything making sense to you that how everything is linked right one thing I still haven't tell you about this offset array how this offset array is used I will tell you when I tell you about clustered non-clustered but till now tell me that in the comment section is it clear to you I am going very slow you know right how the table data is actually stored in a page there can be this page ultimately get stored into Data blocks so dbms has to maintain the mapping page versus block right then for indexing B plus 3 I think it should be clear to you we have seen with an example so whenever we insert a row it first put its logical ordering into the B tree determine the best page it has to be stored because there can be so many pages determine the best page where this data has to be stored right so we have seen an example that where the logical ordering is it check its neighbor I tried to load that page and see if I can if that can be accommodated in that page if not page splitting happen whenever page is spitting happen a new page is created and it just split the data right and it has to update the pointers also it might be possible that earlier this neighbor node 25 let's say is also storing into page one but after splitting 25 goes into page 2. so this pointer has to be updated and whenever any new page is stored when what whatever the data block is ultimately inserted dbms has to maintain the mapping clear till now cool now let's come back to the indexing part hey what is the two type of indexing clustered and non-clustered now I know that what is how indexing how dbms actually does the indexing but what are these two type of indexing so see first let's understand the cluster indexing cluster indexing says that order of rows inside the data page match with the order of the index order of rows inside the data page match with the order of the index let's see with an example what I am trying to say so what but chance so let's say this is the table 1452 a c b d and this is the address okay so I told you at first that in the data page whatever the sequence in which the rows are inserted into a table generally in that order only the rows are inserted into a page only right so if you are creating a employee id1 so here in the data page the this row will come first after 4 then 5 then 2. now let's say I have put an index over here now you know that when you put an index dbms will do B3 on this index and it will have this sorted one two four five now you know that right how one two four five this at the base it would be sorted only and one will also hold a pointer to this so when we say that right one will hold a pointer to this one row how ultimately actually this one will hold 0.2 pointer to this page in which this row is present right now here the sequence indexing sequence says that one two four five and in the page order is one four five two the way it is inserted rows are inserted one four five two but here you are saying that the order of rows inside the data page match with the order of the index that's what clustered induction says how we maintain the order because you are saying that the order of the rows which are inserted in a data Pages depend upon how we insert it actually right so we inserted 1452 so it is stored like this one four five two but indexing on this column indexing is in sorted order so it should be one two four five so the order of the rows should be one two four five how that's where the offset comes into the picture I told you that in offset it maintain an array right and it has an array like zero one two three so whatever the index whatever the indexing or you ordering is so indexing ordering is one two four five sequential so at index 0 it would be it would point to employee id1 data so it will point to employee ID one data and then after that two so index 1 it will point to employed two data two data so third would be 4 so and the next in the sequence of the array would be four so now you see that even though the data Pages the rows inside the data pages are jumbled or depend upon how the user might have inserted mix mode after induct indexing whatever the sequence comes out the order is generally given by the offset so whenever the you will Traverse this Pages you will Traverse in this sequence of this offset so employee so at zeroth slot it is employee one then two then four then five one two four five like sequencing understood that the useness of this offset it holds the pointer and that's your cluster indexing is that the order of rows inside the data Pages match with the order of the indexing so it doesn't means that this rows or sequencing is changed this offset is change this offset array holds the data in this manner so that it maintain the indexing sequencing okay so there can be only one cluster index per table why because you can create an order of a row based on one index only right so now let's say this is employer this is employee ID let's say name can a cluster index can be made on two employee ID and name AC DB let's say indexes let's say for this a b c d right but in the cluster index the order has to be maintained so order can be maintained either through this or through this so that's why there in a DB in a one table there can be only one cluster index so what column can be a cluster index so the it give of prime uh first prior to the primary key so in this table let's say if I have created an employee ID as a primary key dbms used this as a clustered key and whenever it use it as a clustered key it will sort the data Pages based on that one whatever the index comes So based on this clustered key it creates a b tree B plus 3 and whatever the B plus tree ultimately the index sequence comes the data page is also it will insert in the same manner like not this offset the offset it will maintain in the same manner right so first priority it will go to the primary key if manually you have not specified any clustered key then dbms look for primary key which is unique and not null and it is used as clustered key but let's say your table doesn't have primary key then what what will happen how dbms will create a B plus 3 out of it when you don't have the primary internally which is not visible to you dbms create a new column and it is generally sequential incremental one two three four five six seven right with every row you insert so it created as a clustered key and store the data Pages based on this so any any new row you will insert it will add uh so this is auto increment column it insert which is not visible to you it is used as a cluster key so it use that value in a B plus tree to determine which data page to pick where to put what would be the offset sequence for that page and all now you can ask that hey what if let's say I have a big table I have let's say uh 1 million rows but I haven't defined the primary key there is no primary key so what happened would be dbms internally creates a new auto increment column and does a b tree out of it and maintains everything it maintains page one page two thousand data block one data block to anything whatever the mapping it has been doing it is doing even the rows ordering also it is doing but now you can ask that hey what if after now I added a primary key and make one column as a primary key so now as soon as you make a primary key I told you this will get priority so now dbms has to do lot of work it has to do regenerate this B plus 3 it has to update all the pointers now based on this new index column it has to update the B plus rear pointers it might have to load so many data Pages out of the block reshuffle the rows right so lot of work dbns has to do but everything has to be changed now based on this column indexing okay so that's what I did if a table there is no primary key then internally it creates a hidden column which is used as a cluster index this column just increase sequentially and guaranteed unique and not null so cluster index you already got it right uh it actually based upon this column which is you can say that the primary key column the primary key column which is always General consider the clustered key it orders your rows and Pages based on that then you can say that hey what is non-clustered key okay cluster key take care of primary key but you remember we have so many other Keys also secondary key composite key what about those also or let's say that I have a scenario where this name can be used a lot to query so it is better to put the indexing on it also so non-clustered is like I can put indexing or you can say that non cluster indexing I can put on this name secondary index so on the name also I can put an index let's say I am putting an index on name also but now this name would not be considered as a primary key or it's a secondary key let's say so whenever you are creating a secondary key composite key right ultimately it is doing non-cluster indexing so what does it means that so now let's say I am doing name So based on that B plus 3 on the name let's say it is able to maintain certain sequence that this is a this is B let's say this is D this is d okay this a now what so now two things so it will again this a data is stored in where let's say in page one so it will point to the page one this B is stored in somewhere so but here if you see that the ordering of this offsets are not based on this first correct The Ordering of this I would say that the ordering of the rows doesn't depend upon the secondary index second there can be many secondary index or composite index so I can have address also as an index so it will create one more B tree and stores the data and just maintain the pointer to the database right so we have only one clustered Index this cluster index actually drives this ordering of this row and we can have many secondary index on the non-clustered index as your name say the cluster you are not clustering the rows and pages right so we can have many non-cluster index or secondary index right and that's one of the reason when somebody asks like you should always create index you can say that we should not like I have 10 columns let's say column one column two column three column four you might be asking that why can't so this is primary key why can't I put this this I will also make secondary Index this is also I will make secondary Index this is also I will make secondary index over like whatever the query will come it would be faster now you understand at least grabs that there is an overhead so for each secondary index you have to maintain a B plus 3 for it now if there are 1 million rows in this table let's say 1 million rows means in the B plus 3 you are creating so many keys here right you can just consider the amount of memory it also required right let us say if you have three secondary index so means you are creating three B plus 3 like a non-clustered index one for this secondary index for this column two you might be creating another B plus 3 for non-clustered index uh an indexing you might be doing for column three for again 1 million record like this no assume that any row particular row is deleted you have to update the pointers also of this you have to delete that row or because this is the leaf node has the index value you have to delete that node also and when you delete it you might have it might be possible that you have to update the pointer also so that's why it is always say that create an indexing carefully it is not easy because there is a lot of overhead comes in what are the overhead first thing is this B plus 3 also needs memory so you have to worry about the memory also because this is also ultimately going to be stored in the disk only right and it dbms also whatever the data it is stored it also stored into a Pages which is known as index pages and this index pages are stored in disk only so there is a overhead that it maintain the index pages so because I told you that one million nodes it created definitely one million node cannot comes into one page it might have to create so many index pages also index pages also comes into so many uh let's say block five block six block seven so now you see the overhead which dbms has it has to maintain the index pages also index page is mapping also it has to maintain the data page where the data is actually stored and its mapping also blocks so just creating a secondary index Composite Index and indexing will just create a additional overhead and any addition any because anytime you add a new element you have to update your clustered index the primary key one then all the secondary you also have to update and in the update I already told you page split can happen when the page split happen it might divide right so index has to be updated so I think it's pretty much clear right so now you know that what all things has to be done so whenever whenever you have to it looks simple right so whenever you have to insert one row new row what all things it does right it first find an index page right what would be the first step it first find the index page right so in from the index mapping index page one index page two let's say block one block two so it will first load the indexes pages into the memory right because indexes Pages has your battery data combination indexing right so second what it will do is uh or you can say that so let's say or I would say that instead of inserting let's say you want to search because we are talking about search query so let's say you want to search a data with 35 let's say I'm saying ID so first it has to load the index page index page is loaded from this data blocks now from this index Pages it will run the B plus tree it will Traverse this B plus 3 to find which Data Page this 35 has so maybe you have a data page this indexes has the data so now let's say this 35 ultimately points to a particular data page data page 5 let's say so now from for second step is using the B plus v and find the data page third step use that data page find the data block is maintain our mapping right data page which data block that's it too so now it can directly goes to that data block load that data block in memory right fifth read the data right so now even if there are hundred thousands of data block it doesn't have to worry it can directly goes to that particular data block okay so this is check it out one more time it would be little bit clearer but this is so many things happens right in the background and we can discuss more if you have doubt in the comment section and even if you want it I can have one live session where we can discuss more okay guys thank you bye
Info
Channel: Concept && Coding - by Shrayansh
Views: 43,421
Rating: undefined out of 5
Keywords: Database Indexing, DBMS INDEXING, dbms indexing, dbms, db indexing, db index, How DBMS Indexing done, data page and data blocks, index pages, clustered index, non clustered index, clustered indexing, B tree usage for indexing, B+ Tree, how B tree is used for indexing, DBMS indexing in hindi, B tree data structure, RDBMS indexing, mysql indexing, MYSQL Index, concept and coding
Id: 6ZquiVH8AGU
Channel Id: undefined
Length: 83min 52sec (5032 seconds)
Published: Wed Sep 27 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.