How Do Databases Store Tables on Disk? Explained both SSD & HDD

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what is going on guys my name is Hussein and in this video I want to talk about how databases store information in the disk and that was a question asked by one of you guys and it was a very good question because usually we take things for granted but when you really understand is you feel pity for the computer that does the job for you because oh my god right you you appreciate it right when you start washing the dishes you know how hard that job is right stinking dishes and especially if you like cooked an egg and you forgot the plate for a day and then you come the next thing and those the yellow yolk thingy I don't know why they call them English right the yellow thing they get stuck in the plate and you need to work really hard and then you understand how dish washing is really hard same thing with a computer computer is not dish washing per se but kind of similar rambling I know guys the idea here is to understand how databases really store information on disk alright so it is extremely complicated thing and let's talk about it a little bit this I'm no they're not gonna be any graphics or anything just imagine what it'll mean guys see follow my hands because that's what I'm gonna use here I'm gonna make another video detailing where with virtual board and all that jazz but here's the thing guys imagine you have a beautiful table has to be beautiful if it's not a beautiful then you have a problem but if there's a table and let's say for simplicity there's only one field and that field is an integer 32-bit let's say right sorry that's what 8-bit eight bytes bits which is 4 bytes not not eight ok so you have an integer right field so and that's four bytes so if I insert one row any number between one and two billion I guess that's the maximum 32-bit integer you can have right it's gonna take four bytes of space and we can talk about what space really meet so if I insert a row here four bytes insert another row four bytes another row four bytes another four bytes how does it this really stored in disk right and I'm gonna explain this in to fashion how is stored in SSD and how to stored in HDD hard disk drive which is this rotating a thingy all right how do you like this sound effect guys huh so yeah so we have a spinning disk and this is D and there is like a little bit different how they store the information so let's start with the hard disk drive which is the spinning disk the hard disk drive then it is obviously a platter which is a circle right and there is sectors and each sector present information and each sector has a little bit of Block in it and these are tagged and identified uniquely for every single sector that is in the hard drive so you would say okay platter number one track number seven sector number five and block number two and that gives you exactly a block which is a block of disk that corresponds to number of bytes let's say 512 bytes okay so go back to our logical representation of the table if I insert 4 bytes this integer like number 7 which is 4 bytes right that's the beautiful thing here if you're in September 7 you're gonna reserve 4 bytes if you're insert number 2 million you're gonna reserve for byte doesn't matter right because it's all 0 0 1 0 0 0 0 0 it's all four parts and therefore bytes will be stored in that given block ok and cause that block is 512 and you stood for four bytes that kind of waste right you cannot jump to another block and store another four bytes right you have to insert another row and there's what the data is smart the database says wait a second this table is reserving this block so for and it's gonna put the next four by the next integer next to that four bytes so it's gonna be adjacent to it so they can fit in the same block and you keep inserting something until that 512 is completely filled right and you might say wait a second how do I know if I want to pull row number one to go to that block that's how the database stores stuff the database will say okay row number there is a unique identifier for the role and it could be that primary key if that field by happen to be primary key so that particular number points there is metadata in the day way that says hey this I want for example row number one row number one have some information next to it sector seven block eight track three and that information is what the database need to actually go to the disk that's circle oddly mechanical drive and start spinning and reads what does it read it doesn't read four bytes it doesn't read eight bytes and race the whole goddamn block which is 512 bytes of memory and you pour it in memory and then when we got that that is slow because that's an i/o but the moment I have all this block 512 I got row number one and I got no number two as well by for free I got normal three and four and five and six and seven until the last row that is in the block until row number 128 to be specific right what if I want because 128 is the last block that he can't fit 512 bytes right so now what if I won't want roll 129 for example which is happen to be the value one of one and a half million right if you want that oh it's not on the block still on the block it's not on the block that's another IO and they that Erebus knows this information look Oh another IO go to that location that 128 129 will say Oh block number 7 now walking number 8 it's the next to it and then you pull that information okay and you might say are these two blocks near to each other could be could be not based on how fragmented your data is but usually they are next to each other so that's how mechanical Drive works in a nutshell so how this is how they are stored on this comes to beautiful this is these is this these don't have rotating stuff they don't have sectors there is no tracks there is no all the garbage there is however the idea of blocks still and block have a size all right 512 1024 whatever right and these blocks of memory okay have pages and each page has a size but it's the same thing right instead the database is storing to go back to the our representation logical representation is still the database storing that track and sector where the actual stuff is for this is D it just stores the block number and the page number so all row number one is block number 17 and page number eight right and it's just all that information I want to pull that page let's say 512 bytes in that page we have that beautiful page we got row number one and two and three and four and five and six right up until role 128 same thing and we didn't talk about rights per se but I'm gonna mission in and after a while but yeah so that's the same thing you go back and you read it exactly the same however it's hundred two hundred four hundred times faster because it's a random access right instead of spinning wheel it's it's literally you tell that IO controller of SSD go to that particular block but that part of a page and pull it and when you pull a block you usually pull all the pages with it that's so that's what happens in an SSD when you pull a block you pull all the pages with it as well let's go through all right I want to write to page we never write two pages right as users we write two rows it's like oh I'm going to update this row number nine to be four instead of 1.5 million to be 2 million so what what do we do here what does that mean that translate first of all that's raw number nine so pull that row what does it mean where is row number nine or number nine in the database there is metadata so sort row number is block number 302 and then pull that that's the first thing and then I'm gonna update it it's 1.5 million update therefore but to say two million now updated in memory now I'm gonna write it back and here's how is this D works all right I'm gonna talk about how is this D works and I'm gonna talk about how the sector works the the mechanical drive so for the SSD when you want to write that block first of all you write only the pages in the block right so that change that 1.5 million existed in a page number one right but if you write it you change it to a two million you have to always create a new page that's how has this D works so you write a new page and then you flush back that block as a different page to do scribe and this information reading and writing to the same location that block in SSD is extremely Hertz the SSD feeling right SSD have shelf life and this block is block writing and reading the rewriting to be specific has some shelf life and this property in SSD is called endurance literally how many times you can write to the same block before that block is dead and that block is dead we cannot reuse it that means we just lost space and ours is deep ok take that with the hard disk drive which is the physical thing you can literally go back and write to the same location right and you can write to it a long time because it's just magnetic right idea so you can strike write to it without any problem because when you write to it you're gonna write an entire block back you cannot just write one bytes right that's that's not how I or controllers work so so when you do that when you change this place you take that whole block and just flush it back to disk and SSD there's limit to how much how much time you can write to it but for the hard disk drive there is no limit to how many times you can write you can write the same block it's okay sometimes sectors get corrupt but that's rarely right it happens compared to SSD right so this is this faster reading and writing but have low shelf life depends on their endurance so this is deep however the hard disk drive they are a little bit slower there so because they are literally abiding by the laws of physics right it's not electromagnetic stop magnetics and light and electricity right it's just using literally physical stuff to seek back to allocation and then well literally write all these bytes 1 0 0 0 1 0 writes all this information to disk right there was a little bit of if introduction about how databases store information to desk guys hardest drive are different than SSDs you really sometimes you need to understand how these internal works there are papers and people taking PhDs just discussing how can we squeeze as much performance from SSDs because those are the future obviously those no those guys are not dead the mechanical drive we're still using but these are the performance stuff but we have noticed with use the data structures that we have built b-trees to be specific they kill those these if they are not configured properly because what do be trees do be trees when you keep inserting stuff yeah and such thing I'll SSD love new inserts if you're inserting new stuff that's the best thing you can do because insert the new stuff you're always taking new blocks and you're writing to new blocks you're not overriding an existing block so you're not reading something changing and manipulating and writing to the same block this is these don't like that because you can only change it so much when I say to them so much I'm talking about 500 700 and that's how it is is the cost go up as well right the endurance of that is deep right so be trees when we designed this thing we didn't fall we didn't we designed a way back in the 60s and 70s right I wasn't born back then while those data structures didn't really think about inserts versus update it's like okay we'll just insert stuff right and here's where the problem of B 3 B trees and again if it's configured incorrectly if you keep inserting in your tables you have a B 3 index that B 3 index has to be large first not just one integer primary key could be composite of strings and all that stuff then as you insert the P tree that tree will try to grow right when it tries to win it about to get imbalanced the tree will shuffle itself and rebalance itself and that's there where it kills this is the when you rebalance what does it mean yes you are inserting into your logical table you're just appending the data but the index is restructuring itself right was this what is this all right so yeah so the the the index is reuptake I'll oh this is no longer the root node this is now the root node oh the child or the child have too many nodes update itself so now it's this is the location it's point two so it really burns and seven rebalancing is nothing but updates we need update in place in SSDs oh my god that is the worst thing you can do right and you can do this multiple times and your sisters gone and when I say a gun yeah that block is gone and when that block is gone you disk essentially gets smaller because that can no longer use that block it needs to use a new block in memory in desk and that can slow things down guys so that's the idea of SSDs hard disk drive and how databases store information there people are still researching this stuff so they came up with another data structure Facebook primarily I believe they didn't come up with this data structure they said Petrie's are may well this is DS right so they came up with their own database engine they called it rocks DB okay and they said we have a lot of use cases where we're writing staff all the time and those rebalancing B trees are killing our is this route shelf-life and and or throwing is these like we don't like there's no tomorrow so what do we do so they used an existing data structure called log structured merge tree or L is M right LS m trees are append only they are great for appends right so they it's it's a bit it works and as levels right I'm not gonna go into details but that's essentially the nancial so if you levels it works at the memory love on the level 0 level 1 level 3 and level 4 and these levels or keep you always inserts so the index even if you build an L symmetry index space then it always just inserts at the end and that's really nice for us this is the love that obviously you have to have a large assisted you if you were inserting a lot of refraction but alright guys that was a little bit a long video talking about how databases store information aim to the desk right I like to talk without graphics and stuff like that because I might use this as a podcast and check my podcast guys if you're interesting if you want to listen to this while you learn in a train or get back on your working out you can listen to this information without actually looking at and expecting something graphical in the videos I do make other videos where I use a virtual board but sometimes I make this more casual chatty kind of videos as well alright guys like this video if you liked it dislike it if you don't like it I'm gonna see you on the next one you guys stay awesome
Info
Channel: Hussein Nasser
Views: 26,375
Rating: undefined out of 5
Keywords: databases, how database store in disk, database tables on disk, database engineering, software engineering, backend engineering, btree vs lsm, lsm, b+tree, log structured merge treee, binary treee, database engines, relational database, nosql vs sql
Id: DbxddGtHl70
Channel Id: undefined
Length: 18min 56sec (1136 seconds)
Published: Sun May 31 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.