A Deep Dive in How Slow SELECT * is

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
select star from table is a universal SQL syntax that will basically return all columns in the table that you're selecting it's been said that select star is slow and to avoid it like don't return on column because it is slow but I want to actually dive deep into the reasons of why select star is is slow some of these reasons you might already know hey we're turning 50 columns is of course lower than returning one but I think some of these reasons you might not have think about because it really involves understanding the entire stack and the backend engineering fundamentals and specifically the network aspects of things as well you know and of course programming logic memory management how the OS works all of this really affects the performance of your queries and selecting all the fields while convenient because you don't have to enumerate the fields you want it's really it can impact the performance in general so I wrote up an article on medium that I'm gonna review in this video I'm going to add my comments and going through the reasons of all of these things how about we jump into it alright so all right so let's get started how slow is Select star let's read this little bit of a paragraph and then start discussing this though in a row store database engine rows are stored in units called Pages boy don't get me started now we're already in the first sentence and I already like thinking about this stuff right after I ride this and then coming back and says wow there is just so much to these things like a page is and a block is the most overloaded term in in in software engineering that is right you have no idea how the word page shows up in the entire stack you know from the database to the file system to the operating system to the uh SSD itself right or to the drive itself the page oh pages are all different in sizes and specifically here we're talking about database pages right and then database pages are are really fixed size like all the databases that I know of use fixed size Pages or just just think of it structure I suppose with headers and the content of the page is essentially mostly rows or if you have like a column store it's going to be the columns right for that first row and then the columns for the second rows it depends on the implementations and if you have documents it's going to be the document so it's going to have graphs it's going to be the graphs and so on right and of course this little bit changes if this is an index page versus a heap page but essentially that's the gist of it right so everything is at the page effectively right and this is an example I put where this is how the postgres page looks like there's a bunch of headers and these are called the the Tuple pointers and they specifically specify where does this Tuple start in byte 134 and it has a length of 20 bytes right and the second table starts at 155 and it has 10 and this is basically lives in page zero those are Pages there but so technically speaking if you really think about it the row is stored in the page with all its columns so the first question is since I am when I read a page I get all the rows and in that page and I get all the cons so technically select star should be cheap right because I already have all the columns in line quote unquote online in that page right because that's how raw store stores things basically we're dealing with transactional you know workload in this case so we'll always assume raw store but but then why is why do people tell us that select start is slow and that's basically what we need to understand here right you see every time you fetch let's say let's explain first of all how do we read something right I want to read a row right select star from table where ID equal one let's say this is a student ID right so assuming there is no index what the database will do is okay we're going to do a full table scan so there is your table and the table is literally one file and it is organized as an array of these pages that we talked about this fixed size pages so what what the database will do is again I need to do a full table scan and what that means is I need to scan the pages one by one in the file so I'm gonna read page 0 from the file how do I get page zero remember when you read from disk you have very a certain operations when you want to read right you you read you specify the file descriptor where do you want to read from where the starting position is and how much bytes you want to read that's it and the how much thing is I think up to two gig that's the limiting and Linux at least okay that that's all you already got so how does that convert to pages when it comes to databases well once you understand the fundamentals is this is all simple stuff right the database to read page zero page 0 starts at the zeroth position in the file so the opposite is zero and Page the page size is also fixed right in postgres is 8k in annual DB MySQL is 16 right and so the length is 8k it's all effects so you might say how do I read page seven then well seven is literally page seven is seven times the length of all the pages that went before it right so seven times eight plus one so you actually start the next page right the next buy and then you read 8K right and so on that's that's basically how you read pages once you the database reads the these raw bytes and and these in Reading verses there's little more to it than then just I say read right I don't like to say these things anymore I like to understand how things work there's there's layers and layers and layers of things underneath that read operation that is the file systems involved the the bytes are converted into file system blocks and these blocks are mapped to the SSD blocks or the sectors or in the in the drive and those blocks are physically retrieved you know because the API to read offset length is not consistent across the the the the the the the storage driver right we don't read bytes we read blocks assuming this is a block storage right we we deal mostly with block storage so everything is a block so if you want to read a single byte you read whatever the number of minimum number of block the file system allows you to and that's 4K in most cases all right so I'm gonna I'm gonna talk about this maybe in another video so so just so I don't I don't go off track here but once you read that um once that page is read from the SSD transfer to the file system into many blocks and now we have the raw bytes the operating system have the raw bytes in memory hot right the database now takes those Place those memory location and space them into something called the shared buffers right it's the databases on Cache where those pages will be lived so that hopefully someone will want to query something that is in the same page so I just pull to this shared memory effectively it is the reason is shared memory because the databases will spin up multiple processes of course most databases deal with multi-processes and these multi-processes all of these executing multiple transaction that you need to see a unified view of of all the pages and this is where the shared buffers are located right so that's what we do so when I do a select ID right one oh that's uh page zero oh well we don't really know it's page zero we just scan we read page zero now look let's read row by one is is the id1 is id1 is the id1 and then we find it if we didn't find it we read the next page page one and then it verified it is page page two and so on right until we find that ID and once we find it we really have everything because the page have all the columns in line I'm gonna put an asterisk on that so that's that's how things work now now let's go to the reasons because now everything is there why is Select star slow right let's first go to the first reason case index only index only scans goodbye first of all you're right right if you're actually reading the Heap which is the table data you're right you have that select store you can have everything but here's what you miss on let's assume you have an index like a student let's go back to the example of students right if I am a student and I have a table student there is a the table uh there's an ID field there is a grade there is a name there's a bunch of other stuff right so now I have the student who has an ID that is a name there's a gray there's a bunch of other columns as well right so now as you might have a grade index an index on the great field so if I do a query says okay give me all the student IDs I'm just interested in the IDS that scored more than 90. right Mark a mark of 90 or higher well if you're great index and assume let's assume this is my sequel right or SQL Server and I always I picked those two letters because they they store secondary indexes a little bit different than postgresway and we're going to come to the boss guys it's just it's very similar there here so now because I'm doing a grid grid is greater than 90. definitely I'm going to use the index that is there right so as I'm scanning the index and again the scanning the index is going through the B trees and the b3s have layers and these layers store these keys and the keys are stored in pages and the pages are stored on disk very similar thing it's just that data structure is different now but but the content is still Pages you're reading pages now we have read this we read through this beautiful index and we found as we're scanning right because the index is ordered the grades because we are having an index on the grade it's order so the ones I find the 90 I find 90 and 90 and 90 and another 90 and a 91 and 92 everything is ordered and and and because the value of the secondary index in the leave page is the primary key right again this is let's see my sequel right uh or Oracle of the you have an index organized uh table right then you technically have the ID because the CSA the ID is the primary key right you have the ID right there in the leave page in the index so your your work is done technically rock is done I said you found the the grade you have over there of course is right there and you have also the the primary key which is the ID which is what you want so it becomes what we call a covering uh indexes scan or in in postgres speak is called index only scans right so you only really need to scan the index because you don't really need anything else you just ask for the ID but the moment you do select star that optimization is just screwed the database is you really had to ask for the all the fields are you really did have to do that damn it now says okay you need the ID but you also need the name and you need the date of birth and you need uh his documents and whatever right so now what the database needs to do is collect all these IDs which our primary ID is and now it has to do turn around and do another lookup it's called index six right on the primary key hey I have I found ID 7 scored more than 90 and 11 and 1007 and 1008 and and and 20 0007 they have to be random right because you don't you have no idea that the students who score more than 90 are actually in order right you'll be lucky but it's never the case this is called random reads So Random reads are the worst right I wanted you to the database random everything is the worst when it comes to ssds reading from desk reading of databases you want to avoid those but sometimes you can't right so now you have a collection of these puppies you turn around and then do another scan or a seek on the primary index so you say Okay I want to find this a student this student so you'll be you'll be scatter shot all over the index which will cause eventually many i o to desk why because you just chose to do a star instead of asking for what you want so really really slow stuff in postgres right in postgres by default when you just create an index on the grade right always has to do that even if you ask for the ID it has to go back to the table because uh postgres secondary indexes which is everything is secondary there are there are no primary indexes in in postgres the the value is actually called the Tuple ID which is where we explained right here right a typical ideas is literally is literally a two pair which is Page index and the index of the Tuple or the row in that page right that's that's what it is so like this in this particular case page 0 comma one is the first row 0.2 0 comma two is is the second Tuple and so on right so so that that's that's what's happening to and posca stores these tuples so it has to go back but in postgres you also can have a covering indexes where you can create an index and you include columns from the Heap from the table in the index right such that that index will have the ID or will have anything else you want so you can get that beautiful index only scan so if you have such index and your front-end application on this case is actually is the backend server right that actually executes the query in this case it is a front-end to the database right well will not utilize this beautiful index on this Castle so that's that's the first case index only scans you basically kiss them goodbye when you do a select Star right you might say okay I don't care the random reads affecting from this is really bad okay let's talk about deserialization cost correct when you select star The Columns live in the page but getting them out and into the protocol that will be eventually delivered to the app will require something called deserialization because these are just a bunch of bytes at this point they are raw bytes in the page because the page is just literally a memory once it it's pulled into the memory right through right you you may lock it effective effectively it's just a bunch of bytes right so so now you have to actually parse right it says okay there's this row and the first this is the First Column this is the second column this is the third column and once you get the content of this column you actually have to coerce you have to deserialize this from byte down to the data type of whatever this thing is if this is an integer you cast it to an integer if this is a long if this is a double it has to do a double if this is a custom data type you cast it to this custom data type there's a string request a string that also has a cost not much but it can add up if you have a lot of columns so deserialization cost you can really add up because you don't really know how many fields that exist in advance right he's doing select store you have to do all of that stuff right and and that decentralization of course again we're talking about the day at the database level here converting the page raw byte to data structure that then that data structure will be serialized I suppose down to the network eventually right but but but but dealing with them into a number or a value the conversion needs work and and it needs to be stored in its own form as a variable as a data structure so you're actually allocating more memory if you think about it just to store these things right building those raw objects all right so that's uh legislation cost right not much but it can add up okay not all columns are in line first of all what do we what do you mean by inline when we say in inline when I say inline I mean when I say when I have a raw Store where the rows are literally one after the other row all columns first row all columns second row all columns third row all comes just one after the other right The Columns that are appearing inside the row I'll call it in line in the same page right there it's called inline columns okay and you're going to see this word being used all the time like inline functions right and compiler like like inlining sometimes gives poor performance things right but the problem here is remember when we said pages are fixed size I forgot to mention one thing which is rows cannot really span multiple Pages if you have like what do you have like a really big role with a I don't know 20 000 columns can't happen but if you have that then the raw technically cannot fit you will have one page with a single row and that row will not fit that page because it's so big right so you you will have to spin up another page to complete that role database never allowed to do that because the complexity to find the row becomes exponentially harder so no databases as well as far as I'm aware allow rows to be spanned between multiple Pages just because of the complexity if you think about it right so Rose cannot span badges because of that then Hussain what do I do if I have one field of the strength that has the entire work of Shakespeare in it that's what databases do like think about this or or blob or lobs right things that are actually binary that are stored in as a column Jason Jason's are rarely stored in line like Json documents like in postgres or other databases you can't because you can't fit a page so what happens is those are assigned a pointer and when I say a pointer don't think of a C sharp position don't think of a c pointer here and it's actual it's an it's a unique identifier that points back to that blob right so what happens is a a marker is created and that thing is placed on an external table right and and MySQL has it poscus has it's called toast right they have to have it there's no way out you have to have some external place to store your stuff right there are they cannot fit into the page and as a result if you think about it this actually what limits the number of columns why don't databases allow unlimited number of columns because of this reason right because at the end of the day you can have a non-variable static type integer like the integer is not variable right it's a fixed size it's either four byte or eight okay 64 or 32 right if you have that then how many actually you cannot really put that outside it doesn't make sense right that's an integer right so so so what happens is you are limited by the whatever the number of columns based on certain calculation you can do the math as well so what what databases do is they take that long strength that long document that long blob that long Json put it in another table put an ID to it and that ID is instead is stored in it's very tiny like one or two bytes or even less inline so you only that column only have the pointer that identifier of the external storage it's never stored in line right and again it depends on the size of the thing right sometimes if you have like 128 characters supposed to say eh okay I'll put it in line right there is there is a threshold to decide that but mostly large stuff are stored outside and guess what they're often compressed so there is a cost if you actually say give me that Json give me that document give me that strength you're asking the database yeah you're saying it's already in line uh no I got this but I have to now do another query to another table called the toast table and postgres the oversize something uh text or something I forgot what it stands for okay and that is an additional IO because that page where that toast exists might not be in the shared buffers and as a result you're just closed another i o look at the work you're doing look at the pain you're causing you're causing pain an IO call goes to the disk and the disc suffers every time you hit it with an i o right next time you do a select star think about the suffering you're causing to all this equipments okay I tried to say this with the straight fair but I couldn't okay so especially that because these are things are compressed right and you can disable compression if you want but that will just blow to your storage if you don't care about storage sure but then not only you're pulling it from an i o external now either a CPU involved now you're making the CPU work for you right you're uncompressing stop because uncompressing has to be done in memory right now I'm just like ah let me uncompress Huffman encoding whatever right gzip all this stuff is happening now geometry stuff you deal with geometry I do with geometry all day that's my basically my my nine to five okay geometry stuff GIS right so so yeah this is really expensive right so it's doing select store now you're silently and and the sad part is if you ask the neighbors to do all this stuff to pull all these blobs to pull these strings and you just in the client you just say I really just needed the ID well I don't really need the rest of this stuff all right it's like asking someone to bring you all this stuff right and bring you a dish of food or something but you only take one thing you say oh I I don't want that anymore how rude how rude really okay not all comms are online so be careful be very careful Network cost when was the last time you're back and running the same time in the same place as the database never you'd ever put the database in the same place as the back end right they don't they never run in the same processes in the same host right sure you can run in the same physical Halls on a separate container right but I don't know why I would do that but you might have virtualization set up but still you have to separate them there is networking involved right and and of course you need to keep them as close as possible so the latency is low right you don't want to put one in U.S West and the second one is the database is you and us East and the back end application in U.S West and you expect queries to be fast no you have to put them in the same network Zone hopefully right and if if you can't then create replicas and have the replicas close by right I think I created a one video at some point I think I think it was clever keep your database closed and your no keep your servers closed and your database closer I think I think that was clever a clever way of saying it did I just call myself clever because kids kids these days will call me crunch so uh right Network cost so now that we talked about okay you selected all the fields you took the hit you did you avoided the beautiful index on the scan you took the head of database uh data structure C deserialization from the page down to their finger and then you took uh uh uh what was that what else yes uh you you told me to get all these text fields and blobfish and Jason Fields so I had to go to a toastable an external table decompress them and and also put them in a data structure so the double double trouble and then now I'm ready to send it back across the network now each database has its own networking protocol right right it could be most used like binary specific proponentation I don't see I didn't see really a a unified uh and that's something I I talk about sometimes in my channel where there is no unified database protocol doesn't exist every database just does its own thing right unlike the web we have HTTP right it's a unified web protocol anything web you go HTTP right and there's like the the use cases uh went beyond web right so everywhere else like this is the de facto standard some people brought HTTP down to the database I don't know how how well that performs to be honest but but every database does its own thing like a SQL server has its own postgres has its own redis has its own everybody database I was like ah I want to build my own and then now you you realize that data structures you have down serialized it down to bytes down to this protocol uh language that you have right so the serialization aspect to to that protocol application Level protocol right your application at that the database protocol application in this case right whatever that structure is right and then that goes into the the transport protocol which we don't have anything else TCP that's all what we have right so it goes into down to IP packets and then segments sorry and then IP packets and those IP packets are shipped right so now if you're sending really really really really large columns the response from that protocol the database response the database SQL result will be so large such that the network transmission you will feel it especially if you have like larger latency you will feel it because now I need to write and the the maximum I can write at a time is the MTU which is the maximum transmission unit defined by the IP layer like layer three and that's basically is defined based on on path Discovery MTU right the the the the the the the protocol that discovers what is the weakest link in this network that I'm transferring this is okay what is the minimum I can send or maximum in this case I can set 1500 is the default on the internet you can send lower there are some routers like a support like 500 like very old routers by 1500 is the standard there are routers with jumbo frames that suppose like nine thousand and stuff like that and they are even large but that's the kind of standard so you can only send 15 bytes at a time in the IP and that even it's even lower the TCP segment layer right I know I'm going into very details but this is the point in all my videos I always go into details right I don't make like one minutes videos and deal with it I I like to explore this thing because I I am passionate about this stuff so I apologize if you don't like this content but I really do like to go into these details it couldn't brush everything like I understand everything remove the black boxes so now if you have a lot of data even if you do compress it which is an additional CPU database overhead then you still have to transmit it and the the conjunction algorithm protocols and slow start will allow you to send multiple of these 1500 or 1460 segments without waiting for an acknowledgment right and initially you're going to send one I suppose 10 recently 10 segments at a time like Google increased the initial window size for the congestion window to 10 I suppose because like there is no point the intern is really fast these days so let me send 10 at a time so 10 times 15 that's like what that's 15K this is 15K right so you can say 15K at a time if you want right at a stocked right and then but but not all Network supports that right so it always starts slow and then it creates increase increases slowly right incremental increases so now as you go through this you're sending that but eventually the client have to start acknowledging this it has to process this and it only the client only acknowledged something that actually was able to receive right so if there is a data loss guess what the client didn't receive it the server timeout will hit and then we'll retransmit it so all of this you'll feel it then more Network you have latency the more you will kind of feel it that's why it's like I always feel like applications if you want to really test applications like when it comes to Performance like just just put put the database and the back end in a very really you know uh far apart I know it's a bad idea right to do that in production but but it will it will force you to optimize because you're gonna you you're gonna send as small as possible and you're gonna receive as small as possible data you're gonna say okay ah why am I sending this much let me send less right so it's like putting yourself in slow mode such that you feel the pain uh off of these kind of things right so returning all columns require decentralization right of of these large columns such as strengths blob the client will never use that's the sad part all of this you're encoding all the high cost of transmission for nothing nothing at all and then we finally talk about the client deserialization where now that I actually the client received all this raw TCP segments from the neck the neck actually transferred it to the OS through dma and then that becomes just robots and the sigma and the OS now just say okay what's for this listener where is this going this is IP this IP this destination all right it is this socket so it will start transmitting your data to The Client app which is the the basically if you have no node.js that's the node.js app and specifically the library that you use for the database uh uh client right so if it's a postgres client that will be that that Library will be called and you're at the mercy of how this library is authored how is it parsing the protocol how is it how is it understanding this stuff that's why lazy parsing is also a thing and a client side where uh let's actually wait for the client to consume some of the stuff such that you can avoid the head of client deserialization and client building all these objects fetch has the same thing by the way right the fish API fetch API when you call it it will it will it will give you the headers only it will give you the response but it'll give you the just the byte the raw body right you cannot actually do anything with the body until you call Body Dot Json function so go ahead and actually took take the head to actually this realizer and the reason they do fish does it this way is because uh you might only lead the headers for some reason or you might only need the response code right the status code or content lag so this is the way such that the app the client will consume what it needs and when it actually is the content to the body it will do Json or dot text to actually move it from binary down to the deserialization down to the data structure in this case it's Json because compared to Json is expensive right now I'm assuming it's the response is Json in this case right and uh so so the decentralization is costly so if you can do it lazily that's even better but how about that if you can avoid sending data that you never use that's even better for the back end isn't it right so yeah guys that's that's all what I have for you today right so we talked about Hawaii select star is really slow really really really slow I might have missed a reason or not let me know in the those comments down below hope you enjoyed this video I'm gonna see you on the next one you guys stay awesome goodbye
Info
Channel: Hussein Nasser
Views: 26,190
Rating: undefined out of 5
Keywords: hussein nasser, backend engineering, database engineering, select * from, postgres database, mysql, sql server
Id: wybjsKtA9hI
Channel Id: undefined
Length: 39min 24sec (2364 seconds)
Published: Mon May 01 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.