How does PostgreSQL actually work? By Postgres Global Development Core-Team Member

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
without further ado I'd like to introduce our esteemed speaker of this evening he has been a member a major contributor Postgres since 1999 by a Cheyenne sir who's even been working in this IT sector since that long she's a pretty healthy number he's been a developer consultant for second quadrant as well as a min member of the Postgres global development core team please welcome Peter Eisen child Wow thank you nice turnout next time together come earlier today so yes knowledge baby thank you for coming out boom ooh here's Postgres who's already okay okay that's by half okay cool so yes Erik introduce me I'm Peter I I live in Pennsylvania on the dusty eastern side just across the country so I have some good nature about here and I you know want to talk about it how how does post-production work now you know maybe those of you who didn't come tonight don't want to know but I I think it's it's useful to look a little bit into the internals because if you you know if you're probably a DBA or developer or someone who works with Postgres and is somehow responsible for or Postgres database perhaps and you know if you have issues or these don't go the way you want them to go sometimes useful to know how how things actually work a little bit under under the surface just to get some context for you problems and helps you understand better as opposed to just you know it doesn't work that's what I want so I'm gonna talk about my employer a little bit because they're sponsoring tonight's so I work for a company called second quadrant we are basically a huntress development and services shop we we've been in business for more than ten years and we have a local company where we have more than probably would need quite a minute so we have you know quite a bit more than 50 staff in 25 countries probably now they're just hired a couple people yes and we also the production support 24/7 so whenever you have a problem even in the middle of the night we have someone maybe on bail so but you lose the way it can help you in need of consulting and we need to contribute to your post post at all in this my one of my main jobs to actually write questions code so that's what we do we have a conference coming up so if you want to more after tonight it's just--that's here in New York and we have a second chapter in Chicago just just over a month so if you want just look at take this it's done and it's going to be the New York one it's just gonna it's gonna get the New Yorker Hotel just couple blocks in their cities really easily better so that's going to be one day of training on the 6th and the 7th just one day of questions hopefully we'll have more chairs passable so my simple whimsical subtitle of this it's referred to this come back again I guess so right ebrary things happen they hid the disk so volume good here with this thing just kicked on let me know so here's a query and if she can't see in the bags it's just it doesn't matter what this yeah this is the kind of thing you do all time you have this one's a little bit store sample database reverses so type that didn't run it and when I find it to 15 milliseconds well stop out there so the other day post rest has now almost 950 thousand lines of code so not quite a value but will probably get them next year so what what just happened so you normally it started with using PSP ah let's see the commands itself at the core should really narrow it down this is pls it's huge actually but it's really it takes a it takes princeton prompt and type something game your present item and then it checks if it's not a vanity thing that it just passes it on to this pqx send it off it doesn't know the client doesn't know so what happens next PS girl commander I know my sister psql these are the libraries at PSO this will different obviously with a different system but it's more like the idea so the first library discos later this nifty cutest posters my great neck so that is what it used communicated initial model stuff get thanks library a full message translation refine my reefa little prompting and editing and she use another system like you spell it basically just have offices in libraries and communicate question so the next step in the chain is look at you sometimes people ask why isn't flippy q-and-a fortunes so that's a pony ancient thing it's in the 4sp I believe his name posters posters had a Korean language from post-coital perform SQL was accepted as a mainstream center village so first well is where the peak you came from so there's a lot of really old interfaces in Postgres them somehow PQ but that's one of the information so that's like you is like most about PQ connect you excellent so curiously the turn starts of that's always so what little Q does is is an extract network network extraction little basically you have these kinds of commands connected database from the backside to run a query and what the PQ basically does is turn us into soldier operation socket some error checking it takes a little you did up here or passage for Matt Santos a large waves until the stuff comes back and then it sort of copies that it takes a little space and formats a little bit back again look you doesn't know about your database or what tables there or all it doesn't taste something from the colony application sends it off the to the network gets whatever back to the Johnson and I investigated the really interesting thing that the Duke University is a protocol so postcodes uses a custom protocol to communicate between clients and servers and the reason for that is again ancient basically posters have been around since the 1980s and one about this protocol is just invented back then and we've just been extending it and ever since so you know nowadays but that's what we have so this is kind of how that looks on the left I kind of Illustrated clients and servers and these are kind of how these messages are constructed the first letter the first bytes in the messages all the metalwork and explaining the messages in the first example the Cubist's hurry so you have a message that sends for each of the background and then there's a number of attention that shows how long the message is so then you just have a string that's it so it's xq the length byte a length word I guess and then the string that happens a lot of stuff happens and stuff comes back and the first thing that comes back is T message t is a tuple descriptor yes it describes the format the format of the result rows tell us how many you wrote many columns are there what are the types and means and then the data comes back those those D rows the rows one comes back many rows and just keep setting these zeros and then the job of making q basically is to take the T message and interpret it and remember it and then use that information decode the dealers all right so it knows that okay there's so many columns that are coming right now and then I know how to take part with Eros and copy all that stuff in the local memory put into the c structs and that is what if it gets returned out all right so that's basically the Java would do too and then once the result set is done you get the Z message ready for the next query then you can send another cute message Judy one another query or feed on you just want to create the client supposed to Senator Menendez at that CX and maybe if you've seen that if your client is dies you get a a message in the server log it says client end of client connect unexpected in a slide connection that's like this and that they do just means that the client is sent this text message to exchange that's not really a big problem it just kind of tells you that if I died unexpectedly there's something wrong in your client so if you wanna learn more about careful there's protocol first of all of document there's other it's not are committed because they have that kind also trade secret what stress documentation is a simple way formation of a protocol and if you want to play around with a little bits every so often in my job is so sort of works consultants don't work I get a TCP dump or more extreme cases water shark and then just look around the nice thing brother why are charges that actually knows the post Chris protocol so if you select one of these many options it has it can do decode protocol and then it basically shows you that kind of stuff fly by and then you can kind of see okay I run my query that's oh it's that way than maybe for some reason comes back only a few things doesn't happen all that often but it does happen if people have funny Network configurations and things like that those are good tools to kind of handy so then the next thing that happens is once the server receives the person it's just a big string I'd select this and this and that's one side it has a partner parcel and that this is spam if you have ever horse neighing maybe but compile theory it's a standard lexical analysis and parsing with the likes of my son those are the sort of automation of lexan yeah that used for parsing that's not what you do that every user likes tool and then it's always flex wasn't by so much parser generator and again it's kind of hand waving capacity here because it just it doesn't you don't need to do this is standard stuff the grammar description which is pretty complicated SQL is a huge language compared to any other programming languages usually only has a few things as important as grammar there's a function call this expression is in to find something that's it SPL is baby be all these all the statements they all have slightly different like there's no common sense so every statement has its own chemistry a great table drop table all the table already and all these things have to be in this grammar file and select update us like this it's huge really you just pass it to by someone and it was often by some if you use a very old tub of math it just works so you feed in tokens and now that the other end comes across the parse trees basically in memory representation of a bunch of seats looks like the top level you have very complement of this because you put in multiple cities could be select something semicolon select something else tech turbulence range select statement and select a house on these things in Party of those from Cosworth laws provide any sort something maybe maybe few other things the party list has a bunch of entries and the front claws and out phase of the join expression and enjoin expression has tables this is not terribly and so then you have this parse tree memory and the partially there is something that kinda goes it so if you type nonsense at this point the possible fit this doesn't have to do but at this point you don't know this party makes sense it tables in this so that's nice if you want out what about this and this is kind of interesting I got started this might really need to know anything about databases to take a look this is all to the standard standard tools and these source files are kind of self-contained so if you give me the documentation of Iceland like simply the documentation and these four files and you will see how so after that we go into what's called a personality this is kind of the remaining part of the partial is not directly and this is where all the are ten hands so it goes through this parse tree it just checks everything does anything here make sense right it checks all these tables even there are these problems there it does this column belong to that table do all these function does exist it's tired this is allowed to put you know aggregates your window functions there relative to group by that you put in and all that stuff what's up everything consistent catalogs and just there's a whole liar so if you per if you get if you get an error from you created inside like syntax power this is not it probably then if the is all that busy parts treater that's been slightly modified chat and then so here's a bit of an example product or abstract of that little bit but if we have this clause and all careers where customers not HD nation so first it would look what is customer at AG to mean that could even mean a bunch of different things and then it looks up the age how long has to be true and first we look up leave each type need to good idea boy any sort of the posters of travel I didn't turns out be 21 that we have to do have an operator keep it between itself is not operated right between me it's just care that'll be for less than equals its out it looks at each operator its code back because the name of the procedure that is an operator just connotation and it looks at the beach prom without a teacher and s source code and then I see what posters is all this is kind of user space to mean you can extend it you can as you probably know if you use posters before you to build extensions and that means you can all this is extensible questions and if this happens during Marcin else's when it goes through our streets what what else so once you're not with a parcel you go to something called be driver and this is sort of an old academic partner posters in the way it was one of the some research projects in the late eighties that the nowadays who would do this trailers but back then this was research interviews rewrite rules we could based on these rules rewrite it for a to something else this turned out to be quite buggy and it's not really use anymore but what we do use this force to expand use so if you have a view it's kind of a macro expansion in the way it just goes through the parse tree basically says no here it says table five but that's actually the view so we're gonna take the new definition and just plug that in and then you have to fix up all these references a little complicated than general but that's roughly speaking that's what is born out of days they also does things like it plucks in default sort of command you don't specify all the columns if it falls off less than and so I just kind of face and then gets into leaders so this is really the like why do you even use a relational database one reasons is probably you in doing like all this transaction stuff like the acid protection blocks and commit them and what make sure is that kind of stuff that's the one that yelling is that you just type an abstract description of the query and it actually figures out how to do this is sort of in the the ladies 1980s 1990s they watch this sort of idea of fourth generation this was sort of the idea of the first generation with machine code and the second generation was assembly language and third generations will you basically describe this kind of and the fourth generation what's that basically I'm just going to tell you what result I want to do figure it out and do it and this is really knew how databases database management systems their money you just say I want this when that table filtered this way and then the flagger figure Saturday okay how might you do that how what do I can first what do I do next and how do I make it fast that's what the flowers it's kind of the terminology is sometimes mixed up sometimes called a plan or sometimes for the optimizer the source directory is called the optimizer that I think you follow so that's one mine that I've described really the meat of the code like it takes the parse tree that we just figured out and better than make sure that makes sense and then does it's magic thinking and then how did the other comes the executable and so for this bird looks like this it's basically a plan like how are we going to do this I got a career attended aggregation and adjoin there's like that and in this case okay there's many options probably could do this but here I decided at the top it's low-key and underneath the sword I'm gonna do an aggregation and there's different ways of aggravation some of this nature Ashley that's equal to that good like we had two joints over this case it shows to do ash and I want to do the hatchet along as two table that's do toward one it's just a plain sequential skin then the other one is the ash part of the action so it actually does the sequential skin that most actually came out of that memory don't talk about how this is pretty standard stuff two different entities different joint is exotics it's not something that Coast residents so how does it do that we don't have time my words tonight there's other talks and literature about that but in basic need takes two kinds of inputs one is statistics from your tables that's what vacuuming analyze quite so so they kind of know like how many how many rows are in these tables how big on these tables one of the most common bellies in these tables if that's one kind of information and then the other type of information is cost parameters which supposedly model the hardware like how fast is it to go to this versus go into memory so it knows it's kind of caching behavior that's the hardware our fastest and to do sequential scan versus a random scan and this could be different depending what kind of hard way it up frankly that most of these numbers are kind of bogus but so it's not really like big models to harbor it just you kind of tweak those numbers like totally because it's really hard to model Hardware really well like maybe in the really old days when they came up with this you know computers were simpler but now there's so much cash and going on an additional memory and even the disk now it is not really this anymore right they're just cashing layers of it somewhere else right basically so it is really hard to model it correctly but it's at least that's a general idea so in principle put for a complex grade like this there's a lot of possible blacks right he could do sorts you put in the animation and in different enjoins they need to choose it if you choose the merge-join here it's already sorted so you don't have to be the sort we have to figure out that she's the arch going and save the sorting later but how much more expensive submerged and how much memory you have available and then you convince you mean it like that but we won't memory inspector this is all that's happening right so and again then I could be able to talk old one day training session about how this works this is really where the other points problem this kind of before you start Posca source code you know has useful to read they're really quite something the State Department source code default everything possibly them it's a there's a lot of good information there they're not just kind of city directory kind of things so so do look those up if you're kind of interested in diving into this more so to put this together a little bit and how we are so far so the normal way is just in the middle from running selects similar tastes like parse it first and you need to be writing maybe and you have the planner and then we'll use an execution power button X and then you should back but you can run some of these other commands and they kind of interact with this so if you use prepared statements we start with a parsing but then when you have the plan you stop there you save the plan usually the plan is that saved in that memory and gets like a handle back to your or whatever program they gave the IET us men come back neighbor and pass that handle to executed so you this this is what explain for those of you explain the plan we have a plan tree if you run explain analyze it will go on to the next execution but instead of actually producing it real results people instrument the execution and collect some data through the execution ship that way so that's a little bit more complicated way but that's basically goes so execution is back the executor reading on there and it basically just goes through the plan that has been produce and executes it from top to bottom and these are a function that actually was like that there's a big switch or if statement somewhere in there says I think what plan do is go out like first plan it was what I call tech support yeah question was the sort of the thought of it actually you're right but actually starts at top in the first function tulips calls the next time except making this kind of ghost example pool system the way there's a give me one and of course the sword can't actually but the basic idea is that there's sort of a driving any wonder that these functions are basically responsible for producing that tuple somehow and then you end up a lot of know that excellency Spanish is been actually going to go to the disk and fetch one two and in this case I'm going to do the cold sleep spend then the exit - built the hash table and there's some ash going to psychological the hash table and then the other sleep runs and how they some of these things can run a parallel way and the ash donation so it's not super efficient and that's actually going on implementation going on that have changed this model just this idea and we want to bowl and it could be the next to a bone then it's cool those two easy steps and those might not be society people theoretically it's not very good in terms of CPU efficiency so so you know maybe a few years down this might look quite different in that there's some experiments now the justifying compilation of LVM and that kind of stuff that's sort of ongoing ideas now but this is how is the market the last 20 years or so okay so it basically just go through this execution plan that has been produced and each note it has a special function that actually viva la and so then the lowly executors we have any quality access methods that's actually where you do this it's actually not quite where you do this access but it's where you actually been through the sequential scans and where you do the index scans and so our gates get sequential scan that's fairly simple people slow it I keep open it scan and then some cases because he's in there filtering in a certain way and then you have this fall to leave that next step basic give me one - will it mean the next - I kind of do that when it starts the sequential scan opens and for every tuple we call anyone anyone not a super-efficient batch processing let's have the way to support right now so this is all understanding accents each index method you have some directory there this is that what a smarts or relation how does it each reaction or pencil is the executor doesn't really know how the indexes work it's just you know it's been told like you this index hey open the index it will go ahead and then the index go does is your executor can just drive that through the plant so here there's quite a few non poster instead of reaching the hatch just in space which is this type of specialty you space it so they all have their own implementation of what I was eating that's actually supposed to do autonomy and the API has had the same to open the next thanks to so that way out of that and if you really want to go to eat the Dakota this is have to be like others Ichiko next and then this how to see this type of specialist and there's there's oh this in the data directory it's kind of nice to restructure base and number which is the base ID in the next table ID manager said apply them and each files is divided up into these 8k blocks so it did every time it has to read a troop election with a fashion fire log we can't just take one to board it has to be bought in any case which it pulls in to share buffers then it kind of goes endemic decodes or instituted it's sorted it's very specific internal wait Kade blocks are constructed to think all the tuples in there and when you update a to blow as well to a mission in turtle 50 things going on there but basically if you have the sequential scan just give me one block tico-tico decode send all the tuples back give me an X block some cash management going on so you release that market so the next guy can look at that so just you know each scan just basically takes the boxing over if it is an indexed skin these blocks have different roles they would actually case of a be treated no it's not quite that simple in the case of a hash index different locks earlier we talked about doing this with the insurer they did a quiz there was the question probably big kind of table heat this kind has to do with that that basically saying how large and I don't know but it's basically what is the size of the block number times the process because it has to address this somehow right what the reason block numbers have to exist is that indexes point to the heat so the e-license have links to the blocks of the feet so they have to have a certain size of the block and then the penny make applause let's meet the aka couldn't change like this it's partition we do that question yeah yeah so speaking of partitioning or sharding yeah like how does this algorithm differ if you're talking about data that's distributed in different partitions or sharps how does it sort with partitioning is the question basically the way partitioning is important now these entirely tables so if you use the partitioning system that is no question send you get the time you did why why why those entirely different tables on this just in a higher layer they kind of stitch together so you're not affected by that and this kind of a whole lot of problems come out of that which is why you can't currently can't have indexes that go across the partition table because then you would still need a different way of addressing blocks we're thinking about different ways to do that but it's kind of hard from how you address the table in that case partitioning is not what not transmitting doesn't alliterative is you need to write the code separately this question is this partition transferring the new partitioning post which 10 is more like what you might be expired deporting to like if you if you're using the currently released posters 9.6 or earlier you cannot distinction together yourself by the posters tenders but Aminu partitioning this so if you want to learn about this kind of storage and that's kind of useful to know if you have any kind of production issues it doesn't happen but it could be boxing posters storage is fragile memory is sometimes fragile mistakes happen so it is kind of useful from time to time to do or rush in storage and there's a couple of tools use one is the tool called piece of file down which is an external tool and legacy time which basically you do it pointing to it blocking each other's innovation of this kind of happens when two walks a little written by the junk system this is what each one of them have been contributing spec is a similar is a module it can tell you listen to page and interpretations that can also tell you though if it's a it's a b-tree page what is in the meat uhm he thinks of various things are in there hash index page buckets whatnot it's an advocate can decode that information a little bit so this can go quite deep in terms of spit button but this is all sort of good tools to have handy so this is you know along with you know the first thing to push this out to reading plans with for each evening and things like that but I think this might be sore the next thing in terms of being able to handle corruption and that kind of stuff so that's kind of the Riveter and then there's sort of a thing at the side and we are we just had a select greater so this doesn't actually happen but if you have an update for example so instead of keep you know technics and do sequential scan your colon and first we get the to hole and then it would you apply to change the memory and then talk about it which actually just writes back to this but before that to happen we have to do right at one we had to recover get some cash right this is the principle right ed arguing with journaling I said sometimes called another system so this is kind of how this happened these are again some function calls that if you update the what actually does its thing called Milwaukee birthday which composes a description of the change so that fight encoded description of what it's doing and then absolute altitude iPhone user and everywhere in the cohort you may change and the change is whether this this happens there's strengthens yesterday there's dozens and dozens dozens of exponents or calls all over places everywhere the change is being written and updated rip moon copy command all these places said where any time to write to disk you have to do an echo to write your hunt and the right data looks kind of doing father so yeah that's kind of that's kind of how this works this is how our various deposits kind of starts this way we have psql which takes you in foot passes a lot so difficuit let's let the protocol so it gets ended with a protocol it's a party going on execute our own execute I have all these site functionality those like sorting asking where the types are access methods they know how to write to the eat for your eyes to the right and all this replication stuff also happens log if you write a log to the distance that contains the description of your changes you can just take that description of the changes and be placed somewhere else so that's all that replication side and then below that it's something that's not often talked about is something called a Storage Manager and there's probably only one or two manager and at some point in the 1980s they probably thought that should be different there's always implement office or maybe another store where does it really fit what is the level of the correct level of furniture okay so again if you want to try a few things so the best thing is to study a sorceress hacker as you don't quite have time for that explain explain Amazon's a super useful tools for any kind of performance problems as for each time you have a performance problem you get any kind of help viewed on the mailing list or an IRC or professional sometimes they always been asked first with that's what it's a story I understand that the network type but each part of them and you all dumb to understand many file issues education and then studying extensions this useful if you will just want to know how the externals work a little bit so there are extensions that define new data types or new functions you just take the hallways and try to understand how they work they kind of smaller than the whole first resourceful guy that they kind of find a lot of these things just grab an extension and see how that works and because these things for together so right so we have changed Zn is also New York Post person New York City PostgreSQL user group even more that to meet up it's also April gonna be a big conference in April City come to us so a loss of course cause content is still around the city missouri so this check those out almost eight o'clock I guess we can take one or two questions if there are yeah a barnacle yes the oldest mostly and it looked had a few good points and we're trying to yes sir but was also wrong in this puncher point so that was not that it's not that if you somewhat yeah that's a problem if you if you have if you update the truly an update of the index I want to talk to some optimizations that could be fun a lot of the other things that article that were correct sir there are some follow-ups and rebuttals to that Savitri desk but we talk about some bits of the process involved like code rich in the 1918 so how does the project manage technical things I was a project me to take the left code with an eighties yeah that's that's not good all my feel you just have to we actually try to clean this out but we try to run on offer existence to its with nine mainstream operating systems so you have to make the incompatibility and that's that's a mental struggles like developing consensus like what operating system to what can be abandoned again better lately but yeah you gotta be really aggressive and there's like no glory really of deleting old code there should be more visito and sending up codes and yeah that's not hard that's a struggle yeah but you gotta kind of question there must must be plication is it possible - Jew bastard - replication boss because yes it's possible not than the built-in stuff but there's an exceptional VTR just about that my colleagues in the tech department it does work just be careful with the use cases it's not really a che it's more distributed like geographically distributed things so anyone wants to know more about the details I got a few business cards here get in touch with me if you want to know but vdr replication stuff about prosperous conclude you Eric do you want to say my words here or other good more questions okay I don't know what the time constraints this yeah then we'll take all the questions that's interesting any particular features feminine positive interesting yes I think 10 was kind of a milestone in the way that we have now a lot of groundwork for more things to come like parallelism there's a lot more stuff that could be parallel partitioning could be a lot better in terms of your global index it's not a stock kind of a lot more like that logical implication that I mentioned that could be improved in all kinds of ways including integrating multi-master functionality that's going to external so there might not be like brand new super novel feature in the next year or two but certainly they're expanding on what's coming I recently put in so I think I've sold my expectation paramore hi kids and the bowl to put your cards in is right where that guy yeah if you could just pass it on through one more time because once we're done with QA we're giving that away so last chance to put your business card in there if you don't have a business card just big one on a piece of paper and let's put it in that bowl yeah and with regards to distributed systems how does post press out of the box handle conflict resolution blisters by itself is not a distributed system so it doesn't do that there are you know the VTR management which will do multi master replication has you can configure what to do to the conflicts and there's a couple of options that you know my time standby origin and you can pampers there's also something of course Chris Excel which is silicon I will describe that it's a bunch of notes it appears one logical note so I guess they're not really big conflicts in that case because everything the pesto logo transaction awareness so there's a couple different sorts of different things this post Chris excel web scale what do you think Mason Sawyer did he just leave my stuff saw a moment ago we should've asked him now bus bus excel is is quite cool it's not so mainstream but it it does it's really cool right you just give it more nodes and it runs by that amount faster if you can distributed it on that way but I guess it's mostly political processing it's not really necessary what kinds actually but you could do okay more questions to get students to them where first goes to and to or from watching tonight I don't know that is interesting because it's basically a database I don't know anything I have not heard of no use should we do the last one yes it's to be it's pretty confusing in a presentation I don't understand in the time I believe in this system MPP databases are due and you know everything which is happening who really cares about the old playing national database but even even if people do why to expose all of this home written in C style and eighties or nineties what why why do you choose that thing it's like the same thing and somebody from IBM will come and show us the code for see us that they developed in 69 and look at this but you know how is it relevant to contact queries that is a good question yes there is so I can ask one place I don't really know but people do certain way to do the other way is you can think about you know enumerate you know compared to the system X how is relational database of advantage and you know I like things like strong typing data integrity [Music] perhaps ease of operation strong transaction guarantees not just you know it is a transaction transaction like full I've actually realized the building for example and that's actual integrity here applications like that so there are a lot of things going for it and in a way you know businesses it's been meddled in our so people seem to agree but I know this is an ongoing discussion and right Oh somebody okay thank you thank you all for coming here is your for 101 questions feel free to come up now and ask
Info
Channel: Database Month: SQL NYC, NoSQL & NewSQL Data Group
Views: 15,631
Rating: 4.6949153 out of 5
Keywords: PostgreSQL, Postgres, SQL, Peter Eisentraut, Core-Team, NewSQL, libpq, WAL, 2ndQuadrant, PGSQL
Id: zkEhX0yOL3k
Channel Id: undefined
Length: 63min 16sec (3796 seconds)
Published: Mon Oct 09 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.