PostgresOpen 2019 Data Modeling Normalization And Denormalization

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so we're going to talk about data modeling normalization and de normalization and well let's get started so the most of the content of this presentation is extracted from my book I will tell you more about it I've been using PostgreSQL since last century actually also so I'm contributing to part of the contribution team to Postgres and working at situs data and we got acquired by Microsoft so if you didn't know site is data is it's Microsoft nowadays I guess you heard about Microsoft we are your altura one of the projects that are keeping me busy these days here's a pidgeotto failover the idea is to implement the simplest possible way to have just what it says automated failover like business continuity so adjust you have two servers are usual with trimming replication and actually our tooling with will do the wool setup for you so if you don't know how to do it just use it it like five minutes you have it running that's it if you know what you're doing maybe you don't need it and fine with that so one of the other projects that I've been spending a lot of time on is a PT loader and it's meant to load data to PostgreSQL and you can actually do migrations like for example from a my sequel to PostgreSQL in a single command line so it's like one common emigration you just say the data used to be on my sequel but now I got smart about it so I wanted to use PostgreSQL instead and it will discover the schema all the tables reproduce the schema do the data type casting for you and then fill in the date the tables with the data that you have on my sequel and then recreate all the index in parallel and let's try to draw so it's you the goal of this project is that if you use if you're not using PostgreSQL you don't have any excuse anymore just you know one common line sorry sequel server my sequel sequel light and some other stuff yes yeah okay so I told you I wrote a book and I have a special deal for the conference so downstairs I came up with a some of the paperback versions so if you go to the website of the same name the art of PostgreSQL that come you will see that there is four different packages so today if you buy the full edition you get upgraded to the dead three Edition for free because I have the dead tree with me today I only have like 12 of them so if you're interested let me know just after the talk basically let's get into that data modelling so that that quote is from not some programming in C and so it's for developers it's from back in the 70s and I think it is both still relevant today and still and applies also to our field with the databases so because it's written in full I will read it if you've chosen the right data structures and organized things well the algorithms will almost always be self-evident data structures not algorithms are central to programming and so when you do database modeling all you do is data structures really and so the idea is that if you don't if you're not careful about the data structures in your database everything is going to be complex in your life every little thing is going to be hard and then developers will get to us database people and say a sequel is so bad there is nothing I can do it's always complex and yeah yeah yeah the design is crucial to having a powerful sequels story to tell around the degrees the more things that we are thinking about in terms of modeling are of course the data types PostgreSQL comes with a very rich suite of data types that are that embed not only the representation of the values but also the behavior so if you do integral plus integer the answer is an integer but if you do date plus interval that it knows that how to apply for example an interval of one month attached to a part late so the month is maybe 28 maybe 30 maybe 31 maybe 29 days but PostgreSQL we know about that so all the stuff around the operators etc in progress QL it's dynamic it's not our coded so please consider using the proper datatype for what you're doing it will make your life so much simple we are used to using constrained primary keys for in case we will get a little more about primary keys I'm I think it might be a surprise for some of you guys even if you think you know what it was the deal is all about and I wanted to show some of the things that maybe some people don't know so this thing is buggy it doesn't work so maybe you know why maybe you don't so just have a look figure it out and we'll get back to the example later okay but just know that that's a bug okay you don't do that in production an example that most people don't know about PostgreSQL is that you can have a unique index with a we're close we call that a partial index and the partial index can be unique too so this example is like you know for example prediction toggles you have a like new features that you're deploying to prediction in the code but you don't want them active yet so this model is to say that you can enable or disable the feature it's like it looks like this in the UI and you're going to track the unicity when it's enabled so if if it's disabled if disabled is new then it needs to be unique okay so that is it trick to do with Postgres but I've run to some application developers who tried to do that in the application it's very complex because of concurrency of course but with that it's a salt for you like very easily another thing is we say that buys constrained so most people in particular I've been told that the US is the land of the free right nobody wants to have constraints you want to be free you want to do whatever you want to so when you talk about constraints before IKEA that's boring I don't want to implement constraints you know free data I don't know whatever so another way to say it is that it's about its business guarantees actually so you're going to ask PostgreSQL to guarantee things that stable across your data set whatever the transactional velocity or whatever you do with the application you're going to have some quarantine so this one for example who knows how to read that exclude using gist etcetera does it ring the bell to anybody here some of you but most of you know so it's not very well known but it's a exclusion constraint exactly so basically it's like a primary key but what you express is what you refuse so you exclude rows if they have the same currency because this reads equal but overlapping validity because this thing here reads overlaps so you refuse rows that has the same currency and an overlapping validity so good luck with implementing that in the application it's easy enough and unless you have two inserts at the same time that are not respecting that but the application is only good to only going to see one at a time right but you have two of them happening at the same time and PostgreSQL is going to be happy lee pick one and refuse the other so you can have pretty complex guarentees implemented as database constraints your exclusion constraint and why do we do constraints or guarantees the the goal that we are what we trying to implement with that is actually a protection against anomalies so the rule discussion around normalization of the devices and normal forms we're going to get to normal forms you know in a sec the world goes around it is to avoid anomalies okay so what our database animal is you have three of them basically update anomaly so this schema is wrong so you have the employee ID r is address and some skills that you can assign to the employee but you can see that for each skill you have the address again so when you update your look this guy is living at two different places at the same time so that doesn't look good it's called an update anomaly and it's because of the schema if the schema is right this cannot happen but this schema is wrong so it may happen and if it may happen it will right Murphy's Law everybody knows that yeah insertion and O'Malley here it's like teachers in a faculty and UI or them and they have courses but the only way you can enter them in the system is when they have a course assigned so what if you recruit someone but it doesn't have a course assigned yet because maybe you recruit that guy over the summer and of course only start in September something like that I don't know whatever it is but so that's insertion anomaly it's again because the schema is not right it's not proper schema it's obvious when you see it but then there is the deletion anomaly of course so insert update delete that's the anomalies and Ear if the guy is removed from the course well then you lose every information you have about the guy so maybe you're allowed to have to keep the record but unassigned just the course to the to some people so that's very basic but that's the program we are going to solve with normalization and before we get to normalization proper it's another quote from from the 70s all of that is based on work done in the 70s so I'm going to read it aloud again it's from the Fred Brooks and he said show me your flowcharts it's from the 70s right so show me your flowcharts and conceal your tables and I shall continue to be mystified show me your tables and I won't usually need your flowcharts they'll be abuse ok so what it doesn't say here it does that only applies if you did a proper job of normalizing your dataset most of the case as a consultant when I was reading through the scheme of my customers work would be like what is your bonus all about I don't understand anything but if you do the schema right you look at it I'm like ok it's obvious everything is obvious so when you have an application schema you're working on if the business that you're implemented is not obvious from looking at the schema well the schema is maybe wrong maybe the business is very complex maybe the design is not so good oh yeah that's so that's a question I have also often so what kind of tooling would you use to model a database so in my case I use piece equal you know the command line client for Postgres with the following trick so you do begin and you do you create your schema your tables you do whatever you want to and the trick is PostgreSQL I think is it's unique in the market know their database I think as transactional details so you if you use the Oracle for example not to name names but too late if you do create table or maybe drop index or something they implement what I call an implicit commit so then you do all by the way roll back and it will be like what roll back there was no transaction in progress which is the most scary error message you can read in a log there is no transaction in progress yeah review your logs and try if you see that that's scary and I've seen that a lot so at customers yeah and the trick is so you begin you do whatever you want to and then you roll back and what I usually do is in between the insert and the rollback I would add I would try to implement like maybe two maybe five maybe in ten different use cases for the business like queries that everybody is going to want to run maybe financial report maybe activity report maybe just the basic inserts and update and you know like six or maybe a dozen cases and if it feels right writing the sequel query then you commit rather than rollback but of course the first time you do it it's you're going to miss things so you do that and then you write the queries and they feel like you know hard to write it's not working very well but then you run them and there is a rollback you don't care and you can do that as many times as you need so you can be wrong as many time as you want to PostgreSQL will not blame you you just roll back at the end and start again the schema didn't exist yeah exactly yeah so the business case I I do it because it's still usually when I do that it's in the design stage if it's like a rollout for something that is already in production and yes the comment is on spot yeah you can do implement test cases in that script and and so you keep rolled by you keep doing rod backs and roll backs and then you add new tables and then you add new constraints and things and when you're happy about it at the end you you you make commits and that works okay so that's that's my trick that's my tooling for modeling we are I think I'm going to be a little time constraint so I'm not sure I will have the time to say exactly what is behind that so feel free to come to me after the talk but basically people who are using aura ramps sometimes they are using it to design also the schema because you make an object in your programming language like Python Java something and then you tell it a by the way the subject should be a very stable okay that's wrong never do that the are in object relational mapping the R is relation the relation in sequel it's the result of a query one of the queries is table so if you do table foo the result is the content of the food table yeah that's nice it's not very interesting but it yeah it exists and your hands are basically stuck there any query you run is defining a new relation so they should map on the result of the query rather than at the base tables and when they map on the base tables and everything is complex on they're telling you or M is such an are problem to solve yeah don't solve it yeah so anyway that's my so so this talk is not meant to help you work on your database schema if you're doing it with an order I'm sorry about that so we see we saw that database constraints someone who wants about it we saw the database anomalies we want to avoid anomalies and we said we want to have a proper design so what is the proper design you know can you express is it possible to express like a a quick set of rules that if you follow them you know that your design is good well I might be that you asked because that's what they did in the 70s it was 70s was uniques invention and also the debate is inventions so you get some of the principal and phylosophy there is it was the same world at the time so those are the UNIX principles and basically in modern terms you say don't repeat yourself it's too dry principle so apply to database to database normalization it looks like this so who knows about the normal forms who's applied them like exactly like what I mean yes in a database schema let's get to that yeah let's get to that so the first normal form it's easy it's it looks easy but I think a few guys are going to be very surprised in a couple slides it looks so easy but most of the schemas I've seen that you don't follow that okay the first normal form is there is no duplicated rows in the table and that's tricky each cell is single-valued so that's easy to do but many people are using CSV in a text field so don't do that but that's obvious right and entries in the colon are of the same kind but PostgreSQL is going to provide that for free for you you declare datatype and that's it of course if you're doing eav entity attribute value then you you will have a text column that will contain integers and dates and something else so don't do that it's not first normal form the second one so every rule is built on top of the previous one right so if you want to be compliant with the second normal form you first need to be compliant with the first normal form okay and on top of it you are you don't have partial dependencies which means a non key attribute is dependent an unreal part of the composite key basically it means your primary key is broken okay so if you have a broken primary key it's not second normal form because it's easy to have what looks like a primary key but actually you have some extra data and if you do that then you have update anomalies in certain amenities at star like we saw before that's a way to write the rule that we saw if you don't respect it leads to anomalies there is more there is the third normal form and the boyce-codd normal form so usually when I design a schema I would try to reach the bcnf first and then think about it because there is more to that but so I won't take the time to enter into the details of it it's well covered in the literature so the goal of my talk here today is that you remember about it and you know where to look up you have the terms and you know what to do with that I won't go into the details of what it means doing that in practicals it's it's pretty easy to find lots of examples and most of them are controversial anyway so good luck with that but that's not the talk too late but bcnf is not the end of it then you can have fourth normal form no multivalued dependencies then file then the fifth one projection draw in normal form okay so it's four and F plus something and then you have DK and half like the top level okay so now the real question is which form you did you target so I have a good example for that but it's an example of like so I hope it's good for you also but let's say that you are designing a database where you need to the address of customers physical address so physical address if you think about it the the name of the cities are going to be found like every state in the US will have the same names of cities in each city you will have streets with names same names in every city okay and in every cities you will have streets with numbers it's always the same number so if you want to do it right you don't have duplicates so a street name exists only once and then you have a relation mapping table that says in which cities you have which streets okay and then for this mapping you can have maybe the street numbers for each and every building so each time there is a restriction or the spit and I was into you need to update your table right then you have the zip codes which are so in front zip codes are meant to is Postal Service's it's organized around the number of people around it it's like a very center so it's number of people around it and when the population moves the zip cups moves it's it's meant to optimize distribution of mail but that's all there is but people keep using that and it's something that changes over time so that's the little complex so if you want to do a proper design like normalized for addresses you will need a dozen tables easily right so would you do that so if all you're going to do with the address is put it on a label in an envelope the text field is going to be all you need right maybe you're going to run some statistics like the marketing department is going to say I want to have a distribution pair state things like that per district okay so maybe you want to field okay good but say you're a cable company or electricity company and what you use the address for is actually scheduling interventions in the people's house maybe to either repair or install cables like physical cave electrical cables or maybe data cables and things like that then maybe you want to have the exact all the details that you can possibly have about the address and maybe it doesn't stop at the building number but maybe you want for each floor in the building each apartment in the building with a submitted data about them so that when the technicial guy shows up and does the thing he has all the information needed to actually access whatever he needs to access okay so the normalization it depends if are you talking about the core of your business or just you know some information that you need to gather to put for example in an envelope and you're done with it so for me nomer ization is very good when it's the core of your business because it will prevent you to have data bugs in prediction anomalies and things like that but maybe you don't need it for everything that you put in a database so that's where you need to to know what you're doing and that's where as they leave as people we need to understand the business we are trying to solve otherwise you don't know what you're doing in equations so far okay I'll continue then just interrupt me if you have more questions so when you do this design one of the tools that you're going to use are constraints or warranties so let's get back to this bug anybody see the bug so in PostgreSQL big serial is actually going to create a sequence for you and then attach it to the table so it's the same if you say create sequence or just use big circle yeah maybe we could do that there we go so when you have this schema what we have here is a sewer gate key it looks like a primary key but it's not it's a substitute for a natural key right because if you look at it the article the fact that is that it would have an ID or not it doesn't change the nature of the article nobody cares about the idea of the article it's like it's a property you're heading to the tour schema to make it easy to track but it's not part of the definition of an article right so it's not natural key and what does it mean is that you can perfectly insert into that table does two values and you can see that same publication date same category and same title and yet you're saying I have a primary key they don't have the same idea yeah for all practical purposes you have duplicate information in the database okay so would you agree that this is a bug yeah okay so how do you fix it well it's not compliant with the first normal form because you have duplicates so how do you make it so that you don't have duplicates you use a natural primary key so the natural primary key maybe would be category publication date and title then you can have a discussion with the journalists are you allowed to reuse the title from the past in a new publication because history is a cycle I guess yes so I made it that way but maybe they don't want you to do that so you can have a chat about that to understand the business better but know that we have this primary key we're good with actually preventing duplicate so that's better but what's next if you have if it's an online journal with comments for example well then the common table needs to refer to the article you're commenting to and reference the foreign key reference is going to need to have the three columns otherwise there is no way to reference a single article because the primary key that we have is a composite key over the three columns so now every table that is going to reference the comments tape the article comments are going to need to have three extra columns and if you have like billions of articles maybe that's not very good to have all these extra data everywhere around your schema so can you fix that here is oh so another way to spell a big cereal and the sequence is to say generated always as identity so that that's nice you do that and then you say here is my primary key and by the way D the ID is unique so that you can actually use it in foreign keys so you can perfectly have a foreign key that will target the surrogate key D which is the ID that is generated from a sequence that works but that's nice it's not new it's unique and you can use it for foreign keys and so you get you can still use your surrogate key which is very practical for for that and protect your data with a natural primary key and then we solve the bug questions so far yes if so the question is adding the ID to primary key but then if you have two lines with exactly the same content except for the ID the system will be happy with that and it's so duplicate so you're back to square one other questions no I don't see any okay so that was the primary key everybody thinks that I know how it works and we see that it's not that easy but we have many other constraints and PostgreSQL will for example have foreign keys not nil check constraints domains exclusion constraint and even confront triggers if you need to go that way so again I will show you some other examples the the goal of this talk is not to dive in every of those things it's more like to remind you that it exists and for a good reason so that's what you use to normalize your schema and provide guarantees that things are going to look exactly like the business things it should look like okay I'll remember about that thing with the date range for the validity that's pretty useful so that was data modeling and normalization and in the title we also said we would chat about denormalization so the first thing to say is don't do it right I've seen so many schemas were like it's not really normalized what's happening there and the guys are like well yeah if we needed to optimize it so we did a little denormalization when people say that usually what it means is they have no idea what normal form is or ought to apply it to a database schema but they're happy to tell you they are smart enough to optimize it anyway yeah and sometimes surprise you have bugs in prediction because of that sometimes so the the thing you should do is first you normalize your schema up to a level that you're comfortable with the the context of course and then you try things and if you have a prediction problem usually it's performance oriented because of the normalization then you can think about optimizing it away with denormalizing maybes bits of it but if it's never run in production and you don't know you have a problem so it's called premature optimization right everybody knows the quote from Knuth well maybe not so the the everybody is able to quote that premature optimization is the root of all evil okay it's not the quote actually when he wrote it it when you wrote doesn't mean that at all so maybe I should read at least a part of it maybe all of it programmers waste enormous amounts of time thinking about or worrying about the speed of non-critical parts of their programs and these atoms at efficiency actually have a strong negative impact when debugging and maintenance are considered it's exactly the same thing with a database schema if you think you're going to be smart about it and optimize it a little that every single sequel query you're going to write against the schema is gonna be is going to be more complex than it should be slower and are not maintained let's continue with the code you should forget about small efficiency three small efficiencies say about 97% of the time premature optimization is the root of all evil yet we should not pass up our opportunities in that critical 3% it's a little different that what people usually say what did this quote so the rule trick is 97% of the time just don't do it like you don't have any key an exclusion as ever it was not in production yet there is no data in the table you don't have a program so just normalize it but does what about the 3% or do you recognize them well right it's called experience so you've been wrong before so you know it it smells you're right now I did this thing now I know I need to do something about it before it gets bad in production but if it's never been a problem for you in production you don't have this feeling for the three persons so you need to be wrong sometimes and you can fix it and so how do you fix it when you've been wrong and know it it's a problem but the first trick I want to show you is a cache it's the same in every context right when it's too slow is because you have to compute it again and again whatever you're trying to do and then rather than maybe computing computing it again when people ask for it maybe you can implement a cache for example if you have a dashboard that when people have seen that in production at customers when they log in the dashboard is going to run like maybe I don't know it was six hundred queries just to run the dashboard but it was real query against like months and years of data to have a nice you know nice-looking graph on the web and on the dashboard when you log in okay so the dashboard is getting slower and slower and slower of course because there is more and more data to crash into the dashboard and nobody would actually use the dashboard by the way they would wait for it and then immediately click somewhere because they didn't want to see that and it's so slow nobody wants to look at it so one thing you can do is implement a cache like you pre compute the dashboard and maybe your dated nightly maybe if you have one year of data in your dashboard a nightly update is going to cut it right maybe but as soon as you implement cash you need to remember the three hardest problems in computing like naming things and cache invalidation oh and off by ones yeah so cache invalidation you should implement that so let's see what does that mean in terms of PostgreSQL so here is a query that's an example from a Formula One races so it's people driving cars getting as fast as possible and maybe winning the the thing and then you have points and so you can have dashboard nice dashboard with a cumulative points across the season maybe so you you have all the races from the current season and see that's a piece equal script by the way so you can actually set a variable to a value and then use it later in the query right in P sequel if you didn't know how to do that that that's pretty useful and so with that query you're going to have your dashboard for the for the drivers and here there is a grouping set with the surname and concept or name so you have both the points per driver and per constructor in the same query did you know you could do that by the way yeah have a look at it it's pretty awesome and you can limit to this sum of points I did that because otherwise the result wouldn't fit in the slide but maybe you have other business rules my business rules for slides are pretty simple fits in the slide that's good so either you're running that query each time someone's logs in but maybe you want to do Romana is like cash it so what I do for caching is a crater view so as you know with PostgreSQL if you use a view it's going to still compute the data each time you use the view but I prefer to do it that way first I create the view and then I materialize it so that the materialization command is always select star from the view because if you do that and then you're like oh but what's in what what's new that I don't have in my cache well you select star from the cache accept select star from the view and you have the diff here is what happened since the last time you updated the cache yeah pretty easy so that's the way I usually do it so create multi materialized view it means it's going to be a view with the data attached to it and the day the data is going to be the data set that you add at the time you entered the create materialized view command and that's it frozen for time you can even index that because there is a physical data set attached to the concept of the view because it's materialized ok so that's your cache and maybe every night you're going to refresh it or in the dashboard for races for Formula one each time there is a race at the end of the race you need to update the cache right so other you update the cache in Postgres you do refresh materialized view and that's it so maybe after each race you run that command and then you have a dashboard with a cached result that is very fast because you don't need to run through the woolly story of the game each time you just pick whatever is current questions about that oh it's okay so that's the that's the first denormalization technique it's a very powerful one it's very nice to use it's very easy to put in place there is one catch in PostgreSQL when you do that you need to integrate the cash in your application it won't happen magically it's not because if you run the same query as in the view PostgreSQL doesn't care it's going to run the query that's it some systems if you run the same query as in a materialized view it's going to pick the materialized view for you automatically PostgreSQL doesn't do that because PostgreSQL tries to always have a correct answer and there is no technical way to know that the cache is updated enough for your taste to be correct so you do it manually in the application okay questions or so other dahmor ization techniques and another thing that that you might want to implement where it's basically impossible to be normalized is audit trails because when you change the schema of the application and it will happen the only way that your schema is not changing is because nobody is using your application it's dead okay if it's alive it's going to change because life is changing always and it's going to change in ways that you didn't you know think about that at first and it's it's normal it's it's life okay so your tables are going to have new definitions over time which means that if you have an audit trail of everything that happened well either you have the audit trail unmodified or you have a difference of schema in between the audit trail and the prediction because the schemas are going to change so the technique I've been using is storing that in a JSON be it used to be H store but they Santino it is easier it's integrated so you can create an archive schema with a table or even create a ennum table and even type with insert ibly delete because that's basically what's going to happen to your data right and you can have all their versions of any row like from this table at this date we did this action and that that's the data attached to the action so if it's delete that's the data that was just before the delete if it's an update before the update or things like that you can even store before and after you update depends on your needs for an audit trail and when you do that because it's Jason B you can change your schema on prediction without having to impact the archive right so if a column it is dropped from a table you still have the values of the column in your kite that's pretty useful questions about that trick yes so that so yeah good question so the data are here you put its Jason B so the system doesn't know what it is really that's what it's schema left Jason B is the schema s so it doesn't know what it is so you need to define it it's cool because it's very flexible you can do whatever you want to but also it's not specified by the schema it's specified in your code instead yes yes it's very easy to write a trigger it's not included in the slides but it would fit in a slide yeah with the included functions in passwords one trick is let's say you have a table named foo you can select proof for who is also the data type of the table so you can select foo from who and the result is a row type okay and then you can select foo cast to Jason B and that's it you have a Jason B that is the same as the table content that's as easy as that another denormalization trick that I like is validity periods we had a glimpse of it before so it's the thing where you exclude using gist's with equality and overlapping which means that for a given currently you can only have one right that is what that is that exists in a given validity date range so for example here I've just finished an example and in tequila question here for example the the right is your the currency is euro and that that's the date we are interested in to for the value so maybe it's the current value that you want so that would be current einkorn date and maybe you're interested into something in the past because you're reviewing last year financial auditing situation something like that and so you can have the rate of the you currency at the time of the invoicing for example and so you can update your currencies and still have the woolie story of what was the rate at the time of the invoice and then you can use that this operator is you is pronounced contained so validity contains the date and you can even use that in a join question it's built in I can't remember I think it's 95 where we added earlier ok ok so you have it so you should have it already and it's covered in the dogs so yeah I have a look at the PostgreSQL dogs Yankees is very good there is a couple tricks like to be able to do that you need to create bit resist extension but it's covered in the documentation the other Ellsbury's datatypes I'm going to be really fast now because we are going soon to run out of time so another the normalization trick that you can use is use composite types or remember the first normal form you're not supposed to have composite types in your database but sometimes it makes your life easier or arrays or Jason B all of those are helpers to denormalize a schema if you don't know what they are just you know the slides are going to be published so you just look at the slide and look in the documentation everything is covered in PostgreSQL documentation apart from this guy hll so it's an external website but it's easy to find it it's a power log log very quickly another case of de normalization that people like to use is partitioning I'd like to remind that it's it's a it's a kind of de normalization okay up to PostgreSQL ten included it was not very good the support in PostgreSQL was not you know there yet PostgreSQL eleven is making it very nice and Postgres twelve is going to improve that a lot so have a look at it it's going to be covered in in the blog post and everywhere what's new in pd12 about that but just remember you can actually use it nowadays but it's a kind of denormalization so it you it's not always easy to a foreign keys pointing to a partitioning tree and things like that so you need you don't have the same kind of constraints guarantees that you would have without it so it's the normalization and i'm not sure how much content i have but so with PostgreSQL you can implement also not only sequel oh yes that's useful thank you and so that's kim allies with Postgres did you know you could do that with the json so nice there was a so for those for those of you guys who were not in the jason become friends earlier today you can actually the contains operator that works for validity died run geez it also works for looking into a json and that would be a template and you say i want all the cars that are matching this template so that's pretty powerful yeah you can do pretty you can do things and this is a quick examples that because the normalization it's the schema but sometimes people do that for a performance reasons and one of the main performance thing in PostgreSQL that might be a problem for you is durability because it takes resources to implement durability but you can you can turn it off what people don't realize usually is that you can turn a synchronous commit to different values in each different transaction in the same PostgreSQL system in some other systems either you are synchronous or you are not in PostgreSQL each and every transaction has that property so in the same system at the same time you can have some things that happen to be of I value to you so you are remote apply it means it needs to be replicated and visible under standby or maybe off which means like if we crush we lose it no problem maybe it's a session HTTP session or something like that and all that all of that can coexist in the same database at the same time and you can use alter user to set like profiles to different connection strings so that it's easy to manage from the application you can even do that in a trigger like if we spend more than 1000 and yes I might be to wait for the standby otherwise no you know I can use it not too bad and I'm working at situs we do horizontal scaling and I don't have time to explain to you but the red line is horizontal and it's the query time that's what it means horizontal scaling and the other thing is that the size of the data set if you're interested all of that comes from the book so have a look at it the art of PostgreSQL that come and do we have time for questions ok so the slide is wrong sorry one question yes okay so the question is what if you want to do normalize because we don't want to have a join in the query right yes okay that's premature optimization you have no idea if the join is going to be a problem or not at this time so just live with the join join is not a problem with PostgreSQL we have many different ways to implement joints in PostgreSQL optimization layers so usually Postgres code is pretty good at normalized schemas usually yeah there are many things you can do with optimizing queries before doing the normalization so I would advise that there is other steps before you the normalization is pretty advanced on critical there is it's very hard to get back to a normalized schema after you daemon denormalized so I only do it where when it's like you don't have anything else to do that would be my advice thank you very much [Applause]

Info

Channel: Postgres Open

Views: 1,492

Rating: 4.891892 out of 5

Keywords:

Id: 7Ea14-RGehE

Channel Id: undefined

Length: 46min 45sec (2805 seconds)

Published: Thu Sep 19 2019