PostgresOpen 2019 The Art Of PostgreSQL

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everybody so time to start apparently so let's go so we're going to talk about the art of PostgreSQL the idea of this presentation is that it's mostly or oriented for application developers so who are you guys actually writing code for an application and using PostgreSQL yeah some of you well the other don't be too sad about it you will have some good content too so I've been a contributor for PostgreSQL for a long time now like last century working at situs data and we got acquired by Microsoft so nowadays it's a easier database for PostgreSQL hyperscale sytrus something like that one of the projects I'm working on currently is named PG Auto failover and the idea is just exactly as the name says in PostgreSQL we tend to like boring names right so you read the name and you know what it is about usually and so it's business continuity like it's automating the failover so that that's all about it so it's on github it's a completely open source you can open issues you can send bug fixes if you want to even new features if you fancy that so go have a look it's a it's a project we did to simplify setting up ha4 postgresql another project I've been working on a lot the past years is a migration project for going from something else to PostgreSQL the idea is that you don't have an excuse anymore to be using for example my sequel just use PostgreSQL instead it's much better but it's not always easy to implement that and so with PG loader in a single command line you you give it the source connection string and the target connection string the target should be PostgreSQL and then it's going to read all the catalogs from the source database decide what our table the columns the attributes the types do the type mapping for you and then load the data and then create the indexes in parallel that's right stirrer so it's one common line and then your database is running on progress cool now so no excuse just do it it reports my sequel sequel light sequel server and some other input kinds another project I've been working on is this book the art of PostgreSQL we have some copies left maybe the last one or something added at the booth so if you want if you're interested so show up there later and the slides that we're going to go through are mostly extracted from the book okay so it's kind of the same content so let's get started now so the the first thing is that for me is important as an application developer is why are you using PostgreSQL so often when you ask that question I used to be a consultant before and when you get around this question lots of them are they don't really know why it's just that you know it's in the stack it's been deployed already they joined the project and there you have to use it some other they're like oh I know why it's because it's solving this problem that is quite hard to solve for your application and we are using PostgreSQL to do that but often enough I heard that PostgreSQL is used to solve storage which is surprisingly surprising as an answer because it's so wrong it's like it's not true like storage in the sixties it's it was easy because at the time with the computer we had if you would unplug it from the from the power circuit then the anything that was in memory would stay exactly the way it was and you could reply get like a couple weeks later and it would be exactly like it was to two weeks before and in the 70s we switch to other technology where it was not true anymore but being able to serialize something that you had in memory to disk has never been such a problem in computing science it's easy to do everybody knows that you do it you don't need PostgreSQL to do that if you're a Java shop you just serialize your object in XML and read them back and that's it so if storage is the problem you you use for example on the cloud blob storage from Azure or maybe s3 for me from AWS or something else so that's storage so PostgreSQL is not about storage PostgreSQL is about concurrency and isolation the idea is that what happens when you have more than one person trying to do the same thing like to update concurrently and the image is obviously the difference between theory and practice so in theory is the same but not in practice so the the main thing around concurrency in iteration within the context of database like relational database management systems are DBMS is that we say that we provide acid guarantees so I guess everybody knows what as it is it looks a little boring yeah but that's let's let's see a little more about that atomic basically it means that if you have many things to do in the same transaction and something goes wrong you can roll back and if you did two inserts and one update and then you roll back and everything is consoled you don't have like one of the inserts went through but not the other one and though no your database doesn't make sense anymore so that that's pretty cool but usually we don't type in rollback well sometimes we do when we are testing interactively but in the application have you ever implemented a transaction that will do a rollback in your application maybe not what happens is maybe you did a file system is full I know it's 2019 but we still have that program in production sometimes so system is full what's next well with an atomic system the transaction is rolled back and never happened that's it so you're safe we're prosperous world is something that almost no other system is able to do is it supports transactions 4dd else so if you have a application script to be great or move from one version to the next version of the schema you add some columns a new table maybe a new index something like that well then what happens if file system is full in the middle of the script if you're not using PostgreSQL then you add version 1 in production the script was to go from version one to version two and it failed in the middle so now you have a version that nobody ever so know where no developer ever so that version that is now in production if you don't have transactions for the deals with PostgreSQL file system is full yeah okay rollback you still on version one don't deploy the app yet that's it simple done so that's one of the reason why you would use PostgreSQL of course the C of a CD it means consistency consistent means there are some business rules that you know about and that you can share with PostgreSQL you can explain to PostgreSQL here is what's important for me to keep in mind for the rule that I set that we are going to manage and you can have caused gradual implement as guarantees for you so the first step for the consistency is the schema and the data types so here it's a very simple table with two columns but anything that goes in to those columns here for example the ID is an integer so I like sometimes to say bad things about other products if you use my sequel and you have an integer columns and you insert into it the string banana then it will happily take it and it if you select from it it's going to say zero but no errors whatsoever it's happy to work with that so maybe it's not consistent I don't know with PostgreSQL we don't do that so if you try to insert to insert banana into integer column PostgreSQL will tell you I don't know what that is what doesn't fit your model please be careful and then we have constraints like check not nil foreign key primary key relations and we'll get back to that relations are the central concept of sequel basically and some people think it's because we have foreign keys but it's it's not true a relation is just a mathematical concept where you have a set of elements that all share the same properties it's called attribute domains in the in the relational jargon but basically it means that it all looks same like the same here it's an integer on the text anything that is in the table foo is going to have an integer on the text that's it all of them are the same that's that's what is a relation so consistency is pretty important then the I for a seed is oscillation it's the other side of atomicity it's a little more complex to understand sometimes but isolation means that while you are doing your queries are you allowed to see what what is happening concurrently in the rest of the system so if you want to take a consistent backup for example you need to make it so that even if PG dump is going to run for several hours because you have terabytes of data it needs to be a consistent snapshot of the production right so if during the backup someone else is doing insert and update and something else you don't want to do those to be in the backup because you want something consistent you want a snapshot that doesn't move you don't want to see everything that's new so PT dump will typically use an isolation mode where you don't see the changes from the other transactions you can also do that in your application and maybe it could be the default repeatable read or even sir as a serializable it's different but repeatable rate might be what you expect from the database but it's not the default the default is read committed so maybe you want to look into that but anyway every transaction in Postgres can have a different isolation level PT dump will be serializable while the rest of the system is repeatable read or read committed depending by default again read committed so that's the ezel Asian so you see that that that's very important that's that's very hard to implement at the application level and so maybe that's why you're using PostgreSQL actually and then of course it's durable so do you know the little test to do with the power socket plug so basically you write a little client application that will only do inserts for example and you count how many times you got the commit message back from Buzz grass when / that when you say commit maybe the answer is going to be roll back okay because there was a problem cause Louisville was not in a position where it could actually implement the commit like fight system is full is the easiest example to have in mind so you say commit maybe it's rollback so you count how many times when you said commit it was committed actually and then while the test is running you unplug the poor circuit from the server in the middle of the test and then you plug again and you count what you have on the server and what you have on the client it's not the same there is a bug somewhere it's not durable durability means that anything that has been known to be committed by the client should be still there when you do that if it's not maybe the hardware is faulty maybe they are the BIOS configuration or maybe the kernel OS configuration is wrong maybe you did EV sync off in PostgreSQL or maybe you're not using PostgreSQL and then yeah don't do that so that's like that's the the basics around why would you use PostgreSQL so to recap because you have transactions and transactions is a short way to say you are compliant with a seed but be careful because some some systems are naming themselves databases nowadays there's no sequel systems in particular where as a developer if you think about them as the devices you might be in trouble because they are not as acid compliant all of the no sequel systems that you will find are going to implement some trade-offs the only that is obvious is that they they are not implementing sequel it's no sequel okay but also they don't implement acid usually take MongoDB for example it's a scheme Alice that's a feature it means do you not have consistency so you lose the C of a seed it doesn't have transactions so you don't have the a nor the eye of a seed no atomicity no isolation remains the D of AC durability it used to not implement that apparently they fix it nowadays but for a long time you wouldn't have the G of a seed so maybe it's fine to you anyway in your application because it fits your use case but as a developer if you think of the data of a database or something that is a seed compliant because that's all we are taught about databases usually and the system you use is actually not a set compliant it means that all of those guarantees that you don't have either you don't need them let's go or if you need them you need to implement them yourself so that for me the main kicker of using PostgreSQL is that you get every everything for free and it just works and it's reliable and you can just you know care about the application and other good reasons to use it are written there and we're going to see about them so it's we're going to see about why I say it's object-oriented we have extension in PostgreSQL we're going to see a couple examples rich data types that you can do actually data processing in sequel and we're going to see what I mean with that and etc any questions so far maybe not okay so who likes writing sequel and think cycle sequel is nice because we're going to see some of it the rest of the talk is basically sequel queries in the slides so let's go for it and so as the developer that's the main thing you do with was gross class right because you know it's acid so you don't have to care about that so what you need to care about is implementing your business in the application so you do sequel here is a very simple use case that I found because the data was open data at the time but it's not easy to download anymore the links exchanged and there is a subscription and I need to change the example later but I didn't do that yet so when you don't know the data you have a CSV file that looks like this with the year the date the number of shares exchanged and the stock exchange that day the number of trades that exams does share and the amount in dollars that represent and what you can see here is that I'm picking text as the datatype for those columns where maybe obviously it's numeric you know it's it's money it's but it's with a comma separate there is a comma to separate the thousands there is a dollar sign so if you just say to progress quell AIDS numbers it will be not happy with that then we use copy on the client side you know with so you know the backslash copy it's because it runs in a sequel and so the fire the source file is going to be required on the client side of the connection so you can actually use that over a connection to a server you don't need to upload the file to the server before you do that so that's why it's a back slash copy rather than and copy and copy is actually not just a command in Postgres it's also a protocol it's a streaming protocol so if you have a huge set of data it's going to be streamed so that's pretty fast pretty efficient and also if you write your application code in Python or drea or something as soon as you have more than a couple lines to insert into the database in the same table you should use copy okay it's it's way faster even for like the threshold I usually have in mind is like ten lines if you're uploading ten lines of data into the same table use copy and it's available from the it's a protocol that is implemented your driver already so just use copy from the driver but then we don't want to use text we want to use integers and numeric because it's it's money here you use numeric right you knew that right you don't use float for money otherwise you lose money so don't do it and then the thing that I want you to show you here is who knew that you can do in a single sequel command that is going to make a single table rewrite you can actually alter three different columns in the same statement and use text processing functions to transform the data and make it suitable for the new data type we knew you could do that yeah so another way to do it would have been for me to use copy from program and you could run a like a set or hook or python filter to filter the data out and make it look like what PostgreSQL is going to expect but if I were if I hadn't done that then I couldn't have shown you that you can do it in a single statement the three columns at the time altar table okay something that many developers have been when I was consultant or even though when I do meetups and things like that that many application developers they don't like sequel because it's black box and they're like yeah but we don't even know the algorithm that is going to be used right we write a query and what is going to happen in the server nobody knows so we prefer to write the code our own way so here is an example on top of the data we saw already it's the the list of the 10 days in the data set where you have the most dollars exchanged that day in the new york stock exchange okay so it's a top 10 it's easy right so you run through the data and at the end of it you sort the data and at the end of it you just I would put the 10 the top 10 lines when you have to write that in so the example is in Python here when you want to write that in Python maybe if you have like a couple of millions lines in the database maybe you don't want to have all that in memory at the same time because all you want at the end is the top 10 so the number of lines you want in memory at any time is 10 not to millions not to billion so to do that in Python you use each queue and there is an API like you divide the list of 10 and then you keep pushpop and it will if it's greater than what you have already it replaces the country and if not it's a it's not included that's it and at the end of it so this data structure is going to be worth of 10 lines for the world processing of your script so that's easy to control and it's nice okay how do you do that in in Postgres well order by limit 10 that's it maybe it's simpler I don't know I don't know what you think that if I had to write and maintain these or these and I think I would do the sequel version but then the question I have is how do you know that this guy here is going to actually be using that the same thinking we did before only ten lines in memory at all time rather than using all the memory for all the rows and if you have one billion rows it's a problem so how do you figure it out in Postgres well you use explain the my favorite version variant of explain is using analyze variables and buffers options because that's where you have the most information you need to be careful though because analyze with will actually run the query so if you have for example an update that is slow because of a foreign key that doesn't have an index or things like that it it happens and you want to explain the update you can you kind of explain an update maybe you want to begin explain analyze the update and then roll back so that they didn't happen when you you know work around it so explain analyze is running the query and what we what do we see here sort method top and each sort using this amount of memory I think we covered its PostgreSQL implements exactly the same algorithm as what we did manually in Python before so rather than writing all the code that is doing exactly that but client side which means you need to fetch two million lines before you do anything and then you solve them out maybe you can sort them out and send only ten lines to the client and it's way faster and way more efficient and it's only four lines of sequel yes question yes okay so I repeat the question for the the make some other systems like sequel server are going to give you ins about maybe where you should look at because the query is not as fast as you expected it to be PostgreSQL itself will not do that but we have several open source tooling around it like website called explain that depeche comm or a pev which is a visual explained tool that will do that for you so you will have a big red one of those lines will be in red splashy and that's where you should focus you have actually all the information you need if you know how to read it but it takes some practice so using something visual to get there is is a good use of your time other questions okay I'll continue that usually when you have this kind of data and you're working in an application in a team one of the first thing people are going to ask you either marketing department or maybe the finance department or maybe a contents I don't know some of your colleagues are going to be interested in to some kind of monthly reports so how do you write a monthly report in sequel it's pretty easy here I'm showing off several things like I'm using P sequel in that example and the set command is setting a value to a variable in PC cool and then I can use the variable value here so whatever the month is I can change that and just run the same query again and again and I can also pass the value of that at the comment line of the script at the P sequel command line rather than in the file itself so just so you know and then we're using two char to do some pretty formatting and then oh that's something you should pay attention to we don't use between for the date do you know why I could have said between start and end but I don't do that because between is going to be inclusive for both the start and the end which means that your monthly report if it here I say the last day is the first day plus one month and PostgreSQL we'll figure that out remember I told you it's Obrecht object-oriented plus if it's two integers the result is an integer or maybe a big int if it's a date and an interval then it's something completely different but it's still a plus that you're using it's like operator overloading in C++ except that it's very simple to use and it just works and so interval one month because it attached to a date then PostgreSQL will look up into the calendar and decide how many days that months well of course the example I've picked is a February so it will have to decide how many days you had that here in February but PostgreSQL can do that for you and then you if you use between and this kind of expression then you might count March the first twice once in your February report and once a new marks report so that's a bug and then you have the a contents running to you and trying to explain the data with you because they will never have the same result as you have and you're like I don't understand your problem and you're okay that's a bug ear so you use greater than or equal and less than strict and you don't have that bug so that's a small trick maybe you knew about it already but I've run through many developer many developers enough that I would you know take some time on it so here is the result without surprise okay it's some data we don't we don't know the numbers so we don't focus on them but there is a bug can you spot the bag it's hmm missing days exactly Congrats so we we have 28 days that year in February but only 19 only 19 rows output maybe marketing department or accounting I don't know won't be very happy with that maybe they want to have the days without activity so let's fix it it's easy you run the same query right and then you do a lookup in the calendar in your application here it's in Python but it could be anything else and if the day is in the if the day of the calendar as some data associated with it then you add to that data but otherwise you output zeros okay and here is the result fixed well is it really so where is that code going to be used actually so maybe it's only used in one place and okay it's fixed good enough but maybe you have an application with different kind of people using it like front-end people back office people maybe you have a little thing starting in the finance department that was you know a quick PHP PHP script where your back-end used to be in Java I know it's we writin in Jungle or think you know real life happens so so maybe it's not that easy to say to everybody I just use the calendar lookup because mostly what happens is that the calendar default implementation in Python is going to be surprisingly different from the one in Java maybe because of the way they owned all the default timezone and maybe you have some other bugs that I don't know it can get pretty complex but the thing that is common for all of those is they're they're talking to our beloved for SQL database so maybe you want to solve the days with no activity in sequel so we think they know how to implement that in sequel like you know just a gut feeling ok so let's see there is one thing in PostgreSQL that is very useful it's called the generate series and as I said before we like boring names in PostgreSQL so can you guess what generates arrays is going to do it's going to generate a series and it's going to begin with the first date and then it's going to go up to the next date but it's inclusive like between is so we say we go up to the to one month plus one month minutes one day so that we don't step over to the next month because that would be a problem and we do die one day at a time we step one day it could be one hour or it could be thirteen hours if that's your business requirement but here it's one day at a time okay that's easy so we generate a series of dates that's our calendar and then we do a left join for each day of the calendar we're going to look if we have data or not in the table that's easy that's a exactly the lookup that was implementing in Python before it's exactly the same thing but it's done with PostgreSQL on the database side so we generate some values here and we do a left join and here is the result fixed in sequel easy right and then the next question arise yes question yeah I use queries good question so the question is about how how come you have 0 0 where does that come from because you didn't exist in the data set so we invented zeroes we didn't have any data left join is going to provide you with new values and koalas is going to say to take the first argument of the function here that is not new yes so yeah so the question is about using koalas or is new or nvl or some other spellings of it I encourage you to look at Postgres documentation to have more details about it but your koalas is going to return the first argument that is not new and you can have as many as you want yeah then you use a case expression or if neil expression or something else it's going to be too long for this talk so let's have a chat about it later and and meanwhile have a look at the documentation it's pretty well covered it's something that in some communities when you tell people yeah go look at the documentation maybe it's a little rude to do that you know the RTFM situation yeah in our case we take pride in the PostgreSQL documentation that it's being very well maintained so if you do a patch for PostgreSQL that is technically very good like something awesome but not documented it's going to be rejected ok any patch that goes to for SQL needs to maintain the documentation that goes with it in the same patch so we are very proud of our documentation system so when we say please read the docs it's because we've been spending time writing it the good way we think and we would be happy that it's useful so in the Postgres community it's polite so don't take it the wrong way and so here is the result and it's fixed and they're usually the next step for marketing is going to be okay I want like we can make evolution as a percentage in the same result set so that is a copy-paste from the result because it's easier to explain it that way and I've tried to change some colors there is a color code so basically you want from this Wednesday I think it's a Wednesday it's a Monday so the Wednesday is here you want to know that it was 10% less dollars exchange that previous Wednesday okay you understand the result set just a percentage of difference with the previous day of the of the the previous weeks and a week before so who knows how to implement that in sequel okay do you like the others I think you don't know sequel I'm sorry we will get back to that it's a strong statement I know so to implement it we use a city year and on the left maybe you recognize it it's exactly the same query as before right the generate series with the left join and there is a little different let's get back to that later so we compute that and we use computed data here as a front close for the next part of the query and we compute the difference of dollars between the doors from last week and the door hours from this week but how do we get those dollars from last week is because we use window functions and window functions allows us in one query so the the projection operator or the select select Clause is a projection operator it works one row at a time right and here we want to see the values from another row so we are going to say it's a peer row with friend so we have many friends now and with window functions we can go and see the other values from our friends rose and whose are going to be our friends here well it depends on the clothes you put in the over clothes so partition by and order by partition by its you can think about it a little like goodbye so it's going to say our friends are going to be the other rows who have the same value as us for the partition bike we're close so we partitioned by extract either though from date either though know what it means do W its day of Wieck and of course in some countries Sunday is one the first some other countries it's Monday and then in some countries we begin at zero that at one so Sunday is the room and there is one and well it's crazy so of course there is an ISO standard about it so we use the ISO definition for the day of week and so anything that is a Wednesday it's going to be three and anything that is three is going to to be rows that we are allowed to see from the current one we're going to see to be able to see any row that has the same day of week as the current row and we are going to see them in an order that we specify by date and we're going to lag of one so we're going to see to be able to see the one that is one week before us question where is the other two sharra yeah because the two-tire with the D Y is going to be like Wednesday or maybe using the locally that you have on your computer etcetera and this is a number so maybe it's easier to reason about numbers in that close and also it's those glories are four slides and showing of things so and so that's window function so who's been using window functions before nice it was invented in a 92 like you know around utf-8 and maybe ipv6 like things that everybody uses everyday no maybe not but if you so that's equal from 92 so if you are not comfortable with it and it's 2019 maybe it's time to learn sequel again so the the copy/paste from the result of the query is what we saw before and the sequel standard the current version is 2016 and any new version of the standard is duplicating every previous one so there is only one current version of the sequel standard so 92 is an interesting for historic purposes if you do historic research that's very interesting if you do prediction the disease or application development you don't care about that the only one that you care about currently is 2016 so how to think in a sequel because what I'm trying to convey here is that sequel is powerful and maybe you should be doing more sequel and less application code or maybe you should consider that part of your application code is actually going to be written in sequel and the goal of this tolk is so that we realize what it's possible to do in sequel and then there is the question of is it advisable that you would do it as much of the implementation in sequel as I'm going to show you but that the answer to that is more complex and requires more understanding of the company dynamics and the skill levels of the developers and things like that so I'm not going to answer that I'm just going to try and show you how much so far you can go in sequel so what do you think in sequel sequel is a structured query language it's the narrative which means that in most programming languages it's imperative or object-oriented but it's still imperative what you do is when you have a problem you think are about how to reach to a solution and then you write the code that is implementing what you think is going to be the solution and usually when there is a bug is because your thinking was wrong and what you thought would be the solution is actually not the solution okay declarative programming language is different you explain to the computer the result that you want so the problem you have not so much the solution so when you do sequel queries you need to be able to think in terms of what is the problem I want to solve and you need to explain to PostgreSQL the result set you are interested in to not how to build it but what it is you need to define the result set that's what you do in sequel so that's why I think for sequel beginners it's a little weird and strange and not very easy to do but as everything the more you practiced and the easier it gets so another useful tip I think for thinking in sequel is unique says everything is a file so if you want to you know print something on the printer you just cut your file to the printer device and some pages are getting out if you want to talk to the network connect to a network you just open the circuit file and then you you know write to a file and it's actually a socket and it goes through the network everything is a file java says everything is an object and actually the first chapter is a some other data types and the second chapter is objects oh no no no integers are not object but but everything is an object in Python you have many thing to think about and tools to work with and in sequel you have relations so what about it what is it a sequel relation and it's basically a collection of data that all share the same query attribute domains which means a given number of columns and each column is of known data type and every row for this column will have the same data type implemented in it that's that's a relation and any sequel that you are going to write is going to define a new relation based on order relation so you are when you're writing sequel you're composing relations to build another one and so the firm closed for example introduces relations which is why you could use a sub-query in a front clause because the query is defining a relation and of course you can select from any relation including the sub query right so thinking in sequel means understanding relations and not to play with them and compose them and to compose them you can use joins or set operators like Union accept and intersect and in terms of join we have inner and outer draw in cross draw and natural join it's all some of them are since syntax differences some of them are semantically very different we even have lateral joins who's good with lateral joins again that's something quite new in sequel but it's it's part of the toolset that you have so if you don't know what it is maybe sometimes you will need it and rather than writing a very efficient query that then you can just you know use the result of you're going to have to implement it in your application code and retrieve so many data when you need it at me that much so here is an example with a lateral job and I try to put in a bold that the keywords that are related to the relation to the relation concepts so with is a city common table expression it in it produces a new relation that you can reuse later in the query select is the projection operator oh by the way this query is based on a formula Y formula one database with all the races and the drivers and the results and it's going to give you the top three Pilar pilots in number of points that they accumulated in a season but by decade it's let's see the result so in the 50s it was fun geo so if someone told you that you're driving like a Fangio it's a common French thing to say some people take it as a compliment because he he was number one but it is also the guy who broke the most number of cars to get there so I don't know and maybe you can see we had French guy there first and then Schumacher of course and extra so for each decade the 50 is the 60s 70s etc you have the top three drivers and that's where you use a lateral join because you can have a sub-query and you can inject things into the sub-query thanks to it being on the lateral side of a join and then so you inject a we're close that comes from the decade and then you can group by and limit to get the top three of them so it's it's currently the best way to write a top end query usually lateral joy so when you don't have it it's very complex to write when you know about it is like oh I know what to do it done okay any questions so far you can interview interrupt me whenever you want to so let's continue what I'm trying to say now is that cycle is code so if it's code maybe you need some tooling that is going to look like what you have usually when you write code I've put sequel query year again just for perspective my role is that if it fits if the query fits on the slide it's a simple query right okay in that case I had to use a very small font that nobody can read so maybe maybe it's not simple enough but it's the kind of queries you're going to write and oh something I didn't say before well what is that a comment because sequel is code so of course you have commands maybe you colleagues they don't know generate series they've never seen that so if you use that and then you have of course you have a review process you don't push code directly to production right so someone is going to have to review your code and they never seen a generate series before and they're going to say no I don't want to accept this it's foreign to me I don't understand it it's going to be impossible to maintain in the future please do it in pattern or anything so you had a nice comment to your colleague and he's going to be the Sabrina I said you don't know that thank you so you know comments of course and then what you can do with the integrating sequence into your application code is the best way to do it that I found is that you actually use dot sequel files because then you have syntax highlighting it's actually code the day you are the production program in production with the curries then you can just you know send a link to your maybe you have a gig front-end a get lab or github or something you send the link to the DBA and is going to copy page the query play around it and then send your PR that fixes the query because it just a sequel file that's what we do right if you're not doing it that way maybe you're losing out on some of the process around sequel in new code I don't know and so how did to do that we are going to see about that and maybe you want also you need an regression testing around the queries so let's see about that for you guys you are using an ORM in your code I'm running short on time so I will not explain Morgan about it but just don't do it it's not worth it do that instead so those projects began with a closure implementation it's called a SQL and basically it looks like this you write queries in a sequel file and with uh some comments and it's going to expose the query here you load the queries and then in all the queries are going to expose as function calls and the result is going to be a native object in your programming language okay so the integration in the code is going to be so easy and also the query is this a sequel file that you are able to share very easily with your DBA friends because know they're friends because it's easy for you to work with them and also to review when you do code reviews I'm running short of time so I think questions you are going to be after a little time sorry about that and then you can implement regulation testing on top of it so I wrote regress QL in go because I wanted to see if I did like go or not but it took me like I think it was less than a day so if you want to have the same thing in Java Python Ruby whatever you're using my guess is it's going to take you maximum two days to get it so don't focus too much on this implementation the idea is very simple you have the sequel file that those are files in your file system right in your code as we saw just before and so what you do is you create query plans like you associate for each query some parameters and then you run the queries and you keep the result of them as an out file like you know copy from the query and that's a CSV file to store that that's the expected result and then when you change the query you run the tests again and you compare the output the current output with what you expected that's unit testing right it's very simple you do a diff in between the two files and you know if there is something that changed so when your DBA is providing you with a optimized version of a query well you integrate it into your system and you run the regression test and you know if it's a still behaving exactly as before or maybe not so it's easy to do and so that's the main parts of the presentation about sequel and why I think that sequel is code and how you make it so that actually in your development process sequel is code and then before we wrap up I wanted to show you some powerful queries that you can do with a PostgreSQL extension because you know it was maybe not so much boring but not very I didn't see much sparkles so maybe I can do that first query is going to be geolocation maybe you know about geo light they provide you with a data set of like geolocation is easy you have an input an IP address and in output you have a location on earth but then you need someone to collect all those things and max mind is the company behind geo right and they say that you can use it for free as as soon as we're as long as you say where it comes from so I just did so I can use it for free so July yeah that's the license you know so that's your night and you have a location ID and you have two tables with the things and here we use an IP range so it's provided just the type that is provided IP for our is an extension that you can find easily and it provides you with the IP ranges of addresses and this operator maybe you don't know to read it yet but think about it the first time you saw this equal operator you didn't know how to read it either so you need to learn about that so this this one reads contains so the app IP range contains this address that's easy and also we can have it PostgreSQL we can have exclusion constraints so you can build your blocks table in a way that you guarantee that any range has no overlapping runs anywhere else in the table so when you when you give it one IP address in the lookup you're going to look up either zero or one answer that's it because there is no overlapping okay and PostgreSQL is going to guarantee that which means that now you can write this query which normally the wouldn't pass could review but I didn't comment on that because I wanted it to fit in the slide okay sorry about it but here there is a comma which means it's a cross join don't do that never but we have a warranty that this general log table relation the result of this query is going to be zero or one row it's a strong guarantee because we have the constraint okay so the cross join is going to be against zero on one row so maybe it's okay and I was lazy so that's the way I did it and then we have an order by enemy ten it's called the K and and K here is ten okay ki is ten and then we're going to have the ten closest rows and that's the IP address of a previous conference for PostgreSQL that happened to be in Dublin and that was the 10 pubs nearby the conference centre because you taught for an hour and then you're thirsty that's what happens so and it's a very simple query so if you want to implement geolocation into your application maybe it's like a ten line queries to implement it it's like I don't know alfe an hour of work and done well I don't do the visuals and the mapping so you do that but you have the query that's easy and I'm running out of time so this one was interesting it was bashing on MongoDB but so anyway it's you you'll have this light go read over there and this one is I don't have time to explain it but it's basically it's a Model View controller in a single sequel query and the result of the query copy pasted from the console right is that so that's the kind of thing you can do with a sequel and Postgres any questions no so I'm going to be at the booth later so if you have questions that come up in your later please just show up there and that's it [Applause]
Info
Channel: Postgres Open
Views: 7,255
Rating: 4.9714284 out of 5
Keywords:
Id: q9IXCdy_mtY
Channel Id: undefined
Length: 50min 36sec (3036 seconds)
Published: Thu Sep 19 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.