Scaling Postgres Episode 184 Postgres 14 RC1 | Little Things | Index Advisor | A Million Inserts

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this episode of scaling postgres we talk about postgres 14 release candidate one little things index advisor and million inserts i'm kristen jameson and this is scaling postgres episode 184 [Music] all right i hope you your friends family and coworkers continue to do well our first piece of content is postgresql14rc1 released so this is the first release candidate of postgres this is from postgresql.org so this should be the release prior to the finalized version that is expected to be released next thursday september 30th and you can tell there's not a lot of changes since the third beta although one item of note here is that quote the feature where auto vacuum analyzes partition tables is reverted and will not be available in postgresql 14. so i wasn't really familiar with this i looked into it a little bit and basically the analyze portion doesn't get run on partitioned tables by auto vacuum so apparently this patch wasn't an attempt to start doing that but unfortunately it didn't make it in so if you want to learn more about the release candidate you can check out this content the next piece of content postgres14 it's the little things this is from crunchydata.com and he's talking about a few little features that he definitely will appreciate with postgres14. the first one is the json syntax where you can now use json subscripting so much like you can with hashes in other languages you can use the brackets with the quotes or double quotes to identify a piece of nested json that you want so i definitely appreciate this syntax better than the standard syntax up here the next area he mentions is a read-only role so it's quite easy to add someone to some pre-defined roles the one he's talking about is pg read all data then there's also the pg database owner so those are great additions to have he also mentions a few enhancements to psql the command line client for postgres then he also mentions the enhancement of query pipelining so this is basically an way to communicate with postgres in an asynchronous fashion basically you can send it a query and then send another query before you receive the results back so basically you can queue up multiple queries to send to the database and it can go ahead and execute them and then send them back asynchronously and they're calling this a pipeline mode so you can click on this link and it talks more about the pipeline mode in postgrads but mostly it has to do with the client being able to handle it in the lib pq client which is what a lot of adapters use for application frameworks will do this but of course the application framework needs to handle it itself so if i wanted to use this in ruby on rails i would be using the postgres jam and that would have to have support for this and the ruby on rails framework would have to utilize that as well but definitely an interesting performance improvement that i hope people are able to leverage next piece of content a better way to index your postgres database pg analyze index advisor this is from pganolyze.com and they're talking about a way to determine what indexes should be placed upon a table and they were thinking about different ways of doing this one way that they considered is machine learning ready to do this and they actually tried doing this using github's copilot which is an ai based helper for developers and they really weren't able to get anything effective out of it so they kind of thought hmm well how else can we do it they thought about using actually the postgres planner to do it and they had the question what if we use the postgres planner to tell us which index it would like to see based upon a given query so basically ask the question quote what's the perfect index for this query so basically have a query presented to it and then the postgres say well this would be the ideal index for it well then that would be the index you would probably want to create so based upon that logic they created the pg analyze index advisor so this is a tool that is within the pg analyze product but they also made a standalone one that's entirely free and you can just click on this link to get to the free one and basically you post the schema that you have into the top field and then the bottom field you place your query and it gives you index recommendations now i tried this out and it does work for relatively simple indexes and they say it only works for b tree indexes at this point but like i tried some like operators or greater than less than a date and i really didn't see it kind of responding to that so this is probably still early days but it's an interesting concept so you may want to check out this post and this uh new tool the next piece of content when you have millions of insert statements in a file this is from dev.2 in the yoga byte section and he's talking about the situation where you have a file that just contains multiple insert statements in a row and how to run say a million of those and load up a table now of course the post mentioned this is probably a very inefficient way to do it a more efficient way to do it would be to use the copy command and just have the raw data the other scenario that might work well is to batch multiple rows per insert statement so that way you get more data inserted per insert statement but assuming that this is just how you have the data how could you make this execute quickly well the problem is because of postgres's auto commit it actually does a commit after each statement is run and running this actually took 21 minutes however when he did an explicit transaction we'll begin transaction and then ran through that process it ran in a little over a minute so just by avoiding all that transaction overhead drastically improved the performance of it so that's just something to keep in mind and then the latter part of the post does talk about gigabyte db but the postgres related portion is up near the top of the post but if you want to learn more about that you can check out this post the next piece of content restarting a sequence how hard could it be postgresql and oracle this is from fluco1978.github.io and he's talking about sequences in postgres so this is the way that you can get auto incrementing ids for your primary keys is that generally it creates an integer either an integer or a big end and then applies a sequence to it and it's the sequence that gives it what the next value should be now you can create a sequence by just doing create sequence give it a name and increment it by whatever value you want typically one and determine where you want to start it at and that gives you a sequence and then he runs through and does a next value for this sequence that he created up to a hundred times and you can now see that if you look at the current value it is at a hundred now in terms of resetting it you can do select set val the sequence and then the number you want to set it to although personally i prefer the alter sequence command just because everything else is alter table alter index etc this is alter sequence give it a name and then you can say restart so by default it will restart over at the beginning which would be one which is how you've generally set it here start with one it'll reset it to that or you can tell it to restart at a particular location like this says restart with a given number so it's a pretty simple but he says the thing to keep in mind is that when you're looking for what the value is if you do select current value it'll just tell you what the current value is not what the next one is going to be for that you have to do select next vowel and give it the sequence and then it goes into talking about the comparisons to oracle and how it's interesting it wasn't until oracle 18. that their alter sequence could actually restart his sequence so i wasn't familiar with that but if you want to learn more about sequences you can check out this blog post next piece of content building an oracle to postgresql migrator lessons learned this is from cyberduck hyphen postgresql.com they're talking about a tool that they developed in oracle to postgres migrator and these are some of the best practices this tool has taken into account the first one is uh that transaction length matters particularly for oracle when you're trying to transfer you know they say terabytes of data over doing like a change data capture to do an initial copy of the data and then keep it up to date getting ready to do a switch over to using postgres with oracle you can sometimes get snapshot to old error messages or can't serialize access for this transaction and so there's some considerations you need to take into account on the oracle side when you do this and apparently the tool does support that talking about that the oracle catalog is very slow compared to postgres and had to do some things like having a gui avoid reading from the oracle catalog or even caching the data that's in the oracle catalog which seems a bit surprising to me the next one is that it's easy to have invalid objects coming in you just need to be prepared to handle those when you're doing a migration and also be aware of null bytes and broken table content that could possibly happen then the post follows up and talks about ways to efficiently load data into postgres and says because you're writing to a wall log as well as the data files itself if you're doing the data transfer using a tool such as copy you would probably want to do a copy on freeze and also be aware of the hint bits to try and minimize the number of writes that have to happen to transfer data into postgres and then they give some final advice about making sure you get your data types aligned up correctly when transferring data and as they say quote migration is a chance to do a cleanup so generally you're going to be doing this if you're migrating from one database to another but interesting blog post and if you're planning on moving from oracle to postgres maybe you would like to check it out the next piece of content pg pull 2 configuration parameters reserved connections and this is from bping.blogspot.com and this talks about pg pull 2 and reserve connections where you can set a setting that will actually send a message sorry too many clients already as opposed to blocking connections so if you're interested in that you can check out this blog post another post pgpool2 configuration parameters enable shared rel cache this is also from bping.blogspot.com and it talks about the relation cache that exists in pg pool 2 and how it works so you can check that out as well the next piece of content the postgresql person of the week is michael goldberg if you want to learn more about michael and his contributions to postgres you can check out this blog post and the last piece of content we had another episode of the rubber duck dev show this week this past week was on background job processing so if you're looking for two developers discussing how to handle background jobs you can check out this episode this coming week we're going to be covering working with uuids so if you want to hear a discussion determining how best to use uuids you can check us out on wednesday at 8 pm eastern standard time that does it for this episode of scaling postgres you can get links to all the content mentioned in the show notes be sure to head over to scalingpostgres.com where you can sign up to receive weekly notifications of each episode or you can subscribe by youtube or itunes thanks [Music]
Info
Channel: Scaling Postgres
Views: 208
Rating: 5 out of 5
Keywords: PostgreSQL, Scaling PostgreSQL, PostgreSQL Performance
Id: kt6_kW6gNlw
Channel Id: undefined
Length: 12min 38sec (758 seconds)
Published: Sun Sep 26 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.