PostgresOpen 2019 Look It Up Practical PostgreSQL Indexing

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
we'll get started thank you very very much for coming to the talk and for coming to post Chris open so we're talking today about post Chris indexes in a very fast a survey of Postgres indexing techniques feel free to feel free to ask questions during I make you those for the end if depending on how how the complexity of the question and how we're going on time just really quickly my name is Christoph Pettis I'm the CEO and lead consultant for PostgreSQL experts we're a boutique Postgres consultancy based in near San Francisco California if anyone knows where Alameda is that's where we are that's my email address that's the URL of my personal blog the slides will be up there within the next week and there's my twitter account so indexes we don't need indexes by definition because an index by definition cannot change the results of a query it can so a1 you can be 100% compliant with the sequel standard and not implement indexes so why bother right then you know there's a lot of indexes are a lot of work let's just skip that part the problem is this that the oh the order and operations problem because order operations that take order o n are bad not as bad as Oh N squared but the problem is that without indexes all all queries are at best sequential because it has to pick up every row rattle it see if it matches the the the predicates and then throw it away if it doesn't this is really bad you could imagine you know billion row table that kind of thing even worse you're doing the join on two tables that are each a billion rows and if this was an order n by M operation you can see that this would get out of control very very fast the whole point of an index is to turn this oh end into oh something better than n you know ideally log in or older one or something like that but just a reminder possibly the most common question we get about query optimization is why isn't it using my index and very frequently the answer is because using index would be a bad idea they're essential for database performance but they do not result in speed improvements in all cases so it's important to match the index to the particular queries and the workload and what you're doing and and getting some ideas about how to handle that is what we're talking about today so let's look at post Chris's amazing indexes post Chris has lots of different index types there's B tree everybody knows about B tree there's hash indexes just gin SP just Brin indexes a relatively new addition and bloom indexes so we'll talk about each one and what it's good for so but we have this really wide range of index types and each has a range of query types and workloads and data types that they work well for but how do you know which one to use we should give a talk on that so here we are okay B tree indexes this is the most powerful algorithm in computer science and no one knows why it's called that are they for balanced or abroad or Boeing or bushy is that the algorithm that came after a tree indexes you know we don't really know actually why they're called B tree they're old enough to be a lot of people's parents the first paper about them was published in 1972 this is where the idea that this is for Boeing comes from because the original inventors worked for Boeing it's the default type index type with for in pretty much every other database everywhere implements B trees because it really works really well for a large number of cases you know everyone's seen this graphic about how B trees work it's a moment as the word tree might imply it's a tree structure with bins and it partitions it partitions the index the total range of values in two buckets which then go down so this says everything from the start through 7 is in this one everything everything from 7 and 16 inclusive or exclusive now I don't remember ah whatever goes of this one and everything higher goes into 18 b-trees have a lot of advantages they tend to be very shallow structures compared to other tree structures like a binary tree or something like that which means that you have to pull fewer disk pages out to complete a walk of the tree they provide order login access to leaf nodes yeh they're not we're in the good good range of orders and therefore they're totally ordered so you can walk them in order directions so you can help they can help with merge by and order and order by merge join excuse me other and other operations that require sortation the original B tree paper had a fixed number of keys in each of the buckets that does it work for Postgres because post Chris has lots and lots and lots of different data types with highly variable sizes so post crispy trees have a variable number of keys per node and the entire index key is copied into the index because it needs to do a comparison on them which means that larger values so if you have a 32 bit in versus a UUID say you'll have fewer keys possible per node because the nodes themselves are fixed size so you'll get deeper indexes so they will perform not quite as well if if you are you indexing larger data types okay we're done we've solved the indexing problem forever not so fast problem one the entire key is copied into the index you know if you have 288,000 by character strings this isn't so great who knows what toast is in Postgres okay the short version is toast is how post was get post Chris has a hard limit to how wide each row can be invites it's a K by default it's a K minus of things you can recompile post Chris to change this no one ever does that so the problem is of course we often want to start more than 8k things toast is an auxiliary table that is effectively a spillover table for large objects indexes can use toast but you generally don't want to do that because that significantly slows down index access because that's a walk down to a point and then hop over to the toast table to get the full value and it requires a totally ordered type basically a type that supports equality greater than they're less than for all values many many data types including very important data types in Postgres are not totally ordered and thus don't really work in these okay so that's Petrie save that and we'll go on to hash hash is where a hash indexes were long broken in Postgres for a long time they kind of just didn't work very well they weren't well logged so they weren't crash safe they just they were forgotten feature which is a shame post occurs ten they got fixed they worked just fine now it takes the it takes the input value and converts it to a 32-bit hash code and the hash and with a hash table that points to buckets of row pointers kind of like every hash table in the universe it only supports one operator equality but that's a pretty important operator you use a quality a lot in databases so the indexes are much smaller than B tree indexes especially for very large types because it's storing it only has to store the hash code it doesn't have to store the full value and goes out to the the table to get the actual value and if there are few collisions in the when you're doing the index the excess can be faster than a B tree it's really good for long values where the primary thing you're doing with them is comparing equality this is a very large set of things long character strings like URLs things like that even UUID sometimes it's sort of ironic that you end up hashing something that's already a hash excuse me but for example if you're if you have you know sha-256 hash values you may not want to use a b-tree on them you may since you know greater than or less than doesn't mean anything for days you you may want to just hash them again and put them in a hash index just index it yes sir you know it's a good question I would say around 2,000 bytes now that Dow make me make me justify that under pain of torture I can't but 2,000 bytes is kind of where my gut tells me time to start thinking about other index types because that's about the point that things are getting built so in fact there was a compiled inconstant Postgres of mm bites just indexes first of all just isn't a specific index type it's a framework to let you write your own indexes it's a generalized framework in fact that's what stands for so because Postgres is always throwing in new types of all kinds and sometimes you want to write your own types you can go through all the trouble to write a full on ax of index access manager which is a pain in the neck don't do that or just makes it easier for you and what it just index does depends on the particular type being index so it's a little bit abstract it depends on what you're actually indexing but the the most common use case is this so we have a point and we have a rectangle and we have another point and we want to index these but what does that mean if we're talking about b-tree indexes I mean a rectangle isn't greater than or less than a point you know equality makes sense if they happen to be exactly the same kind of objective thing but that's a fairly restrictive operation you know is this point greater than this point I mean you know what does that mean but inclusion makes total sense you know this point is inside this box it's a really big point so so we have that that's a completely reasonable operation and of course in geometric database is extremely common one so just stands for a general lie is the search tree yeah talk about your forest acronyms and it can it's generally used for any type or containment or proximity is a mean a meaningful operation in fact standard total ordering can be considered special case of proximity of proof left to the reader and this is a lot of things like ranges geometric types text trigrams all these kinds of things and Postgres makes heavy use of make perfect sense to talk about is this inside this or is this close to this in some kind of geometric space it's not as efficient for clap for classic scalar types like int with total ordering or for simply quality operations the the generality of just does come with a price okay so this is the first survey we'll talk about when you exactly you use each one in a bit gin indexes which is short for generalized inverted index the problem is that both be treated just tend to perform poorly by comparison where there are lots and lots and lots of identical keys however full-text search you know which is the most classic case has exactly that situation you have a relatively small purpose of words and you know compared to a gigantic amount of things you want to index they use them you know there are but in you know typical part you know conversations are about 5,000 to 6,000 English words we use all the time there are a lot more English words that you walk into the library and look around so thus gin indexes um genetics is organized the keys themselves into a b-tree and then each the leaves of those bee trees are pointers to lists or further be trees to the rows that hold them this scales really efficiently for a very large number of identical keys so full-text search if you're indexing an array and you want to look up what you have a bunch of arrays you want to say find me all the rows where this array field contains a5 that's a that's a that's a very efficient use JSON objects where you're indexing the keys things like that SP just who has ever used at SP just index you notice I'm a consultant who does post course all the time in my hand is down yeah it's a it's a she because it looks really cool but not as many use cases right no I take it back I've used one yay that's wonderful it's a similar to adjusting concept it's a framework for building indexes it has a different range of algorithms for how it partitions up the space to build its index then classic gist it's mainly designed for situation where our classic just would be highly unbalanced and we'll talk about one use case for this later Brin this is possibly the neatest alternative index type that people don't use yet but it's really powerful and wonderful block rain is short for block range index be treated Nixon can be very largest on possible for the b-tree index for a a single b-tree index to be as large as the table is indexing the indexes in aggregate on a table are you often very very much larger than the table itself because B trees are completely general there's no it assumes there's no correlation between where the row where the the key value happens to be and where the row physically is on disk so they can be randomly sorted doesn't matter B trees work fine but often we do kind of know for example this very very common Postgres idiom where you have a created at time stamp that that every time you insert a row it gets stamped with its width when you put it into the database insert heavy tables often haven't monotonically increasing keys like this serial primary keys time stamps things like that and the tables aren't update heavy like let's say it's a log where are you really never changing a row once it goes in the key will tend to be strongly correlated with the position of the row at the table so further as you go into the table that key will just keep increasing and brynn takes advantage of that correlation so in cinema tree of keys that records ranges of case in the pages that probably contain them so it says key 1 to a billion are probably in this row this block so go into that block and find them it's much much smaller than a b-tree index and by by much smaller a Brinn a b-tree index that is you know that assuming that the data works well with Brynn a b-tree index that is in the gigabytes can't really go down to the kilobytes in bread so it's multiple orders of magnitude smaller and if this correlation assumption is true then it can be much faster to retrieve ranges like get me all of last year's orders then a b-tree but it's not good for update have-have heavily updated tables because every time you update a row and Postgres it moves to somewhere else small tables because there's really you don't get much traction for these ranges and if there's no monotonically increasing key Bryn has nothing to grab on to to make to work the last one is bloom indexes which are this is extension this is in contribute it actually has a very powerful very common use case it's like a hash wholly different so it's most useful for indexing multiple columns at once it's very fast for multi column searches such as if you have a bunch of attributes on a things expressed as their own column and then you index them all together with a bloom index it can be very fast and it's very very small compared to creating a b-tree index on each column individually yeah it's potentially faster for retrieval for a very large number of attributes yeah it does it yeah it takes all the values in hashes them together it's also insensitive to the order of attributes unlike a b-tree where a multi column B tree index is very sensitive to which order you you put them in the index bloom indexes are insensitive to that so now that we have our toolbox how do we use it well the first question we should ask is do we need an index at all indexes are expensive they slow down updates they increase the disk footprint they slow down backups of restores all sorts of things that are bad about indexes so as a very rough rule of thumb an index will only help if you if your if less than 15 to 20% of the table will be returned on a query after that it's going to it's often cheaper to just do a sequential scan because walking in index is expensive compared just scanning the table so the usual reason that the planner isn't using an index is it thinks it's going to it's more expensive than doing a sequential scan because of the number of rows that comes back sometimes it's wrong and then it's time to do some debugging the fruit you have to have good statistics for proper index usage so make sure the tables are getting analyzed and vacuumed properly consider increasing the statistic target for specific columns not globally just specific columns that have a lot of distinct values and more distribution than 100 buckets which is the default can capture uu IDs hex hash values tail the tail entropy text rings you know text rings that like very very very constant and then at once you get out to get column you know column four character four thousand it starts getting really random don't jam up the statistics across the whole table you won't get that much advantage out of it and it will significantly slow down how long it takes to analyze the database so who's on RDS who's done a push button upgrade on RDS from one major version to another who had their database lose its mind immediately after that right once when you do an upgrade or a restore you need to run a database wide analyze all to rebuild the statistics when you in community Postgres when you use PG the PG upgrade tool it writes out a little note saying oh by the way here's a script please run this it does that on RDS too but it doesn't tell you and it doesn't let you run the script so you have to go and do it yourself manually this has happened for three separate clients in the last two weeks for us so it's not an edge case you can have really bad statistics even though you're doing everything right so let's say you have a hundred million rows one hundred buckets the default the field is not unique and there but there are only 25,000 unique values this is not uncommon in a foreign key field for example and then you say okay I'm going to select everything from T where the sensor ID is this big where is this UUID the planner is gonna think a million rows come back because these are pretty much randomly distributed and they're landing in a hundred buckets and it's good to extrapolate from that - how many - what it's thinks is going to happen and it'll say a hundred of a million rows but and thus it'll say well maybe I don't want an index here this would to use an index maybe it'll be just cheaper to do a sequential scale and it's wrong so this is a great case where bumping up the statistics can get you a better plan last some indexes and multi-version concurrency control indexes store every version of a tuple until vacuum cleans up the dead ones so as you update and if you've update the row 50,000 times you're gonna have 50,000 dead tuples both in the heap and in the index there is an optimization poster it's called hot which does optimize this to a certain extent but not completely and it means that in the in the normal case an innocent has to go back out to the t p-- to the table itself to determine if this is one of the live tuples that it can see so this can really slow down and x scans it can in fact break index can sometimes in a really really heavily updated table and really heavily updated tables happen more often than you might think for example if you have a job queue or something like that that's being updated and inserted and deleted from all the time that counts and the index bloat can be quite significant there so um sometimes so you have to be aware of this make sure you stay up on your vacuuming it's sometimes make sense to just rebuild the indexes you don't have to take the you don't have to do it a reindex operation necessarily you do a create index concurrently for an identical index and then drop the old one for example so if you have that kind of heavily updated table and query seem to be slowly falling apart take a look and see how bad the index bloat is post chris does have an optimization that helps here which is if we know that every that the tuple on every page is visible to the current transaction we can skip going to the heap and post chris use the visibility map to determine this which is a very big optimization and if the planner thinks enough pages are completely visible it'll generate an index only scan operation this is nothing you have to do the planner just does this the only thing you have to do is make sure you keep up on your vacuuming so always make sure you leave auto vacuum on and make sure the tables are being updated for your vacuum frequently index scans kind of come in two types there's lossy and lossless some index scans are lost lossy in that it says well there's something probably matching on this page on disk but I don't know which exact rows match so I'm gonna fetch the whole page and filter it which means it has to retrieve the pages scan them again and throw away any roads that don't match if you are looking at explain an explained plan you'll the most common version of this is mid map index scan and bitmap heap scan the bitmap index scan builds the list of pages the bit heaps game goes out and grabs the page and sees what matches and what doesn't some index types are inherently lossy like gin and Geist are inherently lossy a relatively new feature in Postgres which is pretty cool other databases like Oracle have had it for a while queries often return columns that aren't in the index predicates in the query frequently you're grabbing other row fields out of what you do a query I mean that's pretty long common traditionally post Chris had to go out to the heap grab the page and get those extra rows because they are stored in the index so it had no other place to get them from a feature was added in post Chris 11 which you can add non index comes to the index and they're retrieved directly when the index is scanned so it doesn't help on non index only scans so make sure you're vacuuming is good and remember you're increasing the size of the index cuz you are putting these things in there so don't go crazy just slam every row into the index yeah it doesn't it doesn't help that much once it doesn't help any on size it does help on insert trying because it doesn't have to do the sortation on the because it's yeah that's basically the only difference so gin indexes genetics are very very fast to query but they're much slower to update than any other kind of index because you have all these B trees that you have to mess around with and sometimes you have to make decisions I want to break pages and do all this kind of stuff it's slow enough that post Chris doesn't actually update the index immediately when you do an insert it stores them in a separate posting area and updates the index at vaccum time or on demand a very frequent situation will get is somebody with a very table with a very heavily updated genetics will suddenly have the system kind of go crazy on vacuum and what's going on is it's resolving this posting list one thing you can do is you can tell it when to clear them clear the pending list impending was supposed to the same by calling a function so you may want to do that for example if you do a giant batch update to a table that has a Djinn index you may want to call this function immediately after - just to resolve the pending list b-tree alone at the currently supports unique indexes if you know it's it's perfectly fine to do optimistic insertion on with index and recover on an index conflict like you can say well I'm going to insert a bunch of things and maybe it exists and maybe that's what I'm just gonna try and if I get it error well it existed already so that's fine that's a that's a that's a fine strategy post Cruz does have a function have a you can do an insert on conflict do or do other things that is a that makes that much easier one thing to be aware of is it can be a concurrency color because it Postgres has to go and lock pages to make sure that - that - if two separate processes or sessions are trying to insert the same row that only one succeeds in the event of in the event that they both try and commit so if you have a lorig sample you have your let's say you're collecting sensor data that's a very common thing and for some reason you have a unique index on the table you're inserting it - maybe not the best idea if you have a thousand inserts per second and a hundred processes doing that you're going to get some serialization and bottlenecks on this locking you'll it will definitely be visible so be aware of that be tree indexes also have a generalization of unique called exclusion constraints which is only one value that passes this comparison is allowed in this table this is very handy for things like rain you will the the hotel room reservation problem you only want one row you don't want somebody be able to insert two rows into a table where one they both have the same room number and they have overlapping date ranges there's no way of doing that with a unique index but you can do it with the exclusion constraint by using the overlaps operator on the rant date ranges that's very nice okay so now we're going to talk about so now we have these zillion indexes which one should we possibly use how do we decide which index uses a particular situation first gather some information about the situation the typical queries you're going to be doing the columns and the data types the operators that you're using in these queries including joins and how many rows the queries typically return so have that in your head when you're making these decisions so does the query typically return a large percentage of the table like 50% or more including and this includes hidden row fetches such as count star which in effect has to query every row and pick up every row in the table if so you probably don't need probably an index one would help figure out a different way of solving the problem you can refactor the query consider summary tables or other techniques before just storing an index at the problem that didn't actually help very small tables that fit in memory you probably don't need the indexes all except to enforce constraints so if you ever you know if you have a table of you that maps California state to the two-digit state codes to state names you probably don't need an index on it so if you have more than one predicate more than you know where this equals this and that equals that which columns are being used so always start with the most selective one the one that will cut down the result set the most if the index is individually don't cut the results much but together they cut them down a lot the intersection is very small that's a good sign you want a multi-column index so if a equals one returns a lot and B equals two returns a lot but B equals one N equals two returns a very very you know 1% then you probably want a multi-column index yes and we will talk about that so but first let's consider the single column case is the column of small scalar these I would all call small scalars you know in begin flow UUID time-stamped easy you know i net and care types we'll talk about later is the value a primary key or otherwise unique then use a b-tree that's your choice right now is it monotonically increasing on a large rarely updated table and the query is doing a range operation this is a pretty common kind of thing for things like summary tables data warehouses they do this kind of stuff all the time users give a brain consider primarily a breed index otherwise a b-tree that's the right index index type for this kind of stuff if you're creating it primarily to support order by descending then Creed is a descending index otherwise ascending Postgres can walk and ascending index backwards oh but it's not as efficient as walking and descending index it well it it will it what you will you can index know you can index no values and have where is null so you should just consider them like any other value really unless you exclude them specifically with a with a predicate where is not null which is very common operation so is the column a text field like varchar' text or care if you're weird there's really no reason to use care I really wish post Chris had a fixed a true fix with character type but it doesn't so are you doing full text search trigrams or other fuzzy search techniques that's a trick question we'll talk about that later is the nature structure to prefix heavy and are you typically doing prefix searches like URLs or a good example where you're searching primarily on the host portion of a URL then SP just actually works really well for this this is what SP just is designed for for these these prefix heavy searches where they're where the there's not a lot of entropy at the beginning of the string if it's a small value let's say 200 characters I said 2000 I'm saying 200 here you can tell I don't have a strong a fixed opinion or do you require total ordering like are these for some reason things to where you have to say greater than or less than if so use a b-tree otherwise consider a hash index for these this is what hash is good at is the comma by date now just a big bite blog post chris allows this why are you indexing a by day please don't index by days please if you must use a hash or calculator hash values sort separately and then you know run through the decision tree again based on what the hash value the the characteristics of the hash value post Chris will let you create this index by the way but why and if you're doing this by day greater than that by day I would love to hear about your use case is a column arranged or a geometric type this is what just is therefore this is this is why we have just like post gifs the the geographic information system extension for Postgres these are all just based if you need nearest neighbor searching you want to use just for sure this is what sometimes called the Starbucks problem in NGO information systems which is you have you think what is the the the or use case for a geometric is get me my nearest Starbucks but if all you have is containment this is actually a really hard problem because if I'm sitting here you know in this hotel and I say get me the nearest Starbucks they're probably a lot of them within a mile probably a lot of them within within a quarter of a mile if I'm in the middle if I'm out in the wilderness in Montana it's a much bigger bigger search so how do we decide what that initial bounding box is and this is a hard problem with geo information systems Postgres has a can index type called K&N that specifically solves this problem without you having to pick a bounding box so that's really handy if you have a very very skewed distribution where where the data points are not particularly random um SP just is sometimes a better choice so experiment with both and see which what you get better performance on assuming SP just supports your data type are you indexing eye nets which is a generalized Internet address for both ipv4 v6 are you just doing equality that's the only thing you're going to do be tree is probably the best one for that try hash probably isn't gonna be better but it's work but you can try it are you doing prefix searches you know searching on CIDR subnets and things like that consider SP just this is actually kind of the or use case recipe just because ipv4 s and v6s are most definitely not evenly distributed across the range of possible n bit values are you indexing an array or JSON B are you just doing equality do hash works really well are you searching for key values to the top level within the object this is actually an error in the slide you can or it should be just key values period you can index any depth Jin is a Jin is there for you if you needed keys from inter values you can use it you can use expression retreat' indexes or now you can use Jin this is obsolete information my apologies is the column J song with no B post chris has two different JSON types JSON B and just plain JSON why is the column JSON it's obsolete you should not be using this you really your only option is expression index because you can end it you can create a b-tree index on it but the only thing you can do is equality there are a couple of edge case reasons to use JSON I'll just mention them one is if it's insert only you're just logging raw values and you you are basically never going to search or rarely go to search JSON is a little faster to insert than JSON because does have to parse it and do all the the encoding for it and the other and the other is if you are there are some there is broken JSON out there in the wild like it has the same key twice at the top at the same level of object and JSON B doesn't handle this but JSON does if but if you need real powerful indexing far better to convert it to JSON because JSON B works much better are you doing full text or fuzzy search if you're doing full text search create a TS vector for the text and created gin index on that rather than indexing the text directly ts vector is the is the encoded and takes it runs through a dictionary turns it into tokenized as it does all the that stuff for it you can either store it as a separate column or use an expression index frequently you'll want to store it as a separate column because you are actually going to submit and smash multiple columns together into the TS vector for example if your you you might want to have an author a title and a body with different weights and you calculate a TS vector from all of those and all that which I just say in the next poll point if you're doing fuzzy search like with trigram creating an index on the column using just trigram ops and which is part of the trigram package this is very this is really nice for things like a autosuggest from a drop-down list because it's really really fast so and your if you're doing a search on every time somebody types of key okay that's important ah is there more than one column in the predicate consider creating a multi column index if the predator together are highly selective remember if you created an excel a be post chris will almost never use it for just a search on beat i it's funny like the day before i was working on these slides and I said never there I saw query that reused be it was a really bad plan and I'm not sure why I did it but it did do it so I can't say never so find the right index type for each columns individually and create the index based on the most selective column one catch here is if some of the eight columns needed just index the whole thing has to be a just index you can't mix and match that the top level in the totally indexed type within a single index fortunately you can install the b-tree just package so you can get it up just operator for things like int and basic operations so if for example you have an integer and a range you can create a gist index that uses both if the query patter is an arbitrary combination of comparisons on the various columns consider a bloom index this is actually a more common use case than you might expect for example you have a GUI that says search that has 12 different predicates you want to search on and the user gets to just pick little type them that's a really hard question to can is probably your best bet there if the predicates are selective individually you might want to create just create an index on each one Postgres can and and or index index results together but you want the individual ones to be selective in that case does the query contain an expression consider creating an expression index for example you know an index on an accent lower name instead of querying on it that's good that can be faster you know sometimes as berries create a different time a separate column that handles this like a column of case independent text for the lower part just make sure you're using that query all that a particular expression in the query a lot it's going to be expensive to build and maintain this index so make sure you're getting a good value out of it if you index on a user written function now we're getting into some pretty advanced stuff but you can do it make sure that it really is immutable that you just didn't declare it immutable because you don't know what immutable means roughly an immutable immutable is a pure function it can only depend on its inputs it can't depend on the state of the database or the state of the system this can be remark you can sometimes pull out of immutability in weird ways like doing time zone operations because time zones are not immutable is one predicate highly-selective so like this we're only 10 percent of the orders are active consider creating a partial index like that because it'll only index rows that contain where that where that predicate passes so this can really speed up operations especially if this the if this part evaluates to a very very small subset of the number of rows because you're not only not only are you are you searching through it because it's a much smaller index at that point okay how do we answer the question do we need an index well look at PG stat user tables you can look at tables with significant number of sequential scans that's a good place to start not all sequential scans are bad look at the particular queries look at their execution plans decide what you're going to do you can you can look at PG stat statements and the text logs and PG badger are your friends here once you created an index is it actually using this well you can look at PG stat user indexes and see how many times it's being used drop indexes that aren't benefiting you indexes are expensive to maintain so don't have any more than you need our index is bloated indexes can suffer from below just like the tables can sometimes worse vacuum can't always reclaim index index spaces as efficiently because indexes have structure unlike the heap which is just you know big bag of tuples sometimes you do want to rebuild indexes we have a you you can just don't have to have handwrite this copy does like we have a we have our favorite bloat check for indexes sometimes indexes yeah I generally don't get upset a book less than 50% blow and more than 50% bloat 15% or more and I think is acceptable by some stretch it depends a little bit on how you know if 50% bloat on a gigantic index is a little worse than 50% on a smaller one but but really it depends on indexes it really depends on query execution time if you start noticing slowdowns in query execution that you can't explain any other way then that's a better guide in in this case I'm defining it strictly as both dead and free space because the query it's it's a little harder to tease out to tease those two apart so yeah so look at the query I'll have to look at the greatest River exactly how it works but that's why I say you know really it's more of a performance based thing you know the queries running fast I don't really care how much dead space there is on it indexes do get corrupted sometimes doesn't happen very often happens a lot less than it used to which is great sometimes you'll get an error during query that it'll say something bad post-christian 10 plus has a m-- check as part of contribute which will let you actually check the check the structure of the query fortunately if you're starting to getting errors it's easy to fix just drop and recreate the index and to conclude indexes are great yay indexes just remember they're an optimization always create in response to a particular query situation just don't just throw indexes that are add a table hoping they'll help an experiment test different types of indexes see what works the best and don't just go by be true by default pick the right index type for the data post Chris house has all these great indexes use the right one and monitor usage and size keep the database healthy and trim and thank you very much question any further questions [Applause] yeah you know the schema has to be 100% identical II want now something to think about is if this is a big problem for you consider moving to a logical replica instead of a binary one because buying it because the logical replica can have any schema you want assuming it can still you know so figure out how to do the insert so that's um a lot of the there there are a lot of problems that used to be related to binary replicas in Postgres where the answer is now think about firing up the logical replica instead of a binary one a lot of data warehousing and things like that can't query cancellation issues on heavily heavily read from replication some kinds of replication lag logical replication has intrinsically slightly longer replication lag but on heavily queried databases it's more it's a little bit more controllable it does you don't get the query cancel out you don't get that balance between replication lag and you this trade-off between replication lag and query dense and query volume that you do on a binary replica so there's a lot of lot of good things about it logical replica increases your sis admin overhead a little bit but so any other questions oh right what should I say about tech spatter here yeah mr. Conway will tell you about tech spatter tops my brains yeah right yeah they're they're the default pattern operator the default operator if you create it on a text type if you create a b-tree will doesn't support like type operations like reg X's or the like percent operations there's a separate operator text pattern UPS or for our car pattern ops that does support that there's no reason not to use that actually because it supports equality and the rest of them just fine there was a time that you wanted you had to create both if you wanted them to be but you had to create both if you wanted to use both standard ordering operators and fuzzy searching or it's not fuzzy pattern searching that's no longer true so thank you for the reminder thank you very much
Info
Channel: Postgres Open
Views: 1,623
Rating: 4.8490567 out of 5
Keywords:
Id: Xv0NFozBIbM
Channel Id: undefined
Length: 45min 35sec (2735 seconds)
Published: Thu Sep 19 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.