Postgres Open 2016 - Flexible Indexing with Postgres

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone great to have you hope you enjoy your lunch I certainly did desserts were quite good actually I must say so my name is Bruce mom Jin I'm one of the Postgres core team members I've been working with Postgres for 20 years now seems like a long time work for enterprise DB one of the sort of poet companies supporting Postgres Simon made there was that point this morning about companies supporting the community so it's it's exciting to to be here this is I guess as Steven said my sixth time as well because I've been to all of them and this is certainly I think one of the best we've had of a PG open I love the venue I think it's just a really great set up the way we have everything working here and obviously the organizers are doing a really great job so today we're here to talk about Postgres indexing specifically flexible indexing specifically non-traditional indexing and I think it's a very interesting topic we have a good number of slides and a lot of sort of sort of brain twister kind of concepts I'm going to throw at you the next 50 minutes but you know I have I have to start out with a with a disappointment I'm actually giving a talk tomorrow afternoon called non-relational Postgres and you might heard simon mention it this morning in a lot of ways this presentation goes with that presentation so I'm going to be alluding to a couple things as I go through this presentation that that may not be clear to you and and it might be clear if you were also able to attend tomorrow's talk about non-relational Postgres the last talk of the day tomorrow and it goes very very well with this in fact ideally I should have had them move that talk today and then I could have done the indexing afterward but I realized that this morning so it's a little too late to be switching talk orders around but but this concept really a flexible index thing I think is one of the powers of Postgres it's similar to the non-relational ideas of Postgres the idea and Simon did a great job this morning talking about it of taking on traditional non-traditional workloads of taking on problems that typically aren't solved in the relational space the relational database space but Postgres solves things like GIS things like full-text search things like very complicated analysis of rays and all that other stuff it's really cool that Postgres does that and actually a lot of this indexing has come from oleg and his team in russia so it's good to have him he's sitting right there it's good to have him in the audience as well to back me up so let's talk about what we're going to talk what we're gonna cover today first we're gonna talk about traditional indexing this is sort of the bread and butter kind of stuff that that all relational systems do to one extent or another and then we're going to branch out a bit a little bit we're going to talk about expression indexes if you you might have heard those before you might not be sure what they are they're actually really powerful and I'm going to show you some examples of that then we're gonna talk about partial indexes which also sounds kind of bizarre but I have some really cool use cases we're gonna talk about well what would a bitmap skin is which is sort of a hybrid of a traditional index scheme I'll explain why that's important then we're going to kind of kind of go out out in the space go away from even traditional storage of indexes and talk about some of the non b-tree index types that Postgres supports we'll talk about the data types that need that support and then we'll finally talk about the use of indexes and kind of how to analyze them and then wrap it all up and again if you're able to attend my talk tomorrow we get a lot more concrete about the specific data types that Postgres supports the specific workloads worthy a lot of these indexes become useful I will take questions during my speech so again if you want to have a question just raise your hand I'll repeat the question and be able to answer it for you pretty easily so let's talk about traditionally indexing this is what pretty much all the relational systems do and and butter index of course for relational systems is b-tree short for binary tree it's sort of has everyone's used to the standard chart that has you know a little tree at the top the root and then it caught it goes all down the B trees is been around for a long time it has a lot of use cases it's a very generic handle it doesn't handle every case as I'll show you later but it handles a lot of them but it's really ideal for looking up unique values and also maintaining uniqueness in an index so if you're looking for a specific order number or a specific customer number or a specific part number if we think of the traditional OLTP cases B tree is just great it takes a huge problem space a huge number of rows breaks it down gets you to that that particular row you're looking for as fast as possible now there's a lot of use cases that don't have that kind of requirement but that's your core of B trees great for it our implementation is it's actually a hybrid of B tree called NB tree it has very high concurrency it handles some of the traditional problems that old or very much older or B tree implementations have in terms of that keeping the tree balanced and so forth those are all handled very well it effectively stores a key and a pointer to a row in the index so if you think of what's actually in that index it's basically just the key that the value the part number the order number the customer number that you might happen to be indexing and then where is it where is it in the heap where what where is the row that matches that value and it also can support things like order by if you ask for like for example the five highest customer numbers or the five earliest employee numbers we can actually use the index for that we can use it for merged Jones and we can use it for the inner side of a nested loop which again is getting beyond what I normally cover but effectively the optimizer can use these indexes as well to get you query so even if you aren't specifically identifying a column that you want to you know you don't you know you're not saying order ID equals 12 right which you hope is going to use an index if you have an index or an order ID you're hoping that's going to be used if it's reasonable but in fact even internally if you don't even mention the order ID column equals a constant you do a joint on the order ID column the the optimizer can make use of that of that B tree index as well okay but you might say okay that's cool that's what everyone else that's what all the other databases do what more do you have quite a bit we can use we can index things like expressions and functions I'll show you it's some example of that the we we sometimes need a better control of the rows sometimes we need a small lightweight index and B trees are pretty heavy if you imagine you're storing every key and then appointed every he theta he B trees kind of yeah I may need a lighter one you might want closest match searches not exact match but I want the things that's closest to that you may have an index with many duplicates and B trees not really great if you have a lot of duplicates think of a think of a a column that has true or false okay B trees really not gonna help you there right a male or female whatever it's just not gonna help you because you don't have really that kind of unique I'm looking for one row you're looking for you know half the table is true or false obviously the B trees not going to help you there and some of the multivalued fields also don't index very well with B tree and I'm through this talk I'm going to talk about some of the examples of actually how are non-traditional indexes make that possible any questions okay great okay so first thing that I would say is nontraditional it's a little hard to get your head too wrapped around but if they're cold expression indexes okay you know I've been around databases for I guess thirty years now and you know they're traditional well maybe not thirty maybe twenty-five but the traditional way that you you know index something is you index a column will you index a couple columns right it's a single column or multi column index it's pretty pretty straightforward okay however some a lot of times you're doing things in your query that don't match a single column and here's an example where I'm looking for the name Andy that could be upper case could be lower case I don't know so I'm basically gonna say I want any case in this for a cop query where the name the lower case of name equals Andy lower case now if I create an index as you can see here in the second query on the name column and I actually actually doesn't work because effectively what's happening is if I do the cop query it's saying well you didn't really ask for name equals Andy you asked for lower of named equals Andy and your index doesn't have that you're Nix just has the name unadulterated so the solution as you can see in the last query is to create an expression index now I'm using a function call here I could have used arithmetic I could have used concatenation I could have used random and or whatever okay I could have used almost anything I can use like two car and then a type zip you know uh some kind of timestamp column and I want to index just the year or whatever I could have done a whole bunch of cool things but I could index the day of the week of the particulars and again as long as I use that but they can same function in my query it'll work so that's an example a very simple example I'll show you some more complex ones a little minute so here's an example of a customer feel there's example of a an actual run we create a customer table we put a thousand customers in there so pretty straightforward and then we ask for customer oops sorry Billy is for customer nine nine nine okay and that works it matches the first index refer to first index being our customer name okay and it works fun but we try the second one on this table and we notice that lower customer doesn't work it does the sequential scan it doesn't use the index that's because it's a brand new table we only have that one index on it now if I create a an index on the lower of the name then when I do this query right here you can notice I'm getting an index again at the bottom okay so I'm not saying you're going to use these expression indexes day one but when you start to write queries that have comparisons that are more complicated than a single column name expression indexes can get you out of a lot of jams in fact a lot of people won't the tools will take a whole bunch of columns that kind of mush them together like maybe only an index like the first three letters of the field and then I want some other part of some other field concatenate all these crazy things that people do it actually allows you to kind of make things indexable that traditionally were not indexable okay so you can use your student find functions in these expression indexes you can concatenate you can use math expressions the function has to be immutable that's a requirement because you are indexing a particular value that can't change every time you call the function or effectively the index would be useless and also there can be casting issues you might need to look at okay any questions okay okay so second one partial indexes this is a little different than the expression indexes this is a case where you only want to index part of a table because remember when I normally say create index it indexes every row okay but that may not be what you need to do okay what if I were only going to be looking up a subset my data why should i index every single row if I'm only going to look up a couple of them for example when I used to do accounting we used to have like an accounts receivable table like a whole bunch of transactions and there was a boolean flag that said zero balance true or false okay and honestly every time you ran an accounting report you only wanted to see the non zero balance entries you did once it's zero balanced or old you didn't want to see them so I could have created I didn't have Postgres at the time but I could have created to an index that just indexed the ones that didn't have zero balances because those are the ones I want to show my report didn't make sense 99% of the entries had zero balances nobody wants to see it those write calcium port wants to show what's receivable not what zeroed so it'll allow you to kind of slice up your data and index only the parts you want to that means that you have less the index is smaller more of it's going to fit into into RAM and also you don't have the overhead of maintaining an index for rows you're never going to be looking up so get smaller more shallow less update and insert overhead and you can still do a sequential scan if you don't use the index so the data is still there just isn't as indexed so here's an example we're gonna add a state column to the customer table okay unfortunately I mean I'm in Dallas so Arizona is not that far well if Dallas its Texas it's really far but I mean in the map it doesn't appear as far as those from Philadelphia right so we're gonna we're gonna make customers who begin with nine be in Arizona so we had a thousand customers ten percent of our customers we've assigned them to Arizona and what I can do is if I really care a lot about my Arizona customers I don't know why maybe there's a tax reason maybe I have I have to run a special report just one Arizona customers whatever I think create a special index now this is not an expression index this is to say a partial index and you can see here I've used - where clause I say create index on customers think column where state equals Arizona so I've only indexed my Arizona customers I can still query the other customers but I can't use this index to find them okay and if I run a query and I say give me all the customers live in Pennsylvania Pennsylvania I get a sequential scan so the data still error I just can't use the index but if they ask for Arizona boom I get the index okay again not something you're gonna use in day one but as you go forward you starts to be like oh yeah this would really really cool why am i indexing everything this would be a great way of doing a whole bunch of cool stuff okay now that exhibit was a little contrived because I actually I'll back up I ended up creating an index on the state column and my where clause also had the state column see what I'm saying like my restriction is the state column atom index in the state it really but there's no reason I have to do it that way so I could this is a lot Ussuri were useful I can index the name column and I can restrict it to Arizona so now I kind of have the best of both worlds right I can go right to the index and know that every person in that index is from Arizona by default because of the where clause then I can pick something else so if I look at the query at the bottom that one doesn't do anything because I haven't restricted it to Arizona right there's there's no way it's like well I could give you some of there were people who are customer 9th I said but I can't give you all of them because I don't know which ones are in Arizona but if I do a query like the top one I ask give me everyone in Arizona and that customer number it all of a sudden uses the index and what's interesting if you look at the ethic at the index condition it's checking the name it doesn't even mention the state because it already knows it's there all in Arizona right so in a way I've used the India twice I've used the index brokers by the fall I know they're all in Arizona and now all he really need to do is to go and look at the name okay kind of crafty I think uh and if you look at the bottom you can still look up everybody in Arizona just fine just go look in Arizona and boom it'll still look him up even if even even though I don't ask for the name there I didn't mention the name at all but it was good okay any questions yes sir does a partial index use a bitmap skin I am going to answer that question in a minute so thank you next question okay great okay bitmap skin hey well you line me right up here so what is a bitmap scan again we're looking at some of the non-traditional aspects of Postgres indexing because normally when you use an index what we do is for every index match we go directly to the heap so every index match we go we get the match we go to the heap we make the match go to the you can't go to the heap if there's a lot of index matches you're going to that heap a lot of times so a normal OLTP that's not a problem you only have one customer so you're only gonna go to the index once you have one order number I'm gonna be indexed once there's a lot of cases where that doesn't work particularly when you're getting a lot of index hits a lot of index matches so when you're in getting and you see this a lot with all that with with data warehouse type queries which are saying you know give me all the blue shirts and all of a sudden you get you know tons of shirts right when you're starting to get a whole bunch of hits that idea that you'd go right from the B tree to the heap is kind of inefficient so we came up with this idea of a bitmap index scan it's not a bitmap index just to clarify bitmap index is a very rigid index there's little bits for each one and it's hard to add and it's hard to update and keep consistent that's not this this is an in-memory bitmap that we create when we expect a lot of matches and this gentleman pointed out that most of the examples from this match from this these from these partial indexes actually had a lot of matches we know there's a hundred people in Arizona right in fact it tells me or actually tells me it thinks there's five but anyway it it knows there's more than one so it starts to use a bitmap so again it's a it's a way of having a reducing a lot of the overhead when we have a lot of matches it also allows us to use multiple indexes for the same query I'll show you example of that so it basically creates a bitmap in memory either at the row or block-level granularity and then it allows us to visit the heap once rather than multiple times in case we have multiple hits on the same page okay we can merge bitmaps together and it's automatically enabled you don't have to do anything to do this so here's an example of a multi column index of using multiple bitmaps what we do is we have a restriction on the index one it creates a bitmap we happen to also have a restriction on the next one column two separate index and we can create two bitmaps and merge them together using that we can we can actually take multiple indexes use them in this same query on the same table which again very common for data warehouse but again you don't have to be using a data whereas query it could be any query optimizer will automatically select these if it finds it's useful okay any questions good great good now now we're gonna again head to space that's why I use the term before we're gonna go for non b-tree index types and again it looks kind of like this picture it's kind of one kind of craziness let's look at the first one this one is fairly new India it was introduced I believe in nine five and it is bring indexes or block range indexes I mentioned earlier when I talked about that there's a whole bunch of cases where you'd want like a lightweight index one that doesn't take a lot of RAM one that doesn't a lot of heavy expensive to update this is your baby right here okay very very popular for data warehouse applications where you may have indexed every column in the table I'm not saying you can't put a b-tree on every column in a table but you probably should think twice before you do that okay b3 indexes have can be very large and if you index every column in table your b3 indexes can be much larger than the table itself an aside from storage you have a lot of overhead every time you do an insert every time you can update you've got to maintain those b3 indexes okay brin eliminates that a lot and gives us a way of creating this lightweight index on columns that probably don't need a b-tree but I'd still like to have an index on them that's kind of the logic behind it so the index is very small it basically stores a minimum maximum value for a range of blocks typically one Meg so you've got very fully got two values for every Meg of data that's pretty small okay you can scale it basically allows you to read through a big table and skip very large sections of your data it's like it's ideal for naturally ordered data but it's not a requirement the index is 0.003 percent of the heap size typically okay and that's percent so it's not one percent it's not point one percent its point oh oh three percent and you can get a again a sort of an under scene because in many cases I'm going to show the example later sometimes the B tree can be bigger than the heap for a very narrow table be true can be bigger this is 0.03 percent the size we ain't going completely different direction indices are inexpensive the update there is a you can actually index every column and very little cost and lookups it is a little slower so you have to have to accept that I want to show you an example in a minute but right now I'm just going to go through the index types and we'll show you some examples the next one is gin this stands for generalized inverted index I know the letters don't quite work but pretend the N is like one of them uh yeah one like one of the letters f2 by basically it's best for indexing values with many keys or values for example text documents a lot of words a lot of lexemes we call them tokens you know in a document so you want to index all of them that's a case we have a lot of values in a document JSON you're gonna index keys and values they're again a lot of values for a particular document a particularly column multi-dimensional arrays monkey don't real data are raised again many values in a single column gin is very ideal for that type of storage very good for cases with a lot of duplicates optimized for multi Road matches and I'll show you some examples in a minute key is stored only once so if you have you know the word Fred in your in your in your document we're only gonna store Fred once the index I don't care if it's in 10,000 collar rows there's only gonna be one word Fred in the index I'm gonna show you how that works in a minute but again we can have a lot of duplicates or you've got a lot of documents with a lot of duplicate values inside a column this really shots index is restored with the key and then for any values of the rows I'll show you example index updates are batched because usually this is a very expensive update operation we have a very very slick way of doing that thanks to the Russians and we've had some major improvements in 94 for this a particularly compression I think thanks dollars and it cut off compression of row pointer lists and optimized multi key filtering again in 94 if you're running that or later you'll find that they're much faster than previous releases final one when actually the final one of them generalized search tree this is as it sounds generalized so it's not like gin which is primarily looking for duplicate cases or and it's not like b-tree where you're gonna binary search everything gin is sort of for everything else that doesn't fit that's how I think of it it's it's very good for things like geometric types range types age store of trigrams these are cases where b-tree doesn't work jin doesn't work the jist allows you to basically create your own and thanks to the Russians they've thought of a bunch of cool ways to index these things and we are obviously the beneficiaries of that over the work they've done in the past 15 or 17 years and we wouldn't we wouldn't be the same without these without these features that's for sure supports things like distance operator nearest neighbor so you can do things like give me all the clothes five closest pizza shops to a particular location that's usually very hard to do in SQL it actually works very well in Postgres and it's highly optimized which i think is exciting again we're and I'm going to talk about this more on tomorrow but effectively the concept you start getting is that Postgres is taking the relational sort of standard and then sort of reaching for the stars right reaching way beyond sort of the stayed sort of capabilities that everyone as soon as a relational system can do and now starting to do things that no one else does and you have people adopting Postgres now because it does things that no one else does and I think that's going to get more and more common as we as we go forward SP just this is a different version of this just it's space partitioned which sounds really cool effectively allows you to sort of work around some of the limitations of gist for particular types of data there are some some data layouts that are particularly hard to represent efficiently exist and space partitioning kind of allows us to sort of reduce the search space and and kind of it stores the data it's sort of in a different format I'm not going to go into huge detail here but again it is it is another option hash indexes I'm leaving this for last for a reason I'm going to get at a minute they're very traditionally used in relational systems Postgres hash indexes are not that good our b-tree is so good that the hash indexes are really not that much of a benefit anymore so we don't encourage their use they are not crashed safe so effectively if server crashes you'll actually potentially get an index corruption when you bring the server back up so as of 9 6 or 9 5 when you try and create hash index we will issue a warning when you create the hash index to say you are you sure you want to do this basically because it's just it's just it's embarrassing we do have some people working on it we haven't ripped it out although maybe we should have years ago somebody is working on making them crash safe they're not replicated they can't be restored with point time recovery their performance is not good their concurrency characteristics are not good just the big boo-boo right just go yes sir why not rip it out so the reason we haven't ripped it out is because every time we go to rip it out somebody who's really smart and it's usually a different person each time shows up and says I will fix it and they yeah yeah and the and they technically do ok they technically make them better in some dimension of better so they'll fix the concurrency problem they'll fix the you know the make them faster to build they'll make them faster to look up right still they're not crushy b-tree but they're still like a little bit if you look the right way you can see it being faster and and that's why we haven't gotten rid of it we do if somebody is seriously looking at making a crash safe and I think if we can get through that like thing we but there is no way we would have accepted this had it come in brand-new there's no way but but I I was it took me years to get the warning to get the warning so when you try and create it you get a so that it is what it is I'm trying the best I can let me ask yes sir so the question is if we had a hash index that was really implemented well would it be a value and the answer is we don't know because the b-tree code has been so optimized right that we don't really know what if we had a hash that everyone was working on if we could get it to pass better than b-tree okay so it's kind of like it's this we don't know problem that we you know it's possible somebody can sprinkle some fairy dust on it and it would be faster but until we actually see it we don't know and it's it's hard it's it's difficult to justify ripping something out when when theoretically it should be faster if that makes any sense yeah but there's also concurrency problems which are a little harder to work around because the way b-tree set up it spreads it out so far that you don't have as many concurrency problems there's a lot of research of how to do that in concurrent way hash has some of that but the structure itself makes it harder to do some of those cool things what's but multiple rows within see hash bucket would cause contention we have made I think we've made Murli and I think we've made hash improvements probably every two years for the past 10 years or 12 years yeah so why don't we incur an internally convert to be tree I think hope springs eternal basically you know there's this there's a sense that so and honestly the people who come to resurrect hash these guys are we and when these are really smart people and I just can't go up against them and say well you're wrong because I'm probably wrong right and and maybe there's gonna be some magic pixie dust out there somewhere but we keep waiting and every time somebody proves it like they do improve it if you look at the release notes we think in nine six we increase the number of default hash buckets so all of a sudden we get a lot less contention and we had a super huge super small number of hash buckets and I think now it automatically doubles the size of hash buckets as you it does a whole bunch of crazy stuff that's actually getting faster so I don't know it's a it's sort of a philosophical issue at this point I would say a question over here I'm sorry okay right so the question is should we just take it out well my point is that it took me years of suggesting to get the warning so it's hard to imagine I'm gonna get an approval to rip it out I just don't it's not gonna happen particularly because we have sites that are using it for particular use cases and all of a sudden if we just took it out then the indexes would just not work anymore and then you know we have to put it away and obviously people do care about it cuz they keep improving it so I guess as long as I'm making little baby steps and the warnings there hopefully you know it doesn't really catch most people it's usually the novices people come in brand-new from other database you're like oh I want to hash index on that I'm never gonna be doing greater than you know I mean either they they assume from other systems that Postgres is just the same and then reuse it at least they're gonna get a warning out but I'll also say we don't get many complaints about it either which is odd like we don't get very complaints of Oh Mike I crashed on my indexes how she mixes corruptor I don't even get complaints that it's not replicated I just doesn't seem to I just don't hear them so I guess if I had a lot of complaints we would do something but in terms of general complaints about it I don't I don't hear it and I guess it's I don't know I don't know what yeah yeah just the boos log yeah so this is you know this is dirty laundry well then that means at least we're communicating that's good yes sir what is you I don't know there is another use case there's no you people let me put it this way people smarter than me say just stay around so it's staying around so if the if the key is very long then you're the hash would be smaller yeah that's good point yes sir right so it so the question is it would be it would be similar to be if it was done properly it would be someone to beat rebut have some more disk size that's also true the build time off is often smaller should be potentially smaller although I think right now it might not be true so again you know I I have no formal computer science training I've learned from read a lot of books so some of the guys you were really you know though guys they don't want to take it out so it's not going anywhere yeah I see right so the argument is that it should for large values it should auto hash it and in fact in fact we I want to go to the next slide I can't talk anymore about hashes but I'll say that we actually have an index a Gini index which hashes the JSON is it Auto exactly what you're saying in Auto hashes okay let's move forward I'm not making this up if you do that query right there those are your index types okay built into Postgres don't impose Crestor okay so let's take a look b-trees great for unique lookups Brin is ideal for indexing many columns small lightweight okay jins' I deal with many duplicates as we just ideal for indexes use keys have many duplicate prefixes okay and just for everything else that's how I kind of think I'm even mentioning hash okay so let's take a look at data types data type support again covering this more in detail tomorrow here are all the data types that support b-tree that's a lot of them okay you know you see integer var car text no text is in there it's it's it's actually BP car over there let's see here timestamp date/time float integer you know roll a lot of them pretty much everything not everything but a lot almost everything if we look at bring little smaller but still very impressive notice the word min max because again it stores the min max for a range of blocks so many of the of the ops have those names ginned pretty small all right what is that for that's we got okay but it really kicks on those so I don't don't sneeze at it if you have that data type this is what you're going to want to use extensions I don't I don't think I have listed here because I didn't install everything when I ran it it's a good point and just these are the other ones and what you can see here are the geometric types I had not installed post GIS but G post GIS would also use these and some of the text search once can use it although we probably recommend Jin in these cases okay SP just a little smaller again related to just some specific data types so index type examples bee tree this is the standard layout okay of the way B trees normally stored you've got a route 1 internal wand local leaf ones and then they point to the heap here's an example of bring I'm creating a table here with a hundred thousand rows and 100 million rows I'm sorry and I'm gonna create a b-tree index and I'm gonna create a Brin index and we look at the bottom the table itself is three point four gigabytes the b-tree index is 2.1 gigabytes ouch okay again it's only a single column so there's no data there but it's still really Hulk and big the Brin index is 140 kilobytes that's accurate that's kilobytes there so the ones that ones in multiple gigabytes the other one is a hundred four kilobytes and that's obviously dramatic Jen again I said Jin is very good for duplicates so I'm going to use a text search example here what I basically got is a I created TS vector column I put the Fox is sick and how sick is this if I look at the actual table it has basically taken the string converted them to lexemes and then store those in the table okay and if I look at the index the Gini index effectively what I have is the word Fox this how and the are once and the word is and SiC are twice remember I said if you have the word Fred it would only show up once no matter how many rows had the word Fred well Fox SiC is here twice okay SiC is here twice as in two rows but I only store sick once in the Gini index because it's the it's the key and then all the matches okay and obviously very very well for optimizing things like full-text search and and so forth okay integer arrays also or index the same way you would index you know here's a JSON example using gin and the other basic the other big use for gin indexes so I've I've got a bill true a bill active and Jack also active if I look at the the JSON at the bottom there it's there's Bill and Jack but when I actually create an index on it in this case a JSON B index we opt index refused to default what I have is I have bill and Jack at the bottom of hearing once I've got active name and true appearing multiple times I only store those once and I store all of the matches so again same concept I went to before where you're you're doing gin and you're only doing one key and moldable matches for that okay there also is a different type of index you can create for je for JSON on gin it's called JSON beat path ops what it effectively does is to hash the entire path of the JSON path so in this case I am hashing named Bill named Jack which are unique and the the sort of path active true actually appears twice okay I only record it once there are some advantages were using one or the other but effectively you can read the docs it has to do with whether you think you're gonna look up a whole path or whether you look need to look up individual keys that may exist at different places within a document okay just is again loosely coupled values also things like geometric types range types IP network types here's a linear example this is a linear example it's very easy to do for b-tree because like if you think like a b-tree it just orders everything right just you know from smallest to biggest it's easy to figure out where everything goes try and do that in a graph way this is trying to give you some of the challenge of what you need to do here so like like I could do like what if I have all these points how do I like do that how do I get it into b-tree cuz like what would I do X is greater than two but but there's no representation of the y part right so I'm only indexing one component that's why be tree just doesn't work effectively with with gist and geometric types you create things called bounding boxes so you have a big box that goes to smaller boxes which goes to smaller boxes similar to the way B tree works right where it gets big and big ranges and smaller and smaller ranges but instead of the range is being linear through now boxes did you get that so when you do B tree it's always a linear one one you know on one axis effectively they're doing kind of the same thing as be tree but it's boxes that are getting smaller and more detailed as you go down okay again amazing work done by Russians originally a code came from Berkeley but it's been drastically redesigned over the years and again it's sort of an our tree representation you can now imagine if I'm looking for a certain point I go for the big box it has the point then I go for the smaller box it has a point then I go for the smaller box as a point then I can find my points okay very good for two-dimensional stuff obviously box circle I met obviously GIS uses as well for ranges also very good with gist if you've ever dealt with range types I'm not going to cover that although I will cover it tomorrow so stay tuned if you're gonna be around but it allows you to sort of do the same kind of complexity as we did with with our tree boxes with bounding boxes but now the bounding boxes are ranges okay so think of it the bounding box is kind of instead of being XY it's now start and stop and we now box's ranges where certain ranges fit certain boxes very very interesting again hard for me to kind of conceptualize a lot but if I think about enough like I could figure it out SP just a little different this is a case where we're actually indexing prefixes so in this example I've got URLs which start with HTTP in a lot of cases start with FTP in a couple other cases actually these are you are eyes technically and what I can now do is I can kind of segment my strings now based on the prefixes that are common among a whole group of rows okay very good for long values that we're gonna chop up it's not use that much I think a lot of people don't understand it sometimes but I'm trying to correct that here but there are some definitely cool use cases for it again these are cases where traditional beetrice or traditionally than the the normal just wouldn't would work well you can imagine that quad point ops is used it's just again a different type of bounding box uses four corner points in a square of decreasing sizes so it's just a little different it tries to avoid some of the problems for a particular data a certain data layouts have trial have trouble with with gist so the quad point ops helps with that Katy points what's only one dimension I'm not gonna go into that we do have extension support for these index types which I talked about earlier we already mentioned PT trigram actually store supported by just ang in post is of course H store in array I'll treat cube and so forth these all support the native you know super cool indexed types that we have any questions Wow yes sir so why are some of this index types and extensions and not so it's it's actually the these index types are in Postgres it's merely support for the data types that's in the extension C so gin and gesture in Postgres but we have to write if we add a custom data type we have to teach gin and just how to store those data types and that's why they look like that and the reason their extensions is because not everyone wants opposed to I asked their database not everyone wants cube in their database or H store I think that's a good reason yes sir yes so beat region b3 just could they be in court in fact they could we do have somebody in Russia who's working on it's a woman who works your Postgres Pro and she's working on sort of optimizing some of the beat recode to handle cases with a lot of duplicates and if we did that that we probably wouldn't need beat region which as you imagine is really designed to handle duplicates really well in a traditional beat reformat we can get rid of it okay okay so the gentleman space is saying he had to use beat region once when he was he had to mix gin with another column that also had gin and he kind of pulled them together I think so it yeah there are some crazy cases where you still would need it that's a good point so let's finish up with index usage summary again this is a lot less technical than kind of the earlier part when do you create indexes when you have a lot of sequential scans maybe maybe take a look at the PG set user tables for that you can use explain to see which indexes are being used sequential scans are not only always bad but again you have to use a little wisdom when you get in there PG set user indexes if the value is low for index game the index not might not be used if the value is zero it's never being used so keep an eye on that you know index use creating indexes it's great for moving it unused indexes it's also great so don't don't think always adding is good removing is also good and then unnecessary indexes you storage and slowed you down when you're evaluating index type you have a couple things to consider the time to build it and the overhead of maintaining it to store the sizes the size the index the access speed and then the flexibility how useful is that index and that is my last slide this is actually a parlor in in Japan I have ran out of time so unless there's any final questions I'll be glad to be up here and answer them I appreciate you asking questions while I spoke and I hope you have a great conference so thank you
Info
Channel: Postgres Open
Views: 2,147
Rating: 4.7333331 out of 5
Keywords:
Id: Xhi6u--9860
Channel Id: undefined
Length: 50min 56sec (3056 seconds)
Published: Wed Oct 12 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.