PostgresOpen 2019 JSONB Tricks

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good morning everyone can you all hear me okay good just making sure so my name is Colton Shepherd as you can see here I'm a former situs data employee I made it to the Microsoft buyout and now i'm part of microsoft and today i'm here to talk to you about Jason B and a few different parts of it it's sort of a mysterious topic so I thought I'd start by giving a quick overview of some good use cases for it what it is bad use cases for it as well because we see a few of those every now and then and it's always good to know when not to use it and I've also done some research on indexing it and all of a quick teaser of what's coming in Postgres 12 as well without further ado here we go so Jason well first things first actually how many of us use JSON good it's a awesome thing for those of us not familiar with it though JSON stands for the JavaScript object notation data interchange format established an RFC 7158 despite the name though this doesn't have much to do with JavaScript in the modern day it's a platform agnostic system agnostic language agnostic lightweight more or less human readable data format so we've got an example here on the slide of just a basic ice-cream sort of schema that I've set up here it's a single JSON B column it's just a really basic example I'll be using it a lot of my different queries here across the presentation to demonstrate everything so there are two J there are two Postgres JSON data types JSON which is a string format that just as a simple string containing the exact JSON you entered that you can parse in a few different ways and what I'm going to be focusing on here JSON B which is instead of a string format it's decomposed and stored in a binary format this makes it obviously a lot faster to access more indexable and easier to compare things that were entered in a slightly different order so it's more order agnostic but it does come at the cost of slightly higher disk usage which I'll be going into a little bit more D later so use cases there's a few use cases where JSON B is really useful and well worth using here the biggest one that we see most often is just unpredictable data structures if you've got data coming in from a customer where you can't fully control schema changes or fully predict them if you've got say a telemetry staff or occasionally new devices are added and they'll have different fields that you can't predict in advance that sort of thing if you just don't know where it is and you don't want to have to worry about schema changes in the future JSON is going to be a great place to start looking here another good use case is a lot of attributes that are rarely used if it's a good alternative to the entity attribute value model tables that are often seen there where we have a table that has something like here's the ID of what there is here's the field here's the value you can replace that with just a single JSON B column of one or two fields or dozens of fields all with matching values and that sort of thing there and another good use case would be arbitrary field that have dynamic values so if you one thing we see pretty often is analytics use cases where users can attach tags to events or things like that and they can write their own tags and each of these tags of their own values instead of trying to model this in a complicated column format where there's messy joins or messy look-up tables or things like that you could just use JSON B and have those fields just be arbitrarily put into there so for an example for a good example of a good use case here I'm going to mention something Microsoft did actually started this before the situs acquisition the Venise DB project which was mentioned earlier in Sunil's slides for windows telemetry data so with that then the basic use case is a lot of Windows 10 machines send it a lot of metrics they come in in a through an aggregation system in a JSON be formatted fashion there is a aggregation pass that does some processing it's some sums to reduce the volume a little bit and get some more useful data out of it some JSON B survives this pass a lot of it is just aggregated into sums and the like but it's used both in incoming and processed data for further aquarium they're the one of the big advantages they found here was that it was easy to add new metrics without major database changes instead of trying to add a column on a multi-billion row table there's instead just okay we've got to do JSON B field we can query that JSON B field and at the moment that this is processing five terabytes of data a day it's a 2,800 quark cytus cluster so that's cheating a little bit and probably won't be representative of single node Postgres processing but it's a great use case they can add metrics arbitrarily don't need to worry about big schema changes don't need to worry about keeping all the different teams and sync on big schema changes if a team drops a new metric and forgets to tell people all themselves it's in the database it's not an issue it can be queried nothing needs to be worried about there so let's go into the less good use cases here I've got an entire slide for this first one and it's just statistics gathering issues for those of you that aren't familiar with Postgres statistics gathering Postgres gather statistics on the distribution of count of columns the contents of columns the data density the placement of different values on average that sort of thing and when you run a query behind the scenes as I'm sure most of us know Postgres is looking at the statistics to figure out how am I going to run this query what sort of scan am I going to need to do what's going to be most efficient here where does the data live how can I get this to you the fastest unfortunately statistics don't work on individual values in a JSON B column it can it'll index the entire column so for a strict match it'll be just fine but if say you're looking up a value that isn't that isn't only all present in a few rows it's going to have to guess as to the density of this and its default guesses aren't always accurate here this tends to happen in particular with values that are relatively infrequent it'll tend to guess them as being a little bit more prevalent than they actually but that isn't always the case here this does vary a lot so once again if you've in some use cases statistics gathering can lead to the planner being imperfect on this and I've got a customer story coming up with a little bit more detail as to all one particular case for that became relevant as for other use cases where Jason B might not be as ideal field that almost always exists and this one is partially a function of indexing because it isn't always that easy to index JSON B I'll be going to a bit more detail on this later and partially because of the storage model of JSON B for every field that is being stored the field is being stored sorry the key and value are both being stored in the column so if you have a million rows and 80% of these have the same JSON B field the field identifier the key is going to be present in storage for each and every one of these obviously this is not very good for disk space the other use case where where it might not be as good of an option involves indexing as I've mentioned before indexing is a bit of a hard problem here I'll be going into some limitations and capabilities on that later so another customer story I've got here where it's mostly a good use case but there were a few things that we learned and the customer learned where we had to adjust our usage of JSON B and reduce it a little bit it comes from our biggest customer heap analytics so their use case is one of the ones I mentioned earlier with custom tags so they've got an event stream coming in where end users can attach arbitrary tags to events and as many of them as they like any given event might have no tags it might have 20 tags and actually not sure of the upper bound there so this is mostly a good use case as I said JSON B is great for arbitrary tags they don't need to do anything fancy and data modeling and because they're using JSON B each event that occurs only needs one row so they can in use partitioning more easily they have a lot of flexibility in how this is done without too much confusion unfortunately as I mentioned the keys are being stored in full as well as the values so the disk usage was getting magnified heap did an analysis of their most commonly used JSON B columns sorry JSON B values and ended up taking 45 or so of the top keys that were used and I think on average these were used in at least 60% of the entries in their database but I'm not certain on those numbers and they turned those top 45 keys into proper columns and this was a 30% distance just by reducing the overhead of this and considering they were at about a petabyte on a site lightest cluster at the time this was a lot of disk space saved and I've got a link to their blog at the bottom with a bit more detail on this here it's a pretty fascinating story and my next customer story that I've got here is about a particular about a use case that just turned out to be bad for us here this is not this was a prospective customer that I will not name didn't end up working out personally because the use case just didn't really fit and they will remain anonymous so they had a tag based search engine the the big idea was users would submit data of various types and there was a fuzzy matching of fields and values there was a timestamp several different metadata columns and a single JSON B column that contained all the tags that users had submitted so just a bunch of key array of key value pairs and the queries as written were using the like operator against both keys and values and we couldn't get it to index there's just no good way of indexing more than a specific key in there so we were doing full table scans on pretty much every query that they had we did a pilot with the same data set where we convert it to column based storage and it was significantly better but they didn't want to do that big of application changes so they went to a different database instead of Postgres where you could index the keys in more depth and it seems to have worked out for them so that would be something to take it to keep in mind here sometimes indexing limitations mean that Jason B is not going to work out for you so now we're going to go on to just a quick overview of the JSON B operators for those of us who aren't too familiar with them and I'm going to start by explaining what an operator is this one's really easy they're just stock functions for manipulating and accessing data there's a few obvious examples on here we probably all know what they are I just wanted to make sure all on the same page here so first JSON B operator I'm going to briefly mention is I'm actually not sure what it's called now that I think about it but it's just to get the JSON B object fields you can see and I forgot to set up the highlight button here but I'm not worried about that you can see we can get the JSON B object field by key in this case we've got our customer info or going to the loyalty info value in there and that's got a loyalty program identifier in there and here you can see with the quotation marks we're getting the it as a JSON object which means in this case because it's just a text value it's just quote wrapped and instead if you use two arrows in here you can see we instead just get it as text so you can either get it as a JSON B object for further manipulation or return it a very simple operator I use it a lot in here next operator I will briefly mention here is just question mark does this string exist as a top-level key within the JSON field in this case we're just looking for regular icecream cones in our order history here and I've got two people apparently in my dataset that ordered that and we've got some very basic results here this is just a boolean result mostly at the use cases I have seen for it have centered around filters on key existence not much more to say here it's very straightforward and easy to use the operator does have a couple variants though if you want to get every element if you want to check whether every element of a given array of strings exists as top-level keys here you can use question mark ampersand as we see here and if you want it to be inside an or operation where you're looking for basically in for lack of a better term you'd use question mark pipe and we can see an example of that here I'm doing a the same query but this time I'm looking for regular cones or waffle cones and you can see I get three results this time there's an order for a waffle cone here not the world's largest data set but that's what it can do and that comes in handy a lot we also have the option to combine operators obviously if I say wanted to look at the loyalty card information from my purchasers in this case I'm going to drill down to the loyalty info level and look for anything that has a loyalty number in that field and looks like I've got two of them next operator I'll cover is the operator or a containment operator does this JSON be element exists at the top level within this other JSON B element it'll go either direction you can see I've got two examples of it here either if the arrows on the right obviously we're checking to see whether this name equals John Smith exists within the customer info field and if the arrows on the left it goes the other direction very straightforward here it's you can see that JSON B is not the easiest thing to type in because of the quote escaping and quote wrapping and everything there so in practice I've mostly seen this for comparing JSON B that was generated by other queries or exist in other columns and seeing whether it exists there but there's nothing stopping you from using it just directly inputting the JSON and because this is JSON be and not JSON order doesn't matter here as long as it exists in there you're going to be all set and the next one here is path operations so let's get the values at this path as a JSON object so for this one I'm feeding it this path right here this loyalty info come a loyalty program here and how this works is the first one is that to the top level P of loyalty info and there we have loyalty and a loyalty program and there it is it's just a list of keys for lack of a better term sort of a nested dictionary approach if you want to take the Python way of seeing it there and we have the option of either getting or values as JSON objects as you can see here or as text instead depending on how many arrows we put into the operator so now that we've gone over the basics of operators and can dive a little bit more deeply into everything here let's get into indexing operators and this is where it gets interesting some of you who here has tried to index a JSON B column not a lot of people not not easy is it to get it right so if you looked into this and it looks like only some of us have not all indexes get particularly good results here most of them are very limited on what sort of operators you can use an index can with and some of them don't even work and and unfortunately one of the things I found here is a lot of the advice you can find online for operators and indexing applies to older Postgres versions so either it doesn't cover the newest functionality or there's better ways of doing it or stuff like that so I decided that I'd spent go into a pretty deep dive on this one to make up for this so let's start with everyone's favorite b-tree indexes that's everyone's favorite right good so a b-tree index on a JSON B column only handles equality operators you can see I've got an example here of a index scan on my table here where I've created a b-tree index on my product column and didn't do anything further here and this is about the only sort of query where we can use an index can a direct comparison of the entire field order doesn't matter as I mentioned earlier because it's that decomposed binary format but it's all it's going to really be used for he if we're going to be indexing the entire column fortunately though we can use function indexes using B tree so let's say I only care about this name field and customer info this name key so I can create a b-tree index on just that and when I do this select where I'm looking for where I'm using the same operator here to get the name of my customer info field we're actually using an index skin unfortunately there is one limitation here we can't give it a single index can only have one of these keys so if you have one or two keys you're going to be accessing frequently you can create a function index on each one of them and you'll get great results but if you're if you're hastens if you're not case-sensitive on these and you might be getting a different case coming into there if you've got say 30 different keys that you want to query on and have it be fast and indexed you're not going to get a really good result here you'll either get bloated indexes or can't really use them at all and we still can't use like operators here either so let's go to hash indexes that's another popular one here but it's the same thing equality only you can index a function or operator result and like doesn't work which we probably expected here but once again we can compare JSON directly here so here I've created a hash index on the same column I've got the same query running here and I've got the same the same results we've got a successful in index scan when we're trying to compare to a bit of JSON B that I just inserted directly here so once again limited results but situationally useful and here we can once again we can do function indexes though so here I've got a hash index Craig done my customer info field you'll see that I've got a explain planful right here for just comparing the name value and I decided to show what happens when we try to use to get here and even with seek squint disabled as you can see from this ridiculous cost we're still using a sequential skin so the next three indexes I'm actually going to cover in just one slide just SP just in brand indexes um no I tried but there's no default operator class for them out of the box I didn't get the time to look into third-party operator classes here unfortunately so I maybe there may be a way to accomplish this but stock Postgres unfortunately once again we can use we can use function indexes just fine but we we can't meaningfully index the entire column with them see so now we get into Jin indexes and you can probably tell this one's going to be a little bit longer than the previous ones so when you looked into your indexes did you end up with a Jin index on a JSON B column or oh nevermind okay a partial index on like I was showing with the bee tree okay so if you looked into gin indexes here you've probably run into this is some of this is about half the documentation on what you can do with gin indexes here and it's not really obvious what some of this means here unfortunately so the excuse me so for gin gin as a concept of operator classes that can operate sorry for operator classes that limit what gin what operations gin can support an index and for JSON B we have two different approaches here we have JSON ops which is the default and that can index several operators you can see we've got our different question marks operators here for just checking for existence and we've got our containment operators that we can also index and that's it and you'll see we have a second option here of JSON B path ops which only handles containment so that doesn't look like a lot and this I'll admit isn't the easiest documentation to read unfortunately but it gives us something to work with here so let's look into some examples as to what this actually does here and here we have a nice wall of text showing what happens when I create a JSON B path ops index here as I mentioned before JSON be path ops is the most limited of our two options for hit list it only handles containment so here i've created an index using gin and using JSON b path ops it's the same syntax who'd normally expect for creating a gin index and I'm running my the explain plan here for I'm looking for orders where there is a regular cone and there is an array of something in there and here's what the query plan looks like it's what you'd expect from a Gini index we've got our bitmap index and then a quick heap scan to recheck and that's what we want to see and if I get a little more complicated and I put something into there we've got the same thing we're still using it we've got we've suddenly gained a lot more capabilities here we're able to check for containment of different fields we can use this in a lot of different lengths we're starting to get into the point where we can start querying or JSON B field and depth and taking advantage of the index here unfortunately this only really handles containment so you're going to either if you're going to be using JSON B path ops you're going to either need to be building your JSON B to compare it against on the fly or you have to have another column that you wish to compare to the values of this is often the case but it isn't always the case so let's look at the default case JSON B ops so when you just create index on a column using Djinn you get JSON B ops it's the default you don't need to specify it but I am here for this I am specifying it here for the purposes of being thorough and here's what you get and here's what it looks like so we have our explain plan here where we're selecting where the product just contains at the top level or regular cone and access actually using our index scale and we can also use the comparison operator here sorry the array op the array version of this here to look for anything in this array in this case we're looking for people that got an ice-cream on a regular cone or a pine cone obviously only one of those is going to get valid results and we're still using the index even though we've given it even though unlike our earlier index is were we're looking at multiple keys we're looking at multiple fields so if you're going to be indexing JSON B this is probably the operator you want to use JSON B ops the default gives us the most flexibility here and for the use cases I've been describing where it is being indexed the good customer use cases that's usually what we ended up with it obviously won't handle like operations particularly well and you have to be careful about disk usage on this because Jin has never been listened but it's usually pretty reasonable pretty maintainable the reason you would use JSON the path ops instead is going to be size and speed JSON be path ops has a much smaller scope as to what it's indexing here so obviously lookups are going to be faster inserts are going to be faster build terms are going to be faster that sort of thing as I understand it the the use cases I mentioned earlier I believe heap is using JSON B ops because they have enough tags that they can't do a function scan and expect to get good result on any one tag they need to query against them all and they don't have any fuzzy matching of them as I understand it so they don't need to worry about like operators and the like and the venice DB project is pretty lightly indexed but i believe they're using a gin index on their json b columns as well so they can pull out the metrics they need at short notice they focus a little bit more on the roll-up query use case though so it's a little bit less relevant there any questions about this slide by the way yes so you're asking hello Jason B handles null values are you referring to null Keys like somebody enters a key of in this case I don't know regular cone : null or just an empty field or what yes I have to check that in detail but as I understand that that is covered here yes that that will be indexed appropriately anyone else um it should be able to for the most part there was a little bit of recursion in there that might not where there were new nested tags that that might not play with but the basic use case yeah that should have some improvements for it they actually end up going to SQL Server so they're using exactly what you described there and it should work well enough in this case oh sorry I forgot to include the question so everybody could hear it it was just asking as to whether the Postgres 12 upcoming features would change this and there's your answer for it all right so just to go just go into this in a little bit more detail on the different use cases here the sorry I lost my train of thought anyways so moving along I've got a special mention to a function in a to a Jin trigram ops function index you might have noticed I have mentioned like operations a lot in what I've been talking about here and a pretty common request that we get is because of how unstructured and fuzzy JSON V content is you're often going to not really know exactly what you're looking for here sometimes we've got a customer giving you data where you're not entirely sure that they'll be consistent on the fields that they're including sometimes you've got just end users not really remembering exactly what tags they've attached to various events and sometimes well it's just part of the use case where not all data is fully predictable here and you need to use like operators here so one of the things we've done for these sort of specific use cases here and this was actually we tried out for that anonymous company that I mentioned here and got some good results for the furthermore highly used keys is Jin trigram ops and if you're familiar with Jin trigram ops it's a index that looks at trigrams are three letter combinations to do like to do to index like matching against textfields so if we create a function index on the name field of our customer info using Jin trigram ops we can actually use the like operator successfully you can see here excuse me you can see here that we've got a successful screen shil scat secuence excuse me a successful index scan on this like operator here using this partial function index on the field so in the anonymous customer use case that I was talking about here for the keys in question that they wanted to do like against we got at least a 10x performance boost there and we've seen that once or twice with similar use cases as well unfortunately there's two limitations here you can't actually do a like operation on the keys themselves in this case and you have to create one of these for each key that you want to index that being said though for the write use cases here if you've got a few keys you need to compare you absolutely need to compare and that you cannot create as a column here or if you've got if it's a sparse enough distribution the index can be fairly compact despite being somewhat redundant then it's well worth considering this and even though this isn't strictly speaking an index on a JSON B column it gets an honorable mention in this talk for solving a fairly common pain point and effective way yes so the question was will post grasp well for an upcoming version include PG trigram by default and unfortunately I forgot I don't know that the answer to that anyone care to comment on that one or all right looks like none of us do sorry I'll I can follow up with you after that if you like so speaking of Postgres twelve Postgres twelve has a lot of new features here and there's one in particular that is very relevant to us here there is an entire new data type being added the JSON path data type so what this is is it provides a binary representation of a pursed SQL or JSON path expression what this means is we've got an entire new language basically to query JSON be content here and this is unfortunately a very complex topic so I have to go back and forth on the slides a little bit but here's here's the basics so it's very JavaScript like and that you can use the dot operator to return the number of an object so you can do JSON element key to get the value at that key it will just return that in the JS if you put that in the JSON path and I'll be showing examples of each of these you can use dot asterisk to return all object members at the current level you can put two asterisks from there to return all object numbers at the current level and below so you can basically dump an entire JSON string I found this one particularly useful for really complex nested JSON that is not canonically formatted and otherwise would have to recurse over in some fashion you can specify different levels at which you which you wish to win it would excuse me at which you wish to dump your JSON element for this dot asterisk level there is a subscript syntax for getting our values at different array locations and you can get all array elements this is a bit of a text um so I'm just going to go straight into the examples here this is mostly here for future reference for when people go back and want to look at what they need to look into here and we've also got a handful of new operators to take advantage of these here so the first one of these operators that I'm going to work on here is the JSON B path query JSON B path query lets us get what's at a specific path and I'm going to be focusing on that a lot so we can go into depth on all the different options and our JSON path here here I've got a basic example of using a JSON path object with JSON B path query and comparing that to just the - I don't know I still don't know how to print out what to call this exactly this operator here where we're trying to get the name for a customer info column so here's the traditional syntax the customer info - greater than name and here's the JSON path query equivalent here you can see we're giving it two arguments one of them is just customer info which is our column name here and the second one here's our first JSON path expression we've just got quote dollar sign dot name so obviously it's quote wrapped because we're feeding it a string in this case the dollar sign in this case means that it's the just the full value in the left side of the equation here that's how we're referencing it in this case and the dot is just getting me the key at this location and name is the actual name of our field here so here's the results we get in either case it's the exact same we're just getting the name from our customer info field this is about the simplest use case I could find to get up to speed on JSON B path queries so now that we've gone over the very simple use case here I've got a bit of a messy slide because I couldn't find a better way to present this but it's the JSON B path query with our asterisk operators here Astra's operators here so here's the column data we've got regular cone at the top level we've got an array that contains a bunch of elements just going into detail about the different scoops that were put on here this person just got a regular cone with scoop of vanilla ice cream it's about as simple as it can get so here's the first thing which is the json b path query for product and just input asterisk where ID equals 1 and we've got just returned the element in question here so that just returned this this are this entire array of one element so let's do that again with two asterisks and in this case it's a little bit harder to read we've got a few things that we're returning here so first the first returned value is the entire field which is the very top level thing that we're getting here just in its entirety exactly as you get out of the column data the second thing is well we're going to recurse through each key at the top level and return what it contains in this case that's this array the scoops equals one ice cream is vanilla that entire array the next thing it returns is every element of the array in order so this array only has one element so the next field is just this JSON element of here and the next thing it does is it recursos to that and it gives us the values at scoops and at ice cream of 1 in vanilla and then we're done so this is as I mentioned probably not going to be something we're going to be using in every single query that we're writing for JSON B because it's not the most manageable but if you need to dump everything in there if you need to compare that if you're not sure of the nesting level etc etc this is going to probably come into a lot of handy here and also please note unfortunately I don't I do have a slide for this one and we have one more option here let's say we want to see everything at a given depth in this JSON element so here I've got the same syntax but you can see I've got the death of two being specified here and as you can see from the earlier example syntax here the depth of two should just be the any individual elements of this array and sure enough that's what we get we get we get for our ID equals one we get that scoops equals one ice cream is vanilla for ID equals two we've got two different scoops being returned on separate rows etc etc you get the idea here you can also specify a range of levels you could give it two - three for example but that just produces too much tax free to show on these slides effectively so I decided to omit that all right for array existence this gets a little bit easier here I just want to look at the exact same thing that I just want to look at the first element of myarray every time and so I'm just specifying standard array indexing starting at zero in my JSON path element and I get the exact same thing I get each element of the array for everything in there and you can also specify instead of the asterisk here you can specify a specific key instead of getting them all so if we want to just look at the first scoop that was put on to ever are the first type of ice cream the number of scoops that were put on to every waffle cone in this case it looks like somebody won seven scoops yes um these are not actually duplicates these come from separate orders but you could put a distinct operation on there if you oh sorry the question was can we use distinct here in this fashion these come from separate orders so I would consider them to be non duplicates but you could put a distinct operation on to here and just get one well returned if you would prefer yes the distinct keyword would be appropriate here so the next operator I'm going to touch on here is the JSON B path exists operator what this one does is it just returns a boolean value telling me whether or not the path that I specified exists this is probably best chained in two different complex queries here but it can definitely come in handy for a few different things so here I've got a basic query just looking at a few different orders in my different few different or the products that were ordered and I'm just looking to see whether we've got at the top level this waffle cone that was specified here so here we've got the actual JSON B content so you can get something to compare this to and you get a buoy a true or false result showing whether the path exists and we only get a true result when the waffle cone does exist very simple very straightforward and not something we can as easily do at the moment without getting into really complicated SQL so this is a little bit closer to the sort of use case I expect this to be relevant to you can see here that I decided to just look for let's look at my products where a waffle cone was ordered so I've got a JSON B path exists on my product column and I'm just looking for a waffle cone being in the top level here and here we get the exact result we should so the next one that I'm going to go over is the JSON be path match operator that's coming in Postgres 12 here and this is where we can give some fairly complex logic here so I'm looking in this case I'm looking for somebody that ordered two scoops of this and unfortunately I entered the number of scoops as tact that as integer so I couldn't do a cool comparison operator but here's what it would look like we've got the json b path match of our product we're specifying that we want to be looking at the product we don't care what type of cone it was we're just putting an asterisk here for the first level but we want the elements of this to have a scoops of two so what happens when we so we need this equals operator here because we have to give a logic field in this string that will when parse to return us a boolean value we can't give it a it won't understand an integer value or something like that it's just basically saying give us any column that matches where the where whatever we're feeding into the column here matches the logic in this case I'm getting the entire the entire row where we've got two scoops so we've got one scoop of this ones group of this you can get significantly more complicated here there's about a half dozen different fields you can put into this but I'm going to defer to Oleg's talk on this he can go he wrote it he can go into a lot more detail than I can on this one this is just a high-level idea of what's possible and I believe I have two more operators to cover here the next one is the JSON B path query array operator it's very similar to what we saw with JSON B path query but we get an array instead so here's what we would get with JSON B path query we would get all these different elements on different rows in this case they're just recombined into an array so that we can parse them without getting confused to which row produced which output so similar to something we've already seen but some new functionality to make it simpler to work with and we've also got on that on that front the JSON B path query first operator which is just the first returned element that matches this one I've only seen one or two use cases where this is going to be particularly useful but sometimes you just want to get a result and it doesn't really matter which one and this would be appropriate there so that's everything that I wanted to cover in the upcoming post breasts well I'm sure I missed something because it's basically an entire new language we've got to play with but that's a high level idea as to what we've got coming and the sort of capabilities we'll be able to express so all here's my conclusion in conclusion JSON B is awesome it's great for working with unstructured and semi-structured data if you have arbitrary fields that your users have entered that you want to build a query in depth and you don't want to worry about complex database changes you should give it a try it's got a lot of flexibility here it can handle some really weird use cases and it's very extensible however you should be careful of disk usage you can get some pretty big savings by going to traditional columns in some use cases especially when your fields are being are almost always present and planner issues as I mentioned with statistics gathering might be a bit of an issue in heaps case they end up turning off they end up actually changing which executor is the planner could even use I believe they I'm sorry oh yeah they turned off nested loops that was it thank you to try to to try to get the planner to get a little smarter about what they were doing and that works well for them also in conclusion J Stein be indexing is a little bit tricky but there are some pretty good options in there once you start digging into how it works and for my last conclusion Postgres 12 is going to fix everything for a certain value of everything and that's it thanks for coming and I've included for reference a few different articles video is that sort of thing that come in handy here and any questions so the question is what use cases is the JSON path simplify or allow that standard operators won't and I've got some speculation there but we haven't had people using Postgres 12 yet so I'm not I don't want to commit to an answer there but I'm going to give a general answer of somewhat more complicated logic and the more deeply nested JSON should be a lot easier to work with anyone else I guess it was pointed out that it better conform to the SQL standard which is an excellent point that I should have probably addressed in my talk thank you I'm sorry um I don't know how that works but I will make some way of making this available to the public yes this these slides yes I'm sorry oh the JSON be quick path query first operator yes it's fine as I understand it it finds one and then stops but you might want to see the legs talk on this I don't know the internals as well as he does alright I guess that's it thanks for coming everyone
Info
Channel: Postgres Open
Views: 6,485
Rating: 4.9055119 out of 5
Keywords:
Id: p9RItyeKbLQ
Channel Id: undefined
Length: 47min 41sec (2861 seconds)
Published: Thu Sep 19 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.