Practical JSON in MySQL 5.7 and beyond - Ike Walker - Percona Live 2017

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
thank you all for coming to my talk thank you it's been a long week but I appreciate everybody who showed up so what we're going to talk about today is practical JSON in MySQL 5.7 and beyond so we'll get a little bit of a tone here so my name is Ike Walker I work at github I am a database architect on the platform data team at github I first gave this talk back in Amsterdam at the picanha live conference over there in October so this is pretty similar to that I've made a couple of changes mostly just incorporating the ADA content but so you know JSON is a big big hot new feature right in in five seven but what I want to go over in this talk is some of the history because it didn't come out of nowhere right like it's not like now that we have JSON support is the first time we've ever stored JSON in MySQL personally I've been storing JSON and MySQL for 10 years so one of the things I'm going to go over is sort of a timeline of the history of you know how people have been using JSON in MySQL in the past I'll kind of zip through that timeline to get to the present because this is the really interesting stuff is what we can do now that there's native JSON support so I'll delve into that a little bit I'm not going to try to do a survey and cover you know every single JSON function or anything like that I'm going to kind of do a deep dive a little bit just to show you some of the the JSON functions that I find most useful I'll go through a couple of different use cases of ways I've used JSON in the past and problems you can solve with it I'm going to try to throw up a bunch of code as much as I can I won't go into it in great depth but I think it's useful to see the actual queries and the output and that'll all be available on the slides if I'm going over anything too fast just you know raise your hand or shout out you know interrupt me any time we don't need to wait until the end for questions there's a lot of content on the slide so I'm happy as I zip through I'm happy to stop if anybody has questions or wants to go into anything in any more detail so let's get underway so what what is this talk all about so specifically this talk the purpose is pretty simple right we're talking about using JSON in MySQL alright so what does this talk not about this talk is not about whether you should use JSON in MySQL right I mean there's use cases where JSON is great there's use cases where it's not a great fit everybody's application is different everybody's use case is different so I'm not going to tell you you know that this is the way you have to use it or that you have to use it at all but I think it's it's useful to go through some specific examples of ways that I've done in the past so like I said I want to go through a brief history of JSON support in MySQL because this isn't something that happened overnight where you know nobody was ever able allowed to store JSON in their database before until five seven came so let's talk about MySQL and JSON going way back so here's a timeline so MySQL was born in 1995 seven years later JSON came out so that would be I suppose technically the first time people might have put JSON in a MySQL database would be 2002 in 2011 Shalom ease wonderful common schema was released which had a couple of nice little JSON functions that you could use in 2013 we got UDF's that came out in the MySQL lab release Federer wrote some great UDF's for parsing JSON I found those incredibly useful for doing some of the same sort of JSON parsing I was doing with common schema but instead of a stored routine I was able to do it with the UDF so it was much much faster and I'll show a little bit about that later and then since 2015 technically we've had native JSON so for those of you who are already on five seven or working on getting up to five seven now we have a JSON data type and we have native functions that we can use to deal with JSON so that that's pretty cool gives you know it gives us a lot more flexibility and options one of the themes that all push through here is sort of a you know with the practical focus I'm doing is that we have some of these legacy applications storing JSON as text right because before 5/7 you had to store it as text so it's not you know it's not like you flip a switch overnight and convert everything to JSON and it magically works right there's some there's some gotchas around that there's some intricacies and how you go from the past of storing JSON is text until the future when we store it as JSON so I'll try to touch on that a little bit my talk as well so there's the broad timeline and let's just step through each one of these milestones a little bit and talk about what it meant for us as application developers and what we could do with JSON in MySQL all right so the first nine-year period I will call this there's an app for that and what I mean by that is if you wanted to use JSON and then persisted in MySQL before 2011 you were really doing everything in the app layer right so you had to store a JSON as text if you want to store it that was your that was your only option basically you had to rewrite the full string every time there was no way to do in-place updates of your JSON the parsing all happened in the a player or you could have you know maybe you wrote some of your own stored routines to deal with it to write so it's not you're a player per se but it's still as the app developer you are the one responsible for writing the code that did you know all of the the manipulation of the JSON so then in 2011 the new era started that I call the the standard procedures era so then you were still storing JSON as text you're still rewriting the full string every time but now you can do some sort of simple parsing of JSON personally I use common schema to do that so and so when Shlomi released common schema he had a bunch of you know very friendly functions where you could extract a value out of JSON I'll show some examples of that so a couple of specific functions that I used a lot from common schema there were others but the get option function and the extract JSON value I both found useful for sort of simple JSON parsing it's nothing fancy nothing you know super deep you know nested JSON but you can you know this is powerful stuff it could work pretty well at least when you could right the performance overhead of running these stored routines so here's an example of a query that I wrote with common schema several years back I blogged about this at the time and so I just copied this example from that so what I had here was I had a an event table that was storing JSON string and then account of events so basically a kind of events for for these relatively low cardinality JSON strings and then you know so I stored the data as JSON so that I could have kind of a flexible schema of what sort of attributes I was storing but then when I wanted to run a query it was very simple to create these dynamic columns in this example I'm extracting the age and gender from this JSON data and then I'm just grouping by that and summing the events so this was pretty cool I could I could easily run a report on my fact table that was using JSON with the help of common schema Thank You salami so now let's move on to the next great era of JSON in MySQL this is the lab experiments era where we had the lab release where Oracle gave us some UDF's that we could use to do some similar JSON parsing so what did this change when we had the UDF so we're still storing JSON as text right that didn't change is still no JSON did data type but now we can do much faster parsing of the JSON because we're using UDF's which are compiled instead of being interpreted you know in some of my tests is 100 times faster much much faster so if you're doing you know dealing with larger quantities of data the performance of these was much better and we also have a new thing where we can do actually do some in-place updates of our JSON objects rather than having to read the whole thing out and then you know do the surgery and do the write and then write it back some of the UDF's allowed us to do rice so let's look at what some of those queries look like again this is one that I took from a blog post I wrote this is using the same fact table that I had before where I'm extracting the age and gender and I through state in here as well to generate this report and summing events by those make a tribute and you know this looks very similar the syntax of it you know you just give it the column name and then the attribute name that you want to extract and you get your report and the benefit here was that this is you know much much faster 100 times fast or whatever running the UDF's as compared to the stored routines so now you know I say 2015 because that's when five seven went GA but you know obviously it all depends on how long it takes you from that point until when you actually get five seven into production in your environment but now in the present day we've got native JSON support so we've gone through the history what can we do now going forward so we've got options now right we can store the JSON either as text the way we always have in the past you can still do that the native JSON functions will accept either text or JSON as inputs but you can also use the native JSON data type we'll talk about this a little bit more but there's definitely some benefits to using that data type so the JSON data type is a little different than storing text it stores it as binary and the keys are sorted which has some implications we'll get into later on you get faster access to the embedded data obviously because the keys are sorted and the the data structure is a little bit more optimized for that so if you're dealing with you know slightly larger JSON objects or even just you know doing a lot of function calls you'll find you get better performance extracting that you can also do updates you can update those values with native functions which there's a convenience component to that that's really great to be able to do the in-place updates the performance is not totally there I'll touch on that a little bit later some of the differences between five seven and an eight oh but you know the sort of high level thing that you should think about the with the current implementation is if you're up doing an update of a JSON object a JSON data type you it really has to rewrite the entire blob it's not really optimized to do it in place updates yet there's work underway for that at 8'o but current if you're running five seven or you know even playing around with the DMR of 801 you know the right performance the update performance is not as good as it could be because it's not doing in-place updates it's really rewriting the whole blob so another great thing we have is that with the native JSON functions we've really got a whole bunch of native functions it's you know we've gone from the small set of what I used to run from common schema to you know the UDF's that had a larger set and now you know it's much larger there's something like 20 JSON functions in 5/7 so you've got a lot of different options I'm not going to go through every one of those functions like I said before I'll go through some of my favorites the ones I found useful for specific use cases but I recommend you you know go onto the docs in the reference manual and play around with it because there's a lot of really cool stuff in there alright so now let's get to some nice meaty examples I've got a couple of different functions that I want to talk about just to shine I'll show just very very simple use cases let me know if you have questions about it or if any of it isn't clear but I just want to illustrate sort of you know as a beginner's guide to getting started using native JSON you know these are some of the functions you might want to start with so first of all is JSON extract that name might use look familiar that was one of the UDF functions that was the one that I used in my blog post and in the examples I showed before so this is a similar example query to what I did before so I've got my fact table I've got this event data JSON string I want to extract an element from that group by it and sum up my events and so in this case there's a there's a field called country that I want extract from there so the output looks pretty familiar everything here looks pretty familiar except this part right here the oh that's not a that's a smart quote sorry that should just be a regular single quote but so that would be a yeah so the path syntax is different you may notice in the five-seven native JSON function so in the UDF's or in common schema you would basically just pass in the attribute name that you want but here we use the dollar sign dot notation so you'll see that as a theme as we with the native functions that there's some slight changes in the syntax but this is pretty cool we can you know do a similar kind of report to what I was doing before but now it's all using the native functions you don't need to worry about installing UDF's and running you know this lab code that said you know don't use in production now we can use 5-7 native functions feel a little bit more confident about it so that's cool JSON search this is a neat function so say you have you know you don't really know your schema for your JSON data right because it's it's very flexible schema you're not imposing a rigid schema so there's some data element that you know is in there you can't quite remember you don't know you know you have multiple application developers working you don't really know what the structure the data is that you're like I know we're storing the value Android somewhere can you help me like figure out the path to that so that I can use that in other JSON functions so I take the same schema and I say you know what I know that somewhere in the event data I'm storing the string Android you know that's the value show me what the key name is what's the key path here and then there's an option whether you just want to get one example or whether you want all of them and so I'm saying I just want one and so and just give me the first row so the first place it finds is there's an something called OS that has Android in it so that makes sense that's cool another sort of synthetic example using - all instead of the one is that if I want to look for the string law in this nice JSON object and so it's exists at a few different paths right it's at the it's at the top level here under the foggy and then there's a key called law that has an array with Lala so this will spit all of them out so it says yep here's the foggy at the top level and then this array it's in the law array in the first and second position so that's just to show the difference between the two different ways you can use JSON search but JSON search is kind of neat way to sort of introspect your your own dynamic json schema and then use the output of that for think something like JSON extract to run your report so do whatever it is you need to do to extract your data so JSON replace is another handy function this is touched on this a little bit or about how do you update do in-place updates in your JSON strings so for example I just create this very simple key value pair of foo and bar and I cast it as a JSON object so this user-defined variable is storing the JSON data type and then I update my user-defined variable and I say I want to replace in that JSON value whatever key is is whatever value is stored at the Foo key I don't know what it is but I just want to replace whatever store there with updated and then when I read my value back I see that the update happen so that's pretty cool you couldn't do that before when you had to do everything in the in the App JSON array is sort of a helper function if you have a few string values or you know other native JSON values and you want to build that into an array you can use this function to do that so I start with the same example JSON object simple key value pair foo bar and now I'm going to replace whatever is that that key with an array and I'll just build called JSON array you can pass in an arbitrary number of strings or other objects so I'm just passing in three different strings and it builds an array with those three strings so that's pretty handy handy way to build arrays likewise there's a function called JSON object that you can use to build an object you know not necessarily an array but you can put in and you know an even number of arguments because it'll be it'll build the key value pairs that way so in this case I'm making a nice James Bond JSON object so his ID is double-oh-seven his name is James Bond you know he's got more than one car obviously because he's James Bond so we need an array for that and then when we read that back we see it's built it up as a nice JSON object for us another thing to point out here I mentioned before that the keys are sorted in the native JSON type so you can see the sorting there where you write it in sort of an arbitrary order and you read it back in sorted key order so we wrote ID and then name and then cars but when we read it back cars comes before name because it's been sorted you might also notice ID didn't get sorted you know ID should be sorted after cars and before name right well no because ID isn't like any other key this is you know every JSON object has to have an ID so IDs special it gets to go first no matter what and I'm sure James Bond wouldn't have it any other way all right so here's another subtle thing that I think it's important to know about the way JSON works so we've got the way that it's simple comparators work like the the equals comparator in MySQL works a little bit differently with JSON than what you might expect because what it's trying to do is it's trying to take values that are equivalent and return and and you know treat them as equivalent with the Equality comparison even if it's a different data type so if I have the string of hello this is just a string value and I want to compare that to a JSON object that has that is just a string of below even though those are maybe you would think they're different it's you know there could be the JSON comparator you do in a quality comparison there it's going to say you know what for all intents and purposes these values are equivalent so that's going to return true likewise if you take a number cast that is JSON compare it to a regular number that'll return true same with boolean values and same with string values to this this example what am i doing with this one this looks like so this is just showing the things that actually are equal so two different JSON objects you know these are these are equivalent but not equal because there's separate objects but the comparator will say it's true so this is important to know once you start building queries and you want to know about how is this JSON data type going to think about these objects you know you know there's like 42 as an int you know or a tiny in verses 42 as a JSON it's all the same thing so that just scratches the surface like I said this is a this is not a summary of all a survey course of all the JSON functions I just sort of wanted to hit some of the quick ones to get you started and then you know once you start using those you won't be able to stop so the reference manual is great you can click in get all of these one that I will point out here is there's this nice little bit of syntactic sugar this arrow operator which is because there's a very common pattern of extracting values you know using JSON extract like we showed so there's a shortcut for JSON extract is just to use the arrow those of you who are familiar with JSON in Postgres this is Postgres has this as well and then but then when you when you do a JSON extract I think I glossed over this before but the value that's returned is a JSON value so for example if you have a string and you call JSON extract it will be returned with the quotes it's a quoted string and often you don't want the quotes so there's another function called JSON unquote so that's a very common pattern of doing JSON extract rapid and a JSON unquote it's very verbose syntax so you've got the syntactic sugar where if you just want to do JSON extract you use the arrow if you want to unquote do a wrap call of unquote and JSON extract you've got this little double-headed arrow nice little piece of syntactic sugar makes your code you know a little bit shorter a little bit more readable so I definitely use that I think I forget how many you know I use sort of a different mixture in the example queries I have but that's something to definitely be aware of before you take a deep dive into all of the functions alright so we've gone through the history of JSON in MySQL I've talked about you know some of the specifics of what we can do with native JSON now so now I want to get into a couple of use cases you know I talked before about how JSON is not for everyone but here's a couple of specific problems that I've solved in the past or that you could hypothetically solve using JSON so I think it will be illustrative to go through these examples and again just raise your hand or shout out if you have any questions about any of these so use case number one is flexible roll-ups so this will be this will look familiar because this most the sample queries I've been using before to show what I did with common schema or with the UDF's we're using this this flexible roll-up pattern alright so what what's the goal of this so I want to basically support different fields for multiple customers with a single dimension in my fact table so this this example was from when I worked at an ad tech company and so we were running ad campaigns for different customers and different customers wanted to track different metrics but I want to come up with a generic schema where I could track you know track all the customers in the same table even though they had slightly different pieces of data that they were using context contextual data they were using to track their ad campaigns so the one requirement being I tried to use relatively low cardinality fields so that this you know JSON field that I was rolling up on didn't have you know an explosion of rows where it wasn't worth rolling up at all so yeah so as just as a simple example of different kind of data that are advertised which we're tracking you know some people wanted to track ad campaigns by age and gender for example so that's cool we can do that very easily with this JSON string whereas other other ad campaigns different customers or even you know different ad campaigns for the same customer they want to track by country and operating system okay that's pretty cool that's a you know simple string so what does that look like when we try to create a schema for this and we try to move forward so first of all let me just say you know this is sort of a naive example I'm going to say okay I want to track you know all the events that happened you know for a specific ad on a specific date based on that JSON string right so they'll create a rollup table I'm going to use the JSON data type right because I'm on five seven why not let's just push it all into the native JSON and since it's a rollup table I want to make sure that I have a you know a unique index on all my dimensions right so I don't insert the same data more than once so I put a primary key on the date that ID and the JSON string right seems seems like a good idea but no that's not going to work so this is our first Gacha we've hit is the JSON columns can't be put in an index you can as we will see later you can extract individual attributes out of the JSON using another great 5:7 feature which is generated columns but if you want it you know if you if you try to index the entire JSON string using the native JSON type - Kuehl is not going to let you do it but that's okay there's always a way in MySQL right just because the first thing you try doesn't work there's always some way to make it work so let's see what can we do so why don't we just use text right that's what I've been doing for the last 10 years storing JSON is text so no big deal you know those strings didn't look too long but I'll use the VAR char 750 just to make sure I have enough space and I'll just stick that in my primary key no problem right so that's one way to do it another way to do it is with generated columns you know so you know I really I really want to store my JSON in the new data type right like I don't want to have to go back to store it as text that seems like a step in the wrong direction so here's what I'm going to do I'm going to store the event data as JSON and then I'll create a generated column that converts that JSON to text and then once it's text I can put in my primary key right no problem so there you go I've got with a generated column hack I can stick that in my primary key it's the same data that was in there before but just casting it as a string lets me do it as long as the string length is short enough thank you that as long as the string length is short enough that it's not going to exceed the maximum index length which of course will vary whether using you know to be large prefix and you know what flavor Vinod you be using and everything so that's another hack so what is the what does the query look like if we use one of those one of those two hacks now we can extra max-age from the event data just like we were doing before but instead of using JSON extract I'm using the syntactic sugar so I run my report this looks very similar to what I did before the one difference that you'll notice is I've got these you know the strings are quoted because I used the single-headed arrow rather than double-headed arrow so if I wrap this in JSON unquote or use the double-headed arrow then I would have gotten just the the raw strings without the quotes likewise I can do it by country this is showing the JSON unquote way to get rid of the quotes this is more likely the way I would do it but it's the same thing just using JSON unquote with the syntactic sugar I could use the double-headed arrow the same way alright next use case configuration data this is also from previous job where I worked at an ad serving company so what was the goal here we wanted to be able to basically store extensible configuration data of an arbitrary depth for every ad so this allowed customers to configure their ads developers to add new features that had new configuration parameters without having to make a lot of frequent schema changes so what asked in that what a schema for this might look like is if you've got a primary key on this table you've got your ad ID and then you've got name value pairs for all your configuration parameters and let's just store that config data in the JSON data type why not so this is a you know example from back then if sort of what some of this configuration data look like we've got a hex ID you know we've got multiple objects embedded in here with a hex ID and then you know type UUID etc and then similarly I've got paths and actions I've got a raise and there all sorts of great nested JSON data and so how do I query that so again we're going to go back to JSON extract this is the most common one so say you know I want to get the sub paths for this specific path I can pass in that using the new path syntax and it'll return an array back that has those three sub paths now I'm going to do an update using JSON replace another function we saw recently and I'm just going to put two values in there and then read it back and instead of three we've got two so there's a use case where you can read the data you can update it read it back and see the changes likewise here's a simpler select so rather than rather than reading out a specific path to get values what what I'm doing here is I'm using wild card to say basically let's jump back to the I'll show the example here so these these three objects I want to know what their type is so I don't care what these arbitrary keys are you know these hex keys I just know that each one I know that each one has a type show me the type so I have a simple way to do that with the wild card so basically say I show me a type under any key and it returns all three of them as a JSON array so that's pretty cool JSON search it's another function we used before here's one where I've you know I've got some string somewhere in my JSON I don't know exactly where it is I want JSON search to you know give me a little help help me find it so I can get the path and it'll tell me exactly what the path is to it including the the index in the array so that's pretty cool all right use case number three is eav antidote so this isn't a specific way that I've use JSON and MySQL this is more taking a common anti-pattern and looking at a possible way to use JSON to solve this so the goal here is we want to store entity attribute value style data but avoid the anti-pattern so what anti-pattern on talking about so bill Carwin wrote a great book on SQL anti-patterns one of them that he talked about was the entity attribute value problem I'm not going to go into great depth on that anti-pattern I recommend you read the book it's a great book totally worth it and but I will use an example that he had in that book which was storing bugs and features in a single table so what it would in bill's example he had an issue table and then issues had attributes that we just stored as key value pairs so obviously that's not a great schema so let's look at some ways to fix that so one way to do it is you can just cram all the dynamic attributes into a text field so I you know since this is a polymorphic table we put the type in and then we cram all the attributes in there and text that's you know that's one way you could have done in the past but now that we have JSON you can don't put in text you put it in a JSON column so we've got this attributes column that stores JSON and this will have all the dynamic attributes for every bug and every feature and then we see for the first time using generated columns to extract specific attributes out of your JSON so I'm you know I don't necessarily know up front what what every attributes going to be but I know that there's a few specific ones that many of my rows will have so a bug will always have a severity feature request don't necessarily have it bago probably have a version effected and a future request will have a sponsor right so I can actually expose these as individual generated columns by extracting them from the JSON and I could even take this one step further if I wanted to you know if there was one of these attributes that was going to be there for every bug and every feature request I could define this as non null and let the generated column enforce this semi start this piece of the scheme on my semi structured data in this specific case we don't expect these we expect these to be null sometimes because some of them are only for bugs and some are only for future requests but that's another layer that we could put on top of that and that's another cool way to use generated columns so speaking of JSON and generated columns let's look at some of the ways those work together because these are two new features in 5:7 that play off of each other really well and there's some really powerful things you can do with them so what can you do with generated columns and json together so what you can do is you can expose one or more of your JSON fields as a table column you can now index them right we looked before you can't index JSON data but you can index specific fields from your JSON using generated columns a general rule of thumb is that you can use the virtual generated columns instead of store generated columns except for a few specific use cases I think if you're doing a primary key index or full-text or GIS that you need to store it but otherwise you can use the virtual variant of generated columns just fine so let's look at some examples still sort of ripping off the same schema we've got our event table again you know we've been extracting age from there so what we're going to do is we're going to create a generated column that extracts age from event data so that what what does this bias well now the sequel that we're running to generate a report is much more simple you know I just say select age because this is it this is a column in the table now so I can select it I can group by it I can get the same report without having to have the verbose JSON function syntax in my query so example number two let's take it to the next level and add an index right that's another cool thing we can do so I'm going to add a gender column parsing that out of my event data and then so now so why would you want to have an index well now that I have an index I can add it to the where clause and avoid having to do a full table scan so let's just get all the events where gender equals female that's going to be a fast query because I built that index all right so let's talk about though I've hit on a few times the idea that you can store JSON is text and you can or you could use the JSON data type so what are some of the reasons you would use one or the other what are some of the differences so the general recommendation would be use the JSON data type whenever you can going forward there's a few examples I've seen you know using specific problems I've run into in the past so one of them you know if the column has to be in a primary key obviously this is a rare use case usually you wouldn't have a JSON data and a primary key but some of the rollup tables I showed before where it's a dimension it kind of makes sense so so if you need it to be in a primary key use text instead of JSON can use some of the workarounds I talked about earlier with that if you need to store heterogeneous data you know say some of your rows or strings that aren't technically you know fully form JSON and some of them are JSON then you're going to need to use text I find this particularly relevant for legacy systems you know where you've already persisted a bunch of data you've been treating it as JSON kinda but now you know you want to you want to update your schema change the type to JSON if you have s you know a single value in there that's not legal JSON you can't alter table it'll fail so if you've got that heterogeneous use case at least transitionally until you can go in and scrub your data it's make sense to keep it as text like I said before you can still use all the JSON function the new JSON functions input' for those as text or JSON but there'll probably be at least some use cases while you're upgrading where you need to keep it as text and also this is another limit i've hit if your json data has a depth greater than a hundred then you need to use text because if you try to query that data back using the json data type it's not going to work at least last I checked the the maximum depth supported was 100 which should be more than enough for most people I would think but I happen to have some data that violated that so I've hit some errors with that in the past I'll hit you'll see that again in a couple of slides so text versus JSON as we talked about before the JSON data type store sorts the keys you know so you can create an arbitrary order and then you get it in alphabetical order some other things to consider with the text versus JSON data type is there's a there's a helper function called JSON valid that you can use to check whether all of the existing data you're storing a text column is valid JSON before you upgrade it that's a really good idea one caveat to be aware of the JSON data type automatically uses utf-8 and before character set obviously we should all be using utf-8 m before right but not everybody does so something to be aware of Morgan had a good blog post about that ok real quick a couple of other considerations using JSON so let's talk about read/write balance so the new JSON data type as currently implemented is really read optimized so as I mentioned before every update requires rewriting the full object it can't do in place updates in the blob so this is something to consider it's read optimize write performance is not as good as it could be thankfully there are performance improvements underway there's a couple of work log links that Morgan gave made to show some of the work being done so you can follow along with those and I know a lot of that work is already in progress in a doe so I think this will get better in a toe in five seven it's currently something to consider disk storage implications so this one's a little bit of a red herring the truth is that the binary storage type and the text type are about the same you know overall you're not going to see a big difference in disk footprint whether you use JSON or text the one thing that I will mention is regardless of which way you store it given the repetitive nature of JSON the way the key names are repeated over and over for every row JSON does lend itself very well to compression so if compression is an option for you something to consider for JSON columns all right so let's go through some gotchas quickly things that can can get you starting to use the native JSON so this is one that specifically hit me when upgrading from five six two five seven having installed some of the the JSON UDF's in the past you've got namespace collisions so if you had installed the JSON UDF for JSON extract and five six and then you want to upgrade to five seven where there's a native function with that name you can't do that you've got to drop the UDF first before you can upgrade to five seven so not relevant to everybody if you're not using the UDF so it doesn't matter but you are something to definitely be aware of another thing to consider is that the native functions are much more strict about valid JSON format than the old functions for example here is a five six machine where I use the UDF and I say hey extract foo from this empty string and the UDF's like hey no problem no whatever it's not there but that's not valid JSON but I don't care I kind of get what you're talking about but the native function is much stricter so if I if I'd call the exact same thing I say here's an empty string extract the the Fuki it's going to give me an invalid JSON text error so something to be aware of if you've been using JSON as text before and now you want to use the native functions they're much more strict is which is consistent with the way my SQL has been getting stricter over time mostly in a good way right but just something to consider another thing to think about is the output of the native functions obviously is JSON instead of text so we were used to the the old function output text right so you extract foo from foo bar you get just a string a bar using the native functions you get this quoted string so that's something to get used to you can use JSON unquote to deal with that it's not a big deal but it does mean that you can't use the exact same function calls that you had with UDF's to use native functions you have to rewrite them a little bit so a bit of a pain but just something to be aware of likewise the path syntax change right we talked about I use just passing a key name now you've got this you know the new dollar dot syntax so get used to that another thing you know we've mentioned it several times you can't index JSON columns directly but generated columns were a great way around that here's a kind of a weird catch-22 that I had with one specific table so I wanted to use the JSON valid function to see if all my JSON was valid but there some of my documents exceeded the maximum depth which like I said was 100 so I couldn't call JSON valid on every row because some of the rows were too deep and I couldn't call JSON depth on all the rows to find out which one is over 100 because some of them were invalid JSON so I get this catch 22 where I can't I need to validate two different things in my data and I can't do both of them across the whole table because they are mutually exclusive so it's a pain but it's kind of you know something you're unlikely to hit just worth pointing out since I specifically ran into that very quickly some of the great new stuff we can expect in MySQL 8 oh so we've got these new JSON aggregate functions I'm not going to go into great detail on that I've got links to all this so you can go play around with them but you can get you have an array AG and an object AG so basically what those do is for a function that returns multiple values rather than returning you know a vertical result set of those values you can create an object or an array to get one single return value that's an aggregated JSON object that's very handy there's a JSON pretty function which can take one very long line of JSON and sort of you know prettify it like that which can be handy for readability at least there's some nice new functions to check out the storage size your JSON is taking up how much free space you have there's also the JSON in-place work like I mentioned before for in-place updates that work is underway from what I understand I talked to Morgan about this yesterday he said the optimizer piece is done and the blob refactoring is done so we're most of the way or I won't say most the way but we're getting there with a doe with more write more optimized write performance and now is your chance to help shape the future of JSON and MySQL so Morgan wrote a blog post recently about a proposal to change the behavior of the way a couple of things work in MySQL JSON handling so I recommend you read this blog post put a comment on there tell them what you think you do sort of the high level version of it is that JSON merge so this is a function I haven't talked about before but that's where you can take two JSON objects and merge them together so the question is what should they behave your be if you have overlapping values in those right so if you've got if you've got two different JSON strings each of them have a key called a on those two values what do you do when you merge them together so the current behavior is it just combines all those values in an array so that way there's no data loss but that may not be optimal like what you would expect it to do so that's the current behavior there's a proposal to change it you should go on the blog post and put in a comment say what you think it should do likewise there's also the handling of duplicate keys apparently this is a little bit of a gray area in the JSON spec because it says that duplicate keys shouldn't exist but it doesn't say that they can't exist and it doesn't really dictate what you should do if you hit them so the current behavior is that the first key wins at least it the way minus QL is implemented it but if you reference it and start to look at how some other programming languages do it like PHP I think was an example that Morgan used their behavior is that the last key wins so those are obviously very different behaviors maybe my SQL should be modified to be more consistent with other programming languages I don't know what Postgres does in this situation is there anybody in the room who knows Postgres JSON well enough to answer that No so I don't know I don't know what Postgres does there but anyways take a look at that blog post make your voice be heard help shape the future of JSON and MySQL and that is the end of my talk I do have five minutes for questions do please go on and rate my talk on the app before you forget I know this is the last talk we're all going to go up and scatter to the wind but if you could take some time to rate my talk and any others that you went to over this is three days I would appreciate it for Conan would appreciate it you can follow me on Twitter at Iowa I've got some github swag here some t-shirts and stickers and my team is hiring very importantly so if you want to come work with me on the platform data team at github come talk to me or if you too shy to come talk to me follow the link you can apply online so any questions yes so the question is how what's the performance like on large tables so yeah so what do all of the examples I'm using we're from a past job at the an ad tech company we don't really use a lot of that at Jay sonic github yet but the performance is you know there's some Oh certainly some overhead to it you know like I said it's read optimized so reads are pretty fast but if you're want to ingest a large quantity of data very quickly it'll definitely be slower with JSON than it would with text so yeah it's not a perfectly optimized performance you know this is really more sort of developer friendly feature so I would say yeah if you're considering using it in a large-volume situation you know you would definitely want to do some testing and veg market for your particular application yes Shlomi yep so the question is about the idea of going completely schema-less where every table is just a primary key in a big JSON blob and then you impose the schema with generated columns so I think that's definitely it's a use case I don't see that being you know a majority of what people will do with MySQL in the future certainly I will say that's that's the way the schema works for document store so if you want to know about document store they were over in ballroom F so they would know better than I would but it mean is certainly you know it's a it's an interesting direction to go and maybe it solves some of the sort of know SQL use cases and in SQL and it you know as you know the schema list has some benefits you know it's appealing the idea of not having to do schema changes right schema changes are hard so I think it's it's an interesting direction I'm you know I wouldn't guess that it's going to get huge traction but I don't really know yes Jeff yep good question so the questions about what happens if you extract a key that doesn't exist from a valid JSON object versus trying to extract anything from an invalid JSON string so yeah so if you ask for a key that doesn't exist it will just return null is the way I remember it I I don't think it's it's not going to be empty string because it's JSON so what is the concept of a missing value in JSON would be no so it'll return though yes well I knew what level I just didn't know what the intermediate key name was sure yeah so the question is about the example where I had star in the path and how far you can extend that yeah so it's a very good question that so that's really about not knowing you know you I knew the depth that I was going to but I didn't know the exact intermediate key names so you can extend that to whatever depth you want but it has to be a specific depth you know I can't say give me the specific key all the way down you know I have to know the general path even if I don't know what the intermediate key names are in the path any more questions all right we did it thank you very much everybody [Music] [Applause]
Info
Channel: Percona
Views: 7,003
Rating: 4.6404495 out of 5
Keywords: Ike Walker, Percona Live, percona, mysql 5.7, update mysql 5.7 to 8.0, update mysql 8, json tutorial, json array in mysql, json array datatables, json array tutorial, mysql database, mysql db, mysql db tutorial, mysql json performance, mysql json array, json mysql query
Id: AWkKK95P2qE
Channel Id: undefined
Length: 49min 59sec (2999 seconds)
Published: Wed Nov 15 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.