Are You Getting the Best Out of Your MySQL Indexes?

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so this talk is are you getting the best out of your mysql indexes um if you want the slides they're online right now and you can actually go to it um my name is sheree cabral i am a senior database admin architect at mozilla so you'll see this logo on every slide because for you amazon is kind of awesome um you can tweet at me i've already turned my phone on to silence so uh you can tweet at me and try to interrupt me but it won't work and that's my url which apparently now the database is down on so let's talk about what is an index right an index is an implementation detail okay if you go to the relational calculus or the relational algebra which is a lot of fun because i'm a math geek um you might not be math geek it might not be fun but it's a lot of set theory and it's all sorts of fun venn diagrams joining things great what's not in that theory is indexes right so if you go and talk to cj date and you're like what's the best indexing thing and he'll be like that's implementation detail who cares and of course that's the kind of thing that's actually going to make your database it's going to make or break your database how fast or slow your database is how fast or slow you can access the data and so while it may not totally matter to um to theoretical theoretical people it actually matters to people actually um programming stuff so what's an index it's a separate data structure in which stuff is put into data is put into so that you can easily access things now in mysql um it is a separate data structure and we have to say you want that separate data structure it's not just going to make it for everything because there's overhead whenever you do a write you do an insert an update or a delete it's going to change that index because you've changed data so if you have an index on first name and you add another name right it's going to put something else in that index on first name who here is old enough to have used a paper dictionary sweet okay so i don't have to explain what that is now remember back when you used to use the paper dictionary and you used to have like a word that begins with um s like surfing right you would like open it to the middle and be like okay that's m and so i have to go further and you do whatever or you you know kind of open it to two-thirds because that's where s was but you would use the guides at the top remember the words at the top to tell you what was what was the first word over here in the last word over here those are indexes and you can actually think of mysql's indexes like those it's not exactly how they are but it's a pretty good um it's a pretty good start to get a sense of how they are so i have a little rant about terminology there you know what's the difference between a key and an index in mysql they're somewhat used interchangeably there are some places where you could say either key or index there's some places where you have to use key and you have to use index well what's the difference well first of all an index comes from you know you have an index and you're looking it up that that might seem obvious to as to where index comes from so where does key come from key comes from the fact that something is a key constraint a very important constraint and that's actually from the relation algebra relational calculus you actually have those in those that theory you don't have indexes in that theory those are implementation detail you do have keys so a key is a key constraint um you can have a primary key constraint a unique key constraint or a foreign key constraint and what those are the only three and so what those do is a primary key says this is how you identify the data right if you're a us citizen it would be your social security number that's how the government identifies you if you have a bank account it would be your account number if you have a credit card it'd be a credit card number usually it's numbers because if you start to do names and stuff my husband's name is antonio cabral which is pretty much portuguese for john smith um so if you tried to look him up you know if he gets he got stopped for a speeding ticket right they would like if they looked up antonio cabral they'd find a million people around where we live because there's a big portuguese population including like his dad who has the same name and the same middle name right so it's not a really good unique identifier um but his driver's license number is unique a unique key is like a primary key but it's not the the first identifying one it's also another way to identify um identify it something so you might have something that's a primary key that just has a number but then there might be some set of like four or five values that make it up unique as well you know some kind of you know sales order and time stamp or something for example uh foreign keys uh we'll talk about a little bit later in detail but basically they're kind of a parent-child relationship um so you could say uh in one table let's say you have a customer you don't want to keep duplicating the information you would say well this customer id refers over to the customers table and it has to be in the customer table otherwise you can't make a sale to that customer and everything else other than keys is an implementation detail so i have a little rant sometimes where people you know interchange key and index don't worry about it it doesn't matter right use key use index people are going to know what you mean but if you want to know the technical terminology right a key means that it's a constraint on the data an index doesn't constrain your data you can put anything you want an index but if you have a primary key or a unique key you can you can only have that piece of data once right so in your in your table if you have a primary key on social security number you can only put one row with that social security number same with unique with foreign it has to refer to something in another table if it's not in that other table you cannot put it in your child table so it actually constrains your data it limits it indexes don't limit your data so what's the plural index anyone i heard indices any any of anybody want to go with anything other than indices okay no it's indexes actually it's actually literally indexes and it's funny because i used to work uh like way back in college i worked in the library and in in a library system it's officially indexes as well so forget your latin it's indexes deal with it okay is it linux is it linux whatever just is it mysql or is it my sql you know what it doesn't really matter some people will say well technically it's my sql because my sql that's really the microsoft sql server wanted people to call it sql and blah blah blah and you know what i say to that is that if you look if you really want to be tacular it's me sql because it's named you know monty dennis named after his daughter and his daughter name isn't is it my it's me so if you really want to get technical don't be that guy so this is actually a cute little sheldon comic um you know where some geek comes over me like i think you mean database like nope i have three datum bases right so don't always assume that uh that you know what you're talking about when it comes to plurals or whatever like really it's not worth a fight um so let's talk a little bit about more lingo what is a simple index or a simple key a simple index is one that has one field in it that's it right so an index on last name would be a simple index compare and contrast with composite a composite index has more than one field right so if you have an index on last name comma first name that's a composite index composite indexes are totally loud they're allowed in in primary keys and unique keys and i think there are also a lot of foreign keys i haven't done much with foreign keys and composite because it gets a little complicated um so let's talk a little about data structures in mysql the data files are b trees in mysql right unless you're using a memory table and then it's you know hash so by default indexes also in mysql are b trees for enodb and for my isom which are the two main main used ones um i'll go i'll do a little bit about b trees it can also be hash for a memory table and you can also specify hash specifically you have to specifically say when you create the index you want to be a hash for you know db so this is what a b2 looks like thanks wikipedia for the image um and what you have here is you basically have this this tree and you can see that going two nodes deep you can find any uh any number that's in this tree right we don't have all the numbers because these these might represent rows in our table or rows in our index right so we've 1 2 5 6 7 9 12 16 18 21 and how it works is you know if you want the number 12 you would go to uh to here you'd say okay is it you start at seven say is it greater than seven or less than seven if it's less than seven you go here if it's greater than seven you go here um if it's greater than 16 you would go here right so anything less than seven is here greater than 16 is here between 7 and 16 is here and you can have more than two two rows or whatever this gives you an o of n implementation right with with maximum two lookups right where you look this up and then you go here you can find any node in this tree so what are b trees really really good for b trees are excellent for range search okay and somebody asked last time about hierarchical data and there's this model the model of nested uh hierarchical data actually kind of uses this kind of concept so if you want to search between five and ten you find five and you just walk the tree right so you start at five so you have to find five so you go to seven it's less than seven oh i found five or i found you know where five should be let's say it was four you would say oh it has to be here and then you just walk the tree right so five and then you go next the next one right because it's all pointers actually so this points to six and this points to seven and this points to nine and this point is twelve oh twelve is too high that's it so it's really really easy it doesn't have to look up five six seven eight nine and ten that would be five different look ups it only has to look up one thing and then stop when it gets bigger so ranges are great for b trees um and by the way because of the same idea mysql was built on uh magnetic disks or as i like to call them spinning disks right not flash right so flash and b trees don't go as well together as spinning discs and b trees because you're talking about like where the head goes and so when you're talking about ranges right if your tables are if your rows are in order on disk and you're just going one to another to another you're minimizing your seek time uh really good for an equality match right you just look it up and you get it and you go um they're actually pretty good for a few equality matches so if you want to get 5 10 and 11 now what my score will usually do is it'll change that to a range search so we'll look for everything between 5 and 11 and just filter out what's not 5 10 and 11. um composite indexes so let's say even index on last name first name how does that work um if you think about it when you had your dictionary back in the day it was really easy to find the words beginning with s or s u right surfing um really hard to find all of the words ending in g there's no index for that so remember i said you could think of my scale indexes like the dictionary index that's kind of how it works it's really easy to find use find words beginning with say the letter g so if you want to find all last names right that begin with g you could do that if you wanted to find all first names that begin with g well that's farther into the index that's like finding all the words that have a middle letter of something it's this index won't help you do that and that's that's how you kind of have to think of the indexes in mysql and so you can't use you can't um find words ending with g so you can't find words with the first name because it's not the first thing in the index right um but it's okay if you want because you might say well what if i want to look up a last name and a first name my last name is cabral which as i said before is like the smith of the portuguese world right so people oh your last name is cabral do you know cabral and like that is literally john carroll and you know i'm like yeah i have six cousins named john cabral and there's probably a million other people um so you might say well what if i want to optimize on that last name first name search but i also want to search on first name because your first name is shiri and that's pretty uncommon um it's like me and sherry rapport who was on nypd blue and csi and whatever and and that's it and i own cherry.com because it's you know unique and whatever um the way that i specifically that i spell it so you might want to say well i don't want to have to bother with cabral i just want to look up sheary well you might say i want an index in my table on last name and first name but also just on first name so i could look up either by last name or by first name or by last name and first name and like i said it's like a dictionary index so what does mysql use indexes for when would you use it why why is it useful to have indexes at all right so you'd use it when you're matching a where clause this is called filtering you're filtering stuff out okay because if you're filling stuff out you don't want to have to do like a full table scan and be like okay now let me start throwing stuff away eliminating rows resolving min or maximum values so if you want to get the maximum value here you just go to the right of the index you can go to the right of the tree but the tree might be bigger so it's easier to go to the right of the index i'm eliminating sorting right so here it's in an order right this index if this isn't your primary key that's not how it looks like on your data b tree so this is how it looks like in your index b-tree so it's already sorted in order um you can also help with grouping if you're grouping stuff together having an index can kind of help if you're having an index on that grouping grouped field because they're already grouped together in order right all the words that begin with a are already grouped at the beginning of the index and then if you have a query that can be resolved by only looking at the index that's called a covering index and that's also pretty good because it doesn't have to go to the to the um the data itself it just your query can completely be resolved in the index so mysql what does mysql not use indexes for and this is really great to review when you have that kind of like okay i didn't explain how come it's not doing what i think it's doing it should be due it should do this and it's not why not well it ignores indexes for functions so for example if you have a timestamp column and you want to find all the sales you made on a certain day you might say well let me find everything where the date of the timestamp column is you know august 11 2012. i want to find all the sales then well my school doesn't do that because it doesn't really know what date is doing right you really want to use where the where the ts column is between midnight on the 11th and 11 59 p.m on the 11th 11 59 59 or 59 seconds whatever that's what you want to use it ignores indexes for joins if the fields aren't similar and what do i mean by the fields aren't similar i mean if they're not the same data type okay now mysql is very helpful um you know somebody asked last session what's the difference between myspal and postgres and the answer was like my skill is easy to use and it really is user-friendly right you can read like a a three-page web tutorial like literally three screens full and be up and running with mysql with the oracle database you need like a week-long class just to learn how to log in or at least i did um you know it's not it's not so easy you know if you don't know how to query how do you find out what your databases are in oracle you have to learn how to do a select query and get from the metadata in mysql you do show databases it's that easy you don't need to know sql the problem is that mysql tries to make things a little too easy sometimes so if you wanted to join two fields and one was a date and one was a timestamp it would just convert that date to a timestamp it would say oh you said august 11th and you're comparing it to august 11th at 10 a.m well what you really meant was august 11th at midnight i'll i'll forgive you but it doesn't really tell you it's doing that implicit cast but that's just like doing a function so for example if you say a date column is equal to a time stamp it does an explicit it does an implicit conversion and that's just like doing a function what else does it ignore indexes for if you have a query with multiple where clauses it's really one where clause but a lot of joining um so multiple clauses not all using the same index is joined by or i'll give you an example let's say we have a test table with an index on last name first name we've done that a lot so here's your here's your test table so queries that will use this index select star from test for last name equals cabral select star from test where last name is kabral and first name is shiri because again i will look at the last name it will find cabral and then it will go to the s's of cabral so you'll find seraphina cabral and chirica brawl which you won't find antonella cabral because it begins with an a select star from tests where last name goes brawl and first name is tony or first name is antonio because again it goes to cabral and then it will find you know things beginning with t o and y and things also finding you know it'll do two kind of range searches there but here's a query that doesn't use the index it's like star from tests where first name equals shiri right we already talked about that that if you don't have the last name you can't do the first name just like trying to find a word where the middle or the end is is a certain letter select star from test where last name equals kabral or first name equals series this is what i was talking about a couple slides ago where i was saying that it's multiple where clauses right it's two different things you're trying to filter out but they don't use the same the same index because first name by itself doesn't use it so last name by itself we'll use that index first name by itself if this said and it would use it but because it says or it won't use it so let's talk a little about composite indexes um let's say you have an index on last name first and middle name okay this is equivalent to having the index on last name first name middle name having index on last name first name right so if you wanted to search on middle name this wouldn't help but if you wanted to search on last name first name and and middle name it would work if you just had the last name and first name it would work awesome so it does this thing called prefix indexing and it's also the same as last name and this is why i said if you really want to look up both on last name and first name where you might have any combination of last name and first name what you might want to do is last name first name as one composite index and then first name as a simple index now some of you might be wondering why do you do your composite indexes last name first name and your simple indexes first name why not have your simple index be last name and your composite be first name last name and the answer is you could totally do that but like your dictionary right if you're trying to find all the words beginning with s that's a lot of words if you're trying to find all the words beginning with q that's fewer words or x a lot fewer words right that section's a lot smaller so what you really want is your most limiting field first because that will really help optimize your queries so uh what else does mysql and indexes for this is this is probably the number one source of why isn't mysql using this index it's there it's got the right fields i know i'm doing the query right i'm using you know have last name and first name and it's an index on last name why isn't using that last name index if you're trying to scan for too much data mysql will just do a full table scan for example if you wanted to look at 7 9 and 12 and that's six lookups right you go to the seven and then let's say you're you're doing a select star you're getting some data that's not in the index right so here's the index and here's the data so you find seven and you say okay i found seven now let me go to the data row for seven and get what i need there then i'm going to nine and i'm getting the data where i need from there and then i'm getting 12 and i'm going to the data row i need from there well that's six lookups right you go to the index for seven the data for seven the index for nine the data for nine the index for twelve the data for twelve versus you just start at the data tree forget the index tree and just do you know one to two to five to six to seven oh seven is what i want okay to nine oh nine's what i want twelve okay great 16 18 21 right it's a whole lot easier to walk that 10 note tree than it would be to do those six lookups because you're going back and forth between the data and indexes which is why the spinning test right the comment is so this is why the spinning disc impacts it as well exactly um so what is too much data too much data is about 15 to 25 so think about that if you have a field and you have um you know a yes no question it's about 50 50 yes no um you know did you like shari's presentation would be 50 50 right it wouldn't be a hundred zero i i would hope it would be you know better but if you have a yes no question it's about 50 50 and you say okay well i'm searching for everyone who said yes and it doesn't use an index well but i have an index on that field well that's why because it's easier to actually do a full table scan so here's the thing mysql usually does what makes the most sense so if you're if you're like hey how come it's not using what i think it should be chances are it's your expectations that aren't following like it's not that my skill is doing something wrong although sometimes it is but usually it's you have to recheck your expectations are you looking for the needle in the haystack or you're looking for the whole haystack right where the whole haystack is 15 to 25 or more um so it can be it can be difficult to really really optimize your indexes um and you may you may just say hey you know what the best index is really no index because it's not going to use it anyway so why bother having the overhead of every time i do a write and insert an update delete to have to update not just the data but the index when it's not even going to use the index um and i didn't even talk about duplicate indexes right so if you had an index on last name first name and then also an index on last name you might say what's the harm it'll only use one of them but the last name first might name one pretty much contains the last name one so why would you do it you can actually you can actually define more than one of the same exact index in mysql it lets you do it why because mine feels very friendly it's very wasteful to have two of the same index because you're not getting any extra power out of it you're just getting extra writes because mysql has to update all of the indexes when it does it right so how is that 15 to 25 calculated okay you and i might know that hey my data is exactly 10 every there's 10 values it's 10 is everything how come it's not doing what i think it should be doing while it's calculated by metadata yay metadata come on data geeks you know you love metadata and here's why you love metadata because it's what's used in these kinds of calculations right so in my isom the metadata is exact and this is why if you've heard that my item isn't so good for transactions or whatever what happens is when you do it right to my isom it updates information it could update a table checksum if you turn that on right which is actually pretty cool because if it checks on the table and comes back right away because it's already metadata but like number of rows in my isom the size of the data in my isom and a lot of the index metadata is exact so you know it's going to do the right thing with the query in fact my iso was designed years and years ago for what we might call now data warehousing it was really designed for reporting queries so that's why it does this right because when you're doing a whole bunch of analysis and reporting queries you want mysql the optimizer to make the best decisions it does about queries right but a lot of us are doing transactional stuff where we just we want the sale to go through we don't want we don't care about exact or not inexact and in enodb you have approximate metadata so if you look at the number of rows in a table in mysql by looking at the metadata you do a show table status it's not necessarily 100 exact if you do a count star you'll find a different number count star takes more time in my icing count star just brings back the metadata because it knows it's the same so about indexes mysql has this concept called a value group explain what that is in about a second um and it uses the average value group size to make determinations about what it should use in indexes so your average value group size is that 15 to 25 on what it will use and it's used for the approximate rows um that it's going to read for when it does a filter or a join so if you have a table and you're like i want to join on first name where the first name begins with s it's going to kind of use this metadata to figure out well approximately how many things are with s because if every first name is with s i'm not going to use an index i'm going to do a full table scan right if you have an index on first name but if not then it will go to the index so how does it figure out how to do that it uses this average value group size so let's talk about what this actually is so let's say we have a group called body parts two eyes ten fingers is the normative standard and so that's what we'll use so if you have a table of body parts and you say okay sherry has two eyes and she has ten fingers two rows whatever so the average value group size here is six because you have 12 things and two different groups 12 values two groups average value group size is six okay which means on average you have six eyes and six fingers right which is about as good as saying the average person has like point nine testicles and point nine breasts right like the average pre that's not valid for anyone like i don't know anyone with six eyes or six fingers six fingers is probably more possible um but you know it's certainly not the average that you would find among the population right so averages can be a little tricky so this is not a really good optimization for either eyes or fingers right because eyes is way too too way too big for eyes i don't know why i said longer and shorter it's bigger and smaller um but the estimate is is is way off for both so be careful when you're talking about average value group size right if you do um if you do the people in this room you're going to find um the average value group size right there's 50 people in the room and there's uh 40 men and 10 women the average value group size is 25 right for the genders there's two different genders male female um and we'll we'll just go with that reality for now um and if you split it down the middle is 25 but if you searched for the women it would actually be a better to use the index right because there's fewer women but if you search for men it should do a full table scan but in both it will probably do a full table scan because it's saying okay half the data is male half the day is female um enodb does do some approximations too it does some page dives to look at the information and so this is really important when when you're like why is my spell not using my index it could be because of this average value group size problem um a composite index you can think of it like a sorted array when we were talking about uh dictionary first name last name it kind of sourced the last name or last name first name right it's like a sorted array where you have the last names all sorted and the first name is all sorted um it can be ascending or descending you can see right you can read from 1 2 5 6 7 9 12 16 18 21 now imagine this was like one and then over here like you know kind of inside it's got one one one two one five one six right because the first value is one right so all the last names between with a and the first names are different they might be the first name of john right so it's all sorted in either ascending order if you read it this way from left to right or descending order if you read it from left right to left you actually physically cannot write an index in mysql that sorts something where the first thing is sorted in ascending order and the second thing is sorting and descending number um you can't actually define a natives like that it won't work if you do an order buy where you order one thing ascending and one thing descending it won't use the index for both it'll probably use the index for the first part ascending but it can't cannot optimize for that descending part it has to do a manual sort on that because it won't come exactly from that let's talk about nulls so dave in the last talk talk about not using nulls one of the reasons to not use nulls is that you have to do an extra pass through the data when you're doing an index right because nulls are stored differently nulls are not they're not a real value they're this kind of empty random thing and you really have to worry about nulls and equality so why is this so in theory in sql theory and practice null equals x is not true for any value of x okay if you want to test if a value is null you have to do the is null not equals null nothing equals null in fact null does not even equal null you laugh it's going to get a little more complicated in a second so if a referenced value in inequality is null mysql immediately returns false because nothing can equal null so if you say this really complicated query with 100 joins where x equals null it's going to go real fast because because my scale knows nothing can equal null haha i've got you nothing will well no i'm gonna return it fast right away i'm now if you're using sub queries or whatever it might have to actually evaluate the subquery um once for every row in the outer query so it could still be pretty slow but pretty much you know um if you're not getting the right answer this might be one of the reasons you might have done equals null instead of is null so the null safe operator is this thing so if you want to say if something equals if a value equals 10 but if that value is null right it might be false whatever so the null save operator is that there is no null safe inequality operator you would have to do it like this so here this is another reason not to do equality so for example um i just ran into this problem a couple weeks ago i have a uh i have monitoring set up so that we do checksums on the master and they replicate to the slave so you can look on the slave and say does the slave have the same checksum as the master meaning is the data at the same on the slave and the master because the whole point of having a slave you want it to have the exact same data as the master they can get out of sync it can get weird um and so in our monitoring shift we say does the master check some equal the slave check sum and you get problems when uh the master checks somewhere this life checks them as null so we actually have to say like when we say show me all of the um the rows that aren't equal instead of saying where the checksum of the master does not equal right bang equals the checksum of the slave we actually should say this because if the master is null it actually won't return because it's not unequal it's it's weird and we actually ran into a situation where we had a false positive on this monitoring check where we went to the thing and we're like why isn't it paging oh because it's a null and again you have to remember that too much data feature slash problem if you have more than 15 to 45 nulls right you might run into this kind of problem because of that but let's talk about nulls and value groups so who who here thinks that um all the nulls are grouped together in a value group so if you have 10 nulls it's one value group who here thinks it's ten value groups okay whoever thinks it's no value groups okay who has no idea pretty much everyone again here's the cool thing it could be all of those but wait there's more you can actually change it on a storage engine basis okay so if you want all of your there's there's there's variables called you know to be stats method and my isom stats method and these variables are how you control how these work so by default both of them are set to nulls equal and what that means is that all of the nulls form one value group so you have 10 nulls 10 value groups okay i'm sorry one value group right so all the nulls are equal um and and so let's say you have um you know 10 men 10 women and 10 you don't know the answer so you put 10 null um there are actually three value groups men women none no nulls unequal means they're not equal so you have 10 women 10 men and 10 and all you now have 12 value groups okay why is this important it's important because remember your average value group size it's gonna be really skewed one way or another if you think your average value group size is should be three right men women null versus twelve right your average is really way off whether you divide by three or divided by 12. the other one is nulls are ignored you might say you know what i don't care about nulls i'm usually searching for non-null values so what i want is i want for men women and null i want two value groups men and women so you can do that too the default is null is equal so you can change that now unfortunately you can't change this on like a per table base wouldn't be great if you could change it on a per index basis well wouldn't it be great um we can so primary keys are row identifiers i know i'm telling you this now i probably told it to you 20 minutes ago they actually can't be null this is probably why i'm telling you here they cannot be null and one of the reasons is because energy orders these on disk in primary key order so where would you put null at the beginning at the end i don't know so primary keys can't be now but unique keys are also identifiers and they can be null so you need keys you can actually have more than one null value because the nulls don't equal each other both the primary and unique can be composite indexes that's fine foreign keys we talk a little bit about foreign keys fourier keys are a parent-child relationship so for example a customer id you might in order to get a payment somebody has to pay you right you might want to always know who who gave me that payment even if the customer is long gone from your from your company you may want to keep that customer's information around because you you need to keep the payment information so you need to know you know what was that person's address when they made that payment or whatever um you can actually do cascading updates and deletes in mysql um so what you can say is for example if you delete the customer id delete all the payments you know just in case you start needing to start shredding documents you can do this cascading kind of delete um which can be useful if you're if it's something you don't actually need to keep you just delete the customer and all their stuff goes away and then you never have to worry about it again as opposed to if you delete the customer and then the the child tables are still trying to refer to the customer can get a little complicated so um let's talk about foreign keys i'm a database administrator which means people call me up and i have to go to the database and look at stuff when things break so when i see a table like this this customer id and status id it makes me want to cry because i have no idea what customer id 121 and status id one means does that mean they're a free user or paid user disabled i don't have no idea what that means like are they administrative user i don't know i what status number one now maybe if there's only three statuses i kind of know um but you know it can be difficult so um this is one way to do a foreign key right or status id on the left hand table refers to statistic in the right hand table and if it doesn't exist in the right hand table i can't put it in the left hand table so for example if i tried to insert customer id 126 with status id 4 and there was a foreign key it would say nope there's no there's no parent id idea for so you can't do that now what i prefer is something like this right we have a table right oh isn't this so much more readable isn't that great if somebody says okay customer one two one two two uh what's their status and i'd be like oh status 2 right people would be like the customer service guys would be like are you that's not helpful right but here so i would have to do a table join i have to think about and what's the status table and hopefully i named the status table something like status but sometimes it's it's you know labeled like field one you never know um people people are weird with things like that if you do something like this we have a foreign key to a table where it's just characters right it does have to look it up um and stuff so the lookup takes a little longer because you're looking up something that's four bytes instead of one byte right with a number the number could be one byte if you have a tiny end um and this you know is four characters which might be four bytes it might be uh 12 bytes if you're using a three if using etf eight um so it can be you know it's a little long for lookup but you never have to do a join to get the information so it saves a lot when you're reading the information when you're writing the information does a little overhead and so this you know again it's kind of like using your index as well you might say well this index is a little bigger um but it's actually more useful because you're reading more than you're writing question that is great yeah so the question is what's the difference between using that and an enum using an enum is actually easier um so an enum is an enumerated list and with an enum actually you get rid of this status table on the right hand side okay so with an enum i'm putting my hand over the right hand side which you can't see i'm real clever um so with an enum you'd only have the the left side table and status instead of being like a var car 20 or something you would define it as an enumerated list enum and then you give it a list of enum quote free quote paid quote disabled and um so old school my school here's the thing about about you know either being a first an early adopter or something or something is that like you get used to how things used to be and then things change and you're like you forget you don't realize that they've changed because you've always done it this way with an enumerated list it used to be that if you wanted to change the list it was a completely offline operation so if you have a yes or no question and you have or you have the status right and you want to now you have free paid and disabled now you want to add an administrative user you want some of the super user right some some computer some customer service guy go and look at everybody's records because that's their job um you would have to take the entire table offline while it rebuilds that's not the case anymore and i say that's not the case anymore i mean like you know for like five years that hasn't been the case so or maybe three or four but you know certainly if you download a version of mysql right now whether it's five five or five six what you can do is if you wanna add one enumerated item to the end of the list right so when i say enumerated under the hood what this actually does is uh the the free status the first one you put is actually index number one and the second one you put is index number two and this the third one you put is in edge number three um so if you want to add something to the end of the list right which would be index number four um it actually will do that in an online way you don't have to rebuild the whole table the only time you need to rebuild the whole table with an enumerated list is um is if you're changing the order right because you actually have to change the rows when you're just changing whether or not new records have to look up a new value there's no work that needs to be done the existing records are fine um and so the reason that i don't mention enumerated list is that it doesn't really have anything to do with like the foreign keys and this is kind of a section on foreign keys so if you are going to use foreign keys for something like this where it's actually in another table like a status might be in another table because there might be a description or there might be some other things tied to the status maybe there's more attributes that it has then you might want to think about doing kind of human readable stuff in the database prefix indexing in mysql if you have a text field you may not want to index the whole thing do you really want to index the whole a whole url you know that's tons of characters long do you want to index an email address like the whole thing how do you want to do that um well it's for strings um and so what um what mysql has actually limits on how much you can index because otherwise there's only so much you can store in the tree before it has to go to separate data structures on disk right and the more if you have a separate data structure on the disk then it's going to take more time to look it up my scholarship has a limit for tables and the limit on tables for strings for indexes is 767 bytes on eodb and a thousand bytes on my iso i don't know why they're different but there you go now when you say bytes you have to be careful because these are strings and one character may or may not equal one bite if using latin one i think dave said this before if you lose that one one character is one bite and that's the default but if you start to use different characters um you know non-ascii you know you might want utf-8 which is three bytes and i say three bytes and you're like no no sherry you don't know what's strong about utf-8 is four bytes well i know that and you know that um but the developers a long time ago at mysql i decided to make this a little easier because who uses who uses like chinese characters really they're not that many people that use it right um of course not a utf-8 is uh is actually three bytes and if you want the full four bytes you have to go to something like utf-8 utf-16 or utf-32 and yeah it's pain so i have to be aware of the character set is what i was saying because if you have let's say you have your text field i know little html right text input equals you know whatever your text field and you have 100 characters because you want a long url um that might be up to 300 400 or more bytes so you have to be careful you have more than one of these in a table and the good news is that myself is pretty good about the errors if you do an alter table and you try to add an index that's too big it will give you an error that says you can only have up to 767 bytes of an index and you might be like but i'm only using 200 characters what's the problem and then you have to remember its characters versus that full text indexing so i did mention primary foreign keys whatever indexing there's also full text indexing full text indexing doesn't do prefix indexing what it actually does is it takes like all the words and document and says okay if you're looking for the word sherry it's in document one five and seven um and you know everything else doesn't have it um so it actually just indexes the words themselves um it's only for car varchar and text fields and it was my isom only until my spell 5.6 which just came out like three weeks ago as general availability so question is there is there a word length limit yes there are both minimum and maximum word life limits which you can change in mysql the defaults for the minimum word limit is four so if you have a lot of uh t tlas three letter acronyms you may want to change that um the other thing is is that uh mysql as a database system is is pretty good a lot of things there are better ways to do full text searching elastic search you know things like that if you have you know or document stores right they were made for documents um mysql was not made for documents so it has this kind of stuff but it's not the necessarily the best way to do it if you have a blog if you have uh let's say reddit right you're running some kind of a blog or forums where people are searching for words you know when you have heavy full text searching you may want to think about something other than mysql for that group buy and sorting i said before that mysql uses indexes for helping and grouping and sorting by default if you do a group by it also sorts okay that makes sense right you're you have sales by state and you want uh alabama to show first and then alaska that makes sense right but it actually does an extra pass-through to sort the data if you do that by default it does extra work for you personally that wouldn't be my way of designing i wouldn't design it such that it does an extra sort i would design it such that if you really wanted it sorted you would have to say group by x order by x as well mysql does that by default so if you're looking at your explain and you see in your extra field you say you see file sort which by the way doesn't necessarily use a file file sorts the algorithm so if anybody took you know algorithms the bubble sort versus file sorter that's what that means so it's not necessarily using a file it's not the worst thing in the world because you're going to disk um but it might go to a file there's another thing in explain which will tell you if it's going to a file or not but it may cause cause a file store and it is doing extra pass through the data which does make it a little slower so if you don't want that you can use order by null you in your group by queries if you don't care about the order it comes back in maybe you're maybe you don't care about the sorting maybe you're um your application is doing some kind of you know parsing and sorting too you can actually use order by null to do this um and the interesting thing is i probably went 10 years before learning about this okay so the question is if if you want to do a different ordering than the groupings you want to group by state but order by uh the sum of the sales right so you want to know which state's the top producer or something well yeah okay well the question was would it do two sorts and the answer is no it would only do a sore if you specifically did an order by it would only order by that one it wouldn't do the order by the group so knowing all this use explain um and uh if you're not getting the index that you expect or the method you suspect you expect check your expectations frankly i i've banged my head a lot about my scale i'm like all right that's because of average value group size so you do have to check your expectations and say why is mysql doing this now that being said sometimes there are bugs so don't always distrust yourself um in the worst case scenario you make a bug and they say no you know give can you give me some of the sample data or whatever and you give it to them they're like oh no that's actually how it's supposed to work see this manual page that's like the worst case scenario so now people might say well how do i use explain um well i'm glad you asked if you really want there i've done this i've done an explain talk before so bit.ly slash explain video all lowercase all one word right so bit.ly slash explain slides if you just want the slides or if you want the slides to follow along that's why they are uh the slides are a pdf so take it home with you read it on the plane it's you know whatever i don't i don't do slideshare because i like to kind of put everything in a directory that i kind of control over and whatever and also it's it's good for offline viewing does slideshare do offline viewing i don't know and now i'm going to talk a little about index merge because if you do look at the um explain slides i talk about index merge now mysql 5.6 has changed how index merge works so i'm going to tell you about how it works in five five and then tell you how it works in five six so on the surface index merge looks really good so what index merges is you can use more than one index let's say you have an index only on last name simple index and a simple index on first name and you want to search first name and lastly you'd be like oh my skill i can use index merch i can do both of them it can it can do an intersection or a union of them that'd be great that'd be awesome i don't have to worry about it ever again i will index um every field i will just put one index simple index on every field and i'm good and this really works really well in oracle so when i see tables design like this i'm like you're an oracle developer aren't you because oracle actually really optimizes that kind of thing really well you just have to put an oracle the oracle database not oracle the mysql database um so on the surface it looks really good it can just do it um but it uses more than one index um but it didn't really work very well and usually so in the explained talk it will say if you have it it's pretty much an indication that you need if you have an index merge and says oh i'm using um the index on first name and then it's on the last name it's usually an indication that you need to have a composite index of the both of them together and it will actually work better if you do that so it was usually like if you see this you can make it better it's a warning flag but you can make it better by by changing the index um it's actually better in mythkill 5.6 it's no longer like a warning sign now it's actually useful and good um so before mysql 5.6 as i said indicates you can make a better index and it was not always used even when it could have been used um and so if a range scan was possible it would use that instead of of index merging right so even index on on something here and something here um it would it would do a range here and nothing here as opposed to you know merging the two indexes um and it may have merged more indexes than necessary based on the order it was doing it so in five six it's better and it uses it when it should use it and it doesn't merge more indexes than necessary i don't know the implementation details because it's complicated it just has to do with the order in which it actually merged the indexes now someone here asked uh last session um where can i go to find like challenges on how to do stuff like i know how to do queries and i know how to join tables um where can i learn more where can i you know get like hard problems to do and there's actually a site called artfulsoftware.com um that has some really good um you might call it like cookbook recipes and stuff really complicated like how do i know if a certain date is a monday tell me a february 22nd 2213 is going to be what day of the week is it going to be and there's actually some things that you can do in my sql to see that there's also aggregation how do i find you know highest ranked things or whatever and it's the artful software is actually a book about mysql and it's actually a virtual book so it's already updated with mysql 5.6 and everything and you can there's stuff up for free and there's stuff you can buy and this stuff is free um it's really awesome because if you're like i want to do this really complex thing you can kind of look up the recipe so if you want to challenge yourself artfulsoftware.com infotree queries.php all lowercase kind of all one word there's no underscores or anything and it's really really good for those kind of challenges you know certainly feedback whatever the my skill administrative bible is uh it's books about three years old and stuff it's for if you're new to mysql and kind of teaches you concepts and terminology um doesn't like teach you how to query things like that planet mysql is a bunch of blogs and stuff if you're totally new it's kind of like reading a magazine for hobbyists like you might not get everything when you first start reading it that's okay this last link here mysqlmaronite.com i see this book here this o'reilly book learning mysql the butterfly book we're actually taking that book and going through chapter by chapter and learning and this actually will teach you how to install mysql like chapter 2 is installing mysql so we're doing it basically one chapter a week for 12 weeks we're in the middle of it now i think we're going to start a new one up soon so if you want to go to mysqlmarinade.com it'll just redirect you to a meetup group if you want to join the group you'll learn when we're doing the net when we're starting the next one um and it's actually really awesome if you if you really want to learn like what's the left join what's the right join that's the chapter we're on this week so um so it's really good to uh to do it and you can do it at your own pace if you want to just do it and start it now i'm not going to be like no you can't do it now you know the homework is submitted by github and there's instructions on how to use that too so that you can really kind of learn how to how to really it's i don't know i think it's great um and it's a great learning resource for you guys i think um and i think that's all i have yep that's all i have any questions or feedback i have a question sure you you talked about the index utilization of your choice right you said that columns have a similar enough for the index to be utilized right okay so the question is i said before that have to be similar enough so what's similar enough is a tiny and similar to a long end is a um is a var car similar to a car like of our car 64 to a car 20. now the rule of thumb is um well first of all you could easily test it right make a table give it about maybe a thousand rows and take one of the rows and see if it uses it you know do an index where one of the rows is unique see if he uses that index so that's that's an easy way to do and i really advocate testing it because you never know what version is going to change something so i hate to say like here's how it is but the rule of thumb that i use is is it does it have to do a conversion right so for a date to a date time those are different data types right card you know var car to car it can compare they're not really different data types right um but you know something like an inch to a decimal is different and i don't know about an int to a long end if you're comparing to like if you have a tiny inch and a big end that's actually what's called a long end if you have a tiny big end and you're comparing the two um if they're both under two under 128 or something would it would it compare and that's a great question and i don't know the answers but i would just say do you know get some sample data and test it yourself because that's that that will be the right answer for your version all right awesome thanks guys

Info

Channel: tcation

Views: 5,551

Rating: 5 out of 5

Keywords: mysql, index, primary key, keys, unique key, foreign key, average value group, metadata, performance, scale, sheeri, sheeri cabral

Id: mijHsfLxAuA

Channel Id: undefined

Length: 52min 30sec (3150 seconds)

Published: Thu Aug 22 2013