Statistics Lecture 3.4: Finding Z-Score, Percentiles and Quartiles, and Comparing Standard Deviation

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so yesterday we talked about how we can't really compare standard deviations themselves to determine whether one set has more variation than another set and I think the last thing I gave you is a coefficient of variation they give you that that's one way to do it it's not a very practical way because we don't use it for anything else it basically just changes the standard engagin into a percentage based on me and says how much your standard ation is very comparatively with another set now what we're going to do today is come up with the last thing we did on Saturday V a shoe which was I gave you some example and said find out what percent of the data falls within these two numbers remember that we found the distance and we divided by the standard deviation that's it hello you are three standing asians away therefore ninety-nine point seven percent your data falls within that range that's what we did last time that process right there is a very good process to lettuce calculators called a z-score about that and what a z-score will do is allow us to compare two data sets directly to see which one has more variation and that's important thing for us to do so that's kind of our idea for the section three point four here so we're on three point four we're talking about this is gonna be called measures of relative standing and what we're going to be doing is comparing measures between or within data sets that's what the relative standing means and to use this we calculate what's called the z-score that this is going really familiar to you because I already previewed this information like on Wednesday the last time we did this week we actually calculate the z-score I just didn't call the z-score I said let's find out how many standard deviations away from me we are that idea is a z-score so when I asked you on your first test which I will what is a z-score you are going to tell me a z-score is the number of standard deviations away from the mean or more specifically the number of standard deviations a particular data value is away from the mean are you up there on that okay don't say I tell you it's on the video now I can prove it I get ribbons on there that's gonna be all your tests I'm going to ask you what does each scored you go it's the number of standard deviations away from you need a data value is so z-score imma say right here the z-score is the number of standard deviations that a value a specific data value is away from me by the way what letter do we use to represent our data values in this class it would be the number of data values we have I'm talking about particular data values themselves incidence so the number of standard deviation is a data value that's our X's here is away from the mean now I can't give you one specific mean because we can deal with either a sample or population now in this case unlike the standard deviation you calculate the z-score at exactly the same for the sample as well as for the population version of a z-score okay so it's not like you had to do anything different really the standard deviation that has the major difference in there for this class you do 2 divided by n for the population start distracted and you do the N minus 1 for the sample you got that so that's that's a difference there for a z-score it really doesn't make any difference so the number of standard deviations away from the mean that gives serves each school so for our sample and for our population it's going to look a little bit different but the way you calculate it is identical so for our sample we'll have Z thanks well these four here's how a z-score works what we did last time is we found the distance between a data value and the the mean how can you find a distance between two numbers say it louder you're going to subtract them do you ever doing that we subtracted I don't remember the numbers exactly but we subtracted the large value minus the mean we got that distance there that was 20 for something that was from from last time so we're going to subtract and we say okay take your data value subtract your mean by the way this for the sample the mean is going to be 1 so in our example from last time we subtracted that that value by the way what was the let's do that against me we kind of see it can you tell me what range egg don't once again 34 59 58 was the upper but today it was 18 in one dropper and 24 was here 20 over what 24 was arranged but your two numbers were 34 and 58 here means okay what's the lower 30 when I gave you this example last time we had a range of numbers from 10 to 58 and the question I asked you made friends will answer your question from last time as well how do you find out what percentage of data falls within that range the first thing you have to do is consider your mean and your standard deviation your mean was 34 and your standard deviation was how much okay and what we tables we said we want to find out how many standard deviations fit in this range and how many Saturday shion's fit in this range in order to do that but easy way instead of just adding going 8 plus 8 plus 8 let's say whatever you don't want to do that it takes too much time and you're not going to be accurate if it doesn't go in exactly because you know what if they visited 59 do 8 plus 8 plus 8 plus 1 that doesn't really worked out well ok we won't have like at least a decimal so instead of doing that we go alright let's figure out how far this is from that and in order to do that folks how do you figure out how far 34 is your 58 that's what we're doing here we're taking the x value which in this case is 58 we're subtracting our mean which in this case is 34 that gives us 24 that's the distance between those two numbers you would hand that now how do you calculate how many standard deviations fit in there that's a division problem okay this is the distance the standard deviation is 8 how many times does 8 24 that gives me the number of standard deviations it is away how many times is that here so we're going to divide by what do you suppose over here not 3 because this is specific sentence the standard deviation that's right so we divide by I'm sorry 8 and we get so our seven patients eight we divide by that we get our three other so here we'll divide by our standard deviation again we'll use the letter S because we have a sample but you're going to calculate the population exactly the same way you'll still have a Z you know still have an X because the data value doesn't change its variable but instead of x-bar what are you going to have for a population this gentleman's for you know this already mu yeah that's right give you and instead of s what are you gonna have say Canada can have a third yeah the Canon it's a sick lowercase Sigma but that's what you have so we take our distance between here that's x minus x-bar we divide by the standard deviation in this case it was eight I think I said three guru but I meant eight you divide by eight because that's telling you how many standard deviations fit in that range of numbers so here we divide by standard deviation in each case now notice that with a z-score does it have to go in exactly for you to get an answer that's no what if this event 59 you'd get 25 here it doesn't go to point five but you will get the decimal and that's nice we can use that we can't use oh it adds up three times but that doesn't really help us statistically okay so this this z-score idea we've actually already already done we've already done this I know we did this last time it's just knowing the name for it it's called the z-score it gives you the number of standard deviations that a data value is a way how to do the same way for population standard sample just different letters here and what this does is allows you to really easily compare the variation of two samples or two different sets or populations so this allows a comparison of the variation in two different samples or two different populations and instead of two different samples in two different or two different populations does that mean you have to compare a population to a population or can compare population to example no you can yeah you can you can if you know the the sample of the notable population sure if you know the z-scores you say this population has more spread and the sample is it'd be interesting to find out if you had a sample and you had a population of well for instance let's say that Merced college was the population in this class as a sample will be very interesting to consider the z-score of you guys for some characteristic and a z-score of the population if you had all that data and interest see if they were exactly the same or not if you were a random sample it should be pretty close right it should be we'll talk about sample size later on but our sample size was big enough for us to hopefully get that now is going to be exact but it should be hopefully pretty close we'll also do some other things statistically that will allow us to make that transition between samples and populations more easily or we can say okay even though we have a sample that stated you represent a population very very well so the answer should really spoke a long time but answer my question but yeah you can will do a lot more than that but yet let me give you an example of doing this okay so we're going to compare the heights of two people now this data is old you know what the you know Miami Beach right you play too much basketball all you see I don't I know I don't like basketball but I know a little bit about basketball because I used to play tell my coach cut me from that school and sixth grade the man goes you got a lot of heart kid man you suck sixth grade on sort of fat it can anyway maybe crying anyway so we're going to compare the heights of two different people no we're not going to compare the heist directly we're going to prepare the heights of Shaquille O'Neal you know who he is yes he's a play for the landing he did unless I'm mistaken did me there's you play for now Carter is he placement that's like no Boston MLA seriously know all about this it's pretty good but he used to play for the team for sure right okay good and I know this one's true lyndon b johnson used to be a president you know now we're going to compare the heights of those two people now if you compared the heights of lyndon b johnson and Shaquille O'Neal you can definitely say that Shaquille O'Neal is he's taller right he was 85 inches tall would that's pretty tall now lyndon b johnson he was only 70 75 inches tall so clearly Shaquille O'Neal was taller than him but what I want to do is compare them in relation to their respective populations now we're going to do populations because we're going to be considering all the players on the Miami Heat as a population during that year and all the presence of the United States throughout history so that will be considering the entire group of people are you with me we have to sample bit we're taking all of the Miami players taking all of the presents the United State okay so we'll be using this one we will be using this here we're not sampling some populations don't you have you with me on that okay so we're going to compare these two guys lyndon b johnson let's say LBJ even 75 inches tall hot Paul is that a bit because we don't say no I'm I'm 64 inches tall or whatever we say we say I'm like 6-4 6-1 6-4 I'm changed that's 64 he was six foot four the same height as Abraham Lincoln just kiss you wonderful 64 so they equal did you know that now you know that isn't interesting I found out that today because I wanted to make sure my stats for right so I looked up on Google I'm sure they're right they're always so anyway 76 inches a 6 foot for Abraham Lincoln 6 foot for working you have to LBJ because he's newer and we know I could go to Lincoln home you want just change that to Lincoln's same idea so he was 76 inches tall that would be six foot four you divide by 12 the remainder is your inches so we got that now the mean for president so if you take all the Presidents Heights over history you add them all up you divide by the number of presidents how many there are come here several more than like four right there maybe 40 50 something are we in the 50s it's not horrible they all know this but it's a number right okay I really don't I don't know how many exactly there are work for close to 50 the mean height for presidents as I mean by mean for presidents the mean height for presidents is seventy one point five inches so on average they're a little under six feet tall they're like five eleven and a half on average some are shorter summer color the shortest guy was a 5-4 linguist by four and medicine pretty sure I just looked at my cousin Alaska as well maybe it's eleven but so average all together they have almost six feet and the standard deviation for these presidents two point one that's all the information we're going to need to be able to calculate what's called the z-score to calculate how far away lyndon b johnson is away from the president's average height that's what we're going to be doing to see how how far he didi is because we want to figure out is this is this rare or is this normal as it is it usual and then we can compare is is a z-score how far away he is to the Shaq's and see which one is relatively taller clearly Shaquille O'Neal is going to be absolutely taller because like absolute and well maybe you haven't heard of like absolute and relative absolute means highest considering all values just the biggest absolute max will be that absolute minimum be just absolutely the smallest thing that happens but a relative comparison of small areas so relatively we're going to consider compared to their respective meet their respective other presidents and for Shaquille O'Neal the respect of other Miami Heat players are you with me on this so absolutely Shaquille O'Neal is taller but relatively speaking compared to their own small groups who solve and we can do that with a z-score set calculator that's good for them okay really it's Joseph whose boss throw mine on the ground at least twice a week okay Shaq Shaq was you seem to know height seems 85 inches right oh yeah you be 71 then I get in step one to make is 86 then 72 for sure 72 is is 86 inches yeah now we're considering old data here so this maybe shook you do I've already struck like an inch at six the camera takes away like four inches too so I mean really I'm really like six five anyway so chefs heights 85 or considered old data so we're going to to the Miami Heat and let's say I don't know if it's accurate or not but let's say that mean for the heat because I think they had some relatively tall players at one point they might have been the tallest basketball team at one time I think it's what it's came from was eighty inches and they had a standard deviation of three point three inches the question is who is relatively taller of course we just answered shaquille o'neal's absolutely call these 86 inches compared to 76 inches for sure but relatively compared to their own individual populations who's taller by the way I said this earlier but we're going to be using this the z-score for samples or populations here because we're considering all the precedents and because we're considering all the Miami key players we played okay that's why we're using populations so we're going to use a z-score for populations populations so what I'm going to do with you I'm going to help you out with doing this z-score then you can do this on your own it's really not a hard calculation to do it's just a subtraction and that division problem here's what you need to make sure of I hope you're listening even though it's easy you can make mistakes on it listen carefully please two things first thing make sure you're always doing this in the correct order you will get negatives here on some cases the reason why you would get negatives is if you have a value that's less than the mean in our case we have two values that are greater than me you see what I'm talking about but you must do it in correct order it is never that you take the mean minus ax value it's always the value minus the mean that's going to give you positive if it's greater than the mean and negative if it's less than the mean are you seeing that if you do it the other way it's reversed it's switched around that that's not going to work for you and you can't switch it back and forth and go I always get positive z-score yay that doesn't really doesn't work that well okay so we need to always stick with the same routine you take the data value itself minus the mean also try not to round that number too much because remember you're gonna have to divide and you're probably round after that so if you've round twice if you rounded too much the first time it's going to be just a bigger problem second you don't want me but so here's how you calculate the z-score for LBJ we're going to do z-score over here z-score says you take X minus mu over Sigma in our case can you please tell me what the X is yeah absolutely that's the date about you that's his height minus the mean for all the presidents that is in this case sure and lastly we'll divide by the standard deviation for those presents and that's given to you also that's 2.1 so what we do is we just like order of operations tells you to do from long time ago you do the subtraction problem which isn't a very hard one and after you get that answer then you divide by 2.1 and you can round it to one to two decimal places I would say because you're giving some decimals here round it to two decimal places after the number then use the rounding rule most of the time so we subtract that we get how much for one five sounds good / - point one three to four point five divided by two point one and get great we got a number two point one four this is useless if you don't know what that number means okay get two point one four great but I want you to skip through this class just using numbers where they're appropriate I want you to understand what we just calculated anyone have any idea what we just calculated instance away from the mean and standard deviation so it's exactly what we did okay we did calc we we were not an even number of standard deviations away if you've looked it this way if you added 2.1 to 71.5 you do it to point one four times tickin 76 so you are two point one four standard deviations without even number segregations we're two point one four standard deviations away from the mean so we're like remember those marching paper they're like one two and then a little bit prettier with me you got to make it up on your own yeah yeah yeah you told it unfold all this guess all right okay yeah it's all going to be there all you have to do is be able to calculate the z-score interpreting tell me what it means and then use it that's all I want I want you to understand it but it's not hard math wise statistics is really not hard math wise you have a calculator all this is is punching numbers into formulas I gave you the formulas you have a sheet for the formula to test but you have to understand it because if you don't understand it you will not be able to apply it to the problems McGraw you have your book right there's a pull-out sheet in your book it looks like this like this feel like I'm doing show tell uh it looks just like this as a pullout sheet you may use this on your tests and nothing else okay if you need extra tables I will give you a printout of extra tables attached to your test do not write on this because I will not let you use if you write on it okay so I will check this every time we have a test have this no writing you can use this it says all the formulas you will ever need if there's more than that I'll write on the board tables on the back we'll be using these funny-looking tables aren't you excited for school deep I'm excited Thanks so that's what you can use okay back to the math stuff audio data Calculon z-score which we have you can alright again what's the z-score mean everybody a number of Saturn it's a number of standard deviations at data values away from me it's exactly this thing it's exactly that every single time so when you calculate in 2.14 it says that that data value is 12.1 for standard deviations away from me that tells you how far it is right tells you it's varying that much from the mean it's away from me by the way using the rule of thumb within two standard deviations was considered usual is this as a usual unusual consider it's considered unusual unusual because he's outside of the tube go ahead and do this check sex hike do this sounds like a dance move Thank You chef you know what Thomas I will put enough in that video baby someday I do have to do a choreographed dance for the Relay for Life and have water so on the mix of the vehicle and the shack two or less we'll cover that in just a bit okay do that all I should do that in class in like five minutes just hungry to it did you do the Shaq's height it'd be cool to have a name like the Shaq with it or a single name like seal anyway so we calculate this one we've got our what's our X in this case minus our mean and do you notice how we're using a different mean not the same as that one right because he wasn't a president that I know of we we have to use the appropriate data because you're talking about different population now so we have completely different values because again you're basing it back on is relative fruit that's why these are measures of relative standing how he relates to his own group divided by 3 so our z-score here is while 6 divided by 3 point 3 what is that a 2 at the end of 8 could you okay got it good be good broken ok sweetie I didn't do this math I have it I'm trusting you what a mistake is just kidding okay so now we can answer the question who is taller Shaq who is relatively tall look at the look at the z-scores it says compared to their own populations LBJ was relatively taller than Shaq because Shaq when compared to the Miami Heat players wasn't on at all all of them were very tall guys so he's only one point eight two standard deviations away from me he's taller than him right because it's positive I hope you're catching this I'm saying a lot of important things here it's positive which means he was taller than most of them right the average so this positive z-score means you were taller then but he wasn't as much taller than his players as LBJ was than his presidents they're both taller than average LBJ was further taller more taller relatively speaking how many understood that idea they're good for you good now well I'm going to answer question right now Corina how do you tell what as usual and what is unusual for z-scores well you think about it we're really not doing anything any different than what we did on Wednesday when we found out how many standard deviations or in between two numbers we're just doing on an individual data but piece by data piece situation so do you remember the empirical rule I hope so I'm not going to draw it again but the empirical rule said what percentage of data fell within one said deviation and then within two and then within three so within one standard deviation you get 68 in two standard Asians you got 95 within three standard Asians you got nine ten plus seven what's the z-score give you again okay if the ceaseless and carefully please if the z-score is one listen please if the z-score is one that means your one standard deviation away if it's negative one that means you're one centimeter away in the other direction what that means is if you are between negative z-score negative one and positive one you're within one standard deviation of the mean or you follow me on that that means 16% of your data is going to be between a z-score of negative 1 and a 0 to 1 it's the same information it's just given to a slightly different way so if I have by the way what's a z-score what's the z-score if I give you the value of the mean itself if you look at this formula if I said for instance what if LBJ had been seventy one point five right here you get seventy one point five minus seventy one point five are you with me on that what would you get there so a z-score at the mean zero so in the middle of our data we need zero if you go over one standard deviation that's a z-score of one or negative one if you go over to STEMI patients that's a z-score of 2 or negative 2 and 3 standard nations of course we get 3 and negative 3 this looks kind of familiar - I hope the only thing I don't have is this and oh now I have that this looks very familiar to you what percentage of data falls within this range what percent your data falls within this range well % influence in this range 97 says 60% to be between negative 1 and 1 PG score 95% C between negative 2 to 4 z-score 99.7 between negative 3 and 3 what was considered usual the 95% range was considered your usual are you within there what that implies for us so join your lines like this but this range in here is usual what that implies because the same information is that any z-score listen carefully any z-score between the range of negative 2 and 2 is going to be considered usual any z-score outside of that range would be considered unusual it's the rule of thumb it's just applied to z-scores now we haven't changed the rule we just now know the word z-score are you seeing the crossover there so I'll write that out for you a z-score between negative 2 and 2 is considered usual for right now a z-score outside of the range of negative 2 and 2 is unusual I don't know if I need to write the next statement because it's kind of just the core later this if it's between negative to its usual outside of negative 2 & 2 would be unusual unusual to be less than to or greater than to that's outside that range of members right there so since our z-score tells us how many standard deviations away from the mean we are that's exact thats all it does right it's a number segregation away from me it tells us how usual or unusual a data value is that's what the z-score is used to pull it so let's take a look at Shaq the LBJ again let's have the room over there would you say that Shaq has a usual height or unusual height why would you say usual reading yeah well boy that's listened to usual is it at the close end of being unusual it's fairly close I mean supposed to 2 so he's getting up there but still considered usual how about LBJ he crosses over that not by much though right he's not that unusual let's say me at a z-score of for usual or unusual very unusual that's way out there 3 3 would be very unusual okay past 3 4 oh my gosh she'd be like a giant you'd be like Shaq being president that would be very unusual height wise height wise I mean clearly he can be present each qualified but jokey but height wise he he would be very tall for president okay that's that's the idea so because these go tells us the number standard age so it would mean it can also tell us how rare a piece of data is we're going to be using this to our advantage later on to determine whether or not hypotheses are true or false just using this information it's kind of nice very cool we'll get there let's do one more talk about this what if LBJ happened do is he dead it's probably dead right something that he would be resurrected and play for the Miami Heat G cool using this statistics would it be weird like height-wise if you play for the Miami Heat yeah it's probably weird anyway because you know presents really don't play professionally at least so let's find out if LBJ played for the heat if that would be usual or unusual figure that up LBJ plays for heat let's figure that out now because we're putting a let make sure you do this right right now because you're putting lvj on the heat's team what's your x value great that's still his height that doesn't change but what means should you be using that he's now on the heat so we're using him for the heat so we'll have the 76 that's him - 80 he's now on that team I'm not gonna be able to calculate the new average with him on it because clearly I don't know the information so we're going to stick with the same average divided by the same segregation of 3.3 has anyone done that already what do you get negative okay what's negative mean again he means is being slow he's less than the mean clearly us right notice how you have to do this in this order otherwise you get a positive implying LBJ would be higher than average but he's not for the heat so here we get what you say get negative okay so negative one point to one my question is would it be usual or unusual if someone of LBJ's height or play for the Miami one so within that to range so looking back at this this would be unusual to have a president that fall it's a greater than - it would be usual or someone of Shaq's height to be played for the Miami Heat it would be usual common or not rare enough for someone of LBJ's height to be playing for Miami Heat how about mr. Leonard site-77 I shrunk I swear I did I used to be right at 6 feet now I'm just a little sore I can't lying like I do my hair so it just looks taller it's just a lie just to make me feel good let's do 70 - okay so 72 - the what is that 80 over 3.3 do you suppose this account going to come up positive or negative I'm less than the average I should hit you below average but yeah I'm less than the average height so I'm going to be a negative z-score here and remember you have to do this in the same order X minus me and X 1 all the time X minus the mean we're populations or samples doesn't matter take your data value minus the mean for that population that you're dealing with so here we're going to get negative 8 divided by three point three you get that as well negative 8 divided by Q for free and you get negative one partying well you're all together on that some pretty good negative two point a my usual unusual my hopes are dead dang it now this doesn't mean it can't happen does it I just I mean well it does mean it can happen for basketball but I mean hike speaking that does mean it can't happen ish means it's going to be gonna be a little more rare than you probably find with this the ones that guy Muggsy Bogues here here him how small was he yeah played NBA basketball he was so short you like what we do peoples ladies and seriously like I think he did he was that short maybe not between the legs but you know stuff like that so he was really really really short so it's not saying it doesn't happen it's just saying it's rare case and it's very very very rare the further you go standard deviation wise away from the mean the more rare a piece of data is are you getting a concept what that means for z-score is the larger absolute value wise the larger a z-score is that means as evaluation you go more negative also the larger absolute value of z-score is the more rare your data these bit is yes compared so that one little statement that will call it a day but larger the z-score we're talking about in terms of absolute value that means just you know speaking away from me so like a negative for that absolute value wise that's for right so that would have a large C school you the larger the D score in terms of absolute value the rarer the piece of data do you know how to calculate z-score you know z-score cutlets do you know the relationship between the rareness of data and the Z school figure the z-score rarer or less ready honey know for every good today make sense where you folks are connected now okay so continuing on from our three point four we already come concluded the z-score talk and if we have no question no questions on that right okay so we are going to talk about quartiles and percentiles they're very similar I'll show you the similarities as we go through this so firstly let's see what happens when we talk about courthouse let's keyword in quartiles Court Turin court quarter quarter quarter court is a quarter of a gallon right so Pelican poor but that's that's key word is like a court quarter idea and so what quartiles do it really just breaks your data up into quarters that that's all it does it's very similar to the idea of like a median a median breaks up your data right in the middle right it's what's called a median the quartiles are spraying it up every quarter so the same basic idea just there's four three of them instead of one one I'll talk about that in just second so we'll talk about our quartiles here we go first quartile we do have a little abbreviation for that we write q1 the first quartile is going to represent the bottom 25% 25% of course a quarter right the bottom 25% of our data now what is that imply if I say the bottom 25% does that data have to be ordered or can it be unordered what do you think if it's like the median it better be in order right so the bottom 25% would be the lowest 25% of the values that's what we mean by the first quartile so q1 is going to be the data value that represents the bottom 25% of sorted that means in order data okay hey what do you think the second quartile is going to be don't all speak at once it's really annoying you know y'all start talking at the same time what you suppose the second quartile is going to be the first one is the bottom 25% what's the second one yeah the bottom 50% ten can you tell me what other value we already have that represents the bottom 50% of a data set that's in order the medium q2 and median are the same thing okay so notice that a median is the right in the middle right 50% to the right a bit percent to the left q2 is right in the middle receptive left to right so those values are the same q2 and the meeting are one of the same they both represent the bottom 50% of our data so I'll write Q 2 but you never gonna see that you're gonna see median okay so we have q1 which is the bottom 25% q2 or the median which at the bottom 50% q3 what do you think you think should be what percentage the bottom 75% of the data that's right now there's no queue for why do you suppose there's no queue for the everything right why would you need to categorize that talk about the fourth quartile you mean the very top of your your data it's just everything's below it so we don't know how to keep for it's like there are only three quartiles because we're separating this into four sections I think uh visas now before that cutting bread if you cut a loaf of bread with one one cut how many slices do you have if to exercise it so we're cutting our one hundred percent into four quarters we only need three slices to do that does that make sense so we have all of our data we're going up first quarter second quarter third quarter and that makes it inherently a for 25% section then so we only need three q1 q2 median and q3 now if you remember if you don't remember this go back and look at those videos by the way you look at the video so you watch in the middle if you haven't try it if they're kind of fun and make stupid jokes and everything and some of you left you can do the Laos and then sometimes they make a joke and you won't laugh like like you're doing now it doesn't really come on the video so I just like I can look like an idiot see that it came up that I won't look too much like an idiot but anyway um actually let's my train of thought there over time oh yeah if you don't remember this your calculator will find the median for you remember that you just plug in the data and you press one variable statistics the only thing I've taught you so far the calculator but also down the line it will show you q1 and q3 as well won't show q2 because it just calls it the median but so long with the mean and standard deviation all those nice things it'll calculate all this stuff for you too if you don't remember that go back on that day that I did the calculator stuff and review that overcometh see you have a class or something to show you how to do so your calculator will figure this out very quickly you don't even have to put it in order in the calculator now by hand we're going to show you how to do this by hand stop hard thing you'll see it's very similar to find the median let's do an example is like to see an example of how to do this now one thing before I go any further different programs sometimes calculate quartiles differently I know that's weird but that happens like math excel and math lab and there's a couple of the stat something stat forget the program and your calculator some of them will do it differently so what we're going to stick with this is stuff in the calculator over the way that you would do it by hand because I think I mean if you're going to by handing it certain way in the calculation give you that right and it is the same other ones type of it slightly if I just need to point that out too so let's do an example to find out how we can calculate all these quartiles okay what's the first thing we need to check for if we're looking at a data set that we're trying to find this information on what do you think okay is it sorted is it in order great so we need to check that though because if it's not we already cover this on the median it's not going to work out right so in order to find these quartiles here's how you're going to do it the first thing you're going to do is find the median first if you're doing this by hand find the median because that's automatically one of the quartiles right that's q2 so go ahead on your data set right now see if you can find the median remember if the median has an odd number of members just new data set the predictive value if it has an even number of numbers you get average something so find me meeting good so meetings maybe somewhere in the middle of course we have an even number of data points so we're going to look right here at the two middle ones the 10 with 15 I'm going to pick the number right in the middle or in other words the arithmetic average of this the mean of these these two numbers so you add them together you get 25 you divide by 2 you get great so our median Pope 1 5 how many were able to find 12 1 5 good did it good here's the nice thing after you find the median that's essentially broken up your data set not essentially it's literally broken up your data set into two groups the top 50% which you can identify to right since it's sorted and the bottom 50% so notice this is why we set the bottom 50% if you're counting if you're going from the left here's 50% of your data the bottom 50% is what q2 or the meeting will calculate for you now because that does that the quartiles if you think about it hope you with me on this quartiles are just 25% of the data right if you broke it up into 50 and 50 we just divided those in half and you automatically have the other two quartiles so basically if you know how to find the median you know how to find your quartiles you find the median and the whole thing and then the separates into two groups find the median here and you're going to have q1 find the median here and you're going to have q3 good families you sure okay so one thing that I need to know you include all the data values unless you have the 1 data value as the meeting itself that doesn't count so here because we're right in the middle that's that's great this is what we want all four of these are going to be the the data points that we're going to calculate this q1 out so we look at this there's four of them we're going to find the number right in the middle of course there is no exact middle number so we take the three and six we're going to average that and what are you going to get of that how much yeah I think it so find the median here only this time we're going to call it Q one you can't have more than one median so you have the median then the the median of 50% is going to be either q1 or q3 respectively how about at the far end we still have these four data points yeah that's our q3 which you nod your head if you're okay on finding those feel okay with that you see where the if I would have written it do you see where the 4.5 is coming from I sound like magical right you're just you're kind of pretending this doesn't exist for now and you're using these four items to find the median or just call that q1 so you find that by averaging than three of the six you find this one by averaging the 2128 cushion why okay that's a great question the question was what happened to q2 someone else answer that for me bien okay so q2 in the median of the same thing so when we're looking at the bottom 50% that has two ways to describe that q2 or for the media itself it's a good question thank you any other questions give me no okay so in this case our q1 or median and our q3 those values actually aren't even in our data set they're they're just averages between there because we don't have an actual between number we don't have like the individual piece of data this all by itself let's see what would happen if I add one more data point on this so if I keep the same same numbers you should add the 39 of the very hand let's do this one together I'm going to show you a couple things that happen all right firstly let's go ahead and find the median take a look up there it's already sorted for you that's what you need that it's already sorted what is the median here 15 great yolly will identify 15 right and this one's kind of nice you have to average a median is 15 now here's an interesting part about this okay when you're calculating the quartiles now you don't include the median and find the next quartiles so in this case even though we added it I know that's kind of strange even though we added the next another number when you calculate the quartile next you don't include the 15 here and the 15 there you get too much crossover and so you exclude that 15 you go oh okay I can't include this 15 so basically what we do it's kind of like pretending that the 15 is not even there it's just separate your data into 50 percents okay so it'd be exactly the same as what you just did so we're having this data and that data oh yeah that one is going to change his name the bottom one didn't change the top one does what's the top and let's calculate verify that this does come out right on your calculators do you have numbers let's see if we get the same thing to do so let's see one three six and sure enough if you look at that see that on there you get the exact same things you get the on your calculus you have you have the same thing but here we have the four point five then we have the 32 is our Q 1 and our team 3 if you eliminate the 39 you said this it is these two would be the same thing the four point five so even though we add the other data value notice how it's not affecting the first part of our our situation first part of our list how many you'll feel okay about these quartile things it's very mathematical term right these quartile things you'll give the court on things so you figure you can do this on the calculator really quickly you don't have to think about it or if you're forced to you can do by hand well if you feel all right with quartiles let's talk a little bit about percentiles now they're really similar concepts what I want you to think about what a percent does what's it percent do what you do with percent I said you have a 63% what's that mean what's it out of by the way a deed yeah don't community okay so I say let's make it in more positive you got a 91% cuz I go today you know this is a test what if you so you a 91% or so you have 90 what's 91% mean what's out of okay so what a percent means is parts out of 100 that's basically the definition actually a percent means a part out of 100 so percent a percentile separates the data value or the data set into hundredths parts out of 100 so one each one percent of the data that's what we're talking about a percentile you've seen percentiles before haven't you when you go and you take your they call that test in high school yeah that's a T we take your SAT you score in the 50th percentile right or the 70th percentile over the 90th percentile or the 99th percentile for some really really smart people okay what is that what does that mean does it mean that you scored 99 percent on the test or 50 percent of it you score the 50th percentile does that mean you score only 50 percent of your test man I feel pretty I won't feel pretty good about that miksa half and right I worked really hard I know I got more than that see that's that we're gonna find out what that actually means what percentile means right now so let's take a look at that so percentile is very much like quartiles but it separates the data into 100 parts instead of just 4 parts Oh using the principle that we just talked about the thread cutting principle if you're separating your data into 100 parts how many percentiles are there say that again I never sent out that's right because you get one more part than how many cuts you have and so if we want 100 different parts we've cut that data up 99 times it's like if we want 4 quarters we cut it up 3 times you leave Manas so there are going to be 99 percent tiles not 100 percent tiles which begs the question can you ever score the 100 percent top dance there's no you can't because that means you would be outside of the data set you'd be above it and that that inherently cannot happen because if you scored on the test you have to be within the data set right you can't be above 100 percent of people if you're in part of the percentage does it make sense so the highest you can have a score is in the 99th percentile that's it so separates our data into 100 parts what this means is there are 99 percentile okay so how in the world do you calculate the natural percentile it's really not that bad and you know what's funny about percentiles it actually doesn't matter what score you got on the test well how percentile works its percentile compares where you are compared to everyone else you took the test so when you take an SAT and you get the 50th percentile that doesn't mean you scored 50% of the test what that means is that out of everybody who took it you scored better than 50% of people half the people took it does that make sense to you so I mean if lots of lots and lots of people did really really really really well you might be 50% and excellent on that test it's a way of comparing you to everyone else took it so if we did this class by percentiles some of you who wouldn't have a zwu if I gave you that grade like if you get a 50 percentile you get a D you know or F that probably would be too good for you would it cuz if everyone got 90 percent so everybody got that right like 90 percent through 99% and you got like a 95 you'd be right in the middle you'd have a few being a 50th percentile it's just comparing you to everyone else does that make sense here's how you compare you to everyone else so a percentile of X where X stands for your data value it's really just a ratio here's how you calculate it you're going to take the number of values that are less than X the number of values that are less than X so for instance let's say that you scored on something on a test and you want to figure out the percentile for the class you'd find out how many people did worse than you it's kind of optimistic isn't it how many goal did worse than me at swimming so you guys don't feel pretty good about that so you find out how good you did compared to everybody else you calculate how many people did worse than you so percentile of X you count the number of data values less than X number of data values less than X then to compare to everyone else we're going to divide by the total number of people who took that test or in this case total number of data values now if you take this this ratio this is of course our numerator should be smaller than our denominator so we're going to get a decimal out of this so what we do after you figure out your decimal multiplied by 100 that will change it into a percentile so finish off by multiplying by 100 let's give this a try let's say that you you scored you scored 87 out of 100 on the test let's go 287 up of our own sense what is your percentage so you get 87 percent Colin's test rate are you clear on that yet 87 percent it's none of the Hana that's pretty easy let's calculate a percentile after comparing yourself with everyone else you all got together and said that darn mr. Leonard his tests are so freaking hard how'd you do how'd you do when really you did pretty good so you kind of hold your cards so but you want to find out how many people divorce in you and you figure out that 39 people in the class scored worse than you and let's say there's 54 people in your class what I want to figure out is what percentile did you score it first question is this are you scoring in the 87th percentile necessarily can you look at that say on 87th percentile does that tell you what percent on your in it does your percentage I know they're very similar words right they both mean out of 100 one compares it to other people that's percentile one is just the score of your test that's percentage so you sort of 87 percent sure in order to figure out a percentile what you have to do is compare your data value to other people to the rest of the group so that's what we're going to be doing now so really it's not even based on 87th doesn't even matter what I mean you send the computation it's just how you did compared it or else so alright the percentile um well really it's going to be the percentile of the score of 87 I'm going to quotation mark so you don't mean don't think I mean that 87th percentile I mean the score of 87 from right here and how we calculate that is we look at how many people scored worse than you how many do a source a good thing how many people took this test okay could you figure out how many people did better than you sure okay well if 39 people did worse than you and there's 54 people in the class if 38 able a divorce you're in the 14 person right so you don't count is doing better than yourself oh so you'd subtract these but then subtract one from that as well because you can you count you count guys you're important it's not special you count for this so that's that's what you do that and then of course that's give me decimals so we're going to multiply that by 100 what's 39 divided by 50 for something exactly well I've times 9 is 72 point 2 2 2 2 2 1740 so you scored in the 72nd percentile is your point seven two two two two two forever leverages that we multiply by a hundred that's going to do that this one place for us so did you score the 87th percentile no you scored 87% did you score 72 on the test well you just a semi semi percentile notice how the 87 in the 72 really had nothing to do with or no the 87 was your placement in the class and the percentile I'm sorry placement on the test the percentile calculates your placement in the class so percentage your score percentile your placement in the class that's the difference between those things also you can go back and forth between between this and your percentile so if I gave you percentile and this information I said your percentile is 70 years 70 third or 75th percentile or whatever if I gave you that information and I gave you there's like 80 people in the class you could figure out how good help well you did couldn't you you go back and forth in this is an equation it works oh by the way the shorthand version erectus is you'd say that as score of 87 out of 100 would be P 72 that's a 72nd percentile it doesn't mean you scored a 72 out of 100 it means you scored better than 72% of the people who took this test so on an SAT like me when you took your SAT you get the percentile if you scored in the 95th percentile that means you score better than 95% of the people who took it that's a lot right that's pretty good we scored a 70th percentile it means you score better than 70% so even though we equate like 70% as a C in that case you might have done pretty well you score better than 70% of people who take it my average would be 50% have you took it retinol would be the fatigue person if it's normally distributed which it probably is that's a tease a huge test multiples Katie tell me how much this equals in terms of something we discovered this class 25th percentile what's the 25th percentile do you think that we discovered this represents 25% the bottom 25% of data right what else that we just cover that represents the bottom 20 function it is line that's it yeah with that in mind how much is p 50 50 percentile yeah let's definitely our median or q2 we're saving and less you will have q3 that's the 75th percentile one more little definition I need to give you before we go on example to find some of this stuff and talk about a box plot I do need to talk about what the IQR is IQR stands for the interquartile range it's not a hard thing to figure out all you do for interquartile range notice that really going to talk about courthouse we only label two of them as quartiles do you notice that the median we don't say q2 we just say the medium so the interquartile range is just the difference between the courthouse how much space is between there so when you hear i QR which you're going to using in just like maybe 10 five minutes because you're using the five minutes when you hear that IQR it means q3 minus q1 just the difference between them we write that up for you and I'll use it so interquartile range is IQR interquartile range it's just the difference between the two quartiles that we talked about is a few three minus q1 and what represents is the middle 50% of your data the middle half hey how many people in here have ever seen a boxplot before I call that a box of whiskers or something like that you see those do you remember how to make them look like this you see something like that before have you haven't one looks like a car with tires that are really from I'm good imagination so to me difference like that no way this is it's a bit of Star Wars that otherwise this is a box plot or you might have heard box and whiskers before here's what this is it's just a graphic representation of what we like to call the five number summary so I got to tell you what a five number summary is it's not a hard thing you already know how to calculate it and this is just a picture of that so five number summary here's what it is in order to do a five number summary you need well five numbers five specific numbers you first need the minimum value and you need the maximum value and then we're going to take a look at our quartiles and that all together is the five number summary the minimum the maximum comedian which inherently is q2 and then we just have to find q1 and q3 the middle 50% of our data so the minimum q1 q2 q3 and the maximum if we put that on our our graph here's how this looks then you want to three this box what this box represents is the middle 50% of our data or the IQR between q1 and q3 so we're going to put the q1 value here we're second number line just some number love of the box around to basically q3 goes here the median goes there signification of where the median goes in the box tells you where most of the data is line whether it's closer to this 25% closer that 20% so that's kind of interesting there the minimum is going to go out here at this far point and the maximum goes over here so if we can find those five numbers we just put it in on a number line we put a box around it and that's basically it would you guys like to try to example this okay let's do that okay right now on your own you should all be able to do this at this point I would like you to calculate the five number summary telling me the minimum the maximum the median q1 q2 I'm sorry q3 I said cute it's already ordered for you so find the median use that medium to find q1 q3 or you should calculate give you all three of them well a couple of these things are pretty easy to find the minimum the maximum they're already listed for you so one and twenty one that's nice comedian want to find a middle data value so it's one two three four five six seven eight nine ten eleven oh I made it nice breathe in them that means five on each side I'm guessing 989 as you find the meeting that separates your to your two data put a value I'm sorry you choose sets of data you have this bottom 50 percent the top 50 percent that's this one through seven and 12 through 21 since I have one two three four I have five data values here your q1 should be the five did you find q1 appropriately and since we have one two three four five I also have 5 over here naturally and so I have this as my q3 were you able to find the 5-number summary good now we're we go from here to a box plot essentially all you do is make a number line and make it to scale over a left-hand side you put one right hand side you put 21 we're going to do the median first so basically the same word that we found these numbers in where's nine closer to the twenty one or the one so we're going to put it like this would be and just slightly over not not that much closer now we're going to find the q2 that's right a hue long the q3 we'll put them on here as well how about the five is a five closer to the one with the Niners right until it's four oh four years away from each one right and then we'll do the q3 13:13 is definitely closer to the nine you want it as close to as scale as possible okay so draw your bold line first bottom top minimum to maximum then scale it out put it things where they're appropriate where they're relative to are you with illness after that you're almost done I mean just make the box around it and yourself nice five number summary now what it says is that while fifty percent it is right here it looks to me like that twenty ones a little bit outside of what's normal does that make sense to you this is it's a little bit out there that's why you draw up scale now that idea a little bit out there it's called an outlier we haven't talked a little bit about it but I haven't given you a way to determine outlier question is how do you find an outlier he used to look at the data set and say oh that's an outlier that's that's way away from the normal data sets or not so here's a question is 21 and not life is it far enough away from normal to be considered an outlier what do you think some people said yeah some people are going I don't know we're going nuts not too far away what if I made it a hundred 221 that would that be far enough away definitely a what if I made it nineteen is nineteen outlier but no and then if we asked that question getting a little subjective right what do you consider far enough away from normal there's a mathematical way to do this giving compliments to explain it but it's kind of tricky for some people so you might want to really watch this again or really stick with it this time right now okay in order to ground outlier you just have to do two things the first thing is fine the I QR in our example in our example can you find me the IQR what is the IQR how do they the IQR thing the next thing you do this is where the math comes in you have to watch carefully okay very very carefully you have to multiply 1.5 times IQR can you tell me how much is 1.5 times 812 okay are you still with me hooks I mean I need head nods if you are so what we did some bump so far we've found the IQR that's just a difference between the quartiles 13 minus 5 that gives us 8 we multiplied that by 1.5 that guesses 12 this is basic math so far right here's the it's not tricky part but you have to remember what to do what you do now you look at your q1 and you look at your Hugh and 3 you're going to take this number you're going to subtract it from q1 you're going to add it to q3 if anything is outside that range that is mathematically not water does that make sense to you so what you're doing is this you take q1 minus 1.5 IQR you take q3 plus 1.5 a kiwa I'm going to draw that again without the box around it okay so the last couple minutes that we have here we find IQR that's eight we multiply that by one point five that's 12 once carefully on the next thing you do you take the 12 that you just found you take q1 minus that here's q1 here's Q 3 Q 1 minus that what's 5 minus 12 now you decide okay so - flow then you take q3 plus that how much is q3 plus 12 hey listen what you do is now you look at this and you consider your range of numbers the range of numbers we're looking at look at the border is negative 7 to positive point do with it do you have anything in your data set that is less than negative 7 okay you have anything in your data set that is greater than 25 then you have no outliers you have no dollars look at what would happen if I change the 21 into something like maybe a 55 or like a 32 or something so instead of 21 I now have 32 now do I not liar yeah as look at you you look at this you look at this range of numbers he'd say all these are good they're within this range of negative 7 and 25 they were in that range but I would have some numbers outside any numbers outside of that range would mathematically be considered an outlier Bridgend have you understood that okay so you're going to have to do the under test so be prepared to go through this example again and give me mathematically water outliers and what or not okay does it make sense for you
Info
Channel: Professor Leonard
Views: 223,869
Rating: 4.9273963 out of 5
Keywords: Professor, Leonard, Standard Deviation, Standard Score, Percentile, Statistics (Field Of Study), Quartile, Math
Id: lPe-rQA_afU
Channel Id: undefined
Length: 91min 51sec (5511 seconds)
Published: Fri Dec 09 2011
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.