Statistics 101: Standard Deviation

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to the next video in my series on basic statistics now Tara minors before we get going number one this video is geared towards individuals who are relatively new to statistics so maybe an undergraduate in a 300 level stats class or maybe a graduate student that's just being introduced to quantitative methods for the first time or maybe to someone out in the general population in business or whatever else that might ask a little brush-up on statistical techniques number two reminder is that if you are watching this video on youtube please refer to the description below the video there you will find a link to the corresponding blog post for this video on the blog post you can actually download the very excel file I'm going to use in the video and therefore you can follow it along with me step-by-step so without further ado let's go ahead and dive right in on this video we're going to be talking about measures of variability namely variance and standard deviation now I just want to do a little side note here okay as someone who has studied more advanced assistants let me tell you if you can really get a grasp fundamentally on variability so variance in standard deviation that is such a fundamental building block of more advanced statistics as you go through your studies or whatever else is you do so a lot of advanced statistical techniques make assumptions about the variability in data and there are many other things that have this concept at its foundation so if you're really good a good grasp on variance standard deviation and things of that nature you're really setting yourself up to be successful later on in your work okay so let's go ahead and get going like all my videos I set us up with a very simple problem to an inquiry way of teaching so here's our problem we have two classes and each class has five students that would be nice wouldn't it but class one has five students and they took an exam in the scores were 85 95 75 80 and 90 now that comes out to a mean or average score of 85 just for the record I'm no longer going to use the word average in stats class use the word mean or someone might throw an eraser at your head now in class two they took the same exam the scores were 88-79 91-85 and 82 that also had a mean score of 85 so on the face of it it would appear that these classes you know did about the same on the exam the both had a mean score of 85 but even though they had the same mean does that really tell the whole story about our data so here what I've done is I've went ahead and graphed the scores on two number lines at the top you can see class 1 so their scores were you know 75 80 85 90 and 95 so those blue stars represent the actual scores on the exam and of course I've circled 85 as the mean look at class 2 it had the exact same mean of 85 but look at where the stars are for this class the stars represent their scores so like 79 I was 80 to 85 and stuff like that same mean but look at the data points so what are the number lines tell us about the variability or the spread or the distribution of the exam scores between these two classes now while both have the same mean 85 class one has greater variability this cores are more spread out so the mean does not tell you everything actually the mean tells you very little in my opinion you have to know more I'm actually about your data to get a picture of what's going on so if you look in class one the highest score was 95 while in class 2 the high score was only I think in 91 but also in class 1 the lowest score was 75 in class 2 the lowest score was I think 79 same mean but variability in their actual results one way to think of this problem is we're going to talk about sports here in a little bit and let's say you know as a coach would you rather have an athlete that has more potential on the high end but more risk on the low end of their performance so that would be like class 1 or would you rather have an athlete that may not be as you know high up as some people but also won't be as low as some people on whatever measure they're doing but be more consistent so which would you rather have well it probably depends on what's work you're talking about and stuff like that but you can think of it in those terms so the students in class 2 are probably more consistent in their scores because it's closer to the mean while students in class 1 might have more extremes in their scores even though the means are the same so what exactly are we talking about here we'll talk about variance and standard deviation what we're talking about is how far is each data point from the mean and I actually want you to think physical distance actually distance from the mean so that's the question that variance and standard deviation and some other measures help us answer how far are the data points from the mean of all those points put together now the standard deviation is just the positive square root of the variance and we will calculate it here in a few minutes so once you have the variants you have the center deviation you just take the square root now if you have the standard deviation you can just square it that'll give you the variance but most of the time we use the square the I'm sorry the standard deviation now if most data points are close to the mean kind of clustered around the mean then the variance and standard deviation will be lower than for data points that are more spread out from the mean say in a different data set so good example are the two classes we just looked at in class two the green stars were bunched in closer to the mean we're in class one they were more spread out from the mean the mean standard deviation and very variation or variance should say are most useful when comparing data sets against each other or when testing a data set against a theoretical value okay and we'll talk about that in later videos as well we could actually do an experiment where we will hypothesize that the average height of the American adult female is 5 foot 10 inches that's our hypothesis then we could go out and you know measure the height of you know 30 or 50 fully grown adult females that we could average those heights together and then statistically test whether or not our mean that we calculate is within what we would expect our theoretical value to be so there are ways of doing that and again that depends on the variation that depends on standard deviation and stuff like that so we'll talk about that in another video but of course these bits of information can be informative on the road now we are we do want to learn the appropriate symbols for these the mean is represented with them as an X with a line across it so we call that X bar okay so if you're in stats class you know and you want to sound really smart you can say hey you know X bar blah blah blah you know and you might get an extra point the variance is sigma-squared that's the Greek letter Sigma squared and since the standard deviation is the square root of that the square root of Sigma squared is just Sigma so the standard deviation is represented as Sigma but I'm using the sample variation in this video not population variation so there is a difference between those two things but most of the time in our work we use sample variation because we are working with a sample of a larger population but sometimes we might use population variation now probably most of you watching this video don't care so just don't pay attention to anything I just said about that note so here is the formula so standard deviation that's Sigma equals the square root of the difference between the data value in the mean so that's X minus X bar square all those and then add them up so Sigma is the square root of the sum of the squared differences between the data and the mean and that's divided by the number of observations we have minus one so actually none of this should be a pet algebra and none of this should be you know freaking you out the Sigma just means the sum of so when we subtract our data point minus the meaning then square it we're going to add all those up and that's what goes on top the Sigma just means and all those up and we're going to do some examples obviously here in a minute divided by the number of things you have minus one and then take the square root of it and then that's your standard deviation so obviously we're going to work some examples okay just figure out the variance the standard deviation for our classes there's only five students so it doesn't take but a couple minutes there are scores for class one eighty five ninety five seventy five eighty and ninety now if you look on our formula does that's our X okay so the plain X in the formula that's the score and that left column now we have to subtract the mean from each one of those so X minus X bar is the score minus the mean that's the inside of the parentheses the formula so we're gonna take 85 - 85 95 minus 85 75 - 85 etc and then we get the score minus the mean so that's the inside of the parentheses in our formula X minus x-bar that's exactly what the third column is now as you see we have to square that we have the squared sign next to our parentheses down there so now we have to square whatever is in that third column so 0 squared is 0 10 squared is 100 negative 10 squared is also 100 negative 5 squared is 25 and 5 squared is 25 okay so we just square so that's the majority if not almost all of the top of our fraction and our formula now the Sigma means we have to add all those together so sum all those up so if you can see over there in the red at the bottom to figure out the variance here I remember that Sigma squared that's like a precursor to the standard deviation we're going to add all those up so 0 plus 100 plus 100 plus 25 plus 25 divided by n minus 1 so in this case we have 5 students so 5 minus 1 is 4 so we add all those up divide by 4 and that's actually our variance 62.5 is our variance now to figure out the standard deviation we just take the square root of that variance so tell me what we're doing algebraically is we're taking the square root of both sides we take the square root of Sigma squared which is Sigma then we take the square root of 62.5 as well because you have to do it to both sides and we end up with 7.90 so the standard deviation for our class 1 test scores is seven point nine zero now that may not mean a whole lot on its own but in the next slide I have class two all combined but before I do that remember the line graph which class had scores closer together it was class two so should the standard deviation be higher or lower than class one well it should be lower because the data points are closer to the mean so here is class two and again I made it more statistically appropriate in terms of the symbols and everything like that so we have our scores on the left hand side the mean is still 85 we subtract those to get the difference then we square that difference so we have nine 36 36 zero and nine again we add all those up divided by n minus 1 which is 4 that gives us a variance of twenty two point five take the square root of both sides and we have Sigma equals four point seven four so our standard deviation for class two is four point seven four so there's our graph again and there are standard deviations for class one or standard deviation was seven point nine zero and for class 2 our standard deviation was four point seven four and that should make sense now okay because the data points in class two are closer into the mean and class one the more spread out even though the means are the same okay there's one more statistic I want to tell you about isn't it's not used as much at least in my experience but I still think it can be pretty informative of telling you more about data so the coefficient of variation is a very simple calculation you just take the standard deviation of your data set divide it by the mean and then times it by a hundred to get a percentage for the coefficient of variation is a relative measure of variability it's usually expressed as a percentage and it measures the standard deviation relative to the mean so it answers the question how large is the standard deviation relative to the mean or as a proportion or percentage of the mean so since it is a ratio a percentage it's helpful for comparing data having different means and standard deviations we can tell by the coefficient of variation how one data sets standard deviation relates to its mean as compared to another and it's me and you'll see how that works here in a second but also since it is a percentage the coefficient of variation is also unit independent it does not matter what units our original data is in when comparing data sets and here's what I mean by that we can take some letter data outside we can measure the high temperature for a month and of course dapeng or you are that might be Fahrenheit or degrees Celsius I could also go down to the intersection here on the highway and count the number of cars that go through the intersection between five and six pm over the course of a month completely different data however each data set will have its own mean and own center deviation using the coefficient of variation I can determine which is more variable because it's just a percentage of its own mean so that's what I mean by unit independent is that you can compare different data sets regardless of their units okay but of course in our exams are fictional exams we are using the same units points out of a hundred so to calculate this it's very simple with take our standard deviations so added by each mean which is they're both 85 times 100 so for class one we had seven point nine zero divided by eighty five times 100 so that equals nine point two nine percent so our standard deviation as a proportion is nine point two nine percent of our mean now if we go to class two we do the same calculation and it's five point should be five eight or eighty five I got a digit in there some you know mixed up the big deal but it's significantly less than class one which would make sense because the standard deviation is much lower than class number one was now of course we have the same mean so it's definitely be lower because it's a much smaller proportion of that eighty five mean so even though our means are the same the coefficient of variation confirms or previous conclusions the standard deviation is much smaller relative to the mean in class two so now here is the big version of our video here we're going to be doing some field goal analysis for the NFL which is the National Football League which is an American Football League here in the United States here's what we're going to do we're to collect data for the longest regular season field goal by each kicker having at least ten field goal attempts during the 2010 and 2011 seasons so I've already collected that data so every kicker that attempted at least ten field goals I have recorded the longest kick during the regular season we're going to generate a raffle tool for visualizing our data using a football field diagram where they compute field goal ball placement so where the ball is actually kicked from and the line of scrimmage where the ball is actually snapped from from the field goal length data using Excel then were to compute the me standard deviation and coefficient of variation for the line of scrimmage for each year so for each year there's going to be a mean line of scrimmage where the ball is snapped from for the longest field goals during that year we're going to figure out what that is so it's kind of like a coach saying okay we move the ball down to X yard-line during the last play should we try a field goal to win the game well we're going to figure out where sort of the limit is where you can expect a NFL field goal kicker to kick that game-winning field goal so we're going to figure out actually what that is and then we're actually going to compare both ears so it'll be pretty cool let's go ahead and get into excel and do that analysis
Info
Channel: Brandon Foltz
Views: 62,249
Rating: 4.9472079 out of 5
Keywords: statistics 101 standard deviation, brandon foltz standard deviation, standard deviation brandon foltz, stats 101 standard deviation, standard deviation brandon, brandon c foltz, brandon c. foltz, brandon foltz, brandon foltz statistics 101, Excel, standard deviation, statistics standard deviation, variance, coefficient of variance, statistics 101, NFL, standard deviation statistics, anova, linear regression, logistic regression, football, field goal, standard, deviation, statistics
Id: JIIXQaMXBVM
Channel Id: undefined
Length: 20min 4sec (1204 seconds)
Published: Fri Sep 28 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.