Statistics 101: A Tour of the Normal Distribution

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
a thumbs up that does encourage me to keep making them since I know they are beneficial on the flip side do you think there is something I can do better please live a constructive comment below the video on YouTube and I will try to incorporate those comments into future videos and finally just keep in mind that these videos are geared towards individuals who are relatively new to stats so I'll just be going over basic concepts and I will be doing so in a very slow deliberate manner not only do I want you to understand what's going on but why so all that being said let's go ahead and get started this video is the next in a series of several videos I have done on different types of probability distributions so we've talked about the binomial distribution we talked about the Plus on distribution and this one is probably the most important for your use because we're gonna talk about the normal distribution and if you've had stats you're currently in a stats class I'm sure you have heard about the normal distribution you've probably have heard of the idea of the bell curve and this is the same idea the normal distribution is probably the one we use the most in first-year stats classes so it is by far the most important thing to understand as you get into high-level statistics or more advanced topics even in that first year because so much is based on understanding the concept of the normal distribution so let's go ahead and talk about it now a quick review about continuous probability distributions so in my previous video we talked about uniform continuous probability distributions now we know that the total area of the distribution has to add up to one and then for this continuous uniform distribution any point beyond the edges is zero so all of our area all of our probability is in this rectangular shape now a continuous probability distribution on uniform one like this we give one end a and the left we call the other end be there on the right and then we talk about the width of this rectangle well the width of the rectangle is B minus a of course the distance to be from the axis minus the distance from the axis to a leaves us with the distance between a and B so that is the width of this rectangle now we know the area is 1 so just using some very simple math we know that the area of a rectangle is its height times its width in this case it equals 1 so we know two of those things we can solve for the third which is the height so we use some simple algebra to come up with the height of our distribution here so that's 1 divided by B minus a so that is the height of this rectangle the area is 1 the width is B minus a therefore the height is 1 divided by B minus a now with its continuous uniform distribution we talked about a very important concept the probability of any specific outcome in a continuous distribution is zero and we'll review why that is here in a second all we can do in a continuous distribution is find the probability over an interval of outcomes or a range of outcomes so think about this let's say well we know this entire rectangle the probability has to equal 1 now that's divided in half right down the middle so we know that 1 half has going to have a probability of 0.5 the other half will have a probability of 0.5 well that's 1 over 2 it's very simple now what about if we divide it into 4 equal bars right down you up and down vertically that's 1 over 4 so each bar would be 0.25 the next one is 0.25 0.25 and 0.25 now let's say we divide it into 10 bars well that's 1/10 so each bar would be point 1 let's say we divide it into a hundred bars each bars now has a probability of 0.001 divide it into a thousand bars divide it into a million bars you see what's going to happen here as we divide it up into more and more vertical bars that get more narrow and more narrow and more narrow the probability goes down down down down down and as that happens as we divide it up into an infinite number of bars the probability of any one of them approaches zero so in real life would it think about this in terms of continuous data is maybe you had a chemistry class and you go over to the scale and you you know weigh a chemical on the scale in the chemistry lab well that scale may have three or four or maybe even five digits you know past the decimal point well it's the same way it's the same concept here in continuous data the probability of any specific outcome is 0 because the decimals could keep going on forever so that's the difference between a coin flip or a die roll which has 2 & 6 outcomes respectively or continuous data which has an infinite number of outcomes and it's only limited by the precision of whatever measuring instrument you were using okay so the idea of any given outcome the probability is 0 therefore we can only find the probability over a range of outcomes it's fundamental to understanding the normal distribution and that's why I spent 4 minutes explaining it in case you didn't watch the video on that this slide is in on continuous uniform distributions so you've seen this shape before I bet you have maybe something like this and in front in the front or back of your book your stats book you're probably seen something like this so the whole bunch of crazy numbers all over it it has the bell curve there and has percentages and deviations and z-scores and T scores and stay nines and all kinds of crazy stuff but oftentimes it you don't really understand where this stuff comes from so the idea this video is to try that we're not gonna explain everything in this graphic right here but we're gonna explain you know several things that so you can solve problems using the normal distribution but I'm sure you've seen this before so what about the normal distribution so oftentimes data is described as being normal in the statistical sense what does that mean so let's talk talk about the frequency by which some events occur both natural and man-made so some natural things human the height of an adult human the temperature outside a person's blood pressure etc so those are events that occur in nature that tend to follow a normal distribution or man-made machined products maybe in quality control situations financial data sales data and things like that in business for these measures the average of the mean tends to be very frequent while measures away from the mean are less and less frequent so the data in these situations tends to clump around the mean so let's take a look at the normal distribution and learn more about its properties so here it is now from at least my view from an ecstatic sense in terms of something being sort of attractive I guess you would say I think the normal curve it's a very aesthetic shape but there are some things about the shape we should know on the ends we call these the tails so we have the lower tail and the upper tail sometimes you'll just see here the tails underneath the curve is the probability area so everything under the curve is what we're interested in we're not interested in anything outside of this curved shape just the area underneath it another characteristic of the normal curve is that it is symmetric so it has symmetry down the middle so the left half is the exact same as the right half when we're talking about the perfect theoretical normal distribution now the high point or the top of the normal curve is the mean the median and the mode so the top of the curve there is the same so the mean the median and the mode are the same thing so right there now the mean down at the bottom this tells you where the position is of your normal curve as far as the shape goes that's largely affected or largely influenced by the standard deviation we won't talk about each one of those here in a minute so the two parameters that really give the normal distribution its position and shape are the mean mu and the standard deviation Sigma so let's talk about the mean so I've used just the standard normal curve here so I've set the mean to 0 of course there's no access to the bottom to tell you that but what to say the mean of this curve is 0 now I could slide it over to the left so a mean could be negative twelve point three that's fine so the mean just going to slides it over to the left or we could have a mean of 10 and that would slide it over to the right so the mean here can be any numerical value and what it does is it slides the entire distribution side to side left to right and this is very very important later on when you are trying to determine if two samples or two populations have statistically different means so maybe you are doing experiment and you have one group or one you know something that gets the experimental treatment and you're expecting something to increase then you maybe have a control group where you're expecting no increase so you would be interested if the means in these two groups are statistically different or have significant differences statistically speaking well that's what we're talking about here now it would also have to do with the variance but if the curve so are very far apart the means are far apart then you might have some indication that whatever experimental treatment you did did affect that group okay so the mean just slides the distribution side to side now standard deviation in many ways is more interesting less I think so what it does is it gives the curve really its overall shape so this is sort of our normal curve now this shape as you can see it is taller in the middle and narrower on the sides if our standard deviation is smaller it makes the curve more narrow and taller so you can see sort of the grayish blue curve is below the orange one at the bottom and it's above it at the top and it's keep in mind that whatever area difference is at the low side it's the same as the difference there at the top or that it goes above so there's a smaller standard deviation will make the curve assuming a same the same mean narrower and taller now the opposite is also true if the standard deviation is larger it will flatten and widen our distribution because our deviations on average are further away from the mean so the standard deviation gives us the shape in terms of how sort of short and wide or narrow and tall the distribution is and of course the mean slides it from side to side let's talk about a few more things here so here is our standard normal curve and I've went ahead and highlighted the area underneath it all in red and this is a specific type of curve called the Z distribution it's called the standard normal curve and that just means we standardize our distribution by making the mean 0 and setting the standard deviation to one so again you'll often see it called the Z distribution so down below we can see we have zero as the mean and then we have negative one and one well that's one standard deviation away from the mean we have negative two and two that's two deviations from the mean three and four etc and what we know is that the curve in theory extends all the way to infinity the blue line here never actually touches the axis in theory it goes on forever it's asymptotic it never touches so we have to keep that in mind when we're doing these kind of drawings now I'll just like the uniform continuous distribution the area under the curve here adds up to one but of course it's a curved shape so it presents us a little more challenge with finding certain probabilities underneath it but the area under the curve is still one now if the area under the curve is one if we take exactly half of the area under the curve well that means that probability has to be 0.5 or 50% of the distribution that's just a simple concept if the entire thing is one half of it has to be 0.5 or 50% now we call this here on the right the upper bound of the cumulative distribution so this represents the area all the way from negative infinity all the way up to a Z score or a Z of 0 which is our mean so we could write it something like this negative infinity less than or equal to Z which is less than equal to 0 is 0.5 that's just an interval in our normal curve how can you find that in Excel well if you go to excel and do the insert function command there at the top you can find the norm dist function and you can use this to find the cumulative probability in the normal distribution that's why it's called norm dist in this case X is our upper bound so our upper bound in this case is 0 our mean is also 0 because we're using the standard normal curve where the mean is 0 the standard deviation is 1 again because we're using the standard normal curve cumulative is true or false in this case we're selecting true because we are interested in the cumulative probability up to 0 now if you look that gives us an answer of 0.5 if you see the lower left there form the result 0.5 so we can use the norm dist function in Excel to give us the cumulative probability in any part of the normal curve on the ti-83 which is what I use calculator you can do the same thing there's a function called normal CDF under the distribution menu which is under the VARs button it's one of the top few there is that mean to see this means cumulative in that function now it requires 4 parameters the first one is the lower boundary now in this case the lower boundary is negative infinity there is no negative infinity button on the calculator so we used a negative e to the 99 button and that's there in the middle of your calculator that gives us a very large negative number comma 0 is the upper bound so our upper bound is 0 comma the mean which is 0 comma 1 which is the standard deviation so it goes lower bound comma upper bound comma mean comma standard deviation and then we evaluate that in a calculator and again we come out with 0.5 0 0 0 0 0 etc so in Excel and in the ti-83 and 84 I would assume probably even 85 as well calculators you can get the same information so I wanted to show you how to find the cumulative probability of the standard normal curve using both so what about this cumulative probability so our upper bound is a Z of negative 1 so Z is from negative infinity up to negative 1 and how do we find that well we go into Excel we use our same normdist this time our X is negative 1 because that's the upper bound we're interested in mean is of course is still 0 Center deviation is 1 cumulative is true so when we evaluate that in Excel we have a result of 0.15 8 6 etc so the probability there in the red is 0.15 8 7 or fifteen point eight eight seven percent of the total area but we can do that in the ti-83 and we'll get the same thing so normal CDF then we have our lower bound of negative infinity or X or R there what we're interested in is negative one mean of zero is their deviation of one lower bound upper bound mean standard deviation we evaluate that in the calculator and again we get point 1 5 8 6 etc so using both we got the same answer to find the probability up to negative 1 as e of negative 1 is point 1 5 8 7 now if we do that for all of the Z's on our graph here and again I'm stopping at 3 we get all of our cumulative probabilities so if we start at negative three a Z of negative 3 the probability up to that point from negative infinity up to that point is point 0 0 1 3 now if we go on up to negative 2 we have point 0 2 2 8 and again this is cumulative we go to negative 1 which we just found point 1 5 8 7 then we go to 0 we know that that's the halfway point so obviously it's 0.5 then we move to the other side of the mean now at 1 it's point 8 4 1 3 so it's all the area up to 0.5 and then the next section of the curve then we keep going we have 0.977 2 4 2 standard deviations above and then point 9 9 8 7 4 3 Center deviations above so this is cumulative this is from negative infinity up to each vertical section in our distribution now what if we want to find the area of the interval between each section so we found the cumulative up to each point now we want to find the probability between each section well that's just simple subtraction so if we want to find the area sort of in each vertical section here we just subtract the higher one minus the lower one so for example if we wanted to find the probability in this area between negative 1 & 0 we just take point 5 minus point 1 5 8 7 and that gives us that probability in there so the probability that of Z between negative 1 & 0 is 0.34 1/3 it's very easy to do to find the probability of each section in our distribution the space between the intervals between the deviations that allows us to do several important things you'll see this time and time again in your class what is the probability that is between minus one standard deviation and positive one standard deviation so this middle section here well that is 0.6 8 to 6 so a Z from negative 1 to positive 1 is 0.6 8 to 6 and if you remember at the beginning of the video here I showed you sort of the you know front page it's common to a lot of stats books well this is where that 68.2% comes from we just did it now how about to Sand deviations so negative 2 on the low end positive 2 on the high end well when we add all those up or do some subtraction we have 0.95 4 4 so the probability of Z between negative 2 & 2 is 0.95 for 4 or 95.4% and this is a very important one that's often used because we often use a 95% confidence in problems so instead of 0.2 sort of 2 and negative 2 we often use negative 1 point 9 6 and positive 1.96 which gives us exactly 95% but between 2 plus or minus 2 deviations is 0.95 for 4 and finally plus or minus 3 deviations on either from either side is 0.997 4 so that that captures almost all the probability in our normal curve plus or minus three standard deviations now notice it does not add up to 1 there is some residual probability beyond 3 okay so it does exist and actually it exists forever in each direction but for almost all the things we're going to do we're interested in this plus or minus three standard deviation area okay so have you seen these shapes of four well by now you definitely have so what we did is just using some basic formulas or some basic functions in X
Info
Channel: Brandon Foltz
Views: 262,933
Rating: 4.960053 out of 5
Keywords: brandon c foltz, brandon c. foltz, brandon foltz statistics 101, brandon foltz, statistics 101 brandon foltz, normal distribution, the normal distribution statistics, normal distribution statistics, statistics normal distribution, statistics 101, z distribution, stats 101, linear regression, logistic regression, multiple regression, normal curve, bell curve, normal distribution probability, gaussian distribution, statistics for data science, The normal distribution
Id: 772_n15Ke9Q
Channel Id: undefined
Length: 24min 27sec (1467 seconds)
Published: Thu Dec 27 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.