03 - The Normal Probability Distribution

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello welcome to this lesson of mastering statistics I hope you're enjoying the course so far I'm enjoying teaching him enjoying making these concepts a little bit easier to understand for you here we're going to talk about something called the normal probability distribution the normal distribution I think I kind of hinted in the previous section that we were going to end up talking or maybe it was in the first section that we're going to end up talking about something called the normal distribution I cannot stress enough how important the normal distribution is to statistics there are some things that are kind of like cute to learn and understand and then there's some things that are super important and one of them is the normal probability distribution almost all the time when when they take these presidential surveys of small samples and then they try to extrapolate it there's some sort of normal distribution involved and you know usually when you're trying to look at the way in which things are distributed in their length or their weight you know the IQ distribution of people in the country grades I mean there's so many examples the normal probability distribution pops up everywhere it just it's kind of engrained into the fabric of world we live in it's sort of the way what you need to think about it so I'm going to outline what it is with a practical example draw a picture I'm not going to draw much on the board it's mostly gonna be a lecture I'll draw a picture to show you what it looks like and then we'll kind of compare and contrast it to what we learned in the last section we also got introduced to these probability distributions so let's say you go to the supermarket and you pick a watermelon now there's a whole giant shelf of watermill maybe there's a giant you know a fenced off area where they've got like 200 watermelons and you randomly pick one so you don't really look at it you just close your eyes and you grab one all right now you know that there are watermelons of all different shapes and sizes but you also know that watermelons can't be as big as a truck right and watermelons unless their baby watermelons they're not gonna be like as small as a ping-pong ball I'm not talking about babies I'm talking about adult mature watermelons that are in the grocery store ready for you to pick so what we call mature watermelons let's say you pick one of these guys at random and then you get a tape measure and you measure its length from and to end along its longest direction and you measure that length and you record that number right and let's go let's say you go back and do that again and get a second watermelon and then you get another watermelon all randomly each time you do it randomly you get a result that is it's basically its length so you could say the random variable here that we care about is the length of the watermelon because the experiment I'm doing every time is randomly grabbing a watermelon measuring its length the outcome of the experiment which we're calling a random variable X is how long that thing is now we know that we're never gonna get almost never gonna get the exact same length even if they look like they're pretty close they're probably gonna differ by a fraction of an inch or very small amount but we also know that if we look at hundreds or thousands or maybe millions of watermelons that there will be some trends we're gonna figure out that most of the watermelons are gonna have an average length around a certain value right and we know that some watermelons are gonna be a little bit bigger than that and some watermelons are gonna be a little bit smaller than that but still there'll be an average value in most of the watermelons we'll cluster around that now you may have the crazy watermelon that's giant compared to the average and you may have the also the crazy watermelon that's a puny little watermelon compared to the average but still there'll be an average value much like grades in a room there's an average value of the grades some kids are gonna get hundreds or ninety-five some kids are gonna get sixty sixes but there will be a average value but getting back to our watermelon if we want to represent the spread of these watermelons or I should say maybe the probability of drawing a watermelon with a certain length we can draw it in terms of what we call a normal distribution we say that the lengths of these watermelons are normally distributed so in a problem in statistics if you see something is normally distributed you'll automatically know it's gonna have a shape that I'm gonna draw on the board and so let me show you that right now this is called the normal and it has such a powerful name like normal because it's it's the one that's so common everywhere the normal distribution and in fact it's a normal probability distribution really is what it is so let's go ahead and draw that so here we have an axis here alright and I will draw a vertical guy here now I'll do my best to draw but you know I'm not a perfect artist here now for our watermelon we're going to say that the average length if we took a sample of a million watermelons million perfectly ripe farm-fresh watermelons let's just say the average of the length is 20 inches 20 inches so we'll put that there because it's 20 inches now obviously some watermelons are gonna be bigger than that that's gonna be this direction so I'll say this is the length in inches some watermelons will be a longer than that and some will be shorter than that so we might have some that are 25 inches and we might have some that are 30 inches and we also might have some that are 15 inches and we might have some that are 10 inches and we may have some really Dorf watermelons that are even smaller but most of them are gonna be right at the average that's what the average value is so I'll put here I'll put this is the average or as we call the mean we've been talking about the average value of the statistics we call that the mean of the population and watermelons in the world okay so then we want to see what this looks like the normal probability distribution looks something like this it goes like this and it does not touch the axis here notice that it comes down it does not touch the axis but it kind of goes like an S like this it gets closer and closer and closer to the axis but it never really touches it and I'll do my best to be symmetric but I'm probably not gonna do a great job so this goes down like this and this goes down like this so that's actually not a bad job what this is supposed to show you is a perfectly symmetrical distribution I kind of screwed it up a little bit up there but you get the basic idea this goes up like this so what you are looking at whenever you look at a normal distribution like this is here's the mean value of the length of the watermelon so you can kind of think of this axis right here being sort of like the probability right the probability so you can look at this and say hey what's the probability of giving a 20 well it seems to be a maxximum so by looking at that normal distribution I can see that if I randomly draw watermelon I'm most likely gonna get somewhere around 20 now as I get a little bit farther away from 20 I've still got a pretty high probability but as I get farther and farther away from 20 the probability falls off as this bell curve you may have heard of a bell curve that's what it is so when you see normal distribution or normally distributed or bell shaped bell curve it's all referring to the same thing it's all referring to this graph right here basically so you can see a few things about this so this curve represents kind of like the probability of sort of selecting these different values for lack of a better word the highest probability is always going to happen around the mean because that's where most of your by definition if it's the mean value then you know a large part of your population is gonna be right around that value of the mean so the probability of getting a watermelon right around this is pretty high anywhere right around the mean is pretty hot as you get farther from the mean the probability gets lower and lower eventually if you get so far away this graph goes so close to the axis the probability effectively becomes very close to zero if I get really if I get 55 or 85 inches way down off the chart probability is gonna get super close to zero all right so that's the deal with statistics if I get really far this way less than 10 inches let's say probability is gonna get really far to zero close to zero so there's a couple of things I want to make sure I'm gonna hit them the bullet points here as we talked about this so the first ones obvious is called the normal probability distribution number two this is called continuous I'm gonna write that down I'm gonna say this is a continuous distribution notice what is the difference between this and what we had on the board in the last lesson the last lesson the experiment was totally different we threw coins we only had certain kinds of results that could happen zero heads one heads two heads or three heads there is no in-between but with watermelons we all know it's possible that that they could be ten or fifteen or twenty or twenty-five or thirty five inches but they could also be any length in between the possibility is there for a watermelon to have any number of lengths so it's it's not discreet anymore it's called continuance and in real life almost everything is a continuous probability distribution that's going to look like this kind of curve and that's why we study it so much in statistics all right number three the random variable X is the length of the watermelon so that's what we call the random variable X we already kind of said that we do the experiment pull the watermelon random variable is what we're talking about here we graph the normal curve that we have on here is symmetric about this axis I haven't drawn it perfectly symmetric but if you were to fold a sheet of paper and kind of draw the other side it should look absolutely symmetric on both sides and it's bell-shaped we talked about that the number five here is extremely important it's completely this graph is completely defined by the mean of the population of watermelons and the standard deviation of the length of the populate of those watermelon so I'm going to write that down here so the normal distribution completely defined by the mean which if you remember from Volume one we talked a lot about the mean that means the average length of these watermelons and the standard deviation of the length of these watermelons we represent that as lowercase Sigma like that what I'm trying to say is all of the normal distributions they all look bell-shaped but some of them are gonna be fatter some of them are gonna be tall and skinnier and so on and also the they might be centered the reason it's centered around 20 is because the the average value we're just saying in this particular example is around 20 but I may have cantaloupes or something else the average value may not be 24 anything else we were looking at cucumbers the average length of cucumbers might be more like 14 inches or something so the shape and where it's placed is going to be completely defined by the mean which is going to show you where the peak is and the standard deviation which is going to show you how fat this curve is we're going to get to a little bit more about how that works in the next section but I want to just tell you that you pick a value of the mean and the value of the standard deviation and you completely lock down what your bell curve looks like or what your normal distribution looks like very important thing I want to show you here for the next guy here this this should blow your mind a little bit here total area under curve meaning under this normal distribution curve is equal to if you had to guess what do you think it would it would be equal to it would be equal to one because if you think about it we talked about the discrete probability distributions before we said we covered all outcomes so when you add all the probabilities up you should get one because there's only certain number of ways in which that experiment can end or unfold and so by showing that they're all equal to one if you add them up we've covered all outcomes well this is representing all possible outcomes of pulling watermelons sometimes we'll get 20 sometimes you get 15 less likely sometimes we'll get 30 that's less likely we're covering the probability distribution of all possible lengths so if we want to figure out if if the height of this curve is sort of representing the probability the relative probability of getting a watermelon then if we find the area under the curve it's like it's like adding up all of the little Heights if you think about it the heights under this curve plus the next one plus the next one plus the next one you're getting the area under this curve we're getting into a little bit of calculus but the idea is the area under this curve is like adding up the vertical length touching this graph next and the one next to it and one next to it and one next to it and the one next to it you add all those together all those probabilities you're getting the area under the curve so in probability or in statistics when you have any kind of probability distribution that's continuous like this the area under the curve is always always always going to add up to be 1 because we're trying to cover all possible outcomes all possible lengths everything we can get from watermelon land is going to sum up and be covered by this curve so it has to be equal to 1 because this represents the probability and we add everything up and it has to be equal to 1 next thing I want to show you this curve here I mentioned it before does not touch the x-axis it goes psionically which means it almost touches but it never quite gets down to that axis so that is a probability of distribution the most important one that you'll ever study and the one that governs 90% of everything you'll do in statistics now I want to show you something we've drawn it we've talked about it we I think you kind of have an idea of what it's for it we haven't done any problems yet but by this point you should understand what it represents what it means even if you haven't worked with it yet you should have that kind of thing now we said that probability distributions can be represented in table form like we did before in graph form also in terms of the formula now this graph just like any graph from algebra can be written down in terms of a formula I want to show you what that is so this is the formula or the equation for the normal distribution and it's it's pretty cool we're not going to use it much I'll explain why in a minute but I want to show it to you so f of X in other words the graph here is equal to 1 over this is the standard deviation Sigma times the square root of 2 pi times e this is that special number e to the minus X minus the mean that's mu squared divided by 2 Sigma squared this is an exact representation of what this is a drawing of on the board all right so this is the formula for the normal distribution so depending on what class you're taking in statistics you might study this in detail and use a lot or you may not use it at all but I want to illustrate it for you so that you at least know that this equation that this graph here doesn't just come out of nowhere it's represented by a very special equation which is this right here e is that special number 2.71 and a bunch of decimals after it it's related to the logarithm on your calculator if you've studied logarithms and my in my classes and if you think about it what I guess what I'm trying to tell you is if you stick this equation into a calculator or a computer and you lawd it you will get a shape that looks like this bell curve right and it's not obvious at first you can't look at this and just know that I mean if you're looking at that and you don't understand it or if you're like I have no idea that looks like a bell curve that's cool I don't expect you to look at this and just know that I'm just telling you that now let me show you here even though it looks very complicated I want to make sure you understand too is just a number pi is just a number taking the square root of those things is just going to give you a number so this thing right here is just a number e is just a number you're raising it to the power of negative and then inside the exponent what you have is Sigma which is your standard deviation and then you have the mean and then you have X here and you have standard deviation down here so if you look down here notice I told you the normal distribution is completely defined by the mean and the standard deviation what I'm trying to tell you is this represents the normal distribution if you put a standard deviation for here and here and put the mean in here everything else in this equation is just a number I mean think about it if I tell you standard deviation of two and a mean of five dough's are just numbers so if I knew that then I would put there's a number there's a number there's a number there's a number there's a number there's a number and there's a number once I lock down the mean and the standard deviation everything in this entire equation is just a number except for X right and that's because you're plotting it so the way you would do that is you would dump this in your calculator all of these would be numbers all of this stuff would be numbers except for of course X which is what you're plotting against right and then if you plot that guy you get a bell curve so for those of you who have a nice graphing calculator or a computer that you like to use I actually kind of encourage you to do that just take this equation and pick anything you want pick a standard deviation of one and pick a mean which is this one of one that's easy right or you can pick a mean of zero if you want okay and then what's gonna happen is everything in this equation is going to reduce to a number except for this and you plot it for all values of X and then what you're going to get is a graph that looks like this all right we don't use this very much in statistics though or at least not an inch three statistics because in the back of your textbook whatever book you're using you have a table of values that calculates the different values along this bell curve notice that I said before that usually you're interested in the area under the curve and I told you the area under the curve is equal to one if you add up all the area under the curve well it turns out later on in later sections we're gonna be very interested in finding the area under this curve not just for the whole curve but we might want to find the area between two parts or whatever and you'll see as we get there that we're going to be very interested in calculating the area under them under the normal curve it's going to be how we calculate answers to our problems if we didn't have any tables we would have to calculate the area using the actual formula and the way you have to do that's with calculus so I'm not going to get into that right now because a lot of you taking statistics have not taken calculus but for those of you that have if you remember calculus about half of calculus is all learning about how to find area under the curve right and there's ways to do that in calculus so if you were in a more advanced statistics class or if just depending on your professor you might use calculus methods to calculate the area using this guy but for everybody else in the world and 99% of statistics classes what you're gonna end up doing is using the table of values in the back of your textbook to find the area under these bell curves so I'm showing you this equation mostly to kind of open your eyes and show you that this does come from someplace I also want to show you that the shape of the curve really is locked down between the values of the mean and the standard deviation if you change those values you change the shape of the curve but it always does look bell-shaped that's the other reason I'm showing you this and I'm also kind of just kind of trying to broaden you and just kind of show you that you know hey sometimes if you're using more advanced techniques you might actually plug this in and use some calculus to find the answers but in almost all cases we don't need to do that because in the back of your textbook its tabulated what the areas under these curves really are and so that's what we're gonna use 99% of the time so as we move forward we're not gonna look at this very much at all but what we are gonna do is use the concept of hey we have this normal distribution it's very important the area under the curve is very Orton for finding probability of outcomes and then we're going to use the tables in the back of the textbook to find those answers there so this is an introduction to the normal distribution follow me on to the next section where I'll draw a few more pictures and I'll show you how the mean and the standard deviation when you change them changes the shape of the normal distribution but no matter what values you choose it always looks bell-shaped like it does here
Info
Channel: Math and Science
Views: 367,558
Rating: undefined out of 5
Keywords: normal distribution, probability distribution, normal probability distribution, statistics, statistics normal distribution, z chart, continuous probability distribution, math, probability, normal, distribution, z scores, standard, probabilities, introductory statistics, stats help, stats tutor, ap statistics, intro stats videos, standard deviation, normal distribution example, normal distribution table, normal distribution graph
Id: gI5y3RZe9fk
Channel Id: undefined
Length: 20min 26sec (1226 seconds)
Published: Wed Aug 16 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.