Statistics 101: Sample Mean Proximity to Population Mean

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello thank you for watching and welcome to the next video in my series on basic statistics now a few things before we get started number one if you're watching this video because you were struggling in a class right now I want you to stay positive and keep your head up if you're watching this means you've accomplished quite a bit already you're very smart and talented and you may have just hit a temporary rough patch now I know the right amount of hard work practice and patience you can get through it I have faith in you many other people around you have faith in you so so should you number two please feel free to follow me here on YouTube on Twitter or on LinkedIn that way when an upload a new video you know about it on the topic of the video if you like it please give it a thumbs up share it with classmates or colleagues or put it on a playlist because that does encourage me to keep making them on the flipside if you think there is something I can do better please leave a constructive comment below the video and I will try to take those ideas into account when I make new ones and finally just keep in mind that these videos are meant for individuals who are relatively new or brand-new to stats so I will be going over the basic concepts and I will be doing so in a slow deliberate manner not only do I want you to understand what's going on but also why all that being said let's go ahead and get started so this video is the next in our series on inferential statistics now again if you look at the beginning of that word inferential it has the word infer well when we infer something that means we're sort of making a conclusion over making a claim about something without necessarily having all the information we'd really like to have to make it so inferential statistics is basically the same thing in the area of stats of course so in the previous videos we talked about things like point estimators so we have population means which we often do not know but we use the sample mean - estimated that is an estimator we often do not know the population standard deviation but we do have the sample centered deviation and we can use that as a point estimator so in the previous video right before this one we talked about the standard error of the mean and of course the standard error of the mean is the standard deviation of the sampling distribution so we take many samples and we find their means then we take those means and we put them into a distribution and the standard error of the mean is the standard deviation of that distribution now what we can do is we can use that concept to take a sample from a population and depending on the sample size and the population standard deviation we can figure out the probability that that sample mean is close to the actual population mean so this is what this video is about when we're doing these inferential point estimators they have to be good so the idea is taking the sample mean and figuring out just how good an approximation it is to the unknown population mean so know that's a lot before we get started but that's going to sort of lay the groundwork for what we're doing so let's go ahead and get started so I'm not going to go through the whole problem we were using in the previous couple of videos but just suffice to say that there's an asphalt company that does quality control measurements on its product and they've determined that the population mean is 3,200 and the population standard deviation is 150 so they're measuring the viscosity or the flow of their asphalt their road paving product now you want to point out that we seldom know these two things if we knew them why would we be trying to estimate them the reason we're doing it this way first is that it's easier to learn these concepts if we know the population mean on the Left mu and the population standard deviation Sigma 50 there on the right so eventually we'll get to the point where we do not know these things and we will have to estimate them based on the sample but here is the information we're working with and then we will use this to develop this concept now what we talked about in that previous video is we took several samples from the batch of asphalt this company is doing the quality control one so we took many samples and we took the means of those many samples and then placed those into their own distribution and it looks something like this now the question I ask you now is there anything about this shape that looks familiar well if we kept doing more and more samples more and more and more I think this is representing nine samples but maybe we did a thousand samples over a long period of time we put them into a distribution it would look very much like the normal curve and that's one of the sort of precepts of one version of the central limit theorem that if we take many samples from a population of any shape and then we do a distribution of those sample means it would be normal so again this is just a distribution of many sample means from our asphalt data now one question we were asking ourselves last time was what is the standard deviation of this sampling distribution so remember this distribution is not about the original overall batch of asphalt this is a distribution of sample means that we want to know about the standard deviation of this distribution and of course that is called the standard error of the mean so the standard error of the mean is just a different name for the standard deviation of the sampling distribution and it has this formula so sigma sub x-bar which is the standard deviation of the sampling distribution equals sigma which is the population center deviation divided by the square root of the sample size in so went ahead and labeled OHS there now again we don't always have and actually rarely have Sigma okay we usually have to estimate it but in this problem we have it to work with so again we will do examples in future videos where we do not know Sigma and have to estimate it but for this case we do and we'll look at that obviously in another video now we hooked briefly about the influence of sample size in the last video and I'll just go through it real quickly here so if we had a sample size of 15 the standard error of the mean for this problem would be thirty eight point seven because we know that the population standard deviation is 150 divided by the square root of the sample size so in this case the square root of 15 so does thirty eight point seven now what if we up the sample size to 135 look what happens to the standard error now it's twelve point nine see what happens now let's up it again how about a sample size of 500 now what happens to the standard error of the mean now it's six point seven one so what's happening as the sample size increases what happens to the standard error of the mean it decreases now a better way to think about this is that it narrows because remember the standard error of the mean is a standard deviation so when the standard deviation decreases the distribution literally narrows or squishes inward around the mean so sample size is important because increasing it up to a certain point decreases the standard error it decreases the standard deviation of the sampling distribution now if we put these in actual curve form like normal curve form we can see that the sample size of 15 the green curve is lower in the middle and whiter the ends and then the 135 is sort of there in the orange color in the middle and then the sample size of 500 is much taller or narrower in towards the mean in the middle and that's the idea of the standard deviation decreasing as the sample size goes up and that's physically I want you to think about the distribution of the sampling the sample distribution going inward in towards that mean so a larger sample size decreases or narrows this standard error so the values of the sampling distribution will have less variation and therefore this is the important part therefore will be closer to the population mean so again in this case we know the population mean but in many other cases in most cases for that matter we do not know it so the larger sample size helps us reduce the error and get in closer to what the population mean actually is which most more often than not we do not know so a larger sample size up to a point generates a better approximation of the underlying population because it minimizes the sampling error and that's sort of the basic concept of the relationship between the standard error of the mean and sample size so to get to the heart of what this video is about so all that was pretty much some of the we did in the previous video some of change a little bit for this for this topic but really is get to the heart of what this video is about and that is the sample means proximity to the population mean so the question we're really asking in this video is how close is the sample mean to the population mean now the important thing to keep in mind that I said before is that the standard error of the mean is really a standard deviation it's just the standard deviation of the sampling distribution so again taking many same apples we put those sample means into their own distribution and then the standard error of the mean is the standard deviation of that distribution now since the standard error of the mean is a standard deviation like any other standard deviation we can use that fact to draw a normal distribution using z-scores okay now I'll mention this later but it's very important to point out that we can only use z-scores if we know the population standard deviation which in this case we do if we do not know the population standard deviation then we use t scores or the T distribution which I'm sure you've probably heard of but in this video that doesn't apply we know the population standard deviation to 150 so we're going to be using the z-scores now the standard error of the mean of course is influenced by sample size they are roughly inversely related and we'll talk about this kind of a little caveat to that in a later slide but basically they are inversely related as sample size goes up the standard error of the mean goes down okay so let's actually take a look at some curves and the data we were working with earlier so here is our standard normal distribution so our mean is in the middle and then we have z-scores of 1 2 3 4 and negative 1 negative 2 negative 3 and negative 4 so this is the sampling distribution of the mean okay now 3,200 is what we are told the population mean is so we'll go ahead and put that down there for now believe it or not for this it's really not that important but I'm just putting it down there to show you sort of the center of this distribution now in our first case we want all of our cases or population standard deviation Sigma there at the top is 150 that will not change as we do these standard errors of the mean because that's part of the population distribution now the sample size in will be changing so we're going to look at what happened to this distribution as the sample size changes so in this case n is 15 now when we go ahead and do that out we find that the standard error of the mean for a sample size of fifteen equals thirty eight point seven so the standard error is dependent on the sample size and of course the population standard deviation stays the same there on the top now what we can do because because this is a normal curve and we're using the Z distribution we can actually put in the numbers for the z-scores so if the standard deviation the standard error of the mean is thirty eight point seven down here at the bottom we can just add thirty eight point seven to our middle point thirty two hundred to actually get the values of our z-scores so you can see that one deviation above the mean that's thirty-two hundred plus thirty eight point seven so that's three thousand two hundred and thirty-eight point seven and then we add thirty eight point seven to that to get two standard deviations above we add thirty eight point seven above lat to get three above and then we do the same thing for the left side except we're subtracting so again since the standard error of the mean is a standard deviation we can treat it just like any other standard deviation to get our values down there on the x axis now we do the same thing for our other sample so here we have a sample size of one hundred and thirty-five so we went ahead and found the SEM I'll just say the standard error of the mean is SEM for a sample size of 135 and now it's twelve point nine so again we can go ahead and put our numbers down here on the bottom by adding and subtracting standard deviations to the right and subtracting them of course to the left and those are our numbers down there now you should notice that the extremes plus or minus three standard deviations is now a smaller range and that's important we're going to talk about that later but that's that the distribution isn't really changing it's sort of the the values along the x-axis so even though physically looking at this curve it's not getting narrower what's actually happening is that it is narrowing because the numbers along the bottom are coming in closer to the middle and now we have a sample size of 500 so the SEM here is 6.71 so it's much much smaller so again we can go ahead and put our numbers down there at the bottom by adding and subtracting 6.71 from either side and get I kind of round it to six point seven which is fine now you can see that the numbers are really close in to the mean because the standard error is so small see see how this works every different one has a distribution based on a different standard error of the mean we're going to do some all kinds of things with these curves here in a second let's want to show you how the numbers on the bottom relate to the SEM up there on the top right now if I do a chart to kind of summarize everything out let's take a look at this so on the left-hand side we have our sample size in 215 135 and 500 the next column is our standard error of the mean we just figured those out so 38.7 and so forth and what I did in the right hand side of this chart is just transfer over the numbers from the centered deviations or the z-scores so for a sample size of 15 we hit a standard error of the mean that's thirty eight point seven and then when we did the addition of subtraction we came up with these numbers and then we did the same thing for the next sample size and then the 500 sample size so you just take a quick look look at minus three standard deviations from the mean for a sample size of fifteen it was three thousand eighty three point nine for a larger sample size it was three thousand one hundred and sixty seven point three what's happening here is it moving closer to the mean or is it moving further away from the mean well it's moving closer because 31 61.3 is closer to 3200 Dan 3000 83.9 now look at the one below that so now we have three thousand one hundred and seventy nine point nine that's even closer to the mean because we had a larger sample size so now you can kind of see how this is working here as the sample size goes up the distribution is actually coming inward towards the mean and that's the point now we can think of these as the sample mean plus or minus three standard errors that's exactly what we did in the previous slides and in this chart so don't let those symbols freak you out or just figuring out the range from the end of this table now for the sample size of 15 the range the farthest from the mean out to three standard deviations its 116 point one so you can see that 3,200 all the way up to 33 16.1 therefore a sample size of fifteen three deviations above the mean if you subtract those two that's 116 point one so for a sample size of fifteen we have the sample mean plus or minus 116 point one that's the three standard deviation range now for a sample size of 135 that goes down to plus or minus thirty eight point seven so again the distribution is narrowing in towards the mean and then finally for a sample has a 500 that plus or minus is twenty point one so again you can see that the mean is 3,200 if we go up to three standard deviations above that for the sample size of 500 it's 32 twenty point one subtract those and we have a range of twenty point one so again this distribution is narrowing in towards the mean and you can actually see that through the Apple of the z-scores okay so here are curves again so we have our sample size of 15 135 and 500 again these curves are general so they're not exactly perfect but they give you the idea so for the sample size of 15 we had plus or minus 116 so that is wider that is further away from the mean for the sample size of 135 that was plus or minus 38.7 so again you can see how the orange distribution is in closer to the mean because we went from plus or minus 116 point 1 to plus or minus 38.7 and then for the sample size of 500 it's even closer in so that's plus or minus 20 point one so again each curve is getting in closer and closer and closer to the mean now the normal curves we looked at before the numbers along the bottom were what was changing now in this graphic what we're actually doing is moving the curve in and the numbers along the bottom would remain constant so don't let that core to sort of confuse you they're both representing the same thing in one graph we change the numbers along the bottom and the curve stays the same in this graph the numbers on the bottom would stay the same and we're moving the curve in so same idea so another question we can ask and this is really important getting at sort of one of the most important things inferential statistics confidence intervals the question we're asking is for each of our sample sizes what is the probability that the sample mean is within 15 of the population mean so what is the probability that the sample mean is within 15 of the population mean which is 3,200 so we can write it like this what is the probability that the sample mean x-bar is between 31 85 and 32 15 that's 15 oh and 15 above the sample mean so that's how we would write that okay so let's go ahead and find our first probability now remember on the upper left-hand side we have a population standard deviation of 150 that will be the same for all of these and in this case we have our sample size of 15 over the top right we have our standard error of the mean for a sample size of 15 and that was 38.7 now the curve here is the same one we used in the previous slide for the sample size of 15 so in the middle we have 4 3200 which is our population mean and then we have deviations out from that I remember the question is what is the probability that our sample mean of a sample size of 15 is between 31 85 and 32 15 so I'm going to assume that you know how to find the probabilities of regions of a normal curve so in previous videos I talked about how to do that in great depth so if you're still not sure how to do that please look at my previous videos or look at your other notes in your class I do it very simply on a ti-83 calculator under the disc menu it's actually very simple to do now the question we're asking ourself is what is the probability that our sample mean is within this region as defined over here on the right so we have those numbers in our probability statement on the right and then on the bottom we have sort of our scale of this distribution now remember because the sample size is small our distribution is going to be relatively wide so let's go ahead and plot 3185 and 3215 onto this normal distribution so 3185 is about there so if you go over a little bit from 3,200 the next stop is 31 61 so 30 185 is about right there I sort of I bought it but so it's not important to be exactly on that so let's go ahead and do 32 15 and it's about right there so again all I did here was I put the points the end points of our probability statement onto our graph down here at the bottom of the brown sort of oval shapes and then just sort of put lines up to show what region that is now the blue arrow in the middle there that represents the region we're interested in so we want to know the probability or the area under the curve for that region represented by the blue arrow now when we do that we come up with a probability of 0.3 0-2 so the way to think about this in terms of how you interpret it either in your class or in your place of work the way we would say it is like this and that is the probability that the sample mean for a size of 15 a sample size of 15 is within plus or minus 15 of the population mean is 0.3 0 2 or a little bit over 30% now that's not a very high probability and the question to ask yourself is why why is that probability relatively low it's only 30 percent so the probability that our sample mean for a sample size of 15 is within 15 of the population mean is pretty low well why is that now if we take a look at a comparison to the sample size of 135 now look at where our endpoints are so we're probability statement 31 85 and 3215 when we graph that on to the 134 the sample size of 135 distribution now look at where they are they're beyond once their deviation from the mean so when we figure out that probability that is 0.75 5 so the probability that the sample mean for a sample size of 135 is within plus or minus 15 of the population mean is point seven five five again why is that why is this the case well we have lower error remember when we increase the sample size we decrease our standard error of the mean that standard error of the mean is a smaller standard deviation so that squishes the distribution in towards the mean now because that distribution is squished in towards the mean will be graph the 31 85 and 32 15 points they will appear sort of further out but remember they really haven't changed what's changed is that the underlying distribution has come in has come in word so we're again interested in this region here in the middle so by increasing the sample size we are increasing the probability that our sample mean is within fifteen of the population mean again by increasing the sample size and thus lowering the error now let's look at our sample size of 500 now we go ahead and plot our endpoints here again so we plot 30 185 and 32 15 now look how far out they are from our 3200 and again why is that it's not that the points have moved outward it's that the distribution has moved inward so the points are still the same there's still 31 85 and they are still 32 15 but the distribution has squished inward towards them so remember the graphic where we had the three different colored normal curves on it that's what we're doing here the distribution is moving inward so the way it looks on this graph is that the points are moving outward but they're not the points are stationary the curve is moving inward towards the mean now what's this probability for all of this area that is point nine seven five that's a really high probability so the probability that the sample mean for a sample size of 500 is within 15 of the population mean is point nine seven five or ninety seven point five percent and that is a very low error and why is that that's because our sample size was large so large sample size disk decreased the standard error of the mean which is another way of saying it decreased the standard deviation of this distribution and therefore it pushed the curve in closer to the middle making the points here appear further out but again they are stationary so it all has to do with that sample size and how it affects the standard error of the mean so the probability that our sample okay that our sample mean is somewhere in that space in that part of the curve is 0.975 okay the last thing I sort of promised this in my previous video so I'm gonna go ahead and show it to you and that is the sample size diminishing returns now some people tend to think when they get into statistics and start learning about sampling and sample size that the larger the sample size is automatically it must be better so if we can get a sample size of a hundred thousand then that's somehow better well the reality is it's not necessarily true and here's why so remember the function or the expression for the standard error of the mean is this so standard error of the mean equals the population standard deviation divided by the square root of the sample size now in this case the population standard deviation was given to us it was 150 which we've already seen several times now what we can do we can express this as a function and then graph it so if we treat sample size here in as a random variable we can save this expression in this way so the function is equal to 150 divided by the square root of x so again X is going to be a variable that we can just graph like any other graph you've ever done in a graphing calculator or where else so we can look at the standard error of the mean for the entire range of sample sizes and that's the cool thing so when we do that this is what we get so look at this graph for a minute now remember our sample size is for a population standard deviation of 150 so this graph is only really applicable to that population standard deviation of 150 now look at the shape what is this telling us okay now let's look at our some of our other points remember our first sample size of 15 well the sample size is along the bottom that's our x value if we go DIF over to 15 and go up toward the red dot is and then we go over to the left the y axis the y axis is our standard error of the mean so remember I think it was what thirty eight point seven or something like that 39 point seven so if you look that's where the red dot is relative to the y axis now what about our other one we hit a sample size of 135 so if we graph that put a red dot where the sample size is 135 and then we go over to the y axis it's about twelve point nine see so this function actually represents the standard error of the mean for any sample size now the important thing to note about this and this is the idea of dimensioning returns note that as the sample size which is the x-axis increases the curve flattens out so the standard error of the mean is very high for small sample sizes so that's why the curve is very high there at the beginning of the x-axis which would make sense remember now as the sample size increases the standard error of the mean decreases pretty rapidly but after a while the curve really flattens out so what that means is that increasing the sample size reduces the standard error less and less as the sample size increases so every u number of samples we increase the sample size passed in this case maybe say it's not always an exact science but say maybe past 125 or so the standard error of the mean doesn't really decrease much more because the curve flattens out and that's this idea of sample size diminishing returns so when you're doing your research or whatever else you know large sample sizes are good because they reduce our error but the idea isn't necessarily to get the largest possible sample size you think you can do so again sample size depends on how much you cut up and slice your data into subgroups and things like that but the infinitely large sample is not the goal because you have diminishing returns relative to your standard error okay so a summary this remember the standard error of the mean is merely the standard deviation of the sampling distribution for a given sample size and remember the standard error of the mean is the exact same for a sample of that size so the standard error of the mean for a sample size of 15 in this problem is the exact same for any other sample of 15 in this asphault problem okay so the standard error is all about sample size and then the population standard deviation you are given we can use a standard error of the main to find the probability that the sample mean is within a given range of the population mean that's what we did in the past three normal curve graphs a larger sample size reduces the standard error of the mean up to a point then diminishing returns kicks in and we'll talk about sort of that break point in a later video now these examples do assume we know the population standard deviation Sigma most often we do not and therefore we have to estimate it and we'll talk about that also in another video now because we know in this case the population standard deviation Sigma it's okay for us to go ahead and use the standard normal Z distribution if we don't know it we would use the T distribution and again we'll talk about that in a future video now we're ready to begin constructing what are called confidence intervals which are a mainstay a very important part of foundation of any introductory stats class okay so that wraps up this video on the sample means proximity to the population mean and again that this comes from the idea that we often you do not know the parameters of the population therefore we have to use samples to approximate them but of course our samples are going to have some error built in because they're not perfect and we discussed that at length in the previous video but we can make some quantitative evaluations about these samples using the techniques we did here because every sample size will have its own distribution of the sample mean and that has to do with the standard error of the mean so we can know within a certain probability whether or not our sample mean is likely to be within a certain range of the population mean okay so again when we do confidence intervals this will be even more concrete but this is just a basic idea leading up to them okay so it's a few things and then we'll wrap up if you're watching this video because you are struggling and in class right now I want you to stay positive and keep your head up your smart and talented I have faith in you many other people around you have faith in you so so should you feel free to follow me here on YouTube on Twitter or on LinkedIn that way when upload a new video you do know about it if you liked the video please give it a thumbs up share it with classmates or colleagues or put it on a playlist that does encourage me and other screen casters here on YouTube to keep making our videos if you think there is something I can do better please leave a constructive comment below the video and I will try to take those ideas into account when I make new videos and finally just keep in mind the fact that you're on here trying to learn try to be a better student trying to improve yourself as an employee or what have you that's what really matters I firmly believe that if you have the right learning process in place then the results will take care of themselves so thank you very much for watching I wish you the best of luck and your studies and your work and I look forward to seeing you again next time you
Info
Channel: Brandon Foltz
Views: 60,602
Rating: 4.9663582 out of 5
Keywords: central limit theorem brandon foltz, statistics 101 central limit theorem, sample mean and population mean, population mean and sample mean, population mean, population mean statistics, sample mean vs population mean, sample mean, sample mean statistics, mean, interval estimation statistics, sample size, brandon foltz, statistics 101, anova, brandon c foltz, estimation statistics, sample, confidence interval, population, interval estimation, statistics, standard error
Id: QbGRTWLjp7c
Channel Id: undefined
Length: 38min 59sec (2339 seconds)
Published: Fri Feb 01 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.