Statistics 101: Confidence Intervals for the Variance

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
you hello thanks for watching and welcome to the next video in my series on basic statistics now as usual a few things before we get started number one if you're watching this video because you were struggling in a class right now I want you to stay positive and keep your head up if you're watching this it means you've accomplished quite a bit already you're very smart and talented but you may have just hit a temporary rough patch now I know with the right amount of hard work practice and patience you can work through it I have faith in you many other people around you have faith in you so so should you number two please feel free to follow me here on YouTube on Twitter on Google+ or on LinkedIn that way I want to upload a new video you know about it and it's always nice to connect with my viewers online I'll feel that life is much too short and the world is much too large for us to miss the chance to connect when we can number 3 if you like the video please give it a thumbs up share it with classmates or colleagues or put it on a playlist that does encourage me to keep making them for you on the flipside if you think there's something I can do better please leave a constructive comment below the video and there we'll take those ideas into account when I make new ones and finally just keep in mind that these videos are meant for individuals who are relatively new to stats so I'm just going over basic concepts and I will be doing so in a slow deliberate manner but only do I want you to know what is going on but also why and how to apply it so all that being said let's go ahead and get started so this video is the next in our series about analyzing variance in our last video we discussed the sampling distribution of the variance we use the sample variance as the point estimator of the population variance which is unknown the sampling distribution of the variance follows the chi-square distribution not the normal distribution therefore we examine the characteristics of the chi-square distribution now this video is about confidence intervals but not for the mean which is what we have been doing here we will learn how to construct confidence intervals for the variance so based on our sample data we can construct a confidence interval for the variance using the chi-square distribution a variance plays an important role in fields such as Quality Assurance and operations management ever heard of Six Sigma while you probably have well what is Sigma remember Sigma is the symbol for the population standard deviation which of course is the square root of the variance now central istic Sigma is the monitoring and reduction of process variance for example a fuel station pump should dispense 1 liter of fuel when the dial says 1 liter it's not good enough to everage one liter the amount should be almost exactly one liter all the time meaning not only the correct measurement but consistency a very low variance near zero now I do want to be honest with you this topic is not easy in fact all my stats videos will become more complex from this point forward for this video you will need to be comfortable with algebra we will have to rearrange a couple of expressions using basic algebra to isolate the term we want to solve for which is the unknown population variance now we'll go slowly and use a concrete real-world example to guide you along but you may need to pause the video for a minute to let what you're learning sink in as we go so let's go ahead and get to work so we will start off with a real-world example and that is in the area of Finance so we'll look at stock returns so when investing in stocks there is usually a trade-off risk versus reward in financial terms risk is often another word for a stocks variance some stocks are steady low-risk but offered lower potential returns for example GE or General Electric others swing wildly higher risk but offer more potential upside Apple for example so let's say we purchased a share of each stock GE and Apple on the first trading morning of 2012 we then held each stock all the way through the year so think of each stock is a ride in an airplane which flight was more likely to make you sick to your stomach due to the up-and-down turbulence of the stock well one way to find out is to go ahead and look at their stock charts so here we have a chart ge is in the blue and Apple is in the green in this chart is the percentage change from where we started at the beginning of the year now if you look at GE it has a lower return so it kind of hovers around you know 5 10 even up to 20 percent there at one point but it's a very narrow range but now look at Apple it shoots up and then comes back down and kind of goes up again and down a bit then up again then all the way down pops up and then goes all the way down big swings from month to month usually over the course of the year so which appears just by looking at this chart to have the greater variance if each one of these lines were the flight path for an airplane which one is more likely to make you sick to your stomach because it's going up and down all the time well I would tend to guess that Apple has the greater variance there in the green while GE is a little bit more steady throughout the year but that's just a conjecture looking at the chart we're going to learn how to actually find the confidence intervals for the variance of each one of these stocks now remember when we take many samples of the same size from a normal population and then find those sample variances those sample variances do not follow the normal curve when placed in their own distribution they follow the chi-square distribution with n minus 1 degrees of freedom and the chi-square distribution looks like this remember there is no one Kyler's distribution there will be a different case or distribution for each degrees of freedom so here in the top that looks like a degrees of freedom of around 4 somewhere around there then the bottom you can see we have several curves on the same graph so the chi-square distribution its shape depends on the sample size which determines the degrees of freedom so here's our curve it's a generic one so remember there is no one chi-square like I said before there is one for each degrees of freedom it's similar to the T distribution in that respect now the area of the probability under the curve does add up to 1 the curve is asymptotic so it never touches the x axis it just becomes infinitely small as it proceeds to the right in the chi-square one is at the left side and 0 includes is on the right side so this means that the cumulative probability runs right to left so remember the normal distribution it runs left to right but in the chi-square the cumulative probability runs right to left now the probabilities are found in the chi-square table in the same manner as normal curves so we could pick a point along this curve and then find the cumulative probability up to that point using the chi-square table so in some ways it's similar to the normal distribution in some ways it's like the T distribution but one of the things that makes it very distinct is that the cumulative probability runs right to left so here is a chart of the monthly returns for General Electric and Apple during 2012 so we have each month there on the left and in the next two columns are in decimal form and then the last two columns are in percentage form so the second and third and the fourth and fifth columns are really the same thing it's just the last two are percentages so you take a look at these real quick you can see that both stocks had a pretty good month in January also in February and March and did an April there was a downturn in the market so the both went down a couple percentage points and then back up in June and then from there they kind of went different ways at times so you can see that each one kind of has its natural range as you look down the list now at the bottom you can see it the mean the variance and the standard deviation so for GE it was one point six one for the mean and four Apple was two point three five for the mean so what is the mean monthly return so ge again was one point six one percent an apple was two point three five percent what about the monthly return variance so for GE that is twenty five point eight nine and for Apple that was seventy three point three zero and I just calculated these in Excel using some simple formulas but I won't go into all that right now so what about the monthly returns standard deviation so for GE it was five point zero nine percent and for Apple it was eight point five six percent now a few things here to point out if you notice the variance numbers are not in percentage that's because the variance is technically a squared measure so we're going to leave off in a unit for that now the standard deviation is simply the square root of the variance so the square root of twenty five point eight nine for GE is five point zero nine percent now for Apple the square root of seventy three point three zero is eight point five six so that is that Center deviation so based on this information we have the mean monthly return we can see that Apple is a bit higher we have the monthly return variance again you can see that Apple has a higher variance and the monthly return standard deviation again you can see that Apple has the higher standard deviation so it seems at first here that our idea of risk versus reward seems to be true in this case that will point out we they have 12 measurements so our sample size is 12 therefore our degrees of freedom will be 1 minus that which is 11 so here is our chi-square distribution with our degrees of freedom of 11 so 12 months so 12 minus 1 is 11 degrees of freedom so we denote the confidence interval in the same way we did for the normal distribution so we have the middle 95% but notice it's not symmetric so there is more probability here on the left hand side so we have 2.5% in the upper tail and 2.5 percent in the lower tail again this is all very similar to when we were doing the confidence intervals for the mean now we represent this lower point this critical value of chi-square sub point 0 to 5 and remember that's because the bottom is 2.5% on the left hand side we have a chi square of 0.975 the probability is 0.975 so this is very similar again to the normal distribution except it's flipped because 0 is on the right in this case and 1 is on the left so we're going right to left so this is just how we set up our confidence interval using the distribution and the chi-square symbols so how do we actually find the chi-square values at those critical values we're going to use the chi-square table so here's our distribution from the previous slide so what we need to do is find these values in the top of the chart so if we can see that we have 0.975 and point zero to five there on the top now what that represents is the cumulative probability starting right and going to the left like we have on our chart now we need to find our degrees of freedom so here we have degrees of freedom of eleven and now we just find where those intersect so four point nine seven five we have a chi-square value a critical value of three point eight one six four point zero two five we have a chi-square value a critical value on the right side of twenty one point nine two zero so those are the actual chi-square values in our distribution so let's go ahead and put those on our graph so here is how we represent our interval we have our chi-square is in between point nine seven five and point zero two five that's just the way we write the expression based on our graph below now we found those numbers in our chi-square table so we can say that the chi-square in this case beats is between three point eight two and twenty one point nine two and those just come from our table we had in the previous slide so we can go ahead and label these so on the Left it's three point eight two and on the right it's twenty one point nine two now of course these will change with every different degrees of freedom and of course if you choose a different confidence interval say like 90% or 99% so these numbers these chi-squared numbers these critical values are affected by the sample size which is degrees of freedom and the confidence interval you're interested in we're going to stick with 95% now we know that the sampling distribution of the variance follows the chi-square distribution that's what we just spent several slides talking about but how does that relate to the actual data we collect now what we know is that whenever a random sample of size n is selected from a normal population the sampling distribution of that follows this rushon so chi-square equals n minus 1 times s squared du all divided by Sigma squared so remember that n is the sample size s squared is the sample variance and Sigma squared is the population variance which is unknown so we know this has a chi-square distribution with n minus 1 degrees of freedom so let's talk about the actual interval estimate now this is the heavy algebra slide but we're going to go very slow so you can see exactly how we proceed through finding what we need to find so this is our interval expression we had before so chi-square is between point nine seven five and point zero two five those are our endpoints on our distribution now we know that this chi-square follows n minus 1 times s squared divided by Sigma squared that was in the previous slide so on the top we have what we learn the distribution and here below that we have the actual expression that follows the chi-square now what we can do is we can substitute so where we have chi-square up there in the top in the middle we can substitute this expression the N minus 1 times s squared divided by Sigma squared n where that was so now up here on the right we have sort of a combination of those two just through substitution now to get rid of the fraction we can multiply everything by Sigma squared so that makes our middle term just n minus 1 times s squared and then with the multiply each chi-square value on the end by Sigma squared again that's just simple basic algebra now what we'll do is we'll split this into two different parts so first let's look at the left half so we're going to take the left half of this expression and solve for Sigma squared which is what we were actually trying to find in these type of problems so what we'll do is we will divide both sides by Square at 0.975 so we do that and now we have Sigma squared is less than or equal to n minus 1 times s squared divided by the Chi square value at a probability of 0.975 so all we did here was algebra we began by substituting in for our Chi square multiplied everything by Sigma squared and then split it apart here the first time to solve for Sigma squared now it's the same thing for the other half so now we'll split it here and again we'll solve for Sigma squared so we rearrange everything and now we have n minus 1 times s squared all divided by the Chi square at a probability of 0.025 so all we did was split it up to solve for Sigma squared because that's what we're trying to find in this problem is the population variance now we can recombine everything and put those two back together and we're left with this so we have this expression that says our Sigma squared our population variance is between on the left hand side n minus 1 times s squared divided by the Chi square at a probability of 0.3 to 5 and on the right hand side n minus 1 times s squared all divided by the Chi square value at 0.975 so all we did through this whole process is isolate our Sigma squared because that's what we're interested in Sigma squared is the population variance so the cool thing is now that we've done this we have everything else we need to actually find our interval so remember the monthly return variance for General Electric was twenty five point eight nine that's the S squared for GE so what we can do is go ahead and substitute everything into our interval up at the top so our sample size was twelve so n minus one is twelve months more one times s squared so the sample variance s squared for GE was twenty five point eight nine then we divide by the chi-square value at chi-square point zero two five probability now if you look up here in the upper right you will see that that corresponds with a value of twenty one point nine two remember from our curve that was the upper end of our interval on the distribution so then we go and substitute that in so 21.92 on the right hand side of the interval we do the exact same thing so 12 minus one which is n minus 1 times s squared same thing twenty-five point eight nine in this case divided by three point eight two because remember three point eight two is the chi-square value at a probability of 0.975 and again those come from the distribution we looked at a few slides ago so now we can go ahead and just do the arithmetic so we go ahead and do that and we have our Sigma squared our interval for Sigma squared or 95% confidence interval for the population variance is twelve point nine nine two seventy four point five five and that is our first 95% confidence interval for the variance in this case General Electric now when we take the square root of that we get the standard deviation so the square root of twelve point nine nine is three point six zero and the square root of seventy four point five five is eight point six three so the interval for the standard deviation is three point six percent to eight point six three percent now let's go ahead do the same thing for Apple so everything here is similar we have 12 minus one which is the same it's 12 minus one goes to the months seventy three point three zero is the S squared or the sample variance from our data then divided by twenty one point nine two so notice that doesn't change because we're using the same eleven degrees of freedom distribution on the right hand side everything is the same except we're divided by three point eight two and seventy three point three zero on the top so we go ahead and do that arithmetic now we have a 95% confidence interval for the population variance Sigma squared that runs from 36 point seven eight all the way to 211 point zero seven I'll take the square root of that and we have the interval for the standard deviation to the square root of 36 point seven eight is six point zero six and the square root of two eleven point zero seven is 14 point five three so that is the interval for the standard deviation for Apple now before we go on just look at what we came up with sometimes it's easier to look at the standard deviation because that will that's what we're used to so the standard deviation interval for GE starts at three point six percent and goes up to eight point six three percent remember this is the variance this is the square root of the variance so it's about the variability or the spread of our data now look at Apple it begins at six percent and has a high end of fourteen and a half percent so what is this telling us about these two stocks well just based on sort of a cursory look we can see that the variance in Apple is much higher than it is for GE and that makes sense that's what we would expect to happen so let's continue so here is our chi-square distribution with eleven degrees of freedom the same one we've been using now we know the variance the population variance confidence interval the 95% confidence interval runs from twelve point nine nine all the way up to 74 0.55 so those go on the critical values of our distribution so remember due to the shape of the chi-square distribution more variance frequency is located this lower end that's why it has the hump that's why has the skewness there in this left-hand side now remember we can take the square root of that interval to find this into VA ssin interval so again that's three point six up to eight point six three so this is how we actually place our variants numbers on the actual chi-square distribution so for Apple same thing notice we're using the exact same chi-square distribution degrees of freedom of eleven now in this case our variance confidence interval runs from thirty six point seven eight up to two eleven point zero seven so that is actually the end points over ninety-five percent confidence interval so again due to the shape we expect more variance to be down here in the lower side on the left hand side take the square root and we have our standard deviation interval is six point zero six all the way up to fourteen point five three so let's actually visualize this relative variance what I did is I scaled these so that they are relative to each other so here is our General Electric variance that runs from twelve point nine nine all the way up to seventy four point five five so that is this as compared to this so those are scaled lines as they correspond the numbers over here on the left now what about for Apple so here we have thirty six point seven eight as compared to two eleven point zero seven so again these are just scaled versions of those numbers over there on the left so what's actually going on here well remember the twelve point nine nine is the lower end of the chi-square distribution and seventy four point five five is the upper end so it looks just like this now what this is telling us is that we expect the majority of our variance or a larger sum because of the skewness to be towards the lower end same thing for Apple so these just represent the end points on the actual chi-square distribution note about the standard deviation so when I look at standard deviations or compare them to each other I kind of consider them waste lines like for your pants let's go ahead and visualize those so for General Electric it was 3.6 up to eight point six three so we can go ahead and scale those so it looks like that what about for Apple so it was six percent approximately up to 14 point five three so this are those standard deviations so you can see relative to each other then again these these lines are scaled in size to be relative to each other you can actually see that the Apple variance and standard deviation is much greater so the bottom end of the Apple standard deviation is whiter and also the upper end is whiter so you can think of this as the waistline and a pair of pants so a few things remember in our sample the GE sample variance was twenty five point eight nine so that should be inside of the interval we've found if not we did something very wrong but you can see that 25 point eight nine fits snugly in our interval of twelve point nine nine all the way up to 74 point five five now remember based on the distribution we would expect probably that variance of twenty five point eight nine to be a bit closer to the twelve point nine nine end and it is what about for Apple remember sample variance was seventy three point three zero so that should fit snugly inside our interval of thirty six point seven eight up to two eleven point zero seven and it does and you will notice that it also is a bit closer to the lower end which again we would expect so what about this to enter deviation so for GE it was five point zero nine and if you notice that fits snugly in our interval for the same deviation of three point six up to eight point six three what about Apple that's near deviation in our sample was eight point five six and what happens it fits snugly right inside the interval for the standard deviation now it'll make sense if the variance is correct then the standard deviation is going to be correct because the standard deviations is the square root of it but my point here is to show you that our actual sample data we took from Yahoo Finance fits right inside the confidence intervals we calculated for this problem so not only is it a way to check our work well we can actually see sort of in the real world how this is working so just a quick point about the interval estimate for s which is the standard deviation what does this mean in real life so remember GE had an average return of one point six one an apple had an average return of two point three five but Apple had a much higher variance and standard deviation this what's called volatility is one way of interpreting risk it's the classic trade-off of risk versus reward so GE had a lower monthly return but it's variance instant or deviation were also lower so lower reward with lower risk Apple headed a higher monthly return average month in return but it's variance and standard deviation were much higher so higher return higher risk so this problem actually ended up confirming a basic understanding of the stock market and that's the idea of risk versus reward remember return is the actual mean the variance is their deviation is the measure of the volatility or the consistency now do you want to take one extension and show you sort of what happens when we change a certain part of this type of problem so remember our first sample size was 12 and our s squared for GE our sample variance was 25 point eight nine our interval was twelve point nine nine to seventy four point five five and our deviations their deviation was three point six up to eight point six three so we know that already now the point of this slide is what happens to our interval if we keep everything the same but the sample size so in this case we selected twelve months but what if we select fifteen months or 25 months or 50 months or a hundred months what happens to the interval if everything else stays the same but that sample size let's go and take a quick look so for a sample size of fifteen our interval for the variance is from thirteen point nine up to 64 point three eight now compare that to our variance up at the top it was twelve point nine nine and seventy four point five five what happened it became narrower the interval contracted look at the center deviation three point seven two two eight point zero two compare that to what we have up top three point six up to eight point six three what happened the interval contracted of course because those are related so by increasing the sample size from twelve up to fifteen keeping everything else constant the interval got smaller so what about twenty-five sample size of 25 now look at the variance interval fifteen point seven nine up to fifty point one one compare that to the other two what happened it contracted the interval became narrower what about the same deviation same thing interval contracted it became narrower so what general principle are we trying to learn from this well notice as the sample size increases everything else remaining constant the interval contracts or it narrows the confidence interval narrows so what about a sample size of 51 I'll use 51 because my chi-square table will have degrees of freedom of 50 makes it easier to do look at our variance interval 18.1 up to 40 compare that to the other ones you notice that it contracted even further same thing for the standard deviation what about a sample size of 101 now the variance is nineteen point nine eight up to thirty four point eight eight it narrowed even further now look at the standard deviation four point four seven percent up to five point nine one percent that interval is getting very very narrow as we increase the sample size everything else remaining constant the confidence interval estimate narrows it contracts and if you notice it seems to be contracting around a central number but we won't get into all that because the interval will always be based on some sample so then it will never be perfect so let's go ahead and look at this graphically now what I did was graph the changes in the lower boundary and the upper boundary of our confidence interval as the sample size increased so if the lower boundary you can see it started about three and a half and then moved up to around four and a half for the upper boundary it started out around nine or so almost nine and then decrease the came in word to below six so what is this telling us so all else remaining constant increasing the sample size will narrow the interval estimate for the variance and the standard deviation so this graph shows the upper and lower boundary of the standard deviation intervals for different sample sizes assuming the sample variance is constant now it's not going to be constant in practice because each sample heads own unique variance but I just want to show you by isolating the sample size what happens to the interval as the sample size increases now the last point I will make it what this slide is an important one notice that as the sample size increases the effect on the interval becomes less and less its diminishing returns so it actually goes back to way back when when we're talking about sample size and figuring out different information about the mean there's a point where increasing the sample size does you know more good so in this case that's similar so you can see as we get up to a sample size of around 100 the interval stops narrowing significantly so again it's just a little bit a lesson and the diminishing returns of sample size so if you takeaways in a review and then we're done now the first thing I do want to point out because it is a possibility is that if the sample variance is zero then the interval estimate for the population variance is also zero and why is that well if you remember our expression for the interval s squared was in the numerator if s squared is zero that makes each side zero and everything 0 that's another reason we use the chi-square distribution because it actually allows for this possibility because remember the probability runs from right to left so zero zero is an actual point so a few caveats and warnings now interval estimates for the variance are very sensitive to the overall population being normally distributed so you want to check your data first confidence intervals for the mean can withstand some violation of that but the variance cannot now strange things happen very small sample sizes and degrees of freedom in the chi-square distribution try to keep your sample sizes at least ten ideally even larger but the interval estimate for the variance again follows the chi-square distribution the interval estimate for the Center deviation does not finding Sigma or the standard deviation is sort of an extra step so remember it's the variance that follows the chi-square distribution not the center deviation we just go ahead and find the center deviation interval because it's easier to understand but as the sample size increases the interval estimate narrows all else remaining constant the distribution of the variances becomes more and more like the normal distribution in its shape it does not become the normal distribution but it becomes more normal distribution like so a chi-square with a large sample size now as compared to the 95% confidence interval a 99% confidence interval will widen the interval and a 90% confidence interval will narrow it so think of them as the size of the net needed to catch variances and that's the metaphor I used in the means example so 95% confidence interval will be the middle 95% a 99% confidence interval will be a wider area therefore the interval will widen and 90% is a smaller area in the middle therefore it will narrow that interval so again think of it as the size of the net needed to catch those variances and finally remember we are estimating the spread of the distribution I like to think of it as estimating the waistline of the population distribution okay so that wraps up our video on how to estimate the population variance using confidence intervals similar to how we did for the mean but remember in this case we're using the chi-square distribution and not the normal distribution so just a few reminders and then we'll wrap this up if you're watching this video because you're struggling in a class stay positive and keep your head up you're a smart amazing talented person never let anyone tell you indifferently including yourself feel free to follow me here on YouTube on Twitter on Google+ or on LinkedIn it's always nice to hear from you and finally just keep in mind that the fact that you're on here trying to learn trying to improve yourself as a student or as a business person or what have you that is what really matters I firmly believe if you have the right learning process in place the results will take care of themselves so thank you you're much for watching I wish you the best of luck and you're working in your studies and look forward to seeing you again next time you
Info
Channel: Brandon Foltz
Views: 42,551
Rating: 4.980247 out of 5
Keywords: brandon foltz confidence interval, statistics 101 variance, variance confidence interval, statistics 101 confidence interval, confidence interval for variance, confidence interval variance, confidence interval for population variance, six sigma, quality assurance, brandon foltz, brandon c foltz, statistics 101, confidence interval, chi-square distribution, anova, linear regression, logistic regression, variance, chi square, standard deviation, statistics, confidence intervals
Id: rXK1UGc58g0
Channel Id: undefined
Length: 40min 40sec (2440 seconds)
Published: Wed Apr 10 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.