Statistics 101: Understanding Covariance

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to the next video in my series on basic statistics now a few things before we get started number one if you are watching this video because you are struggling in a class right now I want you to stay positive and keep your head up if you're watching this it means you've accomplished quite a bit in your educational career up at this point you're very smart and you may have just hit a temporary rough patch now I know that the right amount of hard work practice and patience you can get through it I have faith in you many other people around you have faith in you so so should you number two please feel free to follow me here on YouTube and or on Twitter that way when it upload a new video you know about it and on the topic of the video if you like it please give it a thumbs up share it with classmates or colleagues or put it on a playlist because it does encourage me to keep making them on the flip side if you think there is something I can do better please leave a constructive comment below the video and I will try to take those ideas into account when I make new ones and finally just keep in mind that these videos are meant for individuals who are relatively new the stats so I'll just be going over the basic concepts and I will be doing so in a slow deliberate manner not only do I want you to understand what's going on but also why so all that being said let's go ahead and get started so this video is the first in a series on what are called bivariate relationships let's break that down for a second what do we mean by bivariate well by means two like bicycle has two wheels and then variate so variables so these involve two variables and more specifically certain aspects of the relationships between two variables the first concept we're going to explore is called covariance and unfortunately covariance is often left in the shadows of its cousin correlation but covariance is extremely important especially in finance and in higher-level statistics the covariance between variables is extremely important so my goal is to really break down what covariance is and what it isn't so when you study correlation and linear regression and things of that nature you know where it's coming from let's go ahead and get started so the first thing we're going to do is look at a good example before we actually define what covariance is we're going to look at a real world example so I'm going to show you next is a scatter plot of the returns for the S&P 500 which is a stock index here in the United States versus the Dow Jones Industrial Average which is another stock index here in the United States and it's a graph of the monthly returns for 2012 so that will have 12 data points and here it is so what we have on the left is a scatter plot of the monthly returns for the S&P 500 vs. the Dow Jones Industrial Average so down at the bottom you can see those are decimal percentages so 0% is right there in the middle on the x-axis and 0% is horizontal on the y-axis and of course 2% would be over two decimal places and 4% etc so what this is telling us is that if this S&P 500 goes up a certain percentage what happens to the Dow Jones Industrial Average for that same month so again this is just a bivariate scatter plot for the monthly returns for each average on the right hand side you can see the actual data points so for the first month the first month of return so we can see it the S&P 500 head about a 3.9 or what is a 4% return the Dow Jones had a about 2.5 percent return for that month and so on and so forth so you can see that the points in the graph correspond to the actual data points or over there on the right it's a very simple concept now the question is how would you describe the shape or the pattern of the data points in this graph over here on the left well they seem to follow a linear pattern so they're not exactly linear but they are fairly strong linear shape starting at the lower left and sloping up to the upper right when one stock index Rises so here I have a blue arrow showing you the S&P 500 would rise what happens to the other well the Dow Jones tins arise as well so when one variable increases the other does the same it tends to do the same so it follows a linear pattern and in this specific case that had better be the case well why is that well the S&P 500 in the Dow Jones Industrial Average claimed to measure basically the exact same thing which is the performance of the overall stock market now they do that in different ways in terms of how the stocks are weighted in the index and things of that nature I'm not going to go into that but basically just keep in mind that these are supposed to be measuring the same thing which is the overall performance of the stock market so if the sp500 goes up significantly over a month but the Dow Jones Industrial Average goes down over that same month something is wrong so these had better have a somewhat or a very strong linear relationship because they are claiming to measure the exact same thing so what we say here is that these two variables show a positive linear relationship when one variable moves a certain direction say up the other tends to move in the same direction also up now you can see down here on the lower left we have a point where both indexes went down you know over 6% I think that was last April so when one went down about 6% or over 6% so did the other so one goes up the other goes up one goes down the other goes down they move in concert with each other so this is called not surprisingly the co-variance and again there are other statistical measures that look very similar but the covariance has specific properties which is obviously the point of this video and the idea here is that they Co very literally they Co vary it's how they change together if one goes up what happens to the other one goes down what happens to the other if one stays constant or has for in this case a return of near zero what happens to the other so it's how they change or don't change together as far as linear relationships go covariance is one of a family of statistical measures used to analyze the linear relationships between two variables the question is how do these variables behave as a pair now there are other measures that do similar things so right now we're talking about covariance but I'm also convinced you've heard of correlation and you've probably heard of linear regression and the reality is these are all very closely related so covariance and correlation or closer related linear regression is related to correlation which is related to covariance and things of that nature so all these measures are analyzing or looking at the linear relationship between two variables and of course we will be doing videos on those other measures so what about covariance basically it's a descriptive measure of the linear association between two variables it's very simple to interpret a positive value for the covariance indicates a direct or increasing linear relationship so it means when one thing goes up the other goes up when one goes down the other goes down now a negative value indicates a decreasing relationship so if one goes up the other goes down if one goes down the other goes up and again we're going to look at that in more detail here in a second the concept here is the direction or the sign of the covariance whether it's positive or negative and for the most part that's all we're really interested in when interpreting the covariance we are not making any comment using the covariance on the strength of that relationship merely its direction now of course if you've had stats before you've already talked about correlation you will realize that it's the correlation that talks about the strength of the relationship and again the covariance and correlation are very very closely related so that should not be surprising but for covariance we're only interested in the sign of the direction of whatever our value is now let's talk about exactly what it means or how we come to you understand the covariance so we're going to go back to algebra so we have our coordinate plane here so we have the x axis the horizontal axis and the y axis is the vertical axis now if you remember maybe from way back when this plane is divided into four quadrants two quadrants one two three and four and if you remember each quadrant sort of has its own property in terms of the signs of the x and y values so here in the first quadrant both values are positive so X is positive and Y is positive so any value in that quadrant has that characteristic a down here in Quadrant three and the lower-left both values are negative so X is negative and Y is negative in Quadrant two X is negative because it's over on the left but Y is positive because it's above the x-axis then in Quadrant four X is positive because it's over on the right but Y is negative because it's down below the x-axis so again for any point you end those quadrants aside from being on the axes themselves which is always a special case they have signs so positive positive negative negative if it's in quadrants 1 or 3 negative positive or positive negative if it's in quadrants two and four and that is important for understanding how we interpret and calculate and graph covariance just take this example now I've highlighted quadrant one and quadrant three and that's where the signs are the same for the X and y value now if I kind of put a rough line in there you'll see that it looks like this so what does that mean in the case of covariance it means that the variables tend to move in the same direction so if X goes up Y goes up and our point might be there in quadrant one if X goes down then Y goes down so our point may be in Quadrant three down there in the lower left so when one variable makes a movement the other variable tends to make the exact same movement and what that means is that our covariance is positive because the slope of this line of this orange line is positive it starts at the lower left and goes up to the right and remember from algebra that that's the characteristic of a line that has a positive slope now let's talk about sort of the opposite of that in this case X goes down or X decreases but Y increases so our variables are exhibiting opposite behavior so X can increase and then Y could decrease that's our point over here on the lower right so our variables are doing opposite things so the variables move in opposite directions in this case the covariance is negative because the slope of this line is negative it starts at the upper left and goes down to the lower right another case is we have a graph that looks something like this now it's often called it looks like sort of a shotgun blast I don't like to use violent metaphors like that but I guess it works but the points seem to don't have you know don't have any pattern so the variables seem to have no linear relationship they're not group up along a line that's positive or a line that's negative they're kind of just like a blob of points so in this case the covariance will probably be near or equal to zero because they do not exhibit any sort of pattern so our monthly stock returns the S&P 500 and Dow Jones when the S&P went up during any given month the Dow Jones Industrial Average tended to do the same thing so that's why this has a positive slope and therefore have a positive covariance so now we learned what the covariance is conceptually I'll go ahead and show you the formulas now whenever I show formulas I always say do not freak out because it's really not that hard oftentimes just understanding the nomenclature is the hardest part but we're actually going to calculate it so you can see exactly how it works so don't freak out when you see the formulas so this is the sample covariance and I'll talk about exactly what this formula does here in a second and then of course we have a formula for the population covariance so what's the difference well it depends of course whether you're looking at a sample from a population or at the entire population that will often be evident on whether or not you're using sample means or population means it just depends on the problem you're working with so the only real difference is the denominator in the equation up there so for sample covariance it's in minus 1 & 4 population covariance it's just in so again I'll just depend on the problem you are facing now we are going to work an example I want to tell you real quickly about what we're doing here if you look at say that sample covariance formula let's just reconstruct what we're looking at now on the top of that fraction what we're doing is we're taking each data point for X or our first variable then we're subtracting the mean or X bar the mean from that so in the first parenthesis it's just each data point for X minus the mean for X in the second parenthesis its each data point for y minus the mean for Y of course since those are in adjacent parentheses after we do the subtraction we're going to multiply those two figures together and then the summation sign tells us that after we multiply them all together we're going to add them up and then after we add them all up we're going to subtract by n minus 1 so however many samples we have minus one and that's it it's not very complicated and of course we are going to work through an example problem here in a second okay so here's our example problem and again I use this from a book I have and that is sourced in underneath the video on YouTube I want to give proper credit where credit is due but it's a very simple example on how to calculate and understand covariance so here is our question rising Hills manufacturing wishes to study the relationship between the number of workers which we'll call X and the number of tables produced which we'll call Y in its plant now to do so it obtained 10 samples and each sample is one hour and length from the production floor so they looked at 10 hours so 10 samples and then they noted the number of workers on the floor at that time and the number of tables they produced during that time so X is number of workers why is the number of tables produced now when they collected the data here's what they have now for the first measurement we have 12 workers and 20 tables were produced that's 12 and 20 in the second sample they had 30 workers and 60 tables were produced 15 workers and twenty seven tables produced etc so those are our samples now at the bottom we have the mean for each variable so it appears over the course of these 10 samples there were on average about 21 workers on the floor and on average they were producing about 41 tables during that hour it's a very simple stuff it's just a bivariate measurement of workers and tables and then the average for each variable down there at the bottom and when we graph it it looks like this now how would you describe this relationship okay in your mind how would you describe that relationship what kind of covariance are you expecting we're expecting a covariance that's 0 are you expecting a covariance that's positive or are you expecting a covariance that's negative let's go ahead and start talking about how we calculate this it's actually fairly straightforward the first calculation we do remember is we take each point for X and then we subtract the mean so you can see I've labeled each x value as green and then the average for X or the mean for X in red so 12 minus 21 point three is negative nine point three so 30 minus 21 point three is eight point seven fifteen minus twenty one point three is negative six point three and on down the list so it just eats each x value minus the mean 4x the next step is through the same process for y so each value and Y minus the mean for y down there in the blue so 20 minus 41 point 2 is negative 21 point 2 60 minus 40 one point two is eighteen point eight so again just each Y Y value minus the mean for y and the last thing we do is we multiply those two things together so negative nine point three times negative 21 point two equals one ninety seven point one six go on down to the next one eight point seven times eighteen point eight is one sixty three point five six so we're just multiplying those what are called deviations or multiplying those together to get our right hand column now once we multiply those together we sum them up we add them together so the bottom you can see that the sum is nine sixty two point four now go ahead and condense the chart into the x and y values and then the sum of our multiplication over here and actually we're almost done figuring out our covariance so the covariance of x and y is sometimes written as S sub X of Y you might see it both ways means the same thing is 960 two point four that's our some down then the lower-right divided by n minus one remember the n minus one is in the formula we looked at before so when we go ahead and do that division our covariance for x and y is 106.9 three so what does that mean how will we describe this relationship well it's linear look at that line so the covariance of x and y is 106.9 three it is a positive linear relationship and remember for the variants that number the 106.9 three really the number itself doesn't really mean anything it's the sign of the covariance positive negative or around zero that were interested in so again it's a positive covariance so we expect the linear relationship to be increasing which it is as you can tell from the graph okay quick review just remember that the covariance is simply a descriptive measure of the linear association between two variables a positive value indicates a direct increasing linear relationship and again not all covariances are going to be as obvious as the stock market data we looked at and the table production data because those were those data points were pretty much along a straight line most if not all you know most of covariance that you're going to look at is not going to look that obvious but a positive value is an increasing linear relationship a negative value indicates a decreasing linear relationship so it might look like that and focus on the direction which is the same as the sign so you can think of this kind of as the slope of that line if the covariance is positive then the slope of that line will be up and to the right if this sign is negative the covariance will be sort of down into down into the right so look at the sign or it could look like this it could have no discernible pattern whatsoever in that case it might be the covariance might be around zero now keep in mind the covariance indicates nothing about the strength of the relationship only its direction the strength of the relationship is dealt with when we talk about correlation but again the covariance is part of the correlation so they actually the formula for the correlation involves knowing what the covariances that's how closely they're related but again just direction and sign and then we'll worry about the strength of the relationship later okay so that wraps up our first video bivariate relationships so again we mean by variate we're talking about the relationship between two variables more specifically when one variable moves a certain direction what happens to the other variable if one increases what happens to the other if one decreases what happens to the other if one stays flat doesn't move a whole lot what happens to the other so we're interested in how these variables move together now what we learned here will help us understand this other linear relationship slight correlation and linear regression because they're all related now just keep in mind if you are watching this video because you were struggling in a class right now I want you to stay positive and keep your head up your smart and talented and everyone around you has faith in you so so should you if you liked the video please give it a thumbs up share it with colleagues or classmates or put on a playlist that does encourage me to keep making them if you think there's something I can do better please leave a constructive comment below the video and I'll try to incorporate those ideas into future ones just keep in mind the fact that you're on your learning trying to improve yourself trying to be a better student trying to be a better business person or whatever it might be - what's really important so again thank you very much for watching I wish you all the best of luck in your studies and in your work and I look forward to seeing you again next time you
Info
Channel: Brandon Foltz
Views: 330,245
Rating: 4.9673195 out of 5
Keywords: statistics 101 understanding covariance, statistics 101 covariance, brandon foltz covariance, statistics covariance, covariance statistics, analysis of covariance, what is covariance in statistics, covariance, covarience, covariance explained, covariance and correlation, what is covariance, brandon foltz, statistics 101, regression analysis, linear regression, brandon c foltz, covariance and correlation coefficient, statistics for data science, Machine learning, stats 101
Id: xGbpuFNR1ME
Channel Id: undefined
Length: 26min 22sec (1582 seconds)
Published: Wed Jan 16 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.