Statistics 101: The Covariance Matrix

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to the next video in my series on basic statistics now a few things before we get started number one if you're watching this video because you were struggling in a class right now I want you to stay positive and keep your head up if you're watching this it means you've accomplished quite a bit in your educational career up to this point you're very smart and you may have just hit a temporary rough patch now I know with the right amount of hard work practice and patience you can get through it I have faith in you many other people around you have faith in you so so should you number two if you liked the video please give it a thumbs up share it with classmates or colleagues or put it on a playlist because that does encourage me to keep making them on the flipside if you think there is something I can do better please leave a constructive comment below the video and I will try to incorporate those ideas into future ones and finally just keep in mind that these videos are meant for individuals who are relatively new to stats so I am just going over basic concepts and I will be doing so in a very slow deliberate manner but only do I want you to understand what's going on but also why so all that being said let's go ahead and get started so this video is the next in our series on bivariate relationships and remember that bivariate literally means two variables by variate in the previous video we talked in great depth about what covariance is and what it isn't for that matter and we hand-worked a covariance problem so we actually did the calculations by hand so you can see where all the terms fit in and sort of its structure now of course statistical packages and even Microsoft Excel can calculate a covariance matrix for you but there is a caveat to that that I will discuss at the end so basically this video is about the anatomy or the structure of a covariance matrix and the reason I'm doing this video because the covariance matrix plays an extremely important role in disciplines such as finance and statistical process control and things of that nature so the covariance matrix is extremely important it's also extremely helpful because it makes very tedious calculations much much easier so let's go ahead and learn about the covariance matrix that's a quick reminder that covariance itself is just one of a family of statistical measures that we use to analyze the linear relationship between two variables so we're interested in how two variables behave as a pair now we have covariance I'm sure you've also heard of correlation and you may have even heard of linear regression and the reality is that all these measures are very closely related so the covariance and the correlation are very closely related linear regression is closely related to both of those but right now we're focusing on covariance because I personally think it's often forgotten in stats classes and it actually comes up in other areas like I said so we're going to go ahead and give it some time and then correlation and linear regression or things that people often have heard of but in all cases it's about the relationship between two variables so basically it's simply a descriptive measure of this linear relationship or it may tell us that there is no linear relationship we're focusing on the sign of the value not really the number we get get back unless it's at or close to zero so a positive value indicates an increasing linear relationship whereas a negative value for the covariance indicates a decreasing linear relationship so it's all about the sign of the covariance that is calculated or given to you you're looking for a negative or a positive now co-variants at or around zero indicates that there's probably not a linear relationship between the two so look for the sign and whether or not it is at or near zero just reminder that the covariance does not tell us anything about the strength of the relationship only whether or not it's positive negative or at around zero the strength of the linear relationship is dealt with with correlation which of course we'll get to probably in the next video so I'm going to walk you through just a basic example of what to look for and how to interpret and understand a covariance matrix now in this example I'm going to use data from four variables and I'm just going to generally call them X 1 X 2 X 3 and X 4 and there are 20 measures or 20 values in each variable now for now the statistics of interest that you know we want to look at are the mean the variance and the standard deviation for all four variables now the software package I use to do this is SPSS now some students are going to use SAS SAS some students will use Minitab some classes will have you use a language or a program called R but they can all do the same thing for you now I use SPSS because that was sort of what I was raised on in my graduate work but basically they will all give you the same information so on the left you can see all four variables X 1 X 2 X 3 and X 4 so our n is 20 there's the numbers of observations we have for each one now in the middle we have the mean for each variable so variable X 1 has a mean of 9.95 5 X 2 has a mean of 20 and so on and so forth then we have the standard error of the mean which is important other cases not really for this one but then we have the standard deviation of each variable so for x1 the standard deviation is one point zero zero three nine three and then we have the variance for each variable so again for x1 that's one point zero zero eight and those are important when we're talking about the covariance matrix okay so let's talk about a extremely valuable tool so a lot of students I work with and even my classmates a really gung-ho about getting the data into SPSS or whatever else and just running the numbers but one of the most important things the most helpful things you can do at first when you're looking at bivariate relationships is printing out or doing a matrix of scatter plots so what is that well they look like this what they are it's a matrix of a scatter plot for each variable pair so one thing I want to point out is that the bottom half for the top half depending on how you look at it is sort of a duplicate of the other so you only really need to look at one half above or below the diagonal so it's a figure that plots each variable against every other variable so if you look they're at the top along the diagonal what we have are some marine statistics for each variable itself so the little histograms in there are just representations of the distribution of each individual variable so X 1 X 1 X 2 X 2 and so forth but what we're really interested in are the actual scatter plots so if you look at the intersection of X 1 the row and the column X 2 that is a scatter plot a very generic scatter plot of the relationship between those two variables and of course if we go to the row X 1 and then over to X 3 that is a scatter plot of X 1 and X 3 so you can kind of see how this works it's extremely helpful just to eyeball sort of matrix to see what relationships you might have what kind of relationships you might be looking out for to look for linear relationships in this matrix of scatter plots now right now based on this which two variables seem to have an obvious positive linear relationship well if you said X 1 and X 2 that would be correct now again most of the real-world data we work with looks like the other scatter plots where it's hard to see any discernable pattern now they're there but it's hard to see now X 1 and X 2 it's very obvious that there is a positive linear relationship between those two but again when you're dealing with several variables this is an easy way to take a quick look at your data and see what relationships may or may not exist just talk about the actual covariance matrix which of course is the idea of this video so here is our descriptive statistics' chart again now in SPSS we can generate a covariance matrix and here it is so this is the whole point of the video here is our covariance matrix now I want to point out something very important here that a lot of people don't really get when talking about the covariance matrix so when I draw your attention to the variance column in the descriptive statistics box so we have a variance of 1.008 for X 1 a variance of 0.9 1/8 4 X 2 etc now look at the diagonal in the covariance matrix what do you notice they're the exact same thing so the diagonal of a covariance matrix provides the variance of each individual variable it's basically a variables covariance with itself so the diagonal is the same thing as the variance the off diagonal entries in the matrix provide the covariance between each individual pair so if you look at the intersection of x1 the row and the column x2 you'll see point eight nine five well that is the covariance between X 1 and X 2 so the variance is along the diagonal and the covariances are off diagonally just to make that clear and blow it up nice and big you can see in the covariance matrix the variances go along the diagonal and again that's basically each variables covariance with itself which is the same thing as the variance now also remember that the standard deviation is simply the square root of the variance so based on this covariance matrix you can actually calculate the standard deviation as well so sometimes it's a save time you can actually do a covariance matrix and other things and stats that you can find other information from but I just wanted to point out that of course the standard deviation is a square root of the variance so you could get it from this chart now as far as the covariance is those are in the off diagonals so with the variances along the diagonal and then the covariance is in the other cells so you can see the intersection of X 1 and X 2 well that's the covariance of X 1 and X 2 the covariance of X 1 and X 3 next to that and so on and so forth and again I've blacked out the other cells because they are just duplicates of sort of the diagonal so the covariance of X 1 and X 2 is the same as the covariance of X 2 and X 1 it's just duplicate now this is my Microsoft Excel warning and this is a famous problem Microsoft Excel has when calculating covariances so on the Left I have the output for SPSS on the right is the output for covariance in Microsoft Excel now what do you notice the numbers are totally different they're close to each other but none of them are the same well why is that the reason that is is because Microsoft Excel computes covariance using the population covariance formula which has a denominator of n instead of the sample covariance with a denominator of n minus 1 now again in the previous video I'll walk you through those formulas so if you want to take a look at them look at the video before this one but SPSS and other stats packages are going to use a denominator of n minus 1 whereas Excel uses a denominator of n which is the population covariance so what can you do if all you have is Microsoft Excel now to have Excel have the Excel output match the what I would consider the proper SPSS output or Minitab or whatever else it might be you have to multiply each cell by n divided by n minus 1 of course that will give you a number slightly over 1 that's kind of like a correction factor I guess you can think of it so in our case this would be 20 which is n divided by 19 which is n minus 1 so that's one point zero five three so to make our Microsoft Excel covariance matrix look like the SPSS one we will multiply each cell in the Microsoft Excel version by one point zero five three so for example if we took the covariance between X 2 and X 1 in the Microsoft Excel we have 0.85 zero now if we multiply that by one point zero five three we have the proper covariance which is point eight nine five so again you just if you're using Excel and you're freaking out because you're getting the wrong answers that's why okay so that is our brief look at the covariance matrix and again it's just an extension of by very relationships and I bring it up because the covariance matrix us play very important role in other areas like finance and statistical process controller - I can think of right off the top of my head but I know it comes up in other things unfortunately in many stats classes covariance and the covariance matrix are just skipped right over so when you talk about by very relationships the covariance and its matrix are skipped right over and they go on to correlation so I think it's very important to understand what the covariance matrix is where it comes from and how you can learn from it as far as you know learning about your data okay so just remember if you're struggling in a class right now once you stay positive and keep your head up you're smart and talented and this just may be a temporary rough patch so keep having faith in yourself because I know everyone else around you has faith in you number two if you liked the video please give it a thumbs up share it with colleagues or classmates put on a playlist that does encourage me to keep making them since I know they are beneficial and finally if you think there's something I can do better leave a constructive comment below the video I'll try to take those ideas into account when I make new ones so just remember and probably most important to me if you're on here learning try to improve yourself as a student or as an asset at your place of work that is what really matters so thank you very much for watching I wish you all the best in your studies or at your place of work I look forward to seeing you again next time you
Info
Channel: Brandon Foltz
Views: 246,596
Rating: 4.9104047 out of 5
Keywords: covariance matrix, covariance matrices, covariance matricies, covarience matrix, variance covariance matrix, variance-covariance matrix, variance covariance matrices, brandon foltz, covariance, matrix, covariance and correlation, logistic regression, statistics 101, linear regression, anova, correlation, statistics, regression analysis, regression, multiple regression, anova statistics, bivariate, microsoft excel (software), machine learning, Correlation matrix, Correlation matrices
Id: locZabK4Als
Channel Id: undefined
Length: 17min 32sec (1052 seconds)
Published: Fri Jan 18 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.