Mod-01 Lec-08 Multivariate descriptive statistics

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Good afternoon, today we will formally enter into the multivariate domain. Now, topic is multivariate descriptive statistics. And the content of today’s presentation includes multivariate observations, mean vector and covariance matrix, correlation matrix, sum square and cross product matrices. So, it will be purely data, we will be dealing with data with symbols like x. So, in univariate descriptive statistics you have seen central tendency measures as well as dispersion measures, univariate case you have seen earlier in this lecture classes. Now, central tendency we have measured in univariate case using mean, then your mode and median, these are the measures we have adopted. And under dispersion, we have used range then I think IQR Inter Quartel Range as well as standard deviation. Now, we will see some of the counter parts in the multivariate domain. For example, mean will be mean vector, when you go for this multivariate statistics site, you will find out that mean will be mean vector. And this standard deviation another component, which is the measure of dispersion that will be not standard deviation it is square that is the variance, variance will be a covariance matrix. In addition as there will be more than 1 variable so there will be correlation between variables, so we will be knowing correlation matrix also. Today’s discussion we will be concentrated on mean vector, covariance matrix and correlation matrix. Now, think a little bit on abstraction level now that there is a multivariate population and that population is characterised by a variable vector which is known as x. There is p number of variables, which is characterizing the population of interest, so it is a p cross 1 vector. What does it mean? We are saying we have created a vector X, which is p cross 1 vector where, p stands for number of variables so there are variable 1, which is x 1, variable 2 is X 2 like variable p is X p. Now, you think that you require to collect data on this p variables, so I am writing here my first variable is X 1, second variable is X 2 like my last variable is X p. And you are basically collecting data on this p variables, so you will be collecting observation 1, observation 2 like this n observations. So, essentially you are not in univariate domain, you are in multivariate domain and you are not only in 1 x with several that n values that means 1to n values. Here your i equal to 1 to n and j equal to 1 to p, So, what does it mean, I equal to 1 to n, these are the number of observations and j equal to 1 to p, these are the number of variables. Now, you think of a situation that you have your population; you know that variables p variables are there, you have identified the variables. Now, you are planning to collect data, you have not collected data, you are you are planning to collect data then, our nomenclature the way we will be writing here, we will be writing like this. The general observation will be x i j, what does it mean? x i j means that is the i th observation on the j th variable that is why i stands for i to n, which is number of observations, j is 1 to p number of variables x i j, which is the i th observation on the j th variables to be collected. If this is the case that means we are first writing the variable then, we are writing what is the observation number, and then we are writing what is the variable number. So, then this is observation 1 variable 1, so x 1 1, observation 1 variable 1 x 2 1 so like this you are writing observation n variable 1. My observation on the second variables will be x observation 1 variable 2, x observation 2 variable 2, so like this x observation n variable 2. So, in this manner you will go then x observation number 1 on variable p, observation number 2 on variable p like this you will be getting observation number n on variable p, this is the data matrix. And we will be denoting this as capital X bold X and the order will be n cross p. This one I am saying this is a data matrix, which you are planning to collect. If this is the case and you all know that X is a random vector here X 1 to X p, x is random vector because all the variables are random variable here. So, x i j is the i th observation on j th variable, this is also a random variable. Please keep in mind; this is also a random variable we are writing it as random observation. So, as you have not collected data, you have just planning to collect data any value of x i j can be found depending on the spread of that variable. You will be getting any value of this, but you do not know what that value is. So, that is why we are saying it will be random observation and the x i j will be random variable. So, you cannot expect value of x i j. You have some expectation so that is why you will see in univariate case that if x is random variable then, expected value of x is mu. This is the case then for every variable here, it has also some expected value, so that is what we will basically talk about the mean vector. So, before coming to mean vector another two important issues is there that to be discussed here. Suppose the observation is i th observation and there is the variable is x j, I told you the general what I have given you here that this is the general observation x i j. So, similarly, there will be a general observation, multivariate observation, general observation similarly, there will be a general variable. Now, if suppose this one is i, this is the i th multivariate observation, So, your value will be x observation number 1 variable 1, observation 1 variable 2 like this if I go observation i on variable j then, observation i on variable p. So, if I write that x i is the i th multivariate observation can I not write like this that x i 1, x i 2, x i j, x i p, this is a p cross 1 vector. Now, if I go by the variable wise so that means when I talk about the i th multivariate observation, please keep in mind that in a particular observation on all the variables. Why it is not that what is the in the univariate case, what will happen, you will get the one value only for that observation. As it is multivariate, you are getting all values on p variables, but the other important thing is that that means all the variables are occurring simultaneously, simultaneous occurrence that is why multivariate in nature. Now, if I go by our general variable which is x j then, this observation will be x observation 1 on variable j, observation 2 on variable j and observation i on variable j similarly, observation n on variable j. So, you will be getting a general variable that means observations on a particular variable that we will write. So, if we write like this x j that is the n observation on j variable then this is x 1 j, x 2 j like this x i j then, x n j it is a n cross 1 matrix, So, that means essentially what is happening? You have one hand this side that the observations number of observations axis and the other hand you have number of variable axis. So, when you go row wise, you are basically talking about different multivariate observations and when you go column wise that means you are talking about n number of observations on different variables, that n observation on variable 1 to variable p. So, what we have assumed here? We have assumed that we have not collected the data, we are planning to collect data and as a result all the entries in this particular matrix are random in nature. What will happen if you collect the data, when you will collect data on same number of variables x 1 x 1 like x p and your data matrix also will be n cross p. Here also you will be getting I equal to 1 to n like x 1 1, x 2 1, x n 1, x 1 2,x 2 2, x n 2 then x 1 p, x 2 p, x n p, this you will get. Here also you are x i 1 then x i 2 somewhere x i j then x i p that general observations will be there, which we will be able to write like this that x i, which is x i 1, x i 2, x i j then x i p that is p cross 1. And also you will be getting one general variable that observation we write that x j. If we write, you will be getting x 1 j, x 2 j like this x I j then, x n j all remain same, it is after data collection. Then my question is, what is the difference between the first matrix and second matrix? This is our second matrix, where we said that after data collection and this is our first matrix we say this is before data collection, what is the difference? In second matrix values are known so it is either fixed values now from the data collection point of view, they are fixed values. First matrix you do not know what value you will get, that is why in the first matrices when you do not know anything, you will expect some value for each of the variable that mean, you also expect some deviation of different values from the mean that will be your variance and you also expect the 2 variables will be co vary, that will be your covariance, that is from the population point of view. Now, let us see this data set, now if I ask you what the x matrix, data matrix is. So, what is the n value, n is 12. Now, months we are excluding for the time being, although month can be in two data’s variable. For the time being let the month is excluded then 1 2 3 4 5, so your data matrix is 12 cross 5, all the variables are measured for 12 observations. So, if you see this then see this is the data matrix, this one 12 cross 5. So, the left hand side matrix, the data matrix is x 12 cross 5, 12 stands for n, this stands for p. Now, first we will see the expectation of the variable values, for each variable what is the expected value, then we will see that when we collect data, what that value is. Now, you see this slide that as there are p variables, so there are p means. These are population parameters this mu is the vectors, which represent p means for the p variables, and it is mean vector and which is a parameter vector for the population. And I have written here that it is expected value of x 1, expected value of x 2, expected value of x j, expected value of x p. You have already seen that what is the expected value of x, what we have written earlier if your variable is discrete, you give a summation and then you say all x, x f x. Here what happened we have basically so many variables at n? We are writing like this expected value of x j stand for the variable for a particular variable j th variable, then we can write all x j, x j f x j, when it is a discrete variable. When it is continuous what will happen, what you write in continuous case here. You have seen continuous case, you write integration minus 2 plus infinite x f x d x. So, I am writing here for continuous case integration minus infinite to plus infinite x j f x j d x j. So, for both the cases j equal to 1 to p, this is your expected value. So, when you write down for mu what is mu? mu is a vector p cross 1 vector, which is mu 1, mu 2 like mu j then your mu p, which is p cross 1 vector, this is the this is nothing but as we discussed now expected value of x 1, expected value of x 2 then expected value of x j, then expected value of x p and expectation we will calculate like this. So, where you collect data we have sets of matrices, 1 matrix we say that we have not collected data that is before data collection with respect to this. We are developing this one, this we are saying that our topic now is mean vector. When you collect data like the 1 I have already given you that 12 into 5, then our data matrix is this after data collection. And we want to compute the average value only because we have seen in univariate case, univariate case what you have seen? We have also seen that x bar is the estimate of mu, So, in our case we are writing x bar, which is nothing but x 1 bar, x 2 bar, x j bar, x p bar that is p variable average vector. Now, what is your x bar in case of mean variate? You have written 1 by n, sum total i equal to 1 to n x i, So, that mean I can write this 1 now in the same manner that this one is i equal to 1 to n 1 by n i sum total i equal to 1 to n x i 1 I can write, one stand for the variable I stands for the observation. Similarly, second one you can write sum total of I equal to 1 n x i 2. So, in that same manner for the j th case, you can write i equal to 1 n x i j and the last one you can write i equal to 1 to n x i p, this is my average vector for the sample collected on p variables. So, we will not go for individual mean calculation, the average calculation? Instead of doing this, we want to do matrix calculation, vector matrix that matrix calculation. What we will do here? See I have in order to calculate this x bar, this x bar we will take this data matrix past what is our earlier data matrix was there this data matrix was x n cross p, which we say suppose x 1 1, x 2 1 like this x n 1 x 1 2 x 2 2 then your x n 2, So, like this x 1 p, x 2 p your x n p that was my sample data and it is n cross p. You want to compute x bar from this sample data and x bar is a p cross 1 vector, that is x 1 bar, x 2 bar, x j bar, x p bar and you know the general formula. Suppose, x j bar is 1 by n sum total i equal to 1 to n x i j that also you know. Now, in matrix in 1 go you want to calculate all this things, what you require to do that means when you calculate in matrix domain, please remember I have n cross p matrix I want a p cross 1 matrix. I am going from suppose this is n cross p, So, you have n cross p you are going to p cross n that means what if I create one matrix, which is let it is 1 big symbolic one, which is all ones 1 1 like this, there will be n ones n cross 1. I am creating 1 unit vector where all there are n elements in the vector and each element is one only. Now, this I want to use this one, this unit vector with this data matrix in such a manner that I will be able to apply this computational formula then; I will be getting the p cross 1 vector. So, that means if I create like this suppose, p cross n into n cross 1 it is p cross 1 from matrix multiplication point of view row column. So, here number of columns is number of row equality is there. If I want to do this that means I have to transpose this matrix. I have a matrix called x with n cross p, if I do transpose x t, this will be p cross n row and column will be interchanged. So, I am doing this x transpose this is p cross n, I will take a dot product that is n cross 1. So, your resultant matrix will be definitely this will be cancelled out and it will be p cross 1. So, what will happen with this, you see now. So, your data is like this, so I am taking now let p equal to 2, n equal to 3 just to reduce the complexity repeating the same calculation. Then my x matrix will be 3 cross 2, which will be like this x 1 1, x 2 1, x 3 1 then x 1 2, x 2 2, x 3 2 as there are 3 observations. So, let you create 1 unit vector, which is 1 1 and 1 this 3 is required because my n is 3. So, I want to multiple x transpose 1, So, what I will do then x 1 1, x 2 1, x 3 1 that is the first row, second row x 1 2, x 2 2 and x 3 2. This is my second row because I have made the x matrix transpose, so then you multiply this by 1 1 and 1. And we all know this one is a x transpose is a 2 cross 3 and 1 is 3 cross 1, So, we will be getting a resultant matrix, which is 2 cross 1. So, then matrix multiplication point of view x 1 1 into 1 plus x 2 1 into 1, so that is what I equal to 1 2 3 x I 1, second one will be I equal to 1 to 3 x I 2. Now, we have seen earlier that what is x j bar that is 1 by n, sum total I equal to 1 to n x I j. So, if I divide this resultant thing by n, I will get the mean vector. So, that means I can write x bar is 1 by n x transpose this one or you will be able to do it very easily because this formulation is much better because you can multivariate domain hands on that calculation type you forget. You have to use Mat lab for understanding the computational part. You can now a day excel is very powerful also, you can use excel also. Using excel you can use this formulation very easily I think I will be giving you tutorial and you will have to do this and then if you find problem talk to me again in my chamber also that is no problem. So, will we be able to compute the mean vector log given data. Now, see this slide suppose you take these first 3 values for the first variable and also first 3 values for second variable and use this that matrix multiplication can we not do. What I said that we are taking only 2 variable case with 3 observations 10 12 11 then, 100 110 and 105. So, I want to get the x bar what we are saying x 1 bar and x 2 bar. You can very easily you can go like this 10 that is 33 that total is 33, here the total will be 5 that will be 1 0 1 1 1 and 3 3 1 5. If you divide it by 3 it will be 11 that first one variable 11 and the second one will be 105. So, that means your x bar is 11 and 105 that you can very easily do here also, but I am saying that you use this formulation that x bar equal to 1 by n x transpose 1. If you do like this 1 by 3 x transpose will be 10 12 11 100 110 105 and then 1 1 1 this is nothing but 1 by 3 into 33 315, which is your same thing 11 105. It seems or that means here and here there is not much of difference in calculation. The reason is because of number of observation is also less number of variable is also less, but you have to have p means large value of number of variables with large number of observations. So, that individual calculation is not required straight away go for matrix multiplication. We will be using simple matrix multiplication, matrix inverse and other things in throughout I can say the lectures. So, this is what is mean vector and from the population point of view and from sample point of view. From sample point of view sample average is the estimate of population mean vector. Now, come to covariance matrix see you have to understand it now. Although things are very simple, but physical meaning of each of the items must be understood then only later on we will be talking about covariance matrix. We will not come back to the physical meaning of covariance matrix further, we will simply say covariance matrix that means, you will be able to catch what is covariance matrix immediately then only you will be able to relate the discussion that time. So we are interested to know covariance matrix. So, let us discuss from population point of view first, that is population covariance matrix. If I say my x is a random variable univariate case then, if I ask you what is the variance of x? Then you will say that it is expected value of x minus mu square. And you have also seen for your discrete case that all x minus mu or x i minus mu this square then, f x i we are not using now let it be general case like this that is why I have written all x. And when you go for integration you write like this plus this x minus mu square f x d x, this is what this is the variance component, which is sigma square. Now, I will do simple alteration instead of x i will write x j that means I want to know the variance of x j i. Then you will write it is nothing but you will write x j minus mu j, only j will be the added there everywhere, then what you will write? You will write sum total of all x j, x j minus mu j square f x j, this is for discrete case. And you write like this integration minus infinite to plus infinite x j minus mu j square f x j d x j, this is your continuous case. Now, if this is the case and if I say what that means for j equal to if you put j equal to 1 2 like p in this formulation whether it is discrete and continuous, when you put j equal to 1 what you get, you will get sigma 1 square. When you put j equal to 2 you get sigma 2 square. So, like this you will get sigma p square that means the variance component of all the variables coming from this equation, but we have seen that we have p number of variables. And what we are also assuming that these p numbers of variables are not independent to each other. If x 1 is dependent on x 2 or x 1 and x 2 are not independent, what will happen? For example, height versus weight of people. Nonzero, so that means they have correlation or otherwise I can say there is very much if someone height is more than other one, it is naturally that weight also more naturally. But there are other parameters also, which govern that weight, but naturally this is the case. So, when they are not independent, they are dependent then what will happen, that means when x 1 vary, x 2 also vary. So, their simultaneous variability is known as what we want to say x 1 and x 2 co vary, they simultaneously vary. If there is covariance means association between their realisation of values of x 1 and x 2. So, now I will write again variance of x j we have written expected value of x j minus mu j that square. It is basically what this one if I just do some manipulation and if I write like this that there is a formulation called covariance between x j and x k, if I write like this x j minus mu j into x k minus mu k. You see the similarity between this 2, when I am talking about variance of j, I am saying x j minus mu j that this one I am further writing x j minus mu j and x j minus mu j, that is why this square is coming. That means same variable I am saying that suppose it is repeating to creating two variables same on that it x 1 on x 1 that sense if you do like this. So, the covariance is this, as I am saying if x j vary x k also vary, there is a chance that is why I am expecting that what is the association between the two. So, in that case we can again write down suppose the same formula that all x j, x k, x j minus mu j x k minus mu k. What we will write here for probability density function? Can we write that if x k and x j separately or we will write x j joint probability. You have to write the joint probability here and continuous case you have to write like this. It is double integration or here I have written all x j, x k simultaneously one symbol I am giving or otherwise you have to write all x j. all x k. What is the notation for this, we will use the notation for this is sigma j k. We have used sigma j for standard deviation, sigma j j for variance and sigma j k for covariance, so what I am writing here that this is sigma j k. Now, be careful about the notation now that sigma j square equal to sigma j j. Later on we will be using sigma j j, sigma 1 1 that is the variance component, which is basically if I write sigma 1 1 it is sigma 1 square. Then we will be using sigma j k, which is this is your variance of x j and this one is covariance between x j and x k. So, you have sigma j square, you have sigma j k and you have p number of variables, can we not find out the population covariance matrix now, we are now in a position to write down the population covariance matrix. As there are p variables so covariance stands for every two variables. So, how many elements will be there. So let us write down like this, we will create a matrix p cross p, p stands for number of variables so that when this is x 1, x 2 like x p again x 1 x 2 like x p. So, x 1 and x 1 the variability then when x 1 is varying with x 1, the same variable variability is variance. So, this 1 will be sigma 1 1, for the second case x 2 x 2 this will be sigma 2 2. So, like this for p variable case x p x p sigma p p, this diagonal lines all the elements in the diagonal lines are variance component that I am saying this is basically variance component. Variance part of the variable as I told you sigma 1 1 is equal to sigma square sigma 2 2 is equal to sigma 2 square like this variance. Then the off diagonal elements will be covariance, so what will be this? Sigma 1 2, sigma 1 p again I am writing 1 2 instead of 2 1 that what is the assumption we are doing sigma j k equal to sigma k j because j th variable k th variable two variables only, but in order we are changing. Then sigma 2 p like this you will get sigma 1 p sigma 2 p this. So, off diagonal elements are covariance part and diagonal elements are variance part, these resultant matrix in our class we will be denoting it like capital sigma. So, keep in mind capital sigma whenever we will be using, this is your population covariance matrix. So, population covariance matrix looks like this the way same thing as see sigma 1 1, sigma 1 2, sigma 1 j, sigma 1 p, sigma 1 2, sigma 2 2, sigma 2 j, sigma 2 p like this. If there are p variables, there will be p cross p elements that this side that will be p cross p, the size of the matrix. Now, we require to know sample covariance matrix, so this population covariance matrix, sample covariance matrix very-very vital component of multivariate statistics, very-very vital covariance matrix for the population for the sample. Now come to the sample part, so sample case we will say sample covariance matrix is S. We will be denoting sample covariance matrix as S, this will also be p cross p matrix. So, my matrix elements I can write like this s 1 1, s 1 2 like s 1 p again s 1 2, s 2 2, s 2 p like this s 1 p, s 2 p, dot dot dot s p p. So, these diagonal elements these are the variance part and off diagonal element will be the covariance part variance and covariance part.How do calculate s 1 1, s 1 2 like this all the elements of this matrix? So, the general one is here, it will be s j j and somewhere here maybe your s k j will be there or s j k. So, you can go by the same manner the way you developed in the univariate case, the variance computation. So, s j j what you will do? We have seen that 1 by n minus 1 sum total of i equal to 1 to n, you have written that x, you have written i then, you have written minus mu that sense, but we will use here. It is basically x bar we have use now j is coming into consideration we will write like this. We can use mu, but here it is mu is not available and we will not when you go for in the sample case, we will always write subtract by the sample average that is why n minus 1 is subtracted. If I use mu here it will be 1 by n so this square then, if I write I can write this one like this 1 by n minus 1 sum total of i equal to 1 to n x i j minus x j bar again I can write like this x i j minus x j bar same thing. So, using this I want to write s j k, s j k is 1 by n minus 1 sum total of i equal to 1 to n, first I will keep the j variable as it is then second case you introduce k. What is happening here? Actually if you see in the covariance case or the variance case, the original data matrix is transformed that will capture, that concept we will take here. You see x i j minus x j bar that means for the j th variable every element is subtracted by its average, for the k th variable also every element is subtracted by its average. When if this is the case, can I not write down the data matrix in this format like this? That my original data matrix is X which is x 1 1, x 2 1, x n 1, x 1 2, x 2 2, x n 2 then x 1 j, x 2 j, x i j then x n j, x 1 p, x 2 p then x n p. So, you have computed here this is x 1 bar, x 2 bar, x j bar, x p bar then, you are writing something you are converting this that some conversion is taking place here, that is subtraction of mean then what are you getting here? If I subtract by mean, I will be getting every observation is subtracted by its corresponding mean value. So, if I just after this basically it is a subtraction by corresponding mean, this is this. So, instead of writing this minus this, if I write like this suppose I will write x star is like this x star 1 1, x star 2 1 so like this x star n 1. Same manner I am writing x star 1 p, x star 2 p like this x star n p, somewhere there will be x star i j. In general one where x star i j is x i j minus x j bar that means this matrix, this matrix same. Now, if I use this formula what will happen then in this case that means x i j minus x j bar x i j star and this one will be x i k star. So, the resultant matrix will be then 1 by n minus 1 sum total i equal to 1 to n x star i j x star i k. So, this type of conversion will take place and ultimately little more mathematics that we will see that. I think up to that to you calculate this using this formulation can you calculate. You take the first data point, you take same data point that first 3 variable values, I think I have given you. Suppose, this is my data points you already calculated mean value. You have to now calculate the variance and covariance part because there are 2 variable only 1 covariance will be there. Then next class we will go for the matrix, how using matrix multiplication formula we will be able to calculate the covariance matrix totally then, the correlation matrix all those things. Thank you.
Info
Channel: nptelhrd
Views: 14,837
Rating: undefined out of 5
Keywords: Multivariate descriptive statistics
Id: CF9kVoyz8cc
Channel Id: undefined
Length: 60min 35sec (3635 seconds)
Published: Fri May 09 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.