Mod-01 Lec-10 Multivariate normal distribution

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Today, we discuss multivariate normal distribution multivariate normal distribution, last class we have see the invariate normal distribution. You see the formula if x is a random variable then 2 pi sigma square e to the power minus half x minus mu by sigma square minus infinite less than x less than plus infinite, so the p d f. Multivariate normal p d f is characterized by mu and sigma mu and sigma that is a population parameter. We want the counter part of p d f multivariate domain when x that invariate x is converted to X which is your X 1, X 2, X p and univariate mu is no longer univariate. It will be a very mean vector mu 1, mu 2, mu p. Similarly, invariate sigma square will no longer be univariate, it will be a multivariate covariance matrix p cross p. So, when we want something by multivariate normal distribution, we want something which is f x in terms of N variable number p and mu vector and covariance matrix, how do you, how do you go about it and how to do it that is the discussion, today. Today, we will discuss bivariate normal density function multivariate normal density function and properties of multivariate normal density function. If time permits, we will go for statistical distance and constant density contours. So, the univariate PDF is this you want to visualize its multivariate counterpart, so let us consider a bivariate case. You see this slide, in this slide you see that there are two variable X 1 and X 2 and probability density that is joint density that X 1 and X 2 X 1 and X 2. So, this is what is given in figure, so you see that you are getting a bell shape, but in three dimensional you are getting because there first two dimension for the two variable values and third dimension are the density values. If you take one more dimension, it is difficult you cannot visualize, suppose there are three variables with density, we cannot visualize pictorially. Now, as I told you our objective of the first part of today’s lecture is we want to develop, this is our objective. So, in order to do so, we will follow a systematic, but simple path. Suppose, you think that you have two variables X 1 and X 2 which we are saying a bivariate case, so our X is X 1 and X 2 that is why my mu is that mini vector mu 1 and mu 2. Your covariance matrix will be 2 by 2 sigma 1 1, sigma 1 2, sigma 1 2, sigma 2 2, I hope that there is no problem with you in this nomenclature. So, we assume something here, you see the slide here, in the slide you see the top figure that here you just scatter plot. Now, what you can say about the two variables X 1 and X 2 seeing the scatter plot are they co related or there is no correlation, is it something like a circle you are getting or ellipse. There is no pattern you see that it is a ellipse type of thing, but there is no correlation. So, we want to simplify our derivation without correlation, so let me know what it means to say our sigma 1 2 is 0 or rho 1 2 is 0. If there is no correlation, then what will be the joint density? Suppose, X 1 and X 2 multiplication of the marginal density of the two, so X 1 cross X 2, now all of us know that if X equal to 1 by root over 2 pi sigma 1 square as it is X 1 e to the power minus half X 1 minus mu 1 by sigma 1 square. Then, similarly for X 2 also, that is a second variable, you can write 2 pi sigma 2 square e to the power minus half X 2 minus mu 2 by sigma 2 square. You can write this and definitely here x j less than greater than minus infinity to less than infinity to j equal to 1, 2 variable you have taken. So, you can multiply these what you are getting, we are getting like this 1 by, that is one quantity equal to 2 pi, so 2 pi you are getting 2 by 2 pi into 2 pi square root 2 by 2. Then, another one what you are getting, sigma 1 square and sigma 2 square also, sigma 1 square sigma 2 square to the power half you are getting here. Then, I am coming to the exponent part e to the power minus half and all of us know that e to the power a into e 2 power b equal to e to the power a plus b. So, we can write this one like this X 1 minus mu 1 by sigma 1 square plus X 2 minus mu 2 by sigma 2 squares, we can write this. So, essentially what is happening here that when I go for that univariate normal or bivariate normal with this with no dependence structure. You are having two component in the density function, one is the constant part another one is the exponent part; exponent means e to the power of something. When I am making the joint distribution, here also you are having also two part this and this, the general structure for the multivariate normal distribution invariate that remain same what is the difference, difference will come in the two components and the values will be different. So, we found out that if X 1 and X 2 are independent, then our structure is like this, now let us see that we want to derive this constant part as well as exponent part from the population parameter. We have mu 1 into mu 2 mu equal to mu 1 mu 2 and we have sigma is sigma 1 1, sigma 1 2, sigma 1 2, sigma 2 2, this one you can write. Now, sigma 1 1 equal to sigma 1 square sigma 1 2 sigma 1 2 sigma 2 square, so if I make something like this determinant of sigma. Here, sigma 1 square, sigma 1 2, sigma 1 2, sigma 2 square, its determinant and you know that determinate will be this cross this minus this cross this. So, this one is sigma 1 square, sigma 2 square minus sigma 1 2 square, now you see that what we have assumed in the earlier demonstration. We say sigma 1 2 equal to 0, just for the sake of simplicity we have taken that sigma 1 2 is 0, so if sigma 1 2 is 0, then determinant of sigma is nothing but sigma 1 square sigma 2 square. So, if I make square root of these, then this is the determinant of square root of the determinant and if this is the case. Then, the constant part what is the in case of our independent bivariate density function, we found out that constant part is 2 pi 2 by 2 sigma 1 square sigma 2 square to the power half. Now, these I can write like these 2 pi 2 by 2 determinant of covariance matrix to the power half. Now, suppose you have one more variable that means you have taken three variables, now like X is X 1 2, X 2 and X 3. You have to consider that your sigma is sigma 1 square, 0, 0, 0, sigma 2 squares, 0, 0, 0, sigma 3 square, we are assuming that all the variables are independent, then what will happen again f x 1, x 2, x 3 will be f x 1 into f x 2 into f x 3. In the similar way you multiply, ultimately your constant term will be 1 by 2 pi, now three variables are there by 2, then sigma 1 square, sigma 2 square, sigma 3 square to the power half. You see if you take determinant here, what are you getting here it will be sigma 1 square, sigma 2 square, sigma 3 square. So, that means determinant to the power half is sigma 1 square sigma 2 square sigma 3 square to the power of half. So, if you now increase it to p variables, so ultimately your dimension will change and sigma to the power half will take care of one part of the constant. So, if I go by p variable, now my constant will become like this one by you see that, when there are two variables it is 2 by 2 when three variables 2 pi to the power of 3 by 2. So, when there are p variable, it will be p by 2 and whether it is two variable or three variable, three variable case. Ultimately, this quantity will be replaced by determinant of covariance matrix to power half, so with one assumption here that we are considering independent variable we proved this is the case, now what will happen to your constant exponent term. So, in two variable cases we found out that the exponent term X 1 minus mu 1 by sigma 1 square plus X 2 minus by mu 2 by sigma 2 squared. So, we say this is the exponent term this is the exponent term this portion is exponent term, so it is x you see that X 1 minus mu on that is been subtracted divided by a standard deviation that is square. Now, create one suppose x minus mu transpose, no I will explain from the invariate case that will be better, so invariate case f x 1 by root over 2 pi sigma square e to the power minus half x minus mu by exponent 1. So, your exponent is minus half, I am writing x minus mu sigma square to the power minus 1 x minus mu is minus half x minus mu by sigma square. You see x minus half is there minus half x minus mu x minus mu square divided by sigma square, so sigma square sigma to the power inverse. Now, if you go for the multivariate case what will happen your x is replaced by X, mu is replaced by bold mu, sigma square will be replaced by sigma. Now, in matrix multiplication what will be the square transpose that matrix X transpose x that is the square term. So, we are basically making here square, so we want this that is why what is meant to say in multivariate domain, the exponent can be written like this x minus mu transpose sigma square is replaced by sigma to the power minus 1 x minus mu. From univariate normal distribution, we have taken the exponent part and we are saying that if we go in same manner to the multivariate part our resultant quantity will be this for the exponent is it so? Now, can we do like the same thing for X equal to X 1 and X 2 variable case mu equal to mu 1 and mu 2 and sigma equal to you have taken already sigma 1 square, 0, 0, sigma 2 square because we are independent case. We want to prove first because we know under this condition, what will be the distribution multivariate normal density function that is known. So, then you write down minus half, so x minus mu transpose, so that means this is X 1 minus mu 1 X 2 minus mu 2 because x minus mu is X 1 minus mu 1 X 2 minus mu 2, 2 cross 1 it will be 1 cross 2. Now, your what do you want sigma 1 square, 0, 0, sigma 2 square, this inverse then, so that is 2 cross 2 then X 1 minus mu 1, X 2 minus mu 2, this is your 2 cross 1. So, what is the resultant quantity 1 cross 2, 2 cross 2, 2 cross 2, it is a 1 cross 1, this will give you 1 cross 2, this will give you 1 cross 1. We say density that exponential to the power this constant value, you will be getting some values density will be calculated. Now, what is the inverse, how to calculate the inverse suppose if A is a matrix like this a 1 1, a 1 2, a 2 1, a 2 2, how do you compute the inverse 1 by adjoint by determinant. So, A inverse is adjoint of A by determinant of A, now adjoint is the transpose of the cofactors of A divided by determinant of A. So, this is the case our A is nothing but this one sigma 1 square, 0, 0, sigma 2 square, which is what is our sigma. Now, determinant already we have seen the determinant is sigma 1 square and sigma 2 square multiplied by these two. Now, what will be the cofactor of this cofactor is if you if you suppose I want to know cofactor of sigma. Here, suppose in a case you see cofactor means suppose you want to see the cofactor of these then you have to cross this corresponding row and column what is left that is the cofactor, but the sign conversion will be there. So, that means cofactor means for a i j, the cofactor will be minus 1 i plus j and the remaining portion whatever the remaining portion remaining part of the matrix that will be the case. As you have take 2 by 2 cross, so ultimately what will happen one row and one column crossed means only one item will be left. So, it is our case, then cofactor of these we can write first one is minus 1 to the power 1 plus 1 that is 1 plus 1 then what is remaining here, sigma 2 squares. Suppose, you cross this and this sigma 2 square will be there and see it is 0 and it will be also 0 and sigma 1 square will be this. So, our cofactor is sigma 2 square, 0, 0, sigma 1 square what will be transpose same because these two element are 0. Now, transpose of cofactor of sigma, this is again coming the same thing because rho and column interchanged symmetry 1, 0, 0, sigma 1 square. Then, what is my inverse, inverse is that is the cofactors mean 1 by determinant, you write down first sigma 1 square, and sigma 2 square that is the determinant sigma 2 square, 0, 0, sigma 1 square. Now, calculate this one my calculation is minus half, our X 1 minus mu 1 X 2 minus mu 2. Then, inverse is coming like this 1 by sigma 1 square sigma 2 square sigma 2 square, 0, 0, sigma 2 square then multiplied by X 1 minus mu 1 and X 2 minus mu 2. Suppose, if I do this portion first, what you will get you will get minus half X 1 minus mu 1 X 2 minus mu 2 1 by sigma 1 square sigma 2 square this is 2 cross 2 this is 2 cross 1 you will be getting 2 cross 1 this into this plus this into this. So, it is basically sigma 2 square X 1 minus mu 1 then this into this plus this plus 0 then 0 again, sigma 1 square X 2 minus mu 2. Now, let me bring this one this side later on, we will manipulate sigma 1 square, sigma 2 square, so if you multiply this 1 cross 2 and 2 cross 1, you will be getting 1 cross 1. So, this into this plus this into this you see what is happening sigma 2 square X 1 minus mu 1 square because X 1 minus mu 1 X 1 minus mu 1 plus sigma 1 square X 2 minus mu 2 square. So, if you divide this 2 by sigma 1 square sigma 2 square what you will be getting? You will be getting like this minus half, then it will be sigma 1 minus mu 1 by sigma 1 square plus X 2 minus mu 2 by X 1 minus mu 1 by sigma 1 square this one. You have seen that we have found out earlier also this is f x 1 cross f x 2, this we will find out like this one you found out you. Just check I showing that earlier when we have multiplied the two what we got here 1 by 2 pi 2 2 pi to the power 2 by 2 sigma 1 square sigma 2 square half then minus 1 by 2 X 1 minus mu 1 by sigma square this one. Here, what are you getting here same thing you are getting, so what I mean to mean to say all though this is not a derivation this is the other way proof that what we are saying that means I can write for a bivariate case. I can write my bivariate normal distribution like this is the case you can write like this. If it is true for multivariate case also then what will happen, ultimately multivariate case X 1, X 2 and X p, then it will be 2 pi to the power p by 2 determinant of this then e to the power minus half x minus mu transpose. This is the case and you have to write minus infinite x j infinite j equal to 1 2 p, this is our multivariate normal distribution we say multivariate normal density function defined. Now, what will be the bivariate density normal density function when your matrix is like this. This covariance matrix is like this sigma 1 square, sigma 1 2, sigma 1 2, sigma 2 square that means there is covariance this the case what will happen can you not find out this determinant of this is our sigma 1 square, sigma 2 square, minus sigma 1 2 square. So, this can be written like this sigma 1 square, sigma 2 square minus rho 1 2 sigma 1 sigma 2 square covariance is correlation times the standard deviations. So, we can write this one sigma 1 square sigma 2 square 1 minus rho 1 2 square and what will be your inverse here now inverse will be 1 by determinant. So, 1 by determinant, let me keep this one only then sigma 1 square sigma 2 square minus sigma 1 2 square into we know that transpose of the cofactor. So, I will take this, so it will be sigma 2 square then what will be this plus this is minus sigma 1 2 minus sigma 1 square. Then, what is my exponent part half x minus mu transpose X minus mu transpose sigma inverse x minus mu. This is equal to minus half X 1 minus mu 1 X 2 minus mu 2 X 1 minus mu 1 mu 2 the 1 by sigma 1 square sigma 2 square minus sigma 1 2 square into sigma 2 square minus sigma 1 2 minus sigma 1 2 sigma 1 square times X 1 minus mu 1 X 2 minus mu 2 correct. So, if you further manipulate, what will happen this equal to minus half X 1 minus mu 1 X 2 minus mu 2, then I want to multiply the last two parts. So, I am writing like this sigma 1 square sigma 2 square minus sigma 1 2 square into this one, you see this is 2 cross 2 and this one is 2 cross 1. So, this multiplied by this plus this multiplied by this plus this multiplied by this, so if we write like this what will happen here sigma 2 square into X 1 minus mu 1 minus sigma 1 2 into X 2 minus mu 2. So, that is coming from this part row column second one will be minus sigma 1 2 minus sigma 1 2, we are multiplying this with this. So, X 1 minus mu 1 an then plus sigma 1 square when you are saying sigma 1 X 2 minus mu 2, so that is your matrix your this is the first row, this is the second row. So, it is 2 cross 1, so then minus half into sigma 1 square sigma 2 square minus sigma 1 2 square you keep here. Now, you are multiplying this into this so multiply this is 1 cross 2, 2 cross 1 you will get 1, so this into this plus this into this. So, what you are getting then you are basically getting sigma 2 square X 1 minus mu 1 and X 1 minus mu 1 that is square minus what you are getting this into this so sigma 1 2 into X 1 minus mu 1 and X 2 minus mu 2. So, this into this over second column verse here second row, so it will be minus sigma 1 2 X 1 minus mu 1 X 2 minus mu 2 plus sigma 1 square X 2 minus mu 2 square that is the total. So, if I further manipulate this what I can write sigma 1 square sigma 2 square minus sigma 1 2 square then this is sigma 2 square X 1 minus mu 1 square minus 2 sigma 1 2 X 1 minus mu 1and X 2 minus mu 2 plus sigma 1 square X 2 minus mu 2 square. If you divide this within bracket quantity by sigma 1 square and sigma 2 square what will happen minus 1 by 2 1 by sigma 1 square sigma 2 square minus sigma 1 2 square. So, I am dividing the entire thing by sigma 1 square and sigma 2 square I am taking common then what will happen this one X 1 minus mu 1 by sigma 1 see sigma 2 is already there sigma 1 square we have already taken sigma 1 I am keeping. Here, this minus 2 sigma 1 2 then divided by you write sigma 1 and sigma 2 here can we not write like this, like this, this equal to this X 2 by this see sigma 1 square sigma 2 square. You have taken here plus you can write down X 2 minus mu 2 by sigma 2 square what is what is this quantity sigma 1 2 by sigma 1 sigma 2 that is rho, so I can write like this minus half sigma 1 square sigma 2 square. You have already seen sigma 1 2 square equal to sigma 1 square sigma 2 square into rho 1 2 square. So, you take common here 1 minus rho 1 2 square then this quantity is X 1 minus mu 1 by sigma 1 square minus 2 rho 1 2 X 1 minus mu 1 by sigma 1 X 2 minus mu 2 by sigma 2 plus X 2 minus mu 2 by sigma 2 square so this quantity this will be cancelled out. So, if I see this verses the independent case you will very easily find out, now if I put rho 1 2 equal to 0, this one will become 0, so then this is X 1 minus mu 1 by sigma 1 square plus X 2 minus mu 2 by sigma 2 square and what you have here also rho 1 2 will be 0. So, I remove 1 by minus half that mean the resultant quantity will be, if I put rho 1 2 equal to 0 my quantity is coming this 1 minus half X 1 minus mu 1 by sigma 1 square plus X 2 minus mu 2 by sigma 2 square. So, this is the exponent part clear, so that means what I mean to say that in the reverse way also we proved that yes this quantity is following the distribution equal distribution what we have thought of. Now, question comes what is this is the shape of this ellipse correct, now see this diagram this very important concept. Here, see this is my equation and we have started with this we said this is the scattered plot of X 1 and X 2 and it resembles that there is no dependency between the two variables that mean covariance is 0. We assume sigma 1 2 is equal to 0, so that mean this one is nothing but this ellipse what is coming here this ellipse. So, that means if I just write down this one what you are getting you are getting you see that what I will do now I will draw a line like this, but it will be a curve so it is basically coming like this. So, when you plot this X 1 and X 2 that exponent part you are getting an ellipse when any time you can get an ellipse because this one is also a equation of ellipse please keep in mind this is the in two dimensional general equation of ellipse. So, if I want to plot this what will happen to my figure you are now depending on rho 1 2 yes origin is at mu 1 mu 2. You usually figure that is original side mu 1 mu 2 data is given in such a manner that 0 is the 0, 0 is the origin mu 1 mu 2 origin is mu 1 and mu 2 this is your mu 1 and mu 2 what will happen if you take the general equation means this 1. So, depending on the rho value that rho 1 2 value is it positive is it negative is it 0. If it is 0 this is the diagram this side or you it may because this side see in here the major axis of the ellipse lies along X 2 axis the reason is the variability along X 2 is more they are independent. That is why the major and the minor axis of the ellipse go along the original X 1 and X 2 axis and along X 2 axis the major axis lies because the variability along X 2 axis is more and variability along y 2 is less, sorry variable X 2 is more variable X 1 is less if variability along X 1 is more. Then, they are independent, then your ellipse will become like this keep in mind they are independent when rho 1 2 is greater than 0, it will be so X 1 increases X 2 increases like this so it will be like. This is inclined because the major and minor axis of the ellipse is not parallel to the original X 1 and X 2 axis. So, as this one is increasing this is also increasing other way when it is less than it will be just this it will go to this level. In one of the slide I think I have shown you this picture that my X 1 and X 2 is like this my data is like this it is a circle this is also a bivariate case. So, this is also independent case the question is this sigma 1 and sigma 2 in this case independent, but sigma 1 square equal to sigma 2 squares, how do you know this axis. I know the ellipse what is this value suppose this is the first of the entire direction second one is the value how you know all this things. Let us see some of the slides here. This is multivariate, so let us see this one first I will I will come back to this how to determine the axis and length all those things. So, what I request to all of you in order to understand the axis. You have to know little bit of matrix what is this again value Eigen value Eigen vector, so Eigen value and Eigen vector. I will show you next class Eigen value Eigen vector then axis the all those things and now see one example a process is characterized by two variables that is X process is designed to produce laminar aluminum sheet of length X 1 and breadth X 2. With the following population parameters this and this are the population parameters obtain its bivariate normal distribution this is the answer, I am sure you will be able to find out this one from the beginning. If you start the way we have described if you start in the same manner you will ultimately ending. With this answer, we come to properties that multivariate normal distribution has started very, very useful properties, multivariate normal distribution that we will denote. Next, hence proved that is MND multivariate normal distribution which is we say N p mu and sigma it has many useful properties some of the useful properties I am describing. Now, you see the first property if X is multivariate normal then all the variables individually are invariate normal obvious that when x is there are X 1 to X p. They simultaneously multivariate normal then X 1 is also invariate normal X 2 is invariate normal x p is also invariate normal that means what I mean to say that this one where mu is mu 1 mu 2 mu p and sigma. I am writing sigma 1 square sigma 2 square sigma p square and these components are also there any one. If say x j this will be your invariate normal with mu j and sigma j square j equal to 1 2 p, so that mean that sigma j will be coming from here that sigma j square. This is your first property what is the second property, if x is multivariate normal, then any subset you take that will be multivariate normal by this what do you mean. Suppose, my X is this X 1 X 2 X q X p, I will create two subsets that is X 1 X 2 that is big X 1 X 2, so then this will be this X 1 is q cross 1 and X 2 is p minus q cross 1 vector then what I want to say we want to say that X 1 is q variable vector. So, it will be multivariate with q dimensions and what will be the mu that is mu and sigma what you will do I am writing mu q and sigma q. If I write what will be the mu that first mu because that first mu variable you have taken what will be the sigma q sigma. Now, sigma 1 1, sigma 1 2 like sigma 1 q, sigma 1 p, similarly this will be sigma 1 q then somewhere sigma 1 q then sigma q p then sigma 1 p that sigma p p. So, you have created a subset with q variables, so that means what is happening here, now this is your sigma q and mu case is mu 1 mu 2 mu q mu p, so this is your mu q. So, that means if you take a subset and you know the parameters for those the subset of parameters you consider and find out it distribution that will be multivariate normal distribution the third distribution is very, very useful. The third property is very, very useful, you see what is written if X is if x is multivariate normal linear combination of x j is invariate normal this property can be exploited like anything in your research what is what does it mean? It means that suppose I will first create a vector like this a 1 all constant a p some I is my X is X 1 X 2 x p then this one is 1 cross p this is p cross 1 so what is the linear combination, linear combination. Obviously, this one gives you a 1 X 1 plus a 2 X 2 like a p x p what it is say is that the this property says that what will be the expected value of a T x it will basically a T. Expected value of x will be a T and mu this is nothing but a 1 mu 1 a 2 mu 2 a p mu p and what will be your variance of this a transpose x. This will be you are a transpose sigma a, you can prove it also writing like this, so a transpose sigma. You see it is 1 cross p p cross p p cross 1 resultant is 1 cross 1. So, then the linear combination will follow univariate normal with a transpose mu a transpose sigma a that is our variance spot. Now, the fourth property fourth property says that instead of one linear combination if you make two linear combinations, what is happening here, you just see in one linear combination. We have taken a 1 a 2 we have taken a p this is the instead of this I am creating another one like this a 1 1, a 1 2, a 1 p, a 2 1, a 2 2, a 2 p. Suppose, a q 1, a q 2 dot dot dot a q p what is happening now? If I find out, so this one is my 1 2 p, so this is p cross q this one is p cross 1. Now, if I make like this A transpose x what will happen then this will be your q cross p and this will be your p cross 1. So, you will be getting something called q cross 1 where as in one linear combination a cross t is basically 1 cross, so that means q cross 1. This means you are ultimately creating this one q cross 1 a 1 1 X 1 a 1 2 X 2 like this a 1 p x p for this a 2 1 X 1 plus a 2 2 X 2 plus a 2 p x p. So, like this a q 1 X 1 a q 2 X 2 plus q p x p so if q 1 will be this if I take one combination that is invariate normal take this one second, so all collectively what you are saying collectively they will be multivariate normal. So, that means this quantity will be as q linear combination you have made this into definitely what will happen a sorry A transpose mu. Then, A just check this transpose part you have to check what is A here a is p cross q and this one is q, so that means what do you want this will be cross q if I write like this. I think in books may they have written in the other way round that part you check ultimate aim is as it is q variable that a transpose x is q variable vector. So, the variance component will be order of q cross q and mean component be order of q cross 1 column vector definitely. So, this four properties are important and you will you have you see that we you calculate x bar in invariate case. When you calculate x bar that is what that is linear combination of multivariate observations n observations are there 1 by n into x or so equal. Now, then that then what will be the distribution of x bar, although it is invariate normal that is why the sigma square by n is coming there, so all those things. Here, we will be seeing not that x bar only it will be a big x bar that means mean vector for, so next class I will explain you that statistical distance. Thank you very much.

Info

Channel: nptelhrd

Views: 85,835

Rating: undefined out of 5

Keywords: Multivariate normal distribution

Id: YgExEVji7xs

Channel Id: undefined

Length: 57min 32sec (3452 seconds)

Published: Fri May 09 2014