Good morning. Today, we will discuss Multivariate
Analysis of Variance, multivariate analysis of variance, which is popularly known as MANOVA. Today's contents are conceptual model for
MANOVA, assumptions for MANOVA, modelling estimation of parameters, model adequacy tests,
interpretation of results, and references. We will see that how much is possible to complete
today and the remaining portion, we will be completing in the next class. You see this slide last class I had shown
you that when l equal to 2 where l stands for number of population and p is 1 then we
have used t-test. That is the difference between two population mean that is described through
as H 0, mu 1equal to mu 2 and H 1 mu 1 not equal to mu 2. For the p greater than equal
to two case that is the multivariate case here also this H 0, mu 1 equal to mu 2 and
mu 1 not equal to mu 2, but this mu 1 and mu 2 are in the vector domain. When p equal
to 2 that is the two cross one vector and we have used Hotelling's t-square. Last class I have told you that, when l greater
than equal to 2 that is three or more for one variable case you will be using ANOVA.
We have discussed one way ANOVA, two way ANOVA, three way ANOVA and multi way ANOVA concept.
Now, if your number of population is more than two that is three or more and as well
as number of variables are more than two or more, in that case what will happen? You will find out that you require to use MANOVA not ANOVA. So, essentially what we have discussed then? When we will go for ANOVA? ANOVA is you have
l number of population l can be 1, 2 capital L. That is 1, 2, 3 like this, this is the
lone 2 3 and you will collect ith number i is equal to 1, 2 dot n number of observations.
You are interested to see the difference in population means, where population is determined
by l 1, l 2 like one to L, that is the case everywhere .The mean value you will find out
here the mean value, you will find out mean one we can say x 1 bar x 2 bar like this.
Then finally, x bar that what we have seen last class
Now, in case in case of ANOVA that l is greater than equal to 3, then what you mean to say
that there is three or more population wit H 1 variable p equal to 1 you are going for
ANOVA. When p l greater than equal to three, but p is two or more then you will go for
MANOVA. Your hypothesis, what you will propose that we will see later on, but before that
let us see one example. This is an example last class in ANOVA we have seen that process
A, process B, process C producing steel washers with certain quality characteristics that
is outer diameter. So, with H 1 quality variables, now here we
are adding one more quality variable that is inner diameter. So, that means the steel
washers produced by the three processes are measured in terms of their two quality characteristics
that is outer diameter and inner diameter. So, here p equal to 2 and there are three
processes process A process B and process C. So, our case example case is capital L
equal to 3 and small p equal to 2. So, this a case for MANOVA, this is a case for MANOVA,
what we have discussed here. I told you in last class also one of the useful
way of looking into data is the box plot. If you see the box plot for the two different
variables inner and outer diameter for the three different processes process A, process
B, process C. You are seen in last class for outer diameter the mean differences are quite
visible from the box plot. If you see the lower figure where it is the inner diameters
are plotted in terms of box. So, you find out the mean value for process
A, that mean the mean inner diameter is this one mean inner diameter for process B is this
mean inner diameter for process C is this one. So, apparently if we want to say about
the differences in means between the three processes are process A and B means are not
different, but process C is different from process A and B.
If you see the outer diameter case here process A and C's mean differences are not all significant
may be, but process B it is different from both A and C. So, there are two types of pictures,
now from outer diameter point of view you are saying that. Also, we have seen in ANOVA
that process B is different than processes A and C, but here if we add inner diameter,
what is happening here is that process C perhaps will differ in terms of their mean values
of inner diameter from A and B. Now, we want to see collectively what is the difference
in mean vector? Where this vector will be determined by mean diameter for outer diameter
as well for mean inner diameter? So, collectively are the two process, three processes different or not that is what we will be testing through MANOVA. So, we require certain notations to fully
understand the application of ANOVA as well as MANOVA. In case of MANOVA there is one
more dimension is added as number of variables will be more than equal to 2. So p will be greater than equal two, so as
a result you see in ANOVA what we said we said like this that ANOVA case here it is
number of populations this axis. This axis is observations and for different population
you have obtained certain observations. Somewhere, there is one observation, which is X i l ith
observation on the lth population and it is a scalar quantity.
Now, in case of MANOVA what will happen? You will find out that one this side is number
of observation. Let it be population in the same manner number of populations and this
side is observation, that is number of observations. Then we will we will add one more dimension,
which is number of variables. So, if we denote number of population in terms of l number
of observation in terms of i and number of variables in terms of j. Then what will happen?
Your general structure will be like this. This is the data structure basically getting
me, what we learn from any observation. Suppose, if I say this is my X i l getting me, this
is our X i l then X i l is no longer a scalar quantity in ANOVA. This is a scalar quantity
in MANOVA, it is not a scalar quantity, because for this cell you see if i go one two three
depending on the number of variables X i l will have more number of values.
So, we can say here that X i l is no longer a scalar quantity, it is a vector. What we
will write here X i l 1, X i l 2, so like this X i l how many variables are there p
variables are there p cross 1. There is the complexity, because one more dimension is
added. Your total work is now in three dimensional case, it is not a two dimensional issue. If this the case, so this is the general observation
this is my general observation as you have p variables. So, you also have p mean values,
now I am writing that the mean vector for population l there will be p mean values mu
l 1 mu 2 like. This mu l p p cross 1 and as there are p variables again there will be
one covariance matrix. If I write like the s l sigma l instead of s, we will be writing
when we take the sample. Now, we will be writing like this. Then what will happen? This will
also be p cross p matrix p variables are there your l varies from 1to capital L.
So, then there will be sigma 1, sigma 2 like this sigma capital L. So, there will be similarly
mu 1, mu 2 mu capital L for mathematical simplicity we will be using this term. Although, this
a vector basically we will be using this term like this without bringing the variable part
the variable part is implicit here, because X i l is nothing but this one this is the
general observation. So, you see this pictorially given in this figure I think it is very clear. Now, for all you that ANOVA where all we have
different populations and different observations. For MANOVA, it is not population observation
and variables and this general observation is X i l, which is this one you have to keep
in mind this vector part. Then MANOVA assume something and MANOVA do also hypothesis testing
like your ANOVA. What is the hypothesis here? What are the
hypothesis in ANOVA? You said mu 1 equal to mu 2 equal to mu l that is your H 0 and your
H 1 is mu l not equal to mu m. For at least one pair of mu l, pair of mu means either
mu l or mu m l equal to 1 2 capital L m equal to 1, 2 capital L and l not equal to m. In
MANOVA case the same hypothesis is same, that we are saying H 0 mu 1 equal to mu 2 equal
to mu l. Please, keep in mind by saying this we are saying like this mu 1 1, mu 1 2 like
this mu 1 p equal to mu 2 1 mu 2 2 mu 2 p equal to. Finally, mu capital L 1, capital
L 2, capital L p that is the difference in ANOVA case, it is scalar quantity. MANOVA
case it is the vector quantity. Your alternate hypothesis that mu l not equal
to mu m by saying this you are saying that mu l 1 mu l 2 mu l p this not equal to mu
m 1 mu m 2 for at least one pair of l m. So, like ANOVA we will also partition the general
observation. So, what is my general observation here, my general observation here is X i l.
Let us see in ANOVA case, parallely see MANOVA, case in ANOVA case X i l is partitioned like
this, mu plus mu l minus mu plus X i l minus mu l, this is the way you partition. We say
this is equal to mu plus tau l plus epsilon i l mu is grand mean tau l is the population
effect. We are saying that no model is perfect, it cannot predict exactly same exact value
for all the observations, there will be random errors. So, sigma epsilon i l is the random
error part. So, same partitioning possible in MANOVA,
what we will write X i l equal to mu plus mu l minus mu plus X i l minus mu l? So, it
similar it is like this we have seen that X i l is X i l 1 X i l 2 X i l p. p variables
are there, which is equivalent to mu 1 mu 2 like our case is mu l that is the grand
mean case. If we write like this think earlier we have written mu l 1 minus mu 1 mu l 2 minus
mu 2 in the same manner you come mu, I think l this mu 1. Now, mu l minus mu, so all cases
l will be there. Then every case what is, what we are doing? This is mu 1 mu this is
mu p not mu l this is mu p, please write down this is mu p.
So, the mu l p minus mu p, that is why the problem comes, I have written l then plus
same thing X i l all this you write X i l 1 minus mu l 1 X i l 2 minus mu 2, same manner
you come X i l p minus mu l p. This is what is partitioning the general observation, to
three components one is this one is the general observation to the left had side general observation
vector and right hand side. This is your grand mean vector then other one that mu l 1 minus
mu 1. This one we are saying population effect vector population effect vector and last one is random error vector. This partitioning for this one this partitioning
we can write also in the same manner earlier we have written that this first one. This
is X i l fine the second one is mu that is also fine then third, it will be tau l plus
epsilon i l. So, this is my general observation, this my grand mean vector, this is the population
effect vector, this is the random error effect. So, if this is just we will just what I mean
to say that let us write down the tau l case. This will be a vector tau l 2 tau l 2 like
this tau l p, p cross one vector. So, when we say lth population effect, that is related
to all the variables considered here we are considering p variables. So, tau l for p variables. So, if we frame our hypothesis like this that
H 0 mu 1 equal to mu 2equal to mu l and H1 mu l not equal to mu m for one pair of lm. Then using this tau l concept, you can find out the null hypothesis also, what will
be the null hypothesis also, what is tau l? Tau l is mu l minus mu. If H 0 is true of
this one is true means all means are equal, then what will be the grand mean? Grand mean
will me n 1, mu 1 plus n 2, mu 2 plus dot dot dot n l mu l divided by n 1 plus n 2 plus
n l. If all means are equal then what will happen we can write all mu 1 equal to mu 2.
So, it will n 1 plus n 2 dot dot dot plus n l by n 1 plus n 2 plus n l into mu because
this will be cancelled out. So, what we mean then we want to say that
every mu will be equal to the grand mu getting me. So, then what we can say that mu l equal
to mu if H 0 is true, which indicates tau l equal to mu l minus mu equal to 0. That
means we can create null hypothesis like this tau l equal to zero l equal one, two capital
L and your alternate hypothesis will be tau l not equal to zero for at least one l. So,
if you test one hypothesis in terms of mu and other hypothesis in terms of tau l you
are actually doing the same thing. So, in MANOVA we will do this in the same
manner like ANOVA partitioning. Now, we partitioned the observation, now we will see that how
to partition the this one your variability part, but what are the parameters you are
estimating in MANOVA your parameters will be this tau l as well mu l. Also, you have
to estimate mu you have to estimate and you have to estimate also the error terms and
another issue here is if you go for unequal sample size, then the weighted effect of the
population that sum will become zero. If you go for equal size that is again, the sum of
the effects of the populations will become zero. I think you can prove the second one
also first one, why it is zero. What I mean to say, we are saying that we
are saying that some total of n l tau l l equal 1 to capital L. This is zero you see,
what is tau l? tau l is mu l minus mu. So, if you write down here sum total of l equal
to 1 to capital L n l into mu l minus mu the l h s that is the left hand side, what you
will get here? This one l equal to 1 to capital L n l and mu l minus l equal to 1 to capital
L into mu. So, you have already seen that mu equal to n 1 mu 1 plus n 2 mu 2 plus n
l mu l by this. So, that means the sum of this n l mu l can be written like this can
be written as this one this into this. Then that will be cancelled out and you will become it will be here. It is n l is there you have not written this, so it is Okay.
So, what we mean to say that this quantity is n 1 plus n 2 plus n l into mu, yes or no?
I have told you this, that n 1 plus n 2 plus n l mu equal to n 1 mu 1 plus n 2 mu 2 plus
n l mu l. So, again this none mu 1 plus n 2 mu 2 plus n l mu l is this one n 1 mu 1
plus n 2 mu 2. This is this part minus n 1 plus n 2 plus n l mu, this quantity equal
to this quantity problem. You have to understand that the grand mean is mean of the means weighted
case here, n 1 mu 1 plus n 2 mu 2 plus n l mu l by total frequency and this one is mu.
So, you can write from here n 1 mu 1 plus n 2 mu 2 plus n l mu l equal to n 1 plus n
2 plus n l into mu. Then if you make minus this will become zero, this is the case. Now, we will see the assumptions. What are
the assumptions? Population covariances are equal, errors are normally distributed, errors
are i i d. We will see fast population covariances are
equal, how to test it? We will be using box M test. Here we will create hypothesis H 0 that population covariance are equal and alternative hypothesis will be sigma l not equal to sigma m, for at least one pair
of l m. Now, we create one statistic, this is our hypothesis then creates the statistic.
Our statistic we are creating suppose D equal to 1 minus u into M. So, you require to know,
what is M? and what is your u? Now, let us see the slide where I have written
these things in slide you see that M is minus 2 log l equal to one to L capital L. That
is the multiplication into that determinant of S l by S pooled to the power n l minus
1 by 2. This is what is actually the this ratio S l by S pooled all of you know S pooled,
what is S pooled? how to come to S pooled? Your n 1 minus 1 S 1 plus n 2 minus one S
2 plus n l minus one S l divided by divided by what? n 1 plus n 2 plus n l minus. Correct?
So, that you have see earlier also in two variable, sorry two population univariate
case. You have seen that n 1 minus 1 into S 1 plus n to minus 1 into by n 1 plus n 2
minus 2 just check, this is the case. Now, again you see the formula, if your sigma
1 equal to sigma 2 equal to sigma l, then your S pooled will be equal to S 1 or S l
in general term. So, that means this determinant by this and determinant that will very that
will be ratio will be one and why have taken log? The log is taking to make it linear?
Because, it is a multiplicative one to make it a linear one the log is taken here like
this. So, if you have taken minus two log it is coming like this. This quantity is linearized
like this summation of l n l minus 1 log S pooled minus l n l minus 1 this our m value.
So, this m value and what is the u value? u is summation of l equal to 1 to capital
L 1 by n l minus 1 minus 1 by that. This sum into two p square plus three p minus
1 by 6 into p plus 1 into l minus 1 this the development by box. So, if you then you put
m and u in D, now D follows chi square distribution with nu degrees of freedom, where nu is 1
by 2 p into p plus 1 into l minus 1. So, you have to remember this, what is your nu value
nu equal to half p into p plus 1 into l minus 1, that is the degrees of freedom for D. So, your if your D greater than equal to chi square alpha and mu, then you reject H 0 population variances are not equal. We have calculated this for this data set. Are you not comfortable? Now, to compute the
covariance matrix for a given data set for process A the covariance matrix is S 1, for
process B it is S 2, for process C it is S 3 you will be using. Can you recall the covariance matrix formula,
what you have used if your X is n cross p? Your X bar is p cross 1 you have created X
minus X bar you also multiplied by 1. So, to make it n cross one, I think you have done
like this one, this transpose, then this one transpose X minus 1. X bar transpose this
will be n minus 1 into S same formula we have used here. We found out S 1 for this, this
row this column and this column this two column O D column for process A I D column. For process
A is S 1 you are getting using same formula you get S 2 you get S 3.
Then what you require to know, you require to know S pooled, S pooled will be S 1 plus
S 2 plus S 3 divided by because here n 1 equal to n 2 equal to n 3 equal to n equal to 10.
So, my S pooled will be n 1 minus 1, that means 10 minus 1 into S 1 10 minus 1 into
S 2 10 minus 1 into S 3 by n 1 is what, 10 plus 10 plus 10 minus 3. So, it is 9 by 27
into S 1 plus S 2 plus S 3, so 1 by 3 S 1 plus S 2 plus S 3.
Now, you see any one of the value, suppose I want to know 1.29, here how 1.29 is coming?
1.29 the corresponding values in S 1 is 1.51 in S 2 is 1.43 in S 3 is 0.93. So, you sum
1.51 plus 1.43 plus 0.93 divided by 3 will give you this value, that you have calculated
earlier also. Then using the formula for M that big formula you have calculated M value
that is 1.04 u equal to 0.11. Then D equal to 1 minus u into M, that is 0.93, what is
your degree of freedom in this case for D? I say that degrees of freedom are your Mu.
Mu is half p into p plus 1 into l minus 1. So, mu equal to half p into p plus 1 into
l minus 1, so half p equal to 2 into 3 into what 3 minus 1. So, how that mean 3 equal
to 2 equal to 6? So your degree of freedom for D is 6, now chi square 6 with alpha 0.05
that value is 12.59 you will be getting it from chi square table. Then you compare computed
D value versus chi square tabulated value. Now, D value is 0.93, which is much much less
than 5., 12.59. So, we can say we are fail to reject null hypothesis. We are accepting
null hypothesis that means the population covariances are equal. So, we have we have
seen the equality of population covariances are satisfied. If this is satisfied, then we will go for
MANOVA. So, this decomposition is simple, again it is not that tough, what the slide
looks like is very difficult one, but it is not like this what we have seen in ANOVA.
We say any observation when you collected X i l that one is partitioned into, now again
let me see from the population point of view I say X i l equal to mu plus mu l minus mu
plus X i l minus mu l. Correct? Now, what is the estimate of mu? That is X
bar what is the estimate of mu l that you have seen in MANOVA. That is X l bar mu l,
that is mean of the population estimate is sample mean. Now, we are partitioning the
sample observation X i l, which can be written like this X l X bar plus X l bar minus X bar
plus X i l minus X l bar. That we have seen earlier same thing possible here in MANOVA.
This is from ANOVA you have done. Now, from MANOVA you do MANOVA also we have
seen that this vector is mu vector plus mu l minus mu vector plus X i l minus mu l vector.
That is the formulation and the estimates also will be like this. So, we are writing
a vector X i l, which is X bar that is the sample mean vector plus X l bar minus X bar
plus X i l minus X l bar. So, you can write. So, you have seen this one earlier, but it
is what will happen is this one is p cross one equal to this will also be a p cross 1.
This difference p cross 1 plus this difference, this is general partitioning of the sample
observation you do little more manipulation. Here what you will do? Now, we will write
like this X i l minus X bar equal to X l bar minus X bar plus X i l minus X l bar. If you
take square, what will happen? Yes, transpose because, this is the vector form. So, you
require to make like this X i l minus X bar into X i l minus X bar transpose equal to
X l bar minus X bar plus X i l minus X l bar into its transpose. Correct? So, our X i l
minus X bar is a p cross 1 matrix transpose will be a 1 cross p matrix and the resultant
will be p cross p matrix, that is what we want also. Now, how many how many dimensions
you have consider? One is i another one is l and other one is j l equal to 1 2 capital
L i equal to 1 2 n or n l you write unequal sample size we will consider here and j equal
to 1 to p. So, we will make sum over this dimension first
is with i, so if I make summation i equal to 1 to n l, then this quantity will become
X i l minus X bar into X i l minus X bar transpose. This will be if you multiplied this into this,
this into this like this. So, I am multiplying that also, but I am writing first i equal
to 1 to n l then you multiply. So, X l bar minus X bar into X l bar minus X bar transpose.
So, this into this plus I can write i equal to 1 to n l X l bar minus X bar into this
one, X i l minus X l bar transpose. So, first one to second one here plus sum
total of i equal to 1 to n l going to the second one X i l minus X l bar into X l bar
minus X bar transpose plus sum total i equal to 1 to n l X i l minus X l bar into X i l
minus X l bar transpose. See this one, second one X l bar minus X bar into this X i l minus
X l bar. The third one X i l minus X l bar X l bar minus X bar, so this value X l bar
minus X bar is independent of i. Similarly, here X l bar minus X bar is independent
of i. So, that mean i summation 1 to l will be affected here X i l minus X l bar as well
as here. If anyone you take i equal to 1 to n l X l bar is nothing but n X l bar I am
repeating i equal to 1 to n l X l bar is nothing but n X l bar means. What I mean to say here
I am saying X i l i equal to 1 to n l equal to n X l bar. Now, again summation of i equal
to 1 to n l X l bar if this also n l X l bar. So, this will become because this is independent of i. So, this quantity becomes 0, similarly this
quantity with this becomes 0. So, the two middle terms will be deleted, because they
are 0. So, then resultant equation will be like this i equal to 1 to n l X i l minus
X bar into X i l minus X bar transpose equal to i equal to 1 to n l X l bar minus X bar
X l bar minus X bar transpose plus sum total i equal to 1 to n l X i l minus X i l bar
X l bar into X i l minus X l bar transpose. Correct? Now, this quantity can be further
written like this. See here there in no ith term, so you can straight away write n l into
X l bar minus X bar X l bar minus X bar transpose plus this ith term is available here, n l
X i l minus X l bar and X i l minus X l bar transpose.
So, we have taken sum over i, now we take sum over l. So, l equal to 1 to capital L
then here it will be here also it will l equal to 1 to capital L here it will be l equal
to 1 to capital L. So, l equal to 1 to capital L then here l equal to 1 to capital L. So,
do we require the summation over p again? We do not require, because we are doing everything
in the matrix domain and the vector quantity has taken care of the number of variables.
So, we do not require further sum, so what is this quantity. Now, left hand side quantity
this is if you consider X i l and X bar as a scalar quantity. Then this one is a square
quantity and this square quantity. From all the observations point of view what you have
seen in ANOVA. So, that you have seen in ANOVA, this one
is S S T sum square total, but here it is a vector quantity. When you are multiplying
this vector with its transpose in such a manner, it is creating a matrix not a scalar creating
a matrix of p cross p dimension. So, we will write this as this one will be something like
this p cross p here will be one p cross p plus this also will become another p cross
p. Correct? So, diagonal elements will be the variance part off diagonal will be the
covariance part variability and covariability. So, this one is S S C P total, this S S C
P what is this that between population mean vector to the grand mean vector. So, we will
write that is between then this one is error S S C P error. So, the total covariance matrix
it is not actually the covariance, covariance that will be divided by the degrees of freedom.
So, we can write that total sum square product matrix is divided into two sources of variability,
one is the population other one is the errors. So, total sum square cross product is equal
to that between sum square cross product that error sum square cross product. This is the
difference from ANOVA big difference from ANOVA. In ANOVA you will be getting scalar
quantity everywhere. Then what will be the degrees of freedom for
this one? It is N minus 1 equal to L minus 1 plus difference N minus L same thing what
you have done in ANOVA. So, when N equal to what sum of l equal to 1 to capital L n l
that is all the observations together. So, in ANOVA we partition S S T into S S B
and S S E, in MANOVA we partition the sum square cross product matrix of the total to
between population and error. Correct? So, when you require to calculate S S C P T, S
S C P B and S S C P E it is really difficult. Let us say that in terms of matrix transpose
then one sum by second sum like this. So, for computation point of view this one S S
C P B is little easier than the other two. So, first you compute S S C P B using this
formula absolutely no problem. S S C P B computation will be like this, l
equal to 1 to L n l X l bar minus X bar and X l bar minus X bar transpose. Correct? Then
for S S C P E there is a formula, which is n 1 minus one S 1 plus n 2 minus one S 2 like
this n l minus 1 S L, you have seen in pooled covariance case this was divided my degree
of freedom, but it is not a covariance on it is basically S S C P matrix. So, that degrees
of freedom is not divided, so it is S 1 is to S L all you can compute very easily. So,
S S C P E will be computed S S C P B will also be computed formula. Then you compute
S S C P T, that is S S C P B plus S S C P E this is the these are the steps, basically
first you compute this, compute this, then compute this.
So, this is what is our decomposition of covariance matrices decomposition of I can say instead
of covariance matrix. Although, it is basically the same way covariance matrix will come ultimately,
but it is sum square and cross product matrix. You write, I am writing S S C P matrix that
is better, so S S C P matrix total to this two quantity. So, I think today we will stop
here and next class I will show you the MANOVA table. Then all the tests, how to go for hypothesis
testing? Then comparison, pair-wise comparison and other things.
Thank you very much.