Good afternoon, today we will formally enter
into the multivariate domain. Now, topic is multivariate descriptive statistics. And the content of today’s presentation
includes multivariate observations, mean vector and covariance matrix, correlation matrix,
sum square and cross product matrices. So, it will be purely data, we will be dealing
with data with symbols like x. So, in univariate descriptive statistics you
have seen central tendency measures as well as dispersion measures, univariate case you
have seen earlier in this lecture classes. Now, central tendency we have measured in
univariate case using mean, then your mode and median, these are the measures we have
adopted. And under dispersion, we have used range then I think IQR Inter Quartel Range
as well as standard deviation. Now, we will see some of the counter parts
in the multivariate domain. For example, mean will be mean vector, when you go for this
multivariate statistics site, you will find out that mean will be mean vector. And this
standard deviation another component, which is the measure of dispersion that will be
not standard deviation it is square that is the variance, variance will be a covariance
matrix. In addition as there will be more than 1 variable so there will be correlation
between variables, so we will be knowing correlation matrix also. Today’s discussion we will
be concentrated on mean vector, covariance matrix and correlation matrix. Now, think a little bit on abstraction level
now that there is a multivariate population and that population is characterised by a
variable vector which is known as x. There is p number of variables, which is characterizing
the population of interest, so it is a p cross 1 vector. What does it mean? We are saying
we have created a vector X, which is p cross 1 vector where, p stands for number of variables
so there are variable 1, which is x 1, variable 2 is X 2 like variable p is X p. Now, you
think that you require to collect data on this p variables, so I am writing here my
first variable is X 1, second variable is X 2 like my last variable is X p. And you
are basically collecting data on this p variables, so you will be collecting observation 1, observation
2 like this n observations. So, essentially you are not in univariate
domain, you are in multivariate domain and you are not only in 1 x with several that
n values that means 1to n values. Here your i equal to 1 to n and j equal to 1 to p, So,
what does it mean, I equal to 1 to n, these are the number of observations and j equal
to 1 to p, these are the number of variables. Now, you think of a situation that you have
your population; you know that variables p variables are there, you have identified the
variables. Now, you are planning to collect data, you have not collected data, you are
you are planning to collect data then, our nomenclature the way we will be writing here,
we will be writing like this. The general observation will be x i j, what does it mean?
x i j means that is the i th observation on the j th variable that is why i stands for
i to n, which is number of observations, j is 1 to p number of variables x i j, which
is the i th observation on the j th variables to be collected.
If this is the case that means we are first writing the variable then, we are writing
what is the observation number, and then we are writing what is the variable number. So,
then this is observation 1 variable 1, so x 1 1, observation 1 variable 1 x 2 1 so like
this you are writing observation n variable 1. My observation on the second variables
will be x observation 1 variable 2, x observation 2 variable 2, so like this x observation n
variable 2. So, in this manner you will go then x observation
number 1 on variable p, observation number 2 on variable p like this you will be getting
observation number n on variable p, this is the data matrix. And we will be denoting this
as capital X bold X and the order will be n cross p. This one I am saying this is a data matrix, which you are planning to collect. If this is the case and you all know that X is a random vector
here X 1 to X p, x is random vector because all the variables are random variable here.
So, x i j is the i th observation on j th variable, this is also a random variable.
Please keep in mind; this is also a random variable we are writing it as random observation.
So, as you have not collected data, you have just planning to collect data any value of
x i j can be found depending on the spread of that variable. You will be getting any
value of this, but you do not know what that value is. So, that is why we are saying it
will be random observation and the x i j will be random variable.
So, you cannot expect value of x i j. You have some expectation so that is why you will
see in univariate case that if x is random variable then, expected value of x is mu.
This is the case then for every variable here, it has also some expected value, so that is
what we will basically talk about the mean vector. So, before coming to mean vector another
two important issues is there that to be discussed here. Suppose the observation is i th observation
and there is the variable is x j, I told you the general what I have given you here that
this is the general observation x i j. So, similarly, there will be a general observation,
multivariate observation, general observation similarly, there will be a general variable.
Now, if suppose this one is i, this is the i th multivariate observation, So, your value
will be x observation number 1 variable 1, observation 1 variable 2 like this if I go
observation i on variable j then, observation i on variable p. So, if I write that x i is
the i th multivariate observation can I not write like this that x i 1, x i 2, x i j, x i p,
this is a p cross 1 vector. Now, if I go by the variable wise so that
means when I talk about the i th multivariate observation, please keep in mind that in a particular observation on all the variables. Why it is not that what is the in the univariate
case, what will happen, you will get the one value only for that observation. As it is
multivariate, you are getting all values on p variables, but the other important thing
is that that means all the variables are occurring simultaneously, simultaneous occurrence that
is why multivariate in nature. Now, if I go by our general variable which
is x j then, this observation will be x observation 1 on variable j, observation 2 on variable
j and observation i on variable j similarly, observation n on variable j. So, you will
be getting a general variable that means observations on a particular variable that we will write. So, if we write like this x j that is the
n observation on j variable then this is x 1 j, x 2 j like this x i j then, x n j it
is a n cross 1 matrix, So, that means essentially what is happening? You have one hand this
side that the observations number of observations axis and the other hand you have number of
variable axis. So, when you go row wise, you are basically talking about different multivariate
observations and when you go column wise that means you are talking about n number of observations
on different variables, that n observation on variable 1 to variable p. So, what we have
assumed here? We have assumed that we have not collected the data, we are planning to collect data and as a result all the entries in this particular matrix are random in nature. What will happen if you collect the data,
when you will collect data on same number of variables x 1 x 1 like x p and your data
matrix also will be n cross p. Here also you will be getting I equal to 1
to n like x 1 1, x 2 1, x n 1, x 1 2,x 2 2, x n 2 then x 1 p, x 2 p, x n p, this you will
get. Here also you are x i 1 then x i 2 somewhere x i j then x i p that general observations
will be there, which we will be able to write like this that x i, which is x i 1, x i 2, x i j then x i p that is p cross 1. And also you will be getting one general variable that observation we write
that x j. If we write, you will be getting x 1 j, x 2 j like this x I j then, x n j all
remain same, it is after data collection. Then my question is, what is the difference
between the first matrix and second matrix? This is our second matrix, where we said that
after data collection and this is our first matrix we say this is before data collection,
what is the difference? In second matrix values are known so it is either fixed values now from the data collection
point of view, they are fixed values. First matrix you do not know what value you will
get, that is why in the first matrices when you do not know anything, you will expect
some value for each of the variable that mean, you also expect some deviation of different
values from the mean that will be your variance and you also expect the 2 variables will be
co vary, that will be your covariance, that is from the population point of view. Now, let us see this data set, now if I ask
you what the x matrix, data matrix is. So, what is the n value, n is 12. Now, months
we are excluding for the time being, although month can be in two data’s variable. For
the time being let the month is excluded then 1 2 3 4 5, so your data matrix is 12 cross
5, all the variables are measured for 12 observations. So, if you see this then see this is the data
matrix, this one 12 cross 5. So, the left hand side matrix, the data matrix is x 12
cross 5, 12 stands for n, this stands for p. Now, first we will see the expectation of
the variable values, for each variable what is the expected value, then we will see that
when we collect data, what that value is. Now, you see this slide that as there are
p variables, so there are p means. These are population parameters this mu is the vectors,
which represent p means for the p variables, and it is mean vector and which is a parameter
vector for the population. And I have written here that it is expected value of x 1, expected
value of x 2, expected value of x j, expected value of x p. You have already seen that what is the expected
value of x, what we have written earlier if your variable is discrete, you give a summation
and then you say all x, x f x. Here what happened we have basically so many variables at n?
We are writing like this expected value of x j stand for the variable for a particular
variable j th variable, then we can write all x j, x j f x j, when it is a discrete
variable. When it is continuous what will happen, what you write in continuous case here. You have seen continuous case, you write integration
minus 2 plus infinite x f x d x. So, I am writing here for continuous case integration
minus infinite to plus infinite x j f x j d x j. So, for both the cases j equal to 1
to p, this is your expected value. So, when you write down for mu what is mu? mu is a
vector p cross 1 vector, which is mu 1, mu 2 like mu j then your mu p, which is p cross
1 vector, this is the this is nothing but as we discussed now expected value of x 1,
expected value of x 2 then expected value of x j, then expected value of x p and expectation
we will calculate like this. So, where you collect data we have sets of
matrices, 1 matrix we say that we have not collected data that is before data collection
with respect to this. We are developing this one, this we are saying that our topic now
is mean vector. When you collect data like the 1 I have already given you that 12 into
5, then our data matrix is this after data collection. And we want to compute the average
value only because we have seen in univariate case, univariate case what you have seen? We have also seen that x bar is the estimate
of mu, So, in our case we are writing x bar, which is nothing but x 1 bar, x 2 bar, x j
bar, x p bar that is p variable average vector. Now, what is your x bar in case of mean variate?
You have written 1 by n, sum total i equal to 1 to n x i, So, that mean I can write this
1 now in the same manner that this one is i equal to 1 to n 1 by n i sum total i equal
to 1 to n x i 1 I can write, one stand for the variable I stands for the observation.
Similarly, second one you can write sum total of I equal to 1 n x i 2.
So, in that same manner for the j th case, you can write i equal to 1 n x i j and the
last one you can write i equal to 1 to n x i p, this is my average vector for the sample
collected on p variables. So, we will not go for individual mean calculation, the average
calculation? Instead of doing this, we want to do matrix calculation, vector matrix that
matrix calculation. What we will do here? See I have in order
to calculate this x bar, this x bar we will take this data matrix past what is our earlier
data matrix was there this data matrix was x n cross p, which we say suppose x 1 1, x
2 1 like this x n 1 x 1 2 x 2 2 then your x n 2, So, like this x 1 p, x 2 p your x n
p that was my sample data and it is n cross p. You want to compute x bar from this sample
data and x bar is a p cross 1 vector, that is x 1 bar, x 2 bar, x j bar, x p bar and
you know the general formula. Suppose, x j bar is 1 by n sum total i equal to 1 to n
x i j that also you know. Now, in matrix in 1 go you want to calculate all this things,
what you require to do that means when you calculate in matrix domain, please remember
I have n cross p matrix I want a p cross 1 matrix. I am going from suppose this is n
cross p, So, you have n cross p you are going to p cross n that means what if I create one
matrix, which is let it is 1 big symbolic one, which is all ones 1 1 like this, there
will be n ones n cross 1. I am creating 1 unit vector where all there are n elements
in the vector and each element is one only. Now, this I want to use this one, this unit
vector with this data matrix in such a manner that I will be able to apply this computational
formula then; I will be getting the p cross 1 vector. So, that means if I create like
this suppose, p cross n into n cross 1 it is p cross 1 from matrix multiplication point
of view row column. So, here number of columns is number of row equality is there.
If I want to do this that means I have to transpose this matrix. I have a matrix called
x with n cross p, if I do transpose x t, this will be p cross n row and column will be interchanged.
So, I am doing this x transpose this is p cross n, I will take a dot product that is
n cross 1. So, your resultant matrix will be definitely this will be cancelled out and
it will be p cross 1. So, what will happen with this, you see now. So, your data is like this, so I am taking
now let p equal to 2, n equal to 3 just to reduce the complexity repeating the same calculation.
Then my x matrix will be 3 cross 2, which will be like this x 1 1, x 2 1, x 3 1 then
x 1 2, x 2 2, x 3 2 as there are 3 observations. So, let you create 1 unit vector, which is
1 1 and 1 this 3 is required because my n is 3. So, I want to multiple x transpose 1,
So, what I will do then x 1 1, x 2 1, x 3 1 that is the first row, second row x 1 2,
x 2 2 and x 3 2. This is my second row because I have made
the x matrix transpose, so then you multiply this by 1 1 and 1. And we all know this one
is a x transpose is a 2 cross 3 and 1 is 3 cross 1, So, we will be getting a resultant
matrix, which is 2 cross 1. So, then matrix multiplication point of view x 1 1 into 1
plus x 2 1 into 1, so that is what I equal to 1 2 3 x I 1, second one will be I equal
to 1 to 3 x I 2. Now, we have seen earlier that what is x j bar that is 1 by n, sum total
I equal to 1 to n x I j. So, if I divide this resultant thing by n, I will get the mean
vector. So, that means I can write x bar is 1 by n
x transpose this one or you will be able to do it very easily because this formulation
is much better because you can multivariate domain hands on that calculation type you
forget. You have to use Mat lab for understanding the computational part. You can now a day excel is very powerful also, you can use excel also. Using excel you can use this formulation
very easily I think I will be giving you tutorial and you will have to do this and then if you
find problem talk to me again in my chamber also that is no problem. So, will we be able
to compute the mean vector log given data. Now, see this slide suppose you take these
first 3 values for the first variable and also first 3 values for second variable and
use this that matrix multiplication can we not do. What I said that we are taking only 2 variable
case with 3 observations 10 12 11 then, 100 110 and 105. So, I want to get the x bar what
we are saying x 1 bar and x 2 bar. You can very easily you can go like this 10 that is
33 that total is 33, here the total will be 5 that will be 1 0 1 1 1 and 3 3 1 5. If you
divide it by 3 it will be 11 that first one variable 11 and the second one will be 105.
So, that means your x bar is 11 and 105 that you can very easily do here also, but I am
saying that you use this formulation that x bar equal to 1 by n x transpose 1.
If you do like this 1 by 3 x transpose will be 10 12 11 100 110 105 and then 1 1 1 this
is nothing but 1 by 3 into 33 315, which is your same thing 11 105. It seems or that means
here and here there is not much of difference in calculation. The reason is because of number
of observation is also less number of variable is also less, but you have to have p means
large value of number of variables with large number of observations. So, that individual
calculation is not required straight away go for matrix multiplication.
We will be using simple matrix multiplication, matrix inverse and other things in throughout
I can say the lectures. So, this is what is mean vector and from the population point
of view and from sample point of view. From sample point of view sample average is the
estimate of population mean vector. Now, come to covariance matrix see you have
to understand it now. Although things are very simple, but physical meaning of each
of the items must be understood then only later on we will be talking about covariance
matrix. We will not come back to the physical meaning of covariance matrix further, we will
simply say covariance matrix that means, you will be able to catch what is covariance matrix
immediately then only you will be able to relate the discussion that time. So we are
interested to know covariance matrix. So, let us discuss from population point of
view first, that is population covariance matrix. If I say my x is a random variable
univariate case then, if I ask you what is the variance of x? Then you will say that
it is expected value of x minus mu square. And you have also seen for your discrete case
that all x minus mu or x i minus mu this square then, f x i we are not using now let it be
general case like this that is why I have written all x. And when you go for integration
you write like this plus this x minus mu square f x d x, this is what this is the variance
component, which is sigma square. Now, I will do simple alteration instead of
x i will write x j that means I want to know the variance of x j i. Then you will write
it is nothing but you will write x j minus mu j, only j will be the added there everywhere,
then what you will write? You will write sum total of all x j, x j minus mu j square f
x j, this is for discrete case. And you write like this integration minus infinite to plus
infinite x j minus mu j square f x j d x j, this is your continuous case.
Now, if this is the case and if I say what that means for j equal to if you put j equal
to 1 2 like p in this formulation whether it is discrete and continuous, when you put j equal to 1 what you get, you will get sigma 1 square. When you put j equal to 2 you get
sigma 2 square. So, like this you will get sigma p square that means the variance component
of all the variables coming from this equation, but we have seen that we have p number of
variables. And what we are also assuming that these p numbers of variables are not independent
to each other. If x 1 is dependent on x 2 or x 1 and x 2
are not independent, what will happen? For example, height versus weight of people. Nonzero, so that means they have correlation
or otherwise I can say there is very much if someone height is more than other one,
it is naturally that weight also more naturally. But there are other parameters also, which
govern that weight, but naturally this is the case. So, when they are not independent,
they are dependent then what will happen, that means when x 1 vary, x 2 also vary. So,
their simultaneous variability is known as what we want to say x 1 and x 2 co vary, they
simultaneously vary. If there is covariance means association between
their realisation of values of x 1 and x 2. So, now I will write again variance of x j
we have written expected value of x j minus mu j that square. It is basically what this
one if I just do some manipulation and if I write like this that there is a formulation
called covariance between x j and x k, if I write like this x j minus mu j into x k
minus mu k. You see the similarity between this 2, when I am talking about variance of
j, I am saying x j minus mu j that this one I am further writing x j minus mu j and x
j minus mu j, that is why this square is coming. That means same variable I am saying that
suppose it is repeating to creating two variables same on that it x 1 on x 1 that sense if you
do like this. So, the covariance is this, as I am saying if x j vary x k also vary,
there is a chance that is why I am expecting that what is the association between the two.
So, in that case we can again write down suppose the same formula that all x j, x k, x j minus
mu j x k minus mu k. What we will write here for probability density function? Can we write
that if x k and x j separately or we will write x j joint probability.
You have to write the joint probability here and continuous case you have to write like
this. It is double integration or here I have written all x j, x k simultaneously one symbol
I am giving or otherwise you have to write all x j. all x k. What is the notation for
this, we will use the notation for this is sigma j k. We have used sigma j for standard
deviation, sigma j j for variance and sigma j k for covariance, so what I am writing here
that this is sigma j k. Now, be careful about the notation now that
sigma j square equal to sigma j j. Later on we will be using sigma j j, sigma 1 1 that
is the variance component, which is basically if I write sigma 1 1 it is sigma 1 square.
Then we will be using sigma j k, which is this is your variance of x j and this one
is covariance between x j and x k. So, you have sigma j square, you have sigma j k and
you have p number of variables, can we not find out the population covariance matrix
now, we are now in a position to write down the population covariance matrix. As there are p variables so covariance stands
for every two variables. So, how many elements will be there. So let us write down like this,
we will create a matrix p cross p, p stands for number of variables so that when this
is x 1, x 2 like x p again x 1 x 2 like x p. So, x 1 and x 1 the variability then when
x 1 is varying with x 1, the same variable variability is variance. So, this 1 will be
sigma 1 1, for the second case x 2 x 2 this will be sigma 2 2.
So, like this for p variable case x p x p sigma p p, this diagonal lines all the elements
in the diagonal lines are variance component that I am saying this is basically variance
component. Variance part of the variable as I told you sigma 1 1 is equal to sigma square
sigma 2 2 is equal to sigma 2 square like this variance. Then the off diagonal elements
will be covariance, so what will be this? Sigma 1 2, sigma 1 p again I am writing 1
2 instead of 2 1 that what is the assumption we are doing sigma j k equal to sigma k j
because j th variable k th variable two variables only, but in order we are changing.
Then sigma 2 p like this you will get sigma 1 p sigma 2 p this. So, off diagonal elements
are covariance part and diagonal elements are variance part, these resultant matrix
in our class we will be denoting it like capital sigma. So, keep in mind capital sigma whenever
we will be using, this is your population covariance matrix. So, population covariance
matrix looks like this the way same thing as see sigma 1 1, sigma 1 2, sigma 1 j, sigma
1 p, sigma 1 2, sigma 2 2, sigma 2 j, sigma 2 p like this. If there are p variables, there
will be p cross p elements that this side that will be p cross p, the size of the matrix. Now, we require to know sample covariance
matrix, so this population covariance matrix, sample covariance matrix very-very vital component
of multivariate statistics, very-very vital covariance matrix for the population for the sample. Now come to the sample part, so sample case
we will say sample covariance matrix is S. We will be denoting sample covariance matrix
as S, this will also be p cross p matrix. So, my matrix elements I can write like this
s 1 1, s 1 2 like s 1 p again s 1 2, s 2 2, s 2 p like this s 1 p, s 2 p, dot dot dot
s p p. So, these diagonal elements these are the variance part and off diagonal element will be the covariance part variance and covariance part.How do calculate s 1 1, s 1 2 like this all the elements of this matrix? So, the general one is here, it will be s j j and somewhere
here maybe your s k j will be there or s j k. So, you can go by the same manner the way
you developed in the univariate case, the variance computation. So, s j j what you will
do? We have seen that 1 by n minus 1 sum total of i equal to 1 to n, you have written that
x, you have written i then, you have written minus mu that sense, but we will use here.
It is basically x bar we have use now j is coming into consideration we will write like
this. We can use mu, but here it is mu is not available and we will not when you go
for in the sample case, we will always write subtract by the sample average that is why
n minus 1 is subtracted. If I use mu here it will be 1 by n so this square then, if
I write I can write this one like this 1 by n minus 1 sum total of i equal to 1 to n x
i j minus x j bar again I can write like this x i j minus x j bar same thing.
So, using this I want to write s j k, s j k is 1 by n minus 1 sum total of i equal to
1 to n, first I will keep the j variable as it is then second case you introduce k. What is happening here? Actually if you see
in the covariance case or the variance case, the original data matrix is transformed that
will capture, that concept we will take here. You see x i j minus x j bar that means for
the j th variable every element is subtracted by its average, for the k th variable also
every element is subtracted by its average. When if this is the case, can I not write
down the data matrix in this format like this? That my original data matrix is X which is
x 1 1, x 2 1, x n 1, x 1 2, x 2 2, x n 2 then x 1 j, x 2 j, x i j then x n j, x 1 p, x 2
p then x n p. So, you have computed here this is x 1 bar, x 2 bar, x j bar, x p bar then,
you are writing something you are converting this that some conversion is taking place
here, that is subtraction of mean then what are you getting here? If I subtract by mean,
I will be getting every observation is subtracted by its corresponding mean value.
So, if I just after this basically it is a subtraction by corresponding mean, this is
this. So, instead of writing this minus this, if I write like this suppose I will write
x star is like this x star 1 1, x star 2 1 so like this x star n 1. Same manner I am writing x star 1 p,
x star 2 p like this x star n p, somewhere there will be x star i j. In general one where
x star i j is x i j minus x j bar that means this matrix, this matrix same. Now, if I use
this formula what will happen then in this case that means x i j minus x j bar x i j
star and this one will be x i k star. So, the resultant matrix will be then 1 by
n minus 1 sum total i equal to 1 to n x star i j x star i k. So, this type of conversion
will take place and ultimately little more mathematics that we will see that. I think
up to that to you calculate this using this formulation can you calculate. You take the
first data point, you take same data point that first 3 variable values, I think I have
given you. Suppose, this is my data points you already calculated mean value. You have
to now calculate the variance and covariance part because there are 2 variable only 1 covariance
will be there. Then next class we will go for the matrix, how using matrix multiplication
formula we will be able to calculate the covariance matrix totally then, the correlation matrix
all those things. Thank you.