Good morning. Today we will discuss multiple
regression, the contents of multiple regression are first we will start with a conceptual
model, then we will describe the estimation of parameters. Then we will go for sampling
estimation of parameters in terms of beta vector then sampling distribution of sampling
distribution of beta estimated. Then we will go for sampling distribution
of error sampling distribution of error estimated, and then we will go for that model fitting
lesson that is adequacy of a model regression model. Then we go for test of individual regression
parameters, which is all betas. After that, we will test the assumptions
related to regression, then there are certain diagnostic issues, then prediction using multiple linear regression. Finally, a case study will be showed, so this is in totality will be covered under multiple
regression. So, we will go by the first the model, then estimation of parameters sampling
distribution. I think, today we will be able to cover this
one, conceptual model and estimation of parameters, I think this is for one hour then sampling
distribution of that beta estimates and sampling distribution of error estimate, I think one
more hour is required there. Adequacy of regression and this test of individual parameter may
be one more hour, test of assumption one hour, then diagnostic issues one hour and prediction
one hour, and case study one hour. So, 1, 2, 3, 4, 5, 6 hours, 6 hours we will be discussing
on multiple regression. Now, let us start with an example, I think in first class and
also in subsequent lectures. I have given one example for the Citycan data,
Citycan is a small company working in the local market and we have seen that the data
structure was like this. Other way, we can start with the variables the important variables
for this particular company profit, then sales volume, and then they have found out the absenteeism,
percentage of absenteeism of workers or employees. Then, related to machine, that machine breakdown
hours monthly, then they also have a marketing department whose performance is measured through
M ratio. The company’s primary interest is how to improve the profit that definitely
through maximizing the sales and apart from many other things.
The company is interested to know how these three variables that absenteeism, breakdown
hours; M ratio is affecting sales volume as well as affecting profit. Now this is the
case then they under this situation, there are two types of variables one called dependent
variables or dependent variable DV, another set is known as independent variable which is IV. So, in this case profit is one dependent variable,
sales volume is another dependent variable and there are three independent variables
like absenteeism, breakdown hours and M ratio. Now, what is of interest in this particular
case here, we want to test that whether the independent variables are contributing in
explaining the dependent variable one at a time. It is happening or not, if we can find
out that yes there is influence of the independent variables on the dependent variable. Then,
depending on the influence measure some actions can be taken to control the dependent variable
with the help of controlling the independent variables.
So, pictorially if we now consider that one dependent variable, let it be sales volume
which is y, we are denoting like this. Other variables we are denoting like this X 1 equal
to percentage absenteeism, X 2 equal to breakdown hours and X 3 is M ratio. So, we are not considering
profit at present, later on we will see that how profit also can be included. So, under
this situation, if I define like this that y within a circle or ellipse and X 1 within
a rectangle, X 2 with another rectangle X 3 with another rectangle. I can create some
straight line curved with the arrow head to y, and then this will determine that the respected
IVS are influencing the dependent variable y that is the measure.
So, that means X 1 influences y that is why the arrow head even it is from X 1 and terminates
at y that is the case. So, I can say that if X 1 has influence of beta 1, X 2 has influence
of beta 2 and X 3 has influence of beta 3, what does it mean? It means that if I change
one unit of X 1, there will be beta 1 unit of change in X 2 if you change one unit of
X 2 there will be beta 2 unit changes in y. If you change one unit of X 3 there will be
beta 3 change in y. If you change one unit of X 1, there will be beta 1 unit change in
y, similarly if you change one unit in X 2 there will be beta 2 units change in y, one
unit change in X 3 causes beta 3 units changes in y this is the meaning of this influencing parameters. Now, what will happen, suppose none of them for example, absenteeism, break down and M ratio, none of them contributes to y will
not be your sales. There will be sales also, it is not necessarily that they have to contribute
always, so in that case we require some other parameter which is known as beta 0 that is
coming through X 0 where X 0 takes value of 1 that is always X 0 equal to 1. Always by
saying this, what we are trying to say that even if there is no influence of X 1, X 2,
X 3 still there will be some amount of y values, this is the constant term beta 0 is added
there. Now, if you take one observation of y followed
by second, third like this what will happen? You will find out that you are not able to
explain using beta 1, beta 2, beta 3 and beta 0, and all these values not able to explain
the total variability in y. This means there are some other variables or some other, either
they are controllable or uncontrollable variables are there which are also contributing towards
the variability of y. So, as a result we require another measure
which is known as error that mean, in multiple regression case you will get three types of
I can say parameters one is beta 0 constant parameters which is known as intercept. Then,
the influence of the independent variables, these are known as that influencing parameters
or regression coefficients, also beta 0 is also regression coefficient, intercept coefficient.
Then, another one is a random one which is error which also contributes, so essentially
then what will happen? If I want to put the entirety in terms of
an equation, I can write like this y equal to beta 0 X 0 plus beta 1 X 1 plus beta 2
X 2 plus beta 3 X 3 plus epsilon. You can very well write this one, so that means this
figure is the pictorial representation of multiple regression model which in mathematical
term you can write using an equation. Now, as we have said that that X 0 will take value
of 1 always, so this can be written like this beta 0 beta 1 X 1 beta 2 X 2 beta 3 X 3 plus epsilon. Now, if we generalize this one that we are
not interested only in three variables, we will go for a variable vector which is having
what I say three variables let it be. So, I am writing like this X 1 X 2 dot, dot, dot
X p, so there are p variables which are contributing towards y and as we have already seen, there
is one intercept parameter and with related to one constant X 0 that one we are making
it 1. So, we can write if we want to create this X in terms of also you want to add X
bar, X 0, then you can write this one p plus 1 into 1. See this X 0 is need not required to be written here, later on it will be taken care of essentially
this X 1 to X p is the variables IVS X 0 is a constant that intercept term will come for
that. So, then if we go by X 1 to X p, this is p cross 1 variable vector under this situation
what will be the beta because there will be influence for each of the variables. So, I
can write beta 1 beta 2 dot, dot, dot beta p which is p cross one but see there is one
intercept parameter intercept parameter related to X 0 so that is beta 0, so you require to
estimate p plus 1 parameters regression coefficients here.
Now, I can write like this y equal to beta 0, beta 1, X 1, beta 2, X 2 dot, dot, dot,
dot beta p X p plus epsilon. So, this is in general the general regression equation this
is the general regression equation you can write later on we will write it in terms of
matrix equation. Now, let us assume that you are going to you
are collecting data, so let n data points are collected or you can collected if you say to be collected this will be the all observations will be random. So, in this case our data
matrix will be first one is the dependent case, so y it will be y 1 y 2 y 3 like your
y n data points. Then, you are collecting X that will be n cross p n is the number of
observation, p is the number of variables, so we will use here X 1 1, X 1 2, X 1 p, X
2 1, X 2 2, X 2 p. So, like this X n 1, X n 2 dot, dot, dot X n p, so this is related
to IVS, these two sets of data you will collect. Now, if you use this data set into the equation
given here, so for every data set or every data points this equation will become true.
So, in order to incorporate that every data points into consideration if I write one y
i here, and then here it will be X i 1, X i 2 like X I p, this is what you are saying
the general observation related to this. Now, if you want to write the equation here,
the equation then what you will write, we will write y i equal to beta 0 beta 1 X i
one plus beta X i 2 plus beta p X i p plus epsilon i. Then, what is this quantity, this
quantity you see it is a weighted linear combination of several variables here basically multiple
random observations. So, can you not write down, this is variate, this is weighted linear
combination, and this is variate. Then, what does this variate represent? In this case,
this one represent the expected value of y i given X i what is this X i equal to? Basically,
X i 1, X i 2 like this X i p getting me without error term, this one is the expected value
of y i. If I know the ith observation for the individual
variable like this, then what will be the expected, what will be the value of y i that
is the predicted value. As we do not know that what are this IVS for the i th observations,
so accordingly we also do not know what will be the y i value, but by this variate we are
saying that what will be the expected value of y i. If you collect i th observation X
i 1 to X I p and then that mean what is happening then y i equal to this plus epsilon i.
This expected value is nothing but the predicted value; suppose you want to use this later
on this will be the better. So, I can write from here that epsilon I equal to y i minus
y i cap. So, essentially what are the things we have found out here, we have found out
all the coefficients regression coefficients also the error term? We also described that
that this linear combination of the variate and this variate will basically talk about
what will be the value of expected value of y given these conditions. So, these are conditional y we say we also
write this one as mu of y given X in general term, we will write mu of y given X. If I
say y I, then given X i where X i is this is the observation. So, if we further assume
little bit this is because you just assume that this is one X and one y and you are fitting
a linear line, suppose this is your Xi. So, what will be the regression line, the linear
line then you go like this you will be meeting you are getting this point here. What is this
value this value here? This is nothing but this value mu mean value of y for the i th
observation given i th observation. Now, why we are talking about mean value,
the reason is suppose you collect for the i th observation, if you collect one, now
you will get some value here. If you collect two, the value may change, so ultimately there
will be one error part. So, this one is taken care of by the concept error, so any point
on this regression line which will be fitting later on. These points are the mean value
of y given x, so we will see later on when we fit all those things I will tell you that
how to make all those things. Now, we have collected n data points and we
have also a regression equation that y i equal to beta 0 plus beta 1 X i 1 beta 2 X i 2,
like this beta p X i p plus epsilon i. If I write it for all the observation, then y
1 will be beta 0 plus beta 1 X 1 1 beta 2 X 1 2 like this beta p X 1 p plus epsilon
1. For y 2, you will be writing like this beta 0 plus beta 1 X 2 1 beta 2 X 2 2 plus
beta p X 2 p plus epsilon 2. If you go on writing, ultimately up to the n th term, you
will be writing like this beta 0 beta 1 X n 1 beta 2 X n 2 beta p X n p plus epsilon n. So, if i write in matrix form, now can you
write y 1 y 2 like y i then y n, for the left hand side you can write thi,s so this n cross
1 vector I can write this equal to. If here for beta 0 there is only one that is 0, so
1, 1, 1 for beta 1 x, what is this 1, 1, X 2 1 X i 1 X n 1 for beta 2 X 1, 2 X 2, 2 X
I, 2 X n 2. Similarly, for beta p X 1 p X 2 p then X i p then X n p, this one is now
what will happen, here there are n into p plus 1 p variables, one for the intercept. So, if I multiply it by beta 0, beta 1 to
beta p this will be p plus 1 into 1, then the multiplication of this will give you n
cross 1. It is giving you n cross 1, so n cross 1 equal to n cross 1 which is basically
this portion is taken care of by this plus there is error terms epsilon 1, epsilon 2,
epsilon I, epsilon n this is n cross 1. So, the resultant can be written like this
y equal to X beta plus epsilon, so where y is n cross 1 vector of observations of the
dependant variables, X this is n into p plus 1. The data matrix of IVS including intercepts,
this is also known as data matrix. Actually, X is this one including intercepts, so we
say this one as design matrix, beta is the p plus 1 cross 1 regression coefficients and epsilon this is n cross 1 error terms n cross 1 error terms. So, this is a nutshell equation for multiple linear regressions in matrix form. Now, we will see some of the assumptions of
multiple regressions, first assumption is linearity, so that is linear relationships,
second assumption is homoscedasticity or we say equal y variance across the values of
x. Other way, you can say IVS given observations of IVS that homoscedasticity I will show you
how it will be done. Variance across y across X y variability remain constant, then third
one is we have seen that the error term is there.
So, the error term will be uncorrelated error terms, which mean there are n cross 1 errors.
These vectors this will not be correlated, uncorrelated error terms and definitely linearity,
sorry normality of the error terms normality of the error. So, these four assumptions needs
to be tested later on after fitting the model we will test this, but at present at this
moment is should say something about what is this. So, first understand what is linearity, you
consider your relationships y versus x, for example it is X 1, and then when you plot
the data if you plot the data. So, suppose you are getting this type of relationship
when you are plotting with X 2 you may be getting
this type of relationship, so in the first
case there is linearity in the second case it is not a linear one.
So, if your data that y versus X that relationship is non-linear then this model is not applicable,
what you require to do? Then, you require converting this to linearity; you have to
transform the data. Primarily, we want the transform IV first case from linearity point
of view transform it. So, the data becomes linear the relationships become linear, now
your relationships can be negative, positive no problem, but linearity is an issue.
Second one is your homoscedasticity, I told you just few minutes back that suppose this
is my X this is y, let it be because as we two dimensional 1, 1, 1, X your regression
equation let it be like this. We are assuming that one fitting is possible linear fitting
is possible, so you have collected data point’s n data points. Suppose, this is X 1, this
one is X 2 and then you will go for this one is X i. So, similarly last one let it be X
n, so what I said that this observation, this is the first observation i equal to 1 this
is i equal to 2, so like this i equal to n. We are saying that you will collect one sample,
but you may collect several samples also. Under such condition what we will have done
for a particular value of x, you may get several values of y, so if you plot this several y
values for a particular value of x, then whatever variability of y you observe that must be
equal. So, that mean we are saying this one if it is sigma square, suppose y equal to
1, so y 1 sigma, y 2 square sigma, y 3 square sigma, y n square. I think this is i square,
so what is the condition sigma y 1 means at X equal to 1, this y 2 y i y n square, this
should be sigma square. This is what is homogeneity from variance
variability of y point of view, if there is violation of this, for example your plot may
not be like this your plot something this here it is, but here may be it is like this
here may be again small here may be big errors. So, across X values y is not homogeneous from
variability point of view that assumption if violated you require to transform variables,
but in that case you have to transform the y variable. If there is linearity problem,
you transform the variables, you can go for both, but it is preferable that you transform
the X variable. If heterogeneous nature, that heteroscedasticity is there, that mean not
equal variance is satisfied, so in that case you have to transform the y variable.
Then, third is uncorrelated error terms, actually you see this equation this figure, so when
you use regression equation and you predict you will basically predict this points for
X equal to 1, this value X equal to 2 this value 3 i this value. So, this value is the
remaining portion is the error. Suppose, your case is like this your original
value is here y i, then based on this equation you are saying this is y i cap, so this portion
is my error portion, this is the error. Similarly, everywhere you will be getting error, what
will happen here it is sigma epsilon 1, epsilon 2, epsilon i, epsilon n error uncorrelated
error terms because each error is random, now so the error will be will follow this
type of distribution epsilon. I will follow normal distribution with mean
0 and what will be the variance, variance will be sigma square sigma y square, that
means what is this one sigma y square we are writing it here sigma square equal to sigma
y 1 square equal to sigma y 2 square. For all observation for equal variance, this condition
is very important because we are considering these variability part will be taken care
by the error. So, error will follow normality with mean 0 and variance sigma square where
this sigma square is the basically the y variability for each of the observation of x.
Now, what we are trying to say further that suppose that there are two error terms y i and y k, then the covariance between these two will be 0, which is uncorrelated error
terms. So, as there are n errors, so if I see the covariance between the errors terms
what will be the n cross n matrix we will be getting diagonal elements will be the variance
off diagonal elements should be 0. That is what uncorrelated error terms and normality
of error is this error will follow normal distribution with sigma square variability
mean 0. So, these are the assumptions we will test all those assumptions later on when we
fit regression equation and it is required to be tested, so once you are satisfied with
the data that examining the data you find out that that assumptions are reasonably valid.
Then, you will go for fitting that is estimation modeling or estimation of parameters. Estimation of parameters model parameters,
that is beta we are talking about how to estimate the beta where y is X beta plus epsilon, this
is my MLR, this is my MLR, so if you recall that we say y i is beta 0 plus beta 1 X i
1 beta 2 X i 2. So, like this beta p X i p plus epsilon i and if you do little modification,
here then this will be y i minus sum total of j equal to 0 to p beta j X i j when j equal
to 0, this X i j will takes the value of 1. So, if I make a square of it, then you will
be getting y i minus j equal to 0 to p beta j X i j square. So, this is for a particular
observation, but we have n observation, so if I take summation over n
what is this quantity, total error sum square
error sum of square errors. So, this one is SSE, now we will choose beta j values in such
a manner that SSE will be minimum, so your ultimate optimization is used here choose
beta in such a manner that SSE is the minimum. So, you can do very easily this one so we
can write SSE also in some other way also. What will happen you have to go for derivation
SSE by beta j? This you put equal to 0 subject to what subject to that your del square SSE
by del beta j del beta k, this is greater than 0. This is greater because it is a minimum
case, so we basically know what happen, we are here I am saying that only for one variable
case, beta j, here it is 2 beta k. Also, we have taken that means the covariance part
is two variables, this is coming into actually what is happening here there are so many variables
p variables. Sir here should we means del square
SSE divided by del beta j beta k or only j b two square, you are talking about when there
is only one variable we say this by now if k equal to j it will take care of that issue.
It is general form we have written here basically. It is basically hessian matrix. So, actually
that I am trying to say as there is p plus 1 estimates, you are making, so you will be
getting a matrix, so hessian matrix. That matrix will positive definite will be positive
definite, that is what I mean to say, that is why I have written like this. So, every
component you will be calculated and a matrix will be found and that matrix will be positive
definite by positive definite we mean that suppose A is a square matrix. Now, you get
any vector X, if you find out that X transpose A X greater than 0, that is that means a is
positive definite, so what the hessian matrix will come that must be positive definite then
it is the minimum condition. Now, let us write in terms of matrix then
we found that epsilon is epsilon 1 epsilon 2 epsilon n that is n cross 1. We are making
SSE mean square sum square error can you not write like this, this is a vector, so 1 cross
n, n cross 1 this will give you 1 cross 1 a scalar quantity. If this is true, you can
write like this y minus X beta transpose y minus X beta, you can write because our regression
equation is this. So, epsilon is y minus X beta, you can write like this, now if I take
derivatives against beta, then you will derive this one. It is nothing but the square term, so ultimately what will happen, it will come like this 2
X transpose y minus X beta 1, X transpose will come from here then this will remain
this because we are deriving the derivatives for beta any one will come from here. This
is with matrix compatibility; we will get this equation, so this equation we will put
to 0. I can write now that minus X transpose y plus X transpose X beta equal to 0, so I
can write X transpose X beta equal to X transpose y. If I multiplied both side by inverse of
X transpose X, then what will happen, what I have done, I X transpose X inverse I multiplied
both side. Now, X transpose X inverse X transpose X in the covariance lecture I have given you
that SSCP matrixes X transpose X. Can you remember that data of covariance type
and I say that this is basically a square matrix symmetry matrix, so the inverse times
this will be identity matrix. I think as generally it is programmed that is why. That is why
case, so this one a identity matrix, so I can say that beta we are writing, now beta
cap because we have taken the X transpose X all this fixed values which are collected
from the sample. So, this will become X transpose X inverse X transpose y, so this is your formula
for estimating regression coefficients, I will show you one example for the estimation
part. Let us solve this one problem, suppose y is
10, 20, 30, 40, 50, this is my 5 cross 1 vector and my design matrix is 1, 1, 1, 1, 1 that
will always be there and let it be 5, 7 then 10 then 12 then 20. We are taking it is basically
5 cross 2, so that mean y 1 variable independent variable, we have taken that is X and this
one is your X 0 1 1 we have taken. So, what you require to calculate, you require
to calculate beta cap which is X transpose X inverse X transpose y, so you first find
out your step 1 is find out X transpose X, so that will be your 1, 1, 1, 1, 1 5, 7, 10,
12, 20. This is X transpose, then your again X transpose X only, so one 1, 1, 1, 1, then
5, 7, 10, 12, 20, now this matrix become 2 cross 5. This is 5 cross 2, the resultant
will be 2 cross 2 matrix. So, you will get this X transpose X, if you 1 into 1 like this,
so there are data points, so you will be getting 5 first this into this, this into this like
this. Now, this into this into this like this sum
of this 5 plus 7, 12, 22, 34, 54, other side also this cross this that will become 54,
now 5 into this 25 plus 49 plus 100 plus 144 plus 400. So, 9 plus 14 plus 418 then 4 plus
4 8 plus 2, 10 plus 1, 1, 4, 4, 10 and 4 plus 1 plus 7, so 718. So, you second what you
require you require computing inverse, so is say this is my step one. Now, step 2 step two I want X inverse X transpose
X inverse, so this is my determinant of X transpose X that one by this one by this adjoint
of X transpose X, so determinant of X transpose X equal to determinant of 554, 54, 718 which
is 5 into 718 minus 54 square. This will become 5 into 8, how much it is 40 plus this 935
minus this quantity will become 16, 96, 26, 16 and the resultant quantity will be 674.
Now, adjoint of this matrix 554, 54, 718, this will become, so 718 will come minus 54
minus 54, 5, this will be like this. So, your X transpose, X inverse then will become 1
by 674 into 718 minus 54 minus 54, 5. This will become the quantity will become
something like this, if we go by this by this will be 1.07, minus 0.08, minus 0.08 then
0.007. If you go little, if you want to make it then 0, 1 something will come for digit
case. Then, what you require to know, now you require knowing X transpose y that is
another thing you have to compute. Then, your step 3 is compute X transpose y,
so this one can be written like this X transpose y is 1, 1, 1, 1, 1 5, 7, 10, 12, 20, this
is my 2 cross 5 into 5 cross 1 the 10, 20, 30, 40 and 50, so this is 5 cross 1. So, it
will be 2 cross 2 cross 1, resultant will be 2 cross 1, so that n all will be summed
here 10, 30, 60, 150. Finally, then this is 50 plus 140 plus 300 plus 480 plus 1,000.
I think this quantity will become 0, this side, this 5 plus 4, 9 plus 8, 7 then 4 plus
3 plus 1, 8 plus 1 9, this is 1970. So, your step 4 is beta cap equal to X transpose X
inverse X transpose y what is our X transpose X inverse, we found out we found out X transpose X inverse. I think X transpose X inverse is 1.07 minus 0.0801, 0.0801, 0.0007 into X transpose y is 150, 1970. So, this quantity will be this
is 2 cross 2, 2 cross 1, it will be 2 cross 1, this into this minus this into this the
quantity will become like this 160.50 minus 157.80 and it will become 2.70. Similarly,
other one will become minus 12.015 minus 13.79, resultant quantity will be 1.775, and this
is the case. So, my regression equation is now y equal
to beta 0, beta 0 plus beta 1 X 1 plus epsilon equal to 2.70 plus 1.775 X 1 plus epsilon,
this is your regression equation your y cap will be 1.70 plus 1.775 X 1 this is y cap.
If you want to find the epsilon cap the error term that is y minus y cap which is y minus
y cap, we have to find out. Now, this one you have y values, you have y values like
10, 20, 30, 40, 50 and you require to find out y cap. How do you find out you have X
values, also X values are 1, 1, 1, 1, 1 5, 7, 10, 12, 20, so this 2.70 plus 1.775 what
is X 1 value 52.70 plus 1.775. X 2 value 7.270 1.775, X 3 value 2.70 plus 1.775, 12, 2.70
plus 1.775 into 20. This will give you your error values it will be 5 cross 1, I think
this is what estimation we have calculated very simple problem only one variable at a
time. So, when you when you take y is function of
f x in the linear mode like beta 0 plus beta 1 X 1 plus epsilon, epsilon only one variable,
this equation is known as simple regression, simple linear regression when that mean p
equal to 1 case p equal to. Now, if it is p is number of parameter to be estimated,
then it is two intercept like this if I say p is the number of variable case, this is
only 1. So, when it is p greater equal to 3 or more number of parameters to be estimated,
other way I can write p plus 1 should be 2, then it is simple regression when p plus 1
greater than equal to 3 that will be multiple regression. All those case, y is a single
DV y is our single DV only one DV at a time we are considering. So, next class we will see the sampling distribution
of beta the beta you have estimated this beta cap I am I am saying sampling distribution
of beta cap, that estimate basically, so this beta cap you have estimated using a sample.
If you go for several samples, the beta cap value will change and it will become random.
It is a random variable what about beta is it a random variable the beta the regression
coefficient. From population point of view, you have y
equal to X beta plus epsilon which we are talking about it is our regression line for
the population beta is the population parameter. So, that constant and they are unknown SI
that is why you are obtaining this beta cap which is estimate of beta and expected value
of beta cap will be beta. This is the unbiased estimation, so beta is constant and random
constant and unknown constant and unknown and beta cap is random variable, but it is
known. Now, when you are collecting data you are getting this value, so we will go for
sampling distribution of beta cap next class.