Mod-01 Lec-21 Multiple Regression -- Introduction

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Good morning. Today we will discuss multiple regression, the contents of multiple regression are first we will start with a conceptual model, then we will describe the estimation of parameters. Then we will go for sampling estimation of parameters in terms of beta vector then sampling distribution of sampling distribution of beta estimated. Then we will go for sampling distribution of error sampling distribution of error estimated, and then we will go for that model fitting lesson that is adequacy of a model regression model. Then we go for test of individual regression parameters, which is all betas. After that, we will test the assumptions related to regression, then there are certain diagnostic issues, then prediction using multiple linear regression. Finally, a case study will be showed, so this is in totality will be covered under multiple regression. So, we will go by the first the model, then estimation of parameters sampling distribution. I think, today we will be able to cover this one, conceptual model and estimation of parameters, I think this is for one hour then sampling distribution of that beta estimates and sampling distribution of error estimate, I think one more hour is required there. Adequacy of regression and this test of individual parameter may be one more hour, test of assumption one hour, then diagnostic issues one hour and prediction one hour, and case study one hour. So, 1, 2, 3, 4, 5, 6 hours, 6 hours we will be discussing on multiple regression. Now, let us start with an example, I think in first class and also in subsequent lectures. I have given one example for the Citycan data, Citycan is a small company working in the local market and we have seen that the data structure was like this. Other way, we can start with the variables the important variables for this particular company profit, then sales volume, and then they have found out the absenteeism, percentage of absenteeism of workers or employees. Then, related to machine, that machine breakdown hours monthly, then they also have a marketing department whose performance is measured through M ratio. The company’s primary interest is how to improve the profit that definitely through maximizing the sales and apart from many other things. The company is interested to know how these three variables that absenteeism, breakdown hours; M ratio is affecting sales volume as well as affecting profit. Now this is the case then they under this situation, there are two types of variables one called dependent variables or dependent variable DV, another set is known as independent variable which is IV. So, in this case profit is one dependent variable, sales volume is another dependent variable and there are three independent variables like absenteeism, breakdown hours and M ratio. Now, what is of interest in this particular case here, we want to test that whether the independent variables are contributing in explaining the dependent variable one at a time. It is happening or not, if we can find out that yes there is influence of the independent variables on the dependent variable. Then, depending on the influence measure some actions can be taken to control the dependent variable with the help of controlling the independent variables. So, pictorially if we now consider that one dependent variable, let it be sales volume which is y, we are denoting like this. Other variables we are denoting like this X 1 equal to percentage absenteeism, X 2 equal to breakdown hours and X 3 is M ratio. So, we are not considering profit at present, later on we will see that how profit also can be included. So, under this situation, if I define like this that y within a circle or ellipse and X 1 within a rectangle, X 2 with another rectangle X 3 with another rectangle. I can create some straight line curved with the arrow head to y, and then this will determine that the respected IVS are influencing the dependent variable y that is the measure. So, that means X 1 influences y that is why the arrow head even it is from X 1 and terminates at y that is the case. So, I can say that if X 1 has influence of beta 1, X 2 has influence of beta 2 and X 3 has influence of beta 3, what does it mean? It means that if I change one unit of X 1, there will be beta 1 unit of change in X 2 if you change one unit of X 2 there will be beta 2 unit changes in y. If you change one unit of X 3 there will be beta 3 change in y. If you change one unit of X 1, there will be beta 1 unit change in y, similarly if you change one unit in X 2 there will be beta 2 units change in y, one unit change in X 3 causes beta 3 units changes in y this is the meaning of this influencing parameters. Now, what will happen, suppose none of them for example, absenteeism, break down and M ratio, none of them contributes to y will not be your sales. There will be sales also, it is not necessarily that they have to contribute always, so in that case we require some other parameter which is known as beta 0 that is coming through X 0 where X 0 takes value of 1 that is always X 0 equal to 1. Always by saying this, what we are trying to say that even if there is no influence of X 1, X 2, X 3 still there will be some amount of y values, this is the constant term beta 0 is added there. Now, if you take one observation of y followed by second, third like this what will happen? You will find out that you are not able to explain using beta 1, beta 2, beta 3 and beta 0, and all these values not able to explain the total variability in y. This means there are some other variables or some other, either they are controllable or uncontrollable variables are there which are also contributing towards the variability of y. So, as a result we require another measure which is known as error that mean, in multiple regression case you will get three types of I can say parameters one is beta 0 constant parameters which is known as intercept. Then, the influence of the independent variables, these are known as that influencing parameters or regression coefficients, also beta 0 is also regression coefficient, intercept coefficient. Then, another one is a random one which is error which also contributes, so essentially then what will happen? If I want to put the entirety in terms of an equation, I can write like this y equal to beta 0 X 0 plus beta 1 X 1 plus beta 2 X 2 plus beta 3 X 3 plus epsilon. You can very well write this one, so that means this figure is the pictorial representation of multiple regression model which in mathematical term you can write using an equation. Now, as we have said that that X 0 will take value of 1 always, so this can be written like this beta 0 beta 1 X 1 beta 2 X 2 beta 3 X 3 plus epsilon. Now, if we generalize this one that we are not interested only in three variables, we will go for a variable vector which is having what I say three variables let it be. So, I am writing like this X 1 X 2 dot, dot, dot X p, so there are p variables which are contributing towards y and as we have already seen, there is one intercept parameter and with related to one constant X 0 that one we are making it 1. So, we can write if we want to create this X in terms of also you want to add X bar, X 0, then you can write this one p plus 1 into 1. See this X 0 is need not required to be written here, later on it will be taken care of essentially this X 1 to X p is the variables IVS X 0 is a constant that intercept term will come for that. So, then if we go by X 1 to X p, this is p cross 1 variable vector under this situation what will be the beta because there will be influence for each of the variables. So, I can write beta 1 beta 2 dot, dot, dot beta p which is p cross one but see there is one intercept parameter intercept parameter related to X 0 so that is beta 0, so you require to estimate p plus 1 parameters regression coefficients here. Now, I can write like this y equal to beta 0, beta 1, X 1, beta 2, X 2 dot, dot, dot, dot beta p X p plus epsilon. So, this is in general the general regression equation this is the general regression equation you can write later on we will write it in terms of matrix equation. Now, let us assume that you are going to you are collecting data, so let n data points are collected or you can collected if you say to be collected this will be the all observations will be random. So, in this case our data matrix will be first one is the dependent case, so y it will be y 1 y 2 y 3 like your y n data points. Then, you are collecting X that will be n cross p n is the number of observation, p is the number of variables, so we will use here X 1 1, X 1 2, X 1 p, X 2 1, X 2 2, X 2 p. So, like this X n 1, X n 2 dot, dot, dot X n p, so this is related to IVS, these two sets of data you will collect. Now, if you use this data set into the equation given here, so for every data set or every data points this equation will become true. So, in order to incorporate that every data points into consideration if I write one y i here, and then here it will be X i 1, X i 2 like X I p, this is what you are saying the general observation related to this. Now, if you want to write the equation here, the equation then what you will write, we will write y i equal to beta 0 beta 1 X i one plus beta X i 2 plus beta p X i p plus epsilon i. Then, what is this quantity, this quantity you see it is a weighted linear combination of several variables here basically multiple random observations. So, can you not write down, this is variate, this is weighted linear combination, and this is variate. Then, what does this variate represent? In this case, this one represent the expected value of y i given X i what is this X i equal to? Basically, X i 1, X i 2 like this X i p getting me without error term, this one is the expected value of y i. If I know the ith observation for the individual variable like this, then what will be the expected, what will be the value of y i that is the predicted value. As we do not know that what are this IVS for the i th observations, so accordingly we also do not know what will be the y i value, but by this variate we are saying that what will be the expected value of y i. If you collect i th observation X i 1 to X I p and then that mean what is happening then y i equal to this plus epsilon i. This expected value is nothing but the predicted value; suppose you want to use this later on this will be the better. So, I can write from here that epsilon I equal to y i minus y i cap. So, essentially what are the things we have found out here, we have found out all the coefficients regression coefficients also the error term? We also described that that this linear combination of the variate and this variate will basically talk about what will be the value of expected value of y given these conditions. So, these are conditional y we say we also write this one as mu of y given X in general term, we will write mu of y given X. If I say y I, then given X i where X i is this is the observation. So, if we further assume little bit this is because you just assume that this is one X and one y and you are fitting a linear line, suppose this is your Xi. So, what will be the regression line, the linear line then you go like this you will be meeting you are getting this point here. What is this value this value here? This is nothing but this value mu mean value of y for the i th observation given i th observation. Now, why we are talking about mean value, the reason is suppose you collect for the i th observation, if you collect one, now you will get some value here. If you collect two, the value may change, so ultimately there will be one error part. So, this one is taken care of by the concept error, so any point on this regression line which will be fitting later on. These points are the mean value of y given x, so we will see later on when we fit all those things I will tell you that how to make all those things. Now, we have collected n data points and we have also a regression equation that y i equal to beta 0 plus beta 1 X i 1 beta 2 X i 2, like this beta p X i p plus epsilon i. If I write it for all the observation, then y 1 will be beta 0 plus beta 1 X 1 1 beta 2 X 1 2 like this beta p X 1 p plus epsilon 1. For y 2, you will be writing like this beta 0 plus beta 1 X 2 1 beta 2 X 2 2 plus beta p X 2 p plus epsilon 2. If you go on writing, ultimately up to the n th term, you will be writing like this beta 0 beta 1 X n 1 beta 2 X n 2 beta p X n p plus epsilon n. So, if i write in matrix form, now can you write y 1 y 2 like y i then y n, for the left hand side you can write thi,s so this n cross 1 vector I can write this equal to. If here for beta 0 there is only one that is 0, so 1, 1, 1 for beta 1 x, what is this 1, 1, X 2 1 X i 1 X n 1 for beta 2 X 1, 2 X 2, 2 X I, 2 X n 2. Similarly, for beta p X 1 p X 2 p then X i p then X n p, this one is now what will happen, here there are n into p plus 1 p variables, one for the intercept. So, if I multiply it by beta 0, beta 1 to beta p this will be p plus 1 into 1, then the multiplication of this will give you n cross 1. It is giving you n cross 1, so n cross 1 equal to n cross 1 which is basically this portion is taken care of by this plus there is error terms epsilon 1, epsilon 2, epsilon I, epsilon n this is n cross 1. So, the resultant can be written like this y equal to X beta plus epsilon, so where y is n cross 1 vector of observations of the dependant variables, X this is n into p plus 1. The data matrix of IVS including intercepts, this is also known as data matrix. Actually, X is this one including intercepts, so we say this one as design matrix, beta is the p plus 1 cross 1 regression coefficients and epsilon this is n cross 1 error terms n cross 1 error terms. So, this is a nutshell equation for multiple linear regressions in matrix form. Now, we will see some of the assumptions of multiple regressions, first assumption is linearity, so that is linear relationships, second assumption is homoscedasticity or we say equal y variance across the values of x. Other way, you can say IVS given observations of IVS that homoscedasticity I will show you how it will be done. Variance across y across X y variability remain constant, then third one is we have seen that the error term is there. So, the error term will be uncorrelated error terms, which mean there are n cross 1 errors. These vectors this will not be correlated, uncorrelated error terms and definitely linearity, sorry normality of the error terms normality of the error. So, these four assumptions needs to be tested later on after fitting the model we will test this, but at present at this moment is should say something about what is this. So, first understand what is linearity, you consider your relationships y versus x, for example it is X 1, and then when you plot the data if you plot the data. So, suppose you are getting this type of relationship when you are plotting with X 2 you may be getting this type of relationship, so in the first case there is linearity in the second case it is not a linear one. So, if your data that y versus X that relationship is non-linear then this model is not applicable, what you require to do? Then, you require converting this to linearity; you have to transform the data. Primarily, we want the transform IV first case from linearity point of view transform it. So, the data becomes linear the relationships become linear, now your relationships can be negative, positive no problem, but linearity is an issue. Second one is your homoscedasticity, I told you just few minutes back that suppose this is my X this is y, let it be because as we two dimensional 1, 1, 1, X your regression equation let it be like this. We are assuming that one fitting is possible linear fitting is possible, so you have collected data point’s n data points. Suppose, this is X 1, this one is X 2 and then you will go for this one is X i. So, similarly last one let it be X n, so what I said that this observation, this is the first observation i equal to 1 this is i equal to 2, so like this i equal to n. We are saying that you will collect one sample, but you may collect several samples also. Under such condition what we will have done for a particular value of x, you may get several values of y, so if you plot this several y values for a particular value of x, then whatever variability of y you observe that must be equal. So, that mean we are saying this one if it is sigma square, suppose y equal to 1, so y 1 sigma, y 2 square sigma, y 3 square sigma, y n square. I think this is i square, so what is the condition sigma y 1 means at X equal to 1, this y 2 y i y n square, this should be sigma square. This is what is homogeneity from variance variability of y point of view, if there is violation of this, for example your plot may not be like this your plot something this here it is, but here may be it is like this here may be again small here may be big errors. So, across X values y is not homogeneous from variability point of view that assumption if violated you require to transform variables, but in that case you have to transform the y variable. If there is linearity problem, you transform the variables, you can go for both, but it is preferable that you transform the X variable. If heterogeneous nature, that heteroscedasticity is there, that mean not equal variance is satisfied, so in that case you have to transform the y variable. Then, third is uncorrelated error terms, actually you see this equation this figure, so when you use regression equation and you predict you will basically predict this points for X equal to 1, this value X equal to 2 this value 3 i this value. So, this value is the remaining portion is the error. Suppose, your case is like this your original value is here y i, then based on this equation you are saying this is y i cap, so this portion is my error portion, this is the error. Similarly, everywhere you will be getting error, what will happen here it is sigma epsilon 1, epsilon 2, epsilon i, epsilon n error uncorrelated error terms because each error is random, now so the error will be will follow this type of distribution epsilon. I will follow normal distribution with mean 0 and what will be the variance, variance will be sigma square sigma y square, that means what is this one sigma y square we are writing it here sigma square equal to sigma y 1 square equal to sigma y 2 square. For all observation for equal variance, this condition is very important because we are considering these variability part will be taken care by the error. So, error will follow normality with mean 0 and variance sigma square where this sigma square is the basically the y variability for each of the observation of x. Now, what we are trying to say further that suppose that there are two error terms y i and y k, then the covariance between these two will be 0, which is uncorrelated error terms. So, as there are n errors, so if I see the covariance between the errors terms what will be the n cross n matrix we will be getting diagonal elements will be the variance off diagonal elements should be 0. That is what uncorrelated error terms and normality of error is this error will follow normal distribution with sigma square variability mean 0. So, these are the assumptions we will test all those assumptions later on when we fit regression equation and it is required to be tested, so once you are satisfied with the data that examining the data you find out that that assumptions are reasonably valid. Then, you will go for fitting that is estimation modeling or estimation of parameters. Estimation of parameters model parameters, that is beta we are talking about how to estimate the beta where y is X beta plus epsilon, this is my MLR, this is my MLR, so if you recall that we say y i is beta 0 plus beta 1 X i 1 beta 2 X i 2. So, like this beta p X i p plus epsilon i and if you do little modification, here then this will be y i minus sum total of j equal to 0 to p beta j X i j when j equal to 0, this X i j will takes the value of 1. So, if I make a square of it, then you will be getting y i minus j equal to 0 to p beta j X i j square. So, this is for a particular observation, but we have n observation, so if I take summation over n what is this quantity, total error sum square error sum of square errors. So, this one is SSE, now we will choose beta j values in such a manner that SSE will be minimum, so your ultimate optimization is used here choose beta in such a manner that SSE is the minimum. So, you can do very easily this one so we can write SSE also in some other way also. What will happen you have to go for derivation SSE by beta j? This you put equal to 0 subject to what subject to that your del square SSE by del beta j del beta k, this is greater than 0. This is greater because it is a minimum case, so we basically know what happen, we are here I am saying that only for one variable case, beta j, here it is 2 beta k. Also, we have taken that means the covariance part is two variables, this is coming into actually what is happening here there are so many variables p variables. Sir here should we means del square SSE divided by del beta j beta k or only j b two square, you are talking about when there is only one variable we say this by now if k equal to j it will take care of that issue. It is general form we have written here basically. It is basically hessian matrix. So, actually that I am trying to say as there is p plus 1 estimates, you are making, so you will be getting a matrix, so hessian matrix. That matrix will positive definite will be positive definite, that is what I mean to say, that is why I have written like this. So, every component you will be calculated and a matrix will be found and that matrix will be positive definite by positive definite we mean that suppose A is a square matrix. Now, you get any vector X, if you find out that X transpose A X greater than 0, that is that means a is positive definite, so what the hessian matrix will come that must be positive definite then it is the minimum condition. Now, let us write in terms of matrix then we found that epsilon is epsilon 1 epsilon 2 epsilon n that is n cross 1. We are making SSE mean square sum square error can you not write like this, this is a vector, so 1 cross n, n cross 1 this will give you 1 cross 1 a scalar quantity. If this is true, you can write like this y minus X beta transpose y minus X beta, you can write because our regression equation is this. So, epsilon is y minus X beta, you can write like this, now if I take derivatives against beta, then you will derive this one. It is nothing but the square term, so ultimately what will happen, it will come like this 2 X transpose y minus X beta 1, X transpose will come from here then this will remain this because we are deriving the derivatives for beta any one will come from here. This is with matrix compatibility; we will get this equation, so this equation we will put to 0. I can write now that minus X transpose y plus X transpose X beta equal to 0, so I can write X transpose X beta equal to X transpose y. If I multiplied both side by inverse of X transpose X, then what will happen, what I have done, I X transpose X inverse I multiplied both side. Now, X transpose X inverse X transpose X in the covariance lecture I have given you that SSCP matrixes X transpose X. Can you remember that data of covariance type and I say that this is basically a square matrix symmetry matrix, so the inverse times this will be identity matrix. I think as generally it is programmed that is why. That is why case, so this one a identity matrix, so I can say that beta we are writing, now beta cap because we have taken the X transpose X all this fixed values which are collected from the sample. So, this will become X transpose X inverse X transpose y, so this is your formula for estimating regression coefficients, I will show you one example for the estimation part. Let us solve this one problem, suppose y is 10, 20, 30, 40, 50, this is my 5 cross 1 vector and my design matrix is 1, 1, 1, 1, 1 that will always be there and let it be 5, 7 then 10 then 12 then 20. We are taking it is basically 5 cross 2, so that mean y 1 variable independent variable, we have taken that is X and this one is your X 0 1 1 we have taken. So, what you require to calculate, you require to calculate beta cap which is X transpose X inverse X transpose y, so you first find out your step 1 is find out X transpose X, so that will be your 1, 1, 1, 1, 1 5, 7, 10, 12, 20. This is X transpose, then your again X transpose X only, so one 1, 1, 1, 1, then 5, 7, 10, 12, 20, now this matrix become 2 cross 5. This is 5 cross 2, the resultant will be 2 cross 2 matrix. So, you will get this X transpose X, if you 1 into 1 like this, so there are data points, so you will be getting 5 first this into this, this into this like this. Now, this into this into this like this sum of this 5 plus 7, 12, 22, 34, 54, other side also this cross this that will become 54, now 5 into this 25 plus 49 plus 100 plus 144 plus 400. So, 9 plus 14 plus 418 then 4 plus 4 8 plus 2, 10 plus 1, 1, 4, 4, 10 and 4 plus 1 plus 7, so 718. So, you second what you require you require computing inverse, so is say this is my step one. Now, step 2 step two I want X inverse X transpose X inverse, so this is my determinant of X transpose X that one by this one by this adjoint of X transpose X, so determinant of X transpose X equal to determinant of 554, 54, 718 which is 5 into 718 minus 54 square. This will become 5 into 8, how much it is 40 plus this 935 minus this quantity will become 16, 96, 26, 16 and the resultant quantity will be 674. Now, adjoint of this matrix 554, 54, 718, this will become, so 718 will come minus 54 minus 54, 5, this will be like this. So, your X transpose, X inverse then will become 1 by 674 into 718 minus 54 minus 54, 5. This will become the quantity will become something like this, if we go by this by this will be 1.07, minus 0.08, minus 0.08 then 0.007. If you go little, if you want to make it then 0, 1 something will come for digit case. Then, what you require to know, now you require knowing X transpose y that is another thing you have to compute. Then, your step 3 is compute X transpose y, so this one can be written like this X transpose y is 1, 1, 1, 1, 1 5, 7, 10, 12, 20, this is my 2 cross 5 into 5 cross 1 the 10, 20, 30, 40 and 50, so this is 5 cross 1. So, it will be 2 cross 2 cross 1, resultant will be 2 cross 1, so that n all will be summed here 10, 30, 60, 150. Finally, then this is 50 plus 140 plus 300 plus 480 plus 1,000. I think this quantity will become 0, this side, this 5 plus 4, 9 plus 8, 7 then 4 plus 3 plus 1, 8 plus 1 9, this is 1970. So, your step 4 is beta cap equal to X transpose X inverse X transpose y what is our X transpose X inverse, we found out we found out X transpose X inverse. I think X transpose X inverse is 1.07 minus 0.0801, 0.0801, 0.0007 into X transpose y is 150, 1970. So, this quantity will be this is 2 cross 2, 2 cross 1, it will be 2 cross 1, this into this minus this into this the quantity will become like this 160.50 minus 157.80 and it will become 2.70. Similarly, other one will become minus 12.015 minus 13.79, resultant quantity will be 1.775, and this is the case. So, my regression equation is now y equal to beta 0, beta 0 plus beta 1 X 1 plus epsilon equal to 2.70 plus 1.775 X 1 plus epsilon, this is your regression equation your y cap will be 1.70 plus 1.775 X 1 this is y cap. If you want to find the epsilon cap the error term that is y minus y cap which is y minus y cap, we have to find out. Now, this one you have y values, you have y values like 10, 20, 30, 40, 50 and you require to find out y cap. How do you find out you have X values, also X values are 1, 1, 1, 1, 1 5, 7, 10, 12, 20, so this 2.70 plus 1.775 what is X 1 value 52.70 plus 1.775. X 2 value 7.270 1.775, X 3 value 2.70 plus 1.775, 12, 2.70 plus 1.775 into 20. This will give you your error values it will be 5 cross 1, I think this is what estimation we have calculated very simple problem only one variable at a time. So, when you when you take y is function of f x in the linear mode like beta 0 plus beta 1 X 1 plus epsilon, epsilon only one variable, this equation is known as simple regression, simple linear regression when that mean p equal to 1 case p equal to. Now, if it is p is number of parameter to be estimated, then it is two intercept like this if I say p is the number of variable case, this is only 1. So, when it is p greater equal to 3 or more number of parameters to be estimated, other way I can write p plus 1 should be 2, then it is simple regression when p plus 1 greater than equal to 3 that will be multiple regression. All those case, y is a single DV y is our single DV only one DV at a time we are considering. So, next class we will see the sampling distribution of beta the beta you have estimated this beta cap I am I am saying sampling distribution of beta cap, that estimate basically, so this beta cap you have estimated using a sample. If you go for several samples, the beta cap value will change and it will become random. It is a random variable what about beta is it a random variable the beta the regression coefficient. From population point of view, you have y equal to X beta plus epsilon which we are talking about it is our regression line for the population beta is the population parameter. So, that constant and they are unknown SI that is why you are obtaining this beta cap which is estimate of beta and expected value of beta cap will be beta. This is the unbiased estimation, so beta is constant and random constant and unknown constant and unknown and beta cap is random variable, but it is known. Now, when you are collecting data you are getting this value, so we will go for sampling distribution of beta cap next class.
Info
Channel: nptelhrd
Views: 30,697
Rating: undefined out of 5
Keywords: Multiple Regression -- Introduction
Id: lWlCWbhaem8
Channel Id: undefined
Length: 61min 9sec (3669 seconds)
Published: Fri May 09 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.