Multiple Linear Regression (Part A)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

you we shall be talking on multiple linear regression so this is my first lecture in multiple linear regression and the content of today's lecture is estimation of model parameters in in multiple linear regression and properties of least square estimator z' and then we will be talking about once the model has been fitted we will be talking about testing for significance of regression okay so let me recall the Disney toy problem there we had only one regressor variable that is the amount of money spent on advertisement well we have observed that that the regressor variable there that the that means the amount of money spent on advertisement that explained eighty percent of the total variability in response variable that is the variability in sales amount and the twenty percent of the variability in the response variable that remained unexplained so that we say that is you know the SS residual part now there could be one more regression variable which can explain the part of that unexplained variability in this fons variable that means the part of that twenty percent of the variability which remain unexplained in that case and one important regressor variable could be you know the number of salesperson you employ okay so also in the in in most of the cases in practice there you will have more than one regressor variable and in that case we need to move for multiple linear regression let me explain the multiple linear I mean multiple linear regression model well okay so the situation here is that instead of one regressor here we have more than one regressor variable say we have K minus one regressor variables okay and the denial general form of multiple linear regression is why I equal to beta naught plus beta 1 X I 1 so this one is the first regressor variable plus beta 2 X I 2 up to beta K minus 1 X I K minus 1 plus the error but epsilon I and this model is you know basically it's for the for the eyath observation so I runs from 1 to n okay so since we have you know more than one regressor variable then that is why we call it multiple regression and since the model is linear that is why it is called multiple linear regression but one should be careful you know this is a linear function when linear means it is a linear function of the unknown parameters here the unknown parameters are beta naught beta 1 beta 2 and beta there are K unknown parameters so this one is this model is is linear linear of unknown parameters beta naught beta 1 up to beta K minus 1 so it is not I mean if the if the model is linear in unknown parameter then then only it is called linear model well and we make the assumption that that the error this is the IH error which follows normal distribution with mean 0 and the variance Sigma square and they are also independent all the epsilon eyes are independent ok so now we will define some matrices why equal to y 1 y 2 y n these are the observations n observations beta equal to beta naught beta 1 beta K minus 1 so this is a K cross 1 vector this is the vector of this is the vector of parameters and this is the vector of observation and if Cylon equal to epsilon 1 epsilon 2 up to epsilon n so this one is the vector of errors and also we define n cross K matrix which is equal to X that is 1 X 1 1 X 1 2 X 1 K minus 1 so this is these are basically observation the first observation on this is the observation first observation on regressor 1 this is the first observation on regressor 2 this is the first observation on regressor k minus 1 okay so 1 X 2 1 X 2 2 X 2 K minus 1 and similarly 1 X n 1 X n 2 this is corresponds to the inner table servation X n K minus 1 ok so this is a you know is a matrix of known form because all the values are known well we we have we have the data like we have the data of this form why I xi1 xi2 X I K minus 1 so we have this data for I equal to 1 to N and we have to using this set of observations we have to fit a model like this it is a multiple linear regression model and this model can be now using the matrix notation this model this model can be expressed as you know y equal to X beta plus epsilon so this is the vector of observations vector of parameters vector of errors well so this is the model we have to fit and this is the model in matrix form we we are given the data we are given a data of this form and using this data we have to find when we have twist we have to fit the model that means basically we have to estimate the parameters well now we will be talking about the estimation of model parameters okay I mean in the multiple linear regression there is almost no new concept every all the concept we have already talked about in the simple linear regression so like simple linear regression here also the you know estimating we will be estimating the parameters using least square method so the parameters are determined by minimizing the SS residual well okay so let's square method determines the parameters so here instead of you know only beta naught and beta 1 we have basically K unknown parameters that is their only difference so least square method determines the parameter by by minimizing by minimizing SS residual SS residual so what is SS residual SS residual is basically it is e I square from 1 to n which is again nothing but why I minus y I hat square one to n right now suppose my fitted model is beta naught hat plus beta 1 hat X 1 plus beta K minus 1 hat X K minus 1 okay so this quantity is equal to so SS residual is equal to Y I minus beta naught hat minus beta 1 hat X I 1 because you know I am talking about the I eight fitted value ok beta 2 hat X 2 like this beta K minus 1 hat X I K minus 1 whole square okay now you know the least square method determines the parameter by minimizing this SS residual what we will do here is that I mean we will also represent this SS residual in matrix form okay for that we will define the residual vector e which one is basically e 1 e 2 e n so I I is the is residual okay so II can be written as e is y minus y hat so this Y is the vector I mean vector of observations and the vector of observations for the fitted value well so this is my e now SS residual is equal to summation e I square 1/2 and so we are basically more talking about another I mean how to express this thing in in terms of matrix notation so this can be written as e prime e right if you here now this one is equal to equal to Y minus y hat prime Y minus y hat ok and this can be written as Y prime Y minus y prime X beta hat minus beta hat prime X Prime why plus beta hat prime X prime X beta hat okay I just missed one step in between basically you know I am replacing y hat by Y hat by this expression so in matrix notation this is nothing but Y hat equal to X beta hat okay so you replace Y hat by Y minus X beta hat prime Y minus X beta hat okay so and then you have this expression here you know this is a you can check that this is a 1 cross 1 matrix that means it is a scalar quantity similarly this one is also 1 cross 1 matrix so basically it is a scalar so everything is scalar here so these two quantity these two are same so this can be written as Y prime y -2 times I am taking this form beta prime X prime Y plus beta hat prime X prime X beta hat ok so this is my SS residual in matrix form but if you do not understand this one here is your SS residual here is your resistible which is very similar to the simple linear regression only we have this additional terms because of the additional regressors variable and the same thing is represented here in matrix form well so we have the two different representation of the SS residual and now we need to define said this SS residual with respect to the unknown parameters so there are you know there are k unknown parameters so we have to define set this SS residual with respect to each unknown parameter and that you that will give you k normal equations so then you will be having k normal equations and k unknown so using this k normal independent normal equations you can find out you can get the estimator for the unknown parameters k unknown parameters okay so here is the process you know l least square method well so what we have is that we have to I am explaining both the things if you do not understand the matrix representation so here the SS residual is of the form Y prime Y minus y Prime - 2 times beta hat prime X prime Y plus beta hat prime X prime X beta hat so this is the matrix representation of the SS residual and another way to represent the same thing is that like the usual technique SS residual is equal to summation Y I minus beta naught hat minus beta 1 hat X I 1 and similarly you go up to beta K minus 1 hat X K minus 1 sorry you have to put I here that's all so this is my SS residual okay now what I will do is that I will differentiate to get the normal equations I will differentiate this SS residual with respect to beta naught first so s is what I have to do is that I will differentiate this SS residual with respect to beta naught hat and equal to zero this is the normal equation which implies or which gives you know if you differentiate this with respect to beta naught hat that will give you summation Y I minus beta naught hat minus beta 1 hat X 1 X I 1 minus beta K minus 1 hat X I K minus 1 equal to 0 so this is my first normal equation and you know this term is nothing but e I okay why I - why I hat so this can be also written this first normal equation can be also written as summation e I i from 1 to n equal to 0 so this is my first normal equation similarly next to differentiate this quantity I mean you you differentiate SS residual with respect to beta 1 hat that will give you the normal equation summation e I X I 1 equal to 0 so this is a very similar to the simple linear regression and similarly you differentiate with respect to beta 2 hat and you go up to beta K minus 1 hat and the final normal equation is summation e I X I k minus one equal to zero so here you have you have K normal equations and you have you have K unknown parameters and all these normal equations are independent so solving this solving this K normal equations will give you the KN k unknown parameters beta naught beta 1 and 2 beta K minus 1 okay so this is the usual take I mean form what we have used in the case of simple linear regression now I mean you know I will go for the matrix representation of the same thing what I will do is that I will just differentiate the SS residual which has been explained you know know which has been expressed in terms of matrix notation and I will differentiate that with respect to beta hat well so this is my SS residual with respect to and I in terms of or this is presented in matrix notation now let me define set this one SS residual with respect to beta hat okay differentiate with respect to beta hat equal to 0 which implies or which gives you know you differentiate this with respect to beta hat and that will give you minus 2 X prime Y so I am differentiating to see this is independent of beta so when while dependency ating this term it is 0 now you differentiate the second term so that will give you minus 2 X prime Y when you differentiate the third term that will give you plus 2 times X prime X beta hat and you equate this equal to zero okay I mean you can you can write down the matrix form in detail and differentiate we'll get this one okay so from here you know here it is convenient to since the the same thing written here and here now finding beta hat in this matrix representation is easy from this normal equation so this is in fact you know it consists of K normal equations this K normal equations so from here we get this implies beta hat is equal to X prime X inverse X prime Y ok so here is the least square estimator of the unknown parameters beta naught beta 1 up to beta K minus 1 okay and if you solve this cane or m'l equations you will be getting the same thing okay now we'll be talking about the statistical property of this least square estimator so of H square estimator so what I am going to do is that I am just going to prove that whatever estimator we have obtained that means beta hat which is equal to X prime X inverse X prime Y this is an unbiased estimator of beta let me prove that unbiased means we have to prove that expectation of beta hat is equal to beta so expectation of this one is expectation of X prime X inverse X prime Y now what is y y is in in matrix notation equal to X beta plus epsilon so this one is basically equal to e X prime X inverse X prime X beta plus Epsilon right so this can be written as expectation of X prime X inverse X prime X beta plus expectation of X prime X inverse X prime Epsilon okay well so this quantity this is going to be identity so expectation of and this one is equal to beta only plus the expectation of this term or this random variable here in you know this epsilon is a random variable which follows normal distribution with expectation zero and variance Sigma square so the expectation of epsilon we know that expectation of epsilon is equal to zero so that is equal to zero which which means this is equal to beta so we have proved that expectation of beta hat is equal to beta that means beta hat the estimator the least square estimator we have obtained that is an unbiased estimator of beta okay so next we are going to derive the variance of this estimator so the variance of beta hat is equal to the variance of X prime X inverse X prime Y right so this one is going to be we know the variance of Y is equal to Sigma square well the variance covariance matrix this Y is basically it is affected in the observation vector so the variance of the whole thing and this one is independent of I mean this is a constant term it does not involve any random variable X is not a random variable so this one is going to be X prime X in verse ex-prime I Sigma square into X X prime X inputs okay so this one is you know this can be finally written as Sigma square into X prime X inverse because X prime X and X prime X inverse will cancel out so it is Sigma square into X prime X inverse well so next we'll be talking about the different representation of SS residual SS residual in matrix notation this is we observed and we derived that this is equal to Y prime Y minus 2 beta hat prime X prime Y plus beta hat prime X prime X beta hat right now you know we know that this beta hat is equal to X prime X inverse X prime Y so I am going to put this value here just to simplify this expression this is equal to Y prime y -2 x beta hat prime x prime y plus beta hat prime X prime X and now we'll plug this beta hat here this is going to be X prime X inverse X prime Y so just I have replaced this beta hat by this expression so this quantity is now Y prime Y minus 2 beta hat prime X prime Y plus beta hat prime X prime Y because this is identity well so the simplified form is y prime Y minus beta hat prime X prime Y well so this the same thing you know this one is nothing but summation e I squared and here is the matrix representation of the summation e I square well what we know is that we know that the summation e I follows normal distribution with mean zero and variance Sigma square now let me talk about what is the degree of freedom of this SS residual well I equal to 1 to N so we know that the sum a SS residual is summation II I square from 1 to N and and E is follows normal with mean zero and variance Sigma square now I want to talk about the degree of freedom for this SS residual SS residual is sum of n AI square but just now we have derived that you know this e is they satisfy K constant that means there are K normal equations involving e I so here all the guys are I mean you do not have the freedom of choosing all the e eyes any eyes independently you can choose n minus K of them you have the freedom of choosing n minus 1 n minus K of the any eyes and the remaining K have to be chosen in such a way that they satisfy those K constants well so in the case of simple linear regression we had two constants on AI that is why you you had the freedom of choosing n minus 2 e I is independently and then the remaining two we had chosen chosen in such a way that they satisfy the constant those two constant and here instead of two constraint on AI we have we have basically n constant and here here are the this n constraint you know oh sorry we have K constant and these are the K constant we have so so you cannot this e I squared here I mean you can't choose n of them you don't have the freedom of choosing all the n e is you can you can choose you have the freedom of choosing n minus K e is independently and then the remaining K have to be chosen in such a way that they satisfy this K constant so basically we are losing K degree of freedom because of this K K constant on on on the residuals well so that so that is that that is explained that the SS residual here SS residual has n minus K degree of freedom right now we know that this this follows this then from here you can say that II I square by Sigma square follows Chi square 1 and and from here you can say that SS residual SS residual by Sigma square which is nothing but summation Zi square by Sigma square this follows 1 to N this follows Chi square n minus K naught n because of those K constant well and well so this we have this result and also we can define the mean square residual mean square that is ms residual which is obtained by dividing the SS residual by the degree of freedom n minus K so okay and and we know that it is not difficult to prove that this ms residual is an unbiased estimator of Sigma square that means we can we can it is easy to prove that expected value of ms residual is equal to Sigma square so we have an unbiased estimator for Sigma square as well okay okay so before moving to moving to the statistical significance of the regression model I just want to give another representation of SS residual so the SS residual can be represented in several ways you know just simply you can write summation Zi square I equal to 1 to N then we had the matrix representation of SS residual and now I am going to give another representation of the SS residual which is in terms of the hat matrix right now I do not have any use of this expression but in future maybe we'll be using this expression let me give another just another representation of the SS residual using the hat matrix okay so I will say that this is other way way to express express SS okay so well what we know is that we know that e equal to in in matrix notation equal to y minus y hat so this Y is basically the observation vector and this one is going to be Y minus what is y hat Y hat is nothing but X beta hat right now this one is going to be y minus X now we'll replace this beta hat by its estimated X prime X inverse X prime Y right so what I got is that this is equal to I minus X X prime X inverse X Prime right so this one is using the notation of H matrix this is I minus H into oh I so this a this is an N cross n matrix the N cross n matrix H which is equal to X into X prime X inverse X prime is called the chord the Hat matrix yeah this is called the Hat matrix because you know ultimately what we had is that here it is equal to Y minus H Y so this is called hat matrix because it this H matrix transformed Y 2 so this one is H Y is nothing but Y hat so this matrix transformed Y to Y hat that is why it is called hat matrix and now you know you can you can prove that you know H H square equal to H well so this is the speciality of this matrix now SS residual can be written as SS residual is equal to e prime e which is equal to Y prime I minus H prime I minus H Y ok and you can you can check that this I minus H prime I minus H is nothing but I minus H so this can be written as Y prime I minus H Y so this this is no another way to express the SS residual right and as I said that I am NOT going to use this expression of SS residual in terms of hat matrix at this moment I will be using in future well next I will be moving to the sort of you know ANOVA approach to to test the statistical significance of the regression model for that I will be preparing with I'll talk first I'll talk about SS total and then the SS regression well so is SS total here this is SS total is nothing but the variation in the observation or variation in the data which is nothing but Y I minus y bar whole square I equal to 1 to n so we have n observations of the form Y I and then X I 1 and then X I K minus 1 so this SST is nothing but the variation in the response variable well so this can be written as summation y square minus n y bar square so this is this is not difficult to check okay well what is the degree of freedom of this SS total SS total has degree of freedom yeah it's a sum of n terms but of course it satisfy the constant it satisfy the constant that why I minus y bar this is equal to 0 so you do not have the freedom to choose all the terms I mean Y 1 minus y bar y 2 minus y bar up to yn minus y bar so you can choose n minus of you have the freedom of choosing n minus 1 of them and then the n8 one has to be chosen in such a way that that that it satisfy this constant so SST has degree of freedom n minus 1 now what is SS regression it is regression is equal to SS total minus SS residual right well so SS total is equal to we know that this is equal to summation Y I Square 1 to n minus n y bar square minus s is residual it is residual if you can recall it is y prime Y minus beta hat prime X prime Y in matrix notation now I can also know slowly I mean this can be replaced in math I mean this can be also written as Y prime Y minus n y bar square so this Y bar is nothing but the mean of the observations minus y prime y plus beta hat prime X prime Y so this 2 will cancel out and you are left with beta hat prime X prime Y minus in Y bar squared so we have the expression for SS regression we have the expression for SS total we have the expression for SS regression and just we are left with the degree of freedom for SS regression what we know is that SS total is equal to SS regression plus SS residual well so let me say again that that this is the this is the total variability in in the response variable and that variability is partitioned into two parts one is I mean how much of the variability in the response variable is explained by the model that is SS residual and the part which is not being explained by the regression model is called the SS residual well we want to we want the model to be the such that we want the model to maximize SS regression and and then obviously minimizing SS residual okay so SS total has degree of freedom n minus 1 we know that SS residual has the degree of freedom n minus K then the degree of freedom for SS regression is n minus sorry is equal to K minus 1 so here is the degree of freedom for so SS regression has degree of freedom K minus 1 well well so in the next class I will be talking about the statistical significance of the regression model in case of multiple linear regression thank you very much

Info

Channel: nptelhrd

Views: 60,292

Rating: undefined out of 5

Keywords: Multiple, Linear, Regression, (PartA)

Id: LhGFXO1NQLk

Channel Id: undefined

Length: 56min 3sec (3363 seconds)

Published: Fri Mar 06 2015