Selecting the BEST Regression Model (Part A)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
regression model we are under the multiple linear regression set of and you know that you know in multiple linear regression the number of regressor variables is more than on and in most of the practical problems what happen is that you know the number of regressors is very large and having the large number of regressor variables we may wonder you know whether some of them can be some of them are irrelevant and can be removed from the regression equation well so the basic idea you know behind this finding the best regression model is that we need to find find an appropriate subset of regressors that can explain the variability in response variable well and finding this subset of regression variable this problem is called you know very variable selection problem well let me explain the thing in in detail there are two there are several algorithms to to solve this problem and those algorithms can be you know divided into two I mean that can be classified into two classes basically one approach is called all possible regression approach and I one is called sequential selection well so first I will be talking about all possible regression all possible regression say here you know we need to consider all regression equations involving I say zero regressors well so if if there are K minus 1 is the total number of regressors in the multiple linear regression model then you know the number of model having zero regressors is K minus 1 C 0 and the model is basically y equal to beta naught plus Epsilon okay so we will also consider I mean of course the regression equations or models involving one regressor and the number of models number of such models is K minus 1 c1 through to regressors k minus one C two regression models are there involving two regressor variables well similarly we go up to K minus 1 regressors so number of models involving K minus 1 regressors is 1 so total we have 2 to the power of k minus 1 regression models and you know these models these equations are evaluated according to some suitable criteria the first one is called R square this is the coefficient of multiple determination or coefficient of determination and then we'll be talking about the criteria adjust it R square and then ms residual and the finally I mean we evaluate the equations based on the criteria mannose statistic and this one is denoted by CP okay and this is you know one approach that is you know all possible regression and the other approach is called sequential selection so I'll be talking about this equals in sequential selection later on and there are three algorithms of this type those are called forward selection backward elimination and the stepwise deduction okay so today we'll be talking about you know this all possibility aggression and how to evaluate so many min 2 to the power of K minus 1 regression equation you based on these criterias well now if the number of regressors is 4 so usually we denote the number of regressors by K minus 1 so if K minus 1 is equal to 4 then K basically you know K denotes the number of K denotes the number of unknown parameters in the model well so if if there are K minus 1 regressors that means there will be K minus 1 regression coefficients and there is another unknown parameter which is the intercept so total you will have K unknown parameters well so if there are four regressors in the problem then there are 2 to the power of 4 sorry 2 to the power of 4 which is equal to 16 a possible regression equations and let me just i have those six teens you know regression equation so here i'm considering the a problem with with for regression variable so this is the model which without any Amin with no regressor variable so number of such model is for C 0 which is equal to 1 now these are the models involving one regressor variable so this this one is involving X 1 the second equation is involving X 2 X 3 and X 4 so these are the four regression models involving one regressor variable and then we have no six regression model involving to regress to regressor variables so this one is involving X 1 X 2 X 1 X 3 X 1 X 4 X 2 X 3 X 2 X 4 X 3 X 4 and then next we have regression model involving three regressor variables so there are 4 C 3 that is equal to 4 such models regression model so this one is involving X 1 X 2 X 3 like that and this is basically the full model this involves all the four regressor variables so there are 4 C 4 that means one such model okay so when the when the number of regressors variable is four we we have no 16 possible regression regression models and we need to evaluate them with respect to some criteria and see the complexity of this problem this approach you know if if you have a problem with say K minus 1 equal to 10 that means the number of regressors is equal to 10 then then there are you know 2 to the power of 10 which is equal to 1 0 to 4 possible regression equations so clearly you know the number of equations or the number of regression models that need to be fitted you know that increases rapidly with with the number of regressor variables well so I mean but but still you know since in most of the practical problems the number of regressor variable could be like 20 to 30 so but of course you can you can use computer to to fit all possible 2 to the power of 20 in modules also there is no problem well so next I'll be talking about the criteria the first criteria I mentioned that criteria for evaluating subset regression model well so we need to evaluate those those subset models and the first criteria to evaluate them is I mean one criteria is coefficient of multiple determination and we denote this one by r square so before also I told I mentioned about this R square and we used to call it like coefficient of determination and hence since we are talking about a multiple linear regression model here we call it a multiple coefficient of multiple determination so we denote this by RP square well so this let R P Square denote coefficient of multiple determination for a subset regression model with P minus 1 regressors and intercept Peter not well so by RP square you know this P basically stands for the number of unknown parameters in the model so since there are P minus 1 regressors there will be P minus 1 coefficient and the intercept better not total number of unknown parameters is equal to P and and we denote the corresponding coefficient of multiple determination by RP square okay so this RP square is equal to SS regression P by SST okay which can be written as 1 minus this is residual P by SST right so what is this SS regression P and this is residual P and they denote regression SS and residual it says for subset model with P minus 1 regressors okay and so basically the RP square is associated with the model when there are P minus 1 regressors in the model and RP square is a parameter which measures the proportion of variability in the response variable which is explained by the regression model involving P minus 1 regressors well so it's not I mean like you know R R square we know that R square increases as yeah when one observation you can make you know that that R this R P squared this RP square it increases as P increases because you look at the definition of RP square R P Square is equal to 1 minus SS residual P by SST and we know that SS residual this decreases this decreases as P increases so from here you know can you can easily observe the observe that that RP square increases as as B increases and and this is maximum when P call - okay because you know P equal to K means P minus 1 is equal to K minus 1 that means we are talking about the full model and since we can have you know the problem we have maximum K minus one regressor variable and the SS residual it decreases as as the number of regressors variables increases so maximum number of regressor variable possible is K minus 1 so when this P is equal to K SS residual has the minimum value and hence the hence our P Square has you know will have the maximum value okay so what we do here is that we we compute this the value of RP square so basically first we compute R 1 square this R 1 square is is the case when the number of regressors is equal to 0 so R 1 square means this will have so P equal to 1 that means P minus 1 equal to 0 for the number of regressors in the model is equal to 0 that is the model if you consider the model y equal to beta naught plus epsilon so this is the model you know involving no regressor variable and it's not difficult to observe no prove that when you have this model with no regressor variable then the coefficient of multiple determination is going to be equal to zero okay next we'll be computing r2 square given a set of data okay so r2 square you know basically here P minus one so this is P so P minus one is equal to one so this one is r2 square is associated with the model y equal to beta naught plus beta 1 X 1 plus Epsilon so this is r2 square is for the model with one regressor okay to illustrate all these things first I'll consider one example well so this is quite famous data this is called the hald cement data here we have one regress one response variable Y and we have four regressor variable X 1 X 2 X 3 and X 4 and we have 13 observations corresponds to the response variable and the regressor variables well now you know here we have four regressor variables and you may think that all the four regressor variables are not significant to explain the variability in Y some of them might you know irrelevant and whether that whether some variables can be removed from the model without affecting the model predictive power well so for that you know we need to we need to select the regressor variables which vary gracel variables are best to explain the variability in the response variable Y so that is the whole purpose of this lecture let me you know let me explain the all possible regression situation here using this example so there are four there are four regressor variables so these are the possible models these are the possible models with one regressor these are the possible models with two regressors and these are the possible models with three regressor variables and this is the model with four regressor variables now what we need to do is that we need to fit each of them and once you have the fitted equation or fitted model for for this type of you know for involving x 1 you can compute the SS residual SS total and from there you can compute the coefficient of multiple determination let me you know fit at least one at least one equation for example you know I will feed this equation ok so I have this data I will try to fit a model between model of the form y equal to beta naught plus beta 1 X 1 plus Epsilon right okay so I will try to fit a model or the form y equal to beta naught plus beta 1 X 1 plus epsilon for that you know that hald cement data I am NOT going into the detail of so this looks like a simple linear regression model so you know how to find beta naught hat so you consider only the data corresponds to the response variable and the data corresponds to the first regressor variable x1 and you know how to how to fit this model fitting this model means you know you have to find the least square estimate estimate of beta naught and beta 1 hat so the fitted equation you can check that the fitted equation is y hat equal to 81 point 5 plus 1 point 8 7 X 1 so this is the fitted equation so once you have the fitted equation you can compute the residuals e I and once you have no.you 1 e 2 up to a 13 you can compute the ASIS residual so SS residual is going to be e I square from 1 to 13 you you just check you know this is equal to 1 to 6 5 well so you have the fitted value you have the original observation so from there you can get e which is equal to Y minus y hat so we know all these things and the esses total is equal to 4 this data it is 2/7 15.8 and hence the SS regression is equal to 1 4 5 0 so I'm just trying to give you some idea you know given a problem with four regressors or five decreases how to how to apply this all possible regression approach okay now we can have the ANOVA table for this and what table so the ANOVA table for this problem I mean for this model is you know you write the source of variation degree of freedom is M s and the F statistic variation due to the regression model total variation the part remained unexplained residual the total degree total degree of freedom here is equal to 12 because there are 13 observations now the SS residual you know here you have two unknown parameters so basically we'll be getting two normal equations and that means there are there are two constraint there are two constraint on the on on the residuals so the residual degree of freedom is equal to 11 sorry is equal to 13 minus 2 because of the two unknown parameters in the model so the SS residual has degree of freedom 11 and the regression our degree of freedom is equal to 1 and we have the SS regression value is 1/4 50.1 residual is 12 6 5 and the total is 2/7 15.8 right and the MH value is 14 50.1 and the aim is value here is you know this is 1 1 5 0.1 and the F value is equal to twelve point six well so what do you want to say here is that once he so this ANOVA table is associated with the model y equal to beta naught plus beta 1 X 1 plus Epsilon similarly you have to fit the other four models involving one regressor variable that means y equal to beta naught plus beta 2 x 2 plus Epsilon so for that model you will get another ANOVA table similarly you fit y equal to beta naught plus beta 3 X 3 plus Epsilon you will get another ANOVA table y equal to beta naught plus beta 4 X 4 plus Epsilon you will have the ANOVA table associated with that model so basically you know there are there will be 16 possible regression models and for each of them you will have you have to feed the model you have to find out the Associated ANOVA table for your convenience of course you can use you know computer or some software package like SAS or s plus 2 to do this job and then once you have you know all this ANOVA tables or the SS residual value is T value for every model you can you can compute the coefficient of multiple determination so here the coefficient of multiple determination R square and this is 2 here P is equal to 2 because there are two unknown parameters and here R square is equal to R 2 square is equal to SS regression which is equal to 1 4 5 0 by 1 2 6 5 sorry by SST which is equal to 2 7 1 5 this is equal to 53.4% so here you know this this model is not that good because because it explained the model involving the regressor variable X 1 only this explains only 53 persons of the total variability in the response variable well so what I want to say now that look at this table here now we have computed the coefficient of multiple determination for this model that is 53.4 similarly you fit this model this model is also involving one regressor variable and that is x2 you find out the corresponding ANOVA table and then then you compute R square value okay so this is the R square coefficient of determination associated with this model and similarly you do for all the models here also you do for all the models here you can see that you know here this model particularly it's a good one this one is involving X 1 and X 2 and the coefficient of multiple determination here is ninety seven point nine percent that means which is maximum in this class so among the among the two variable among the regression equations involving two variables this one is best this is that means y equal to beta naught plus beta 1 X 1 plus beta 2 X 2 plus epsilon because you know almost 98 percent of the total variability in the response variable has been explained by this model well so similarly you have to you know this is really a hectic job you know here you have to estimate all the models involving three regressors and you compute the R square value for all the models and this is the full model which involves all the four regressors and the coefficient of determination is ninety eight point two well now what you want to do is that we want to draw a graph so the number of regressor variable P or basically P is the number of unknown parameters in the model along the x-axis and and maximum RP square along the y-axis okay I hope you know you have observed that no higher the value of RP square higher the value of RP square bit at the model is or the higher value of RP square indicates better fit okay so what I want to mean is that out of all this six model which involve two regressive variable this one is the best out of all this for regressive very for models involving one regressor variable this one is the best because this has a maximum this has the max this model has the maximum coefficient of determination well so what we do in this graph is that here all possible models with P minus 1 regressors are evaluated using the criteria in a coefficient of multiple determination and the one giving the greatest RP square is tabulated okay so let me take this is my P equal to 1 P equal to 2 P 3 P 4 P 5 now when P equal to 1 that means there is only one unknown in the model that means P equal to 1 means P minus 1 equal to 0 that means there is no regressor in the model so this is the model and the RT square value is equal to 0 well so here the RP square maximum RP square is equal to 0 now for P equal to 2 P equal to 2 means the number of regressors in the model is equal to 1 so out of these four models the maximum our maximum is sixty seven point five so we will tabulate this one sixty seven point five okay suppose you know this is twenty thirty forty fifty sixty okay maybe 20 and then 40 60 80 100 okay so sixty seven point five you can keep it here P equal to 3 P equal to 3 means the number of regressors in the model is two and the maximum one is ninety seven point nine okay so we will plot this one ninety seven point nine so for three it is almost here now for P equal to four that means the number of regressors in the model is equal to three and the maximum is ninety eight point two so for four it is ninety eight point two and for P equal to five means there are four regressors in the model and the coefficient of determination value is ninety eight point two again so we'll plot ninety eight point two here well so what it suggests is that the algorithm is like that you know you start with with one regressor and add regressors to the model up to the point we're and additional variable provides only a small increase in V squared so based on the stopping criteria you can this small increase means there is no no specific value of this small what what what do you mean by a small increase so either you know this model with two variable it has coefficient of determination value 97.9 which is close to 98% of the variability is explained by this two regressor variable now if you go for the three variable model then this one is the best or also this one is also the having the same multiple linear regression model so so clearly you know you don't need to go further you don't need to go for the four variable model either you choose the three variable model which is you know beta 1 y equal to beta naught plus beta 1 X 1 plus beta 2 X 2 plus beta 3 X 3 either you go for this model or you go for this model and according to the coefficient of multiple determination criteria this one is also not bad you know this is a model with with two regressives well so this this is how we we evaluate the different possible basically all possible models using some criteria so we talked about one criteria that is the coefficient of multiple determination and next we'll move for the ms residual criteria okay residual mean square well so what we know is that SS residual P by P I mean you know this is the SS residual for the model which for the model with P minus 1 regressors okay when K minus 1 is the total number of regressors variables we know that this one decreases as the number of regressors variable increases and here we are talking about ms residual which is equal to SS residual P let me denote it by P also ms residual P by n minus P n minus P is the degree of freedom for the Associated model either degree of residual degree of freedom for the Associated model okay and here one thing you know I want to mention that you know for SS residual decreases as P increases but this is not true for ms residual I mean ms residual may increase with P okay so reason behind this one is that what I want to say here let me let me write ms residual P which is equal to SS residual P by n minus P and also let me write ms residual P plus 1 which is equal to SS residual P plus 1 by n minus P minus 1 okay we know that SS residual P is greater than or equal to SS residual P plus 1 because s is residual it decreases as P increases but the same thing is not true for ms residual here this could be this could be larger than the MS residual P the reason is this you know the increase in ms residual P occurs I mean this this may be I mean larger this occurs when the reduction in SS residual P for adding a regresar to the model is not sufficient to compensate the loss of one degree of freedom in denominator of course and what I want to say here is that you know this one is of course smaller than this one but if you add an irrelevant regressor in the model this will decrease but the reduction here for adding one more regressor in the model the reduction in SS residual is if it is not sufficient to compensate you know one degree of freedom loss here then only it increases so if you if the newly added regressor variable is not relevant for the response variable or not relevant for the model then then only you know the reduction in SS residual for adding this irrelevant irrelevant regressor to the model is not sufficient to compensate the one degree of freedom loss in the model they in then only ms ms residual increases well so we will learn how to how to evaluate you know all possible models using the MS residual criteria in the next class well so we will continue this descry ms residual in the next class thank you for your attention
Info
Channel: nptelhrd
Views: 16,123
Rating: 4.7560978 out of 5
Keywords: Selecting, the, BEST, Regression, Model, (PartA)
Id: eaclT5JyNEI
Channel Id: undefined
Length: 55min 22sec (3322 seconds)
Published: Fri Mar 06 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.