Creating a dummy variable for regression

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
YouTube we're going to predict risk for the independent variables of age pressure at blood pressure and whether or not a person is a smoker so smoker is a categorical variable no no no no etc etc we want to see what happens to the risk if the person is a smoker so you can see that this column here B is a categorical in other words it's only two values yes and no we want to change that to 1 or 0 depending on whether the person is it so it can be included as a dummy variable in a multivariate regression now I'm using Excel which is perhaps not the most flexible of software and it happens that all the dependent variables have to be grouped together I want to create a new column of zero and one to smoker's I'm need to move it over to the right because I'm going to have to copy and paste the other data so I'm making a new column here called smoker and then we're going to use the equals if function to change it into a dummy variable now I know I want to code one for yes so let's pick one where there is a yes it doesn't we could make it more complicated but this is props mostra equals if now we want yes to equal one so click on this equals double quotations yes close quotations one comma zero close brackets so what's this going to happen is that if d5 if the cell of interest is yes if he's a smoker we're going to come back with a one if he is a nonsmoker it's going to come back with a zero so press Enter yes and this works doesn't it so we've got a zero now you can move get on that little corner at the bottom move it up does that work yes a nonsmoker is a zero and then we can paste it or copy it using that thing all the way down so now we have a new dummy variable that records a one if the person is a smoker a zero otherwise now I need to copy and paste all my the rest of the data one two three columns yes no okay put it in here so now I have the risk that's going to be my dependent variable and three independent variables which will explain how that risk comes about we want to do a multiple regression go to data at the top that ribbon at the top data analysis regression now our independent variables are age pressure and smoker our Y range so Y is the dependent variable is risk so copy down risk is that correct e 21 yes that's right now the x range is one whole group of variables it's all of these including that new dummy variable correct so f1 top left age 21 bottom right we have labels we put in risk age so we want to make sure that labels is checked I'm going to put the output range right here so that I can see it next to the data I want to we're not going to examine them in this some YouTube but you should check all these other things and this will come up later it's very useful information for residual analysis then go okay and here is our output here is the output so our adjusted r-squared is not bad it's okay at 0.85 rounded let's look at the significance of the various predictors here intercept we don't really care about age age has got a very low p-value certainly below point zero five pressure also and so is the dummy for smoker so our p-values are very good so therefore I would say that we could use this as a some type of prediction so let's write that prediction our equation so predicted risk is going to equal the intercept so that's minus ninety one point seven five nine then the age so the coefficient for age is one point zero seven one point O seven six seven and then the pressure pressure coefficient for pressure is zero point two five one eight I'm rounding it up so sorry Humphrey has put in here I need to include age and then pressure and then the dummy for smoker plus eight point seven three nine da-da-da-dah times that dummy so what that will mean is that when the person is a smoker that dummy will switch on because the value will be 1 and so it will increase the risk by eight point seven three nine if the person is a nonsmoker then that D will be worth zero and so therefore that effect will have gone completely so we can see that the effect of being a smoker is significant and we could also compare non smokers to smokers with this result the dummy partner by a variable very powerful technique and we could also have multiple variables for different perhaps age groups or types of occupation etc thank you
Info
Channel: Stephen Peplow
Views: 258,133
Rating: 4.8216867 out of 5
Keywords: Creating, dummy, variable, Regression Analysis, Dummy Variable
Id: TBJsEb2UCPs
Channel Id: undefined
Length: 7min 23sec (443 seconds)
Published: Thu Sep 29 2011
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.