Multiple Logistic Regression in SPSS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video we'll run a multiple logistic regression with one categorical independent variable and one continuous independent variable we're using the youth cohort study of England and Wales 2004 to 2007 cohort 12 dataset this data set is available for free download from the UK data service website let's say you want to answer the following question what are the odds that a young person will not be enrolled in full-time education after secondary school taking year 11 placement satisfaction and total GCSE score into consideration our variable of interest is s2 q10 a measure of enrollment in full-time education in sweep 2 of the youth cohort study it's a binary categorical variable meaning it has two categories this means we can model it using logistic regression which requires a binary variable as the outcome or dependent variable because respondent placement satisfaction may be informed by total GCSE score which in turn could influence enrollment in full-time education we can fit a multiple logistic regression model using s2 q10 or full-time educational enrolment as the dependent variable and both s1 q4 or placement satisfaction and s1 GCSE points nu or total GCSE score as the independent variables we are running a multiple logistic regression because it will allow us to control for the influence of each of the independent variables on our dependent variable to run the logistic regression go to analyze regression and binary logistic first we will find our dependent variable s2 q10 and move it to the dependent text bar then we'll find our first independent variable s1 q4 and move that to the covariance text box now s1 q4 is a categorical variable so we'll need to tell SPSS that it's categorical go up to the categorical button in the upper right and just move s1 q4 from the covariates text box on the left to the categorical covariance text box on the right click continue and then finally we'll find s1 GCSE points new here we are and move that variable from the variable list to the covariance box as well because it is also an independent variable however it's not categorical its continuous so that's all we need to do for s1 GCSE points new because we may want to be able to generalize our results to the whole of the population from which the survey data was taken in this case to all of England we should also calculate confidence intervals so we'll click options and under statistics and plots will select CI confidence interval for X B and it's said it's already set at 95% so we can just click continue and then okay to run the logistic regression in the case processing summary output table you can see how many cases were included in the analysis how many cases were excluded or coded as missing and how many cases we have in total in our survey the case processing summary table tells us that we have fourteen thousand in three cases total in our survey in this data set in the dependent variable s2 q10 the category yes is coded as one and category no is coded as two an answer of no therefore has been arbitrarily given a larger numeric code as two is greater than one in logistic regression in SPSS the variable category coded with the larger number in this case no becomes the event for which our regression will predict the odds in other words because the outcome no is coded as two in the dataset the logistic regression will predict the odds of a respondent answering no to the question of whether or not they were enrolled in full-time education because we're now predicting the odds of a respondent answering know this answer becomes the success in our or one an answer of yes is a failure or zero this is reflected in the dependent variable encoding table this shows us that the original values of yes and no are coded as zero and one in this analysis the categorical variables codings table shows us the frequencies of respondent satisfaction with their placement in education work or training in addition it also tells us that these three categories in our categorical variable s1 q4 have been recoded in our logistic regression as dummy variables in logistic regression just as in linear regression we are comparing groups to each other in order to make a comparison one group has to be omitted from the comparison to serve as the baseline in our logistic regression no has been selected as the baseline or constant dummy variable to which we will compare the predictions for yes and to some extent therefore no won't be included in our model you can see this in the table because it isn't coded with one in any case this is because it is the baseline comparison category and has not been added to the model the output tables in block 0 beginning block show the full-time education enrollment predictions before the additions of our independent variables s1 GCSE points new and s1 q4 into the model block 0 shows the odds of a respondent not being enrolled in full-time education without the influence of GCSE score and placement satisfaction in the variables in the equation table we can see the odds of not being enrolled in full-time education these odds ratios are presented as the xB output in this table here we see that without the addition of GCSE score or placement satisfaction the odds that a respondent would not be enrolled in full-time education our 0.235 the odds that a respondent would be enrolled in full-time education in the variables not in the equation table we see the predicted significant for the variables s1 GCSE points new and s1 Q 4 if P is less than 0.05 this table predicts that GCSE score and placement satisfaction will be significant and that the additions of these second variables will improve the fit of the model we can see that the predicted p-values for s1 GCSE points nu and s1 q4 are zero point zero zero zero meaning that they are all predicted to be significant the block 1 output tables shows us the relationship between our dependent variable s2 q10 and our two independent variables s1 GCSE points nu and s1 Q for the omnibus test of model coefficients output table shows the results of a chi-square test to determine whether or not placement satisfaction has a statistically significant relationship with enrollment in full-time education the chi-square here has produced a value of p is equal to 0 point 0 0 of 0 making our placement satisfaction model significant we can use the cox and snell r squared statistic calculated in the model summary output table to gauge how much of the variation in full-time enrollment is explained by this model in this example the r-squared is zero point one nine two this shows us that nineteen point two percent of the variation in enrollment in full-time education is explained by sweep one placement satisfaction and total GCSE score in year 11 we calculated this by just multiplying 0.192 by 100 this suggests that despite our inclusion of two independent variables in this model other factors are influencing a respondents enrollment in full-time education finally the variables in the equation table shows us the significance levels for each of the variables we've included in our model you can see that the p value for s1 GCSE points nu and both of the dummy variables in s1 q4 are zero point zero zero zero meaning that for both of the variables we've included in this model have statistically significant influence on respondent enrollment in full-time education remember that in this model in s1 q4 no was selected as our baseline comparison dummy variable because s1 q41 with a p-value of 0.02 0.03 XI Oh information provided for us in the xB column we can say that a respondent who was happy with her placement in sweep one has odds of not being enrolled in full-time education that our 0.23 - the odds of someone who was unhappy with their placement this means that those who are happy with their placements are more likely than those who are unhappy to be enrolled in full-time education this is because an odds ratio of less than 1 means that the odds of an event occurring are lower in that category than the odds of the event occurring in the baseline comparison variable because an odds ratio of 0.23 2 is less than 1 this means that the odds of not being enrolled in full-time education are lower for those who were happy with their placements than for those who were not we can see in the xB column for s1 GCSE points knew that the odds ratio for GCSE score is 0.99 one because s1 GCSE points nu is a continuous variable we can say that with every point increase in GCSE score the odds of not being enrolled in full-time education after secondary school are multiplied by 0.99 one because 0.99 one is less than one any odds being multiplied by 0.99 one will necessarily decrease therefore as GCSE score increases the lots of not being enrolled in full-time education decrease young people with higher GCSE scores are more likely to be enrolled in full-time education after secondary school
Info
Channel: Practical Applications of Statistics in the Social Sciences
Views: 144,196
Rating: 4.3083572 out of 5
Keywords:
Id: fwxFz32L-AE
Channel Id: undefined
Length: 11min 19sec (679 seconds)
Published: Tue Jun 10 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.