How to Use SPSS: Logistic Regression

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in this video I will demonstrate how to perform a logistic regression specifically a binary logistic regression and this is a situation in which we have one or more explanatory variables they could be their categorical or quantitative and we're using them to predict a categorical or binary outcome in other words an outcome in which there's only two possibilities such as pass/fail has a disease does not have a disease survives doesn't survive so we can use that similar to the ways that we use multiple regression or linear regression but now the outcome is categorical instead of potentially quantitative so the data we're going to be looking at here is looking at passing or failing a certification exam so this is data on healthcare professionals and data was gathered on whether or not they passed or failed their certification exam on the first try and then we have some predictive variables explanatory variables we're going to use to try and predict pass or fail so this was data that was collected during their their time of their coursework so we've got a clinical experience evaluation score the higher the better they did in their experience there was an oral comprehensive exam a written comprehensive exam their overall GPA or grade point average and then their grade point average within their major we want to try and see if any of these variables then predict if the person is going to pass or fail their certification exam on the first attempt now none of these variables are categories explanatory variables are categorical we could use categorical variables to help explain pass/fail such as gender geographic location we could use that as well but in this case we don't have any of those variables but we have quantitative variables is our predictor variables okay so in order to do the analysis we first want to do a first kind of preliminary step is to actually look at the potential correlation ends between our predictor variables and our outcome to see if there is some kind of a relationship there and that will give us some idea of what to expect as far as the strength of the prediction model that we come up with it will also give us some idea of which variables we may or may not want to include within the model so we're going to do go to analyze correlate and we're going to do bivariate correlations on all of these variables so we've got our test result 0 being a failure of the task the 1 being a pass we've got an interview score that was the interview into their into their academic program we have this clinical evaluation or on written comprehensive exam score and then our two GPAs and we're going to see how these relate to our outcome our test pass or fail so we make sure all those variables are entered in and then we click OK okay then we can go ahead and look at the test status and see how it correlates some of these variables and we can see that there are positive correlations to all of these variables in other words a higher the score on the predictor variable the more likely they were to pass the exam remember one being a pass score zero being fail and these variables here at the end the written comprehensive exam the two GPAs and the oral comprehensive exam score seem to have the best relationship with this outcome so that to me indicates that these would be good potential predictor variables as well as potentially the interview the evaluation doesn't seem to be as strongly correlated but we could still include that in in our model potentially so that's kind of a beginning first step just get an idea of what variables appear to be predicting others now we could have negative correlations in these predictor variables we could have a situation where a high value on one variable might predict someone more likely to fail on the exam and sometimes we want to include those but those tend to create what's called it what's known as a suppressive effect in other words to kind of cancel out some of the predictive ability of maybe a variable to have positive correlations so those we have to be careful of because they can really affect how strong the model might be and how good of a prediction that model might then make so let's go ahead and run the actual analysis then now that we've done these Palama Nerys so we go to the analyze menu regression and we choose binary logistic because again we're doing a binary outcome and so we're going to choose that particular option so we want to make sure our outcome variable is entered in the dependent variables box and then we also want to make sure our potential predictor variables are entered into the covariance box and the method we're going to use is known as the enter method in which we're going to enter in all of the variables it wants we're going to enter them all into the model and see how what kind of model they create we can do hierarchical regression in which we enter in variables in groups or one at a time to see how they might change the model but that creates a quite a bit more complexity in the analysis so we're going to keep this fairly simple and do the enter method which we use all the variables simultaneously at the beginning now if we happen to have a categorical predictor variable or explanatory variable we would need to check this categorical box and then move that predictor variable over into the categorical covariance and so SPSS can identify the fact that one of these or maybe several these explanatory variables are actually categorical as well so that is an element that would need to be addressed if we had that situation if I want to go to the save option and we want to ask for what the predicted group membership might be so we can get an idea of how well this model is predicting someone's pass or fail on the exam so I want to make sure that is checked click continue and then we go to the options button and we want to ask for this Hosmer ulema shall goodness-of-fit test this will give us an idea of how well our model fits the data in other words how well is it going to predict an outcome and then we can also ask for the confidence intervals for the odds ratios that are going to be produced we're going to get odds ratios produced to give us an idea of how likely someone is to pass or fail depending on having a certain level of one of the predictor variables and then this will also give us the 95% confidence intervals for those odds ratio so we can report that kind of data give us an idea of the magnitude or the clinical significance of our predictor model okay once we've done that we click continue and then we click OK okay so the first thing we'll see is is how many cases were included our analysis we've got 92 cases that's a relatively small sample size a lot of the models tend to work better the larger the sample size we have sample sizes in the hundreds each around 400 or so then that tends to be increased the likelihood we'll have a good model but this is the data we have with so that's what we'll work with the first thing we're going to see or the first table we want to look at it's helpful to us to understand how good our model is is this block 0 or beginning black and this is basically like a null hypothesis if there were no predictor variables available for us to make the prediction of someone were pass or fail this is what the prediction would be is there would be about 54 of the subjects out of the 92 would fail the test and so without any possible predictor variables were we're almost kind of flipping a coin saying there's going to be a 50/50 chance you're going to pass or fail and so the overall model predictive ability is around 59 percent correct so it's going to predict who's going to pass or fail around 59 percent of the time without any predictor variables involved in the model okay so that that's kind of the null hypothesis and so what's going to happen then is we're going to do a statistical tasks that are going to compare what our model with the explanatory variables is able to do compared to this null hypothesis presumably if our model is a good model it will be able to increase this percentage correct hopefully close to close to 100% okay the next thing we can look at are the variables that were not in the equation in other words all the variables that we chose is explanatory variables if they weren't used in the model this gives us an idea of how strongly they were able to create or will be able to create a significant model so all of these variables except for the eval variable have a p-value less than 0.05 so that means that there they're going to be significant predictors individually they could individually have a good predictive ability for pass or fail and so now we're going to see how they work together in our predictor model so now we go to block 1 which again is where we've entered all the variables in simultaneously there's a couple things we can look at first first is the omnibus tasks of the model coefficients it looks at the model and compares it back to that null hypothesis that we had and it produces a chi-square value when the significance level for this comparison is less than 0.05 then that means we have a significant model and that means the model will be a good predictor in a p-value of less than 0.05 so that's good news that means our our predictor variables are going to do a good job of making a pass/fail prediction the next thing we can look at is the model summary here and we have Nagle Kirk's r-squared now this is very similar to the r-squared we used in linear regression which gives us an idea of how much of the variance in the dependent variable is explained by our predictor variables and here about 30 percent of the outcome or 30 percent of the variance in the outcome is being predicted by our predictor or is being affected by our predictor variables so that's decent that's not well obviously we like it to be more but that is a pretty decent number about a third of the variances explained by our predictors the next test we can look at on the next output we can look at is this Hosmer election I'll test and this gives us an idea again of how good our model is and in this situation we want the p-value to be greater than 0.05 if the p-value is greater than 0.05 then that indicates we have a good model in other words our model will be significant if we had a p-value less than 0.05 then that would mean our model isn't very good so it's again good news that means we have a good model here now this contingency table related to this Hosmer and lemosho test will give us an idea of how well again the model is predicting certain outcomes and the outcome we're interested is passing can we predict who's going to pass and who's not going to pass and what this statistic does is it breaks the outcomes or the subjects up into groups and then progressively tries to fit our model to the actual outcomes and when we look at down here is the bottom we look at this last step and and what it's showing here is that the observed number of passes and a group of subjects was 10 and our model predicted about nine of those people actually passing and so the closer these two numbers are together the better the model is so that's again showing we've got a pretty good model here because we were able to predict about nine out of every ten people passing the exam that passed the exam at least in these small groups of subjects these small clusters of subjects now the next thing we can look at is the classification pay-table and this will tell us is how good our model was at predicting the actual outcomes so you can see here our model is able to predict about 69 percent of the categories so almost 70 percent of the outcomes were correctly predicted by our mom so again that's much better than the null hypothesis the null hypothesis was 58% and our model is able to predict almost 69 percent if we can get our model into that 65 to 70 75 percent range of correct predictions that's a pretty good model obviously the higher the better but once we get above that 65 percent threshold we think we're doing pretty well with the predictive ability in the next table we can look at here's where we get the actual constants the actual beta coefficients that we would use then to plug into our regression equation so again here's just like when we do linear regression um we get these beta coefficients and we use these to create our equation so those again are useful to actually then be able to try and make a prediction with known values in these coefficients what we also get here are odds ratios related to each of these variables and so the higher this odds ratio is over one the more likely someone is to pass the exam if they have a high score so for example someone has a high oral comprehensive score there are about 1.1 times more likely to pass the exam the one variable that seems to really have an effect on that is the GPA within their major if they had a high GPA in their major they're about 48 times more likely to pass that certification exam so it gives you an idea of kind of the magnitude of the effect that each of these variables might have on predicting the outcome and then we have 95% confidence intervals for each of these odds ratios okay so that concludes the video as far as how we do logistic regression told us so to summarize what we're doing with binary logistic regression is we've got a binary or dichotomous outcome we can use multiple variables to try and predict that outcome they could be categorical or numeric we then attempt to create a regression model and then we can look at various outputs they then give us an idea of how good our model is in making the prediction and then we're given some beta statistics as well some odds ratios to give us an idea of how individual variables might make the prediction and then we can use the beta constants to then actually construct our regression equations so that we can make predictions

Info

Channel: TheRMUoHP Biostatistics Resource Channel

Views: 290,063

Rating: 4.8878145 out of 5

Keywords: statistics

Id: zj15KUXtC7M

Channel Id: undefined

Length: 16min 4sec (964 seconds)

Published: Tue Oct 02 2012