Binary Logistic Regression on SPSS With Assumption Checks and APA-Style Write Up

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey welcome to the video today we're going to take a look at how to do binary logistic regression analysis on spss we'll also take a look at some of the assumptions of that test and we'll take a look at how to report the results of that test so unlike linear regression analysis where we have a continuous outcome variable and we're trying to predict a value on that continuous variable using predictive variables with binary logistic regression analysis we use predictive variables to try to predict an outcome within a binary categorical variable and so what we're going to look at today is whether people have an anxiety diagnosis yes or no that's the binary categorical dependent or outcome variable and we're going to see whether we can predict whether people do or not do not have a diagnosis based on their sex and on their age so xero in this example is going to stand for no diagnosis of anxiety one is going to stand for diagnosis of anxiety zero is going to represent male 1 is going to represent female and these are obviously just going to represent the years that the the individuals have in age um so if we go over to spss we can start setting this up so i'm just going to enter anxiety into this top cell so i'm in variable view i'm in the name column i've just put anxiety there i'm going to put sex here i'm going to put h here so that's just naming all the variables um and then we're going to use the values column to specify what the different levels of these categorical variables are so if we go here values next to anxiety click on this a little bit here i'm going to enter a zero so usually when you're doing a binary logistic regression analysis you're going to use a zero to represent the absence of something and a one to represent the presence of something obviously this doesn't work in all cases for example if we're looking at sex that's not really about the presence of absence of anything okay so i've entered a zero here and i'm going to enter no anxiety diagnosis and then i'll click add then i'll do one and i'll do anxiety diagnosis and then i'll click add then i'll go to okay so i'll do the same thing for sex so i'll do zero equals male add and one equals female add then i'll go to okay and then lastly i'm just going to use this measures column to indicate that these two a are nominal or categorical variables so anxiety and sex and then i'm going to indicate that age as a scale or continuous variable so once i've done that i'm going to go to data view i can see that anxiety sex and age have appeared at the top of these three columns and since i have these columns in the same order in my excel file it's it's simply a matter of copying that data and then clicking this top left cell and then pasting it in and you can see that's the numbers that i entered previously using this values tab when i go to data view i can see that these have been replaced by the names that i gave to those numbers if you don't see the names here i just go up to view and then down to value labels and make sure that this is ticked and that should mean that your numbers are replaced with the names that you gave to those numbers so a couple of the assumptions of the logistic binary logistic regression analysis are that there is no multi collinearity so that just means that your predictive variables are not too strongly related to each other and you also don't want to have outliers so conjecture outliers as part of the process of running the analysis but for multi collinearity we need to check that beforehand so i'm actually going to sort of do the uh the linear regression procedure to check that so i'm just going to analyze then down to regression and across to linear and then i'm going to i'm going to move age so that's sorry age and sex into the independence box and anxiety into the dependent box and then i'm just going to click statistics and then curry linearity diagnostics then continue then okay okay so all we're interested in in this output is a single value because most of this isn't really relevant so all we're going to look at is this coefficients table and we'll look at the tolerance column of that table and what i want to see is that the values here are above 0.1 and since this value is 0.97 that's clearly way above 0.1 so that suggests that this assumption of multi-collinearity or the assumption that there is no multi-collinearity has been met so now we can go on to running the binary logistic regression procedure so if we go to analyze again and this time well same again down to regression but this time go down to binary logistic and then i'm going to transfer my outcome variable to the dependent box and i'll transfer the predictor variables to the covariates box in this example i'm just using the answer method so basically that just means that all of the predicted variables are looked at at the same time i'm also going to click this categorical button and so if you have any categorical variables you can you can transfer them over here um and we're just going to select so that one sex is selected that's my only categorical variable i'm going to click first and then change and this just means that the spss will treat the lowest value as a reference group so in this case zero is allocated to males one is allocated to females so it's going to use males who have the lower number as a reference group in this case it's probably not particularly important to do the step because it's it's just as intuitive to sort of compare males to females as it is to compare females to males but if you had a categorical variable where you had some where zero represented the absence of something and one represented the presence of something the results would be more intuitive if you used the group that represented the absence of something as a reference group so this would be more important this step if you had a variable that had represented presence and absence so i've done so all i did was transfer sex over to here i clicked first i clicked change first comes up here so i'm gonna go to continue next i'll click on options and i'm gonna click these three here so classification blocks osman lemma show given so fits casewise listing or variable of residuals and confidence intervals so take that one then continue and then we should be okay to keep going so we go to okay okay so i'm just gonna ignore these um first couple of things and go down to where it says block zero beginning block and so this is the this is how well the model predicts the outcome variable when it doesn't have any of the predicted predicted variables within the model so in this case because we had 16 people with a no anxiety diagnosis and we had 14 with an anxiety diagnosis obviously there are more people in the former group than in the lats group so spss has just predicted that everybody belongs to the former group so it got 100 correct when it before four people in the no anxiety group and it got zero correct in the for the people in the anxiety group so 100 there 0 there and so the model when it was just guessing basically just 53.3 correctly so we can now go down to the block one bit so this bit is where the mod the model has incorporated the predictive variables into it and if we look at this i suppose we could look at some the classification table first so that's we just looked at it in the context of the the model without the predictive variables so if we compare it to the model with the predict variable so classification table again we can see that predictions have changed so it's it's now so we have 16 people with no anxiety diagnosis and it's the model has predicted that 12 people have no anxiety diagnosis so that's correct but it's also predicted that four people have an anxiety diagnosis among the people with no anxiety diagnosis so we've got 75 correct if we have a look here so we have five plus nine so 14 people with an anxiety diagnosis it said that five of those people or it has predicted that five those people have no anxiety diagnosis and has predicted that predicted correctly that nine do have an anxiety diagnosis so i've got 64.3 correct so it gives an overall percentage of correct classifications of 70 so obviously this is better the 70 than the 5.3 sorry 53.3 percent um when the model didn't contain the predicted variables but we don't yet know whether that is a significant improvement so to figure that out we just need to go down to block one method enter and we can look at this omnibus test sort of model coefficients table and we can see that we have a significant p value so it's 0.003 that's below 0.05 so we can conclude based on that that the model with the predicted variables was significantly better than the model without the predicted variables if we also take a look at this model summary table um this gives us an idea of the amount of variation within the outcome variable that the model predicts so we have a 0.315 here and a point four two one here so we basically just have two methods of estimating the amount of variance predicted by the model in the dependent or alternate variable um so to convert these percentages we just have to move the decimal place over two points or two places so this is 31.5 percent of the variance and this is 42.1 percent of the variance this one here is a goodness of fit or fit test so it's just asking does the model provide a good fit of the data and what we want to see in this case is that the value the p-value is non-significant so as this number here 0.506 is above 0.05 this indicates a non-significant effect and in this case it indicates that the model is a good fit of the data so i'm just going to actually skip this one we've had a look at the classification table and so variables in the equation so we know that the the model overall is a is able to significantly predict the outcome variable so anxiety diagnosis in this case and then we can use these variables in the equation table to look at the individual predictors um so if we take a look first at sex we can see that sex is positively related to anxiety diagnosis so we have to remember that we coded male as zero and female as one and we coded no anxiety diagnosis is zero and anxiety diagnosis is one so if we see a positive value here this actually means that there's an increased likelihood of having an anxiety diagnosis if the individual is female but if we uh scroll over to the sick column we can see that this value here is above 0.05 so there wasn't um so sex didn't make it make a significant contribution to the model and um so if we move on to h we can see that h has a negative b value so this means that's increases in age or associated with decreases in the likelihood of anxiety and if we go over to the sig value we can see that in this case we have a sig value below 0.05 and so this indicates that age does make a significant contribution to the model and in line with this this negative value here we can see that this odds ratio value in the expb column this value is below and this indicates that for every increase in age there's like a 0.9 is 0.9 less likely that a person will have a anxiety diagnosis and if we just look at the sex one again we can see positive value here and we can see a value above one in this case in this column here and we can also take a quick look at these columns here so if we look at the lower and upper limits for the confidence intervals we can see that for sex the this range contains one which is why we have a um a non-significant value basically we can't rule out the possibility that the probability of having a anxiety diagnosis doesn't change um with sex so it's possible that there's just no relationship between sex and anxiety diagnosis conversely if we look at the age row we can see that this these two values don't contain one so they're both below one and that's consistent with this significant p value okay so we um keep going down so i'm going to skip over this um so this one we just want to take a look at the so i'm looking at the case wise uh list table and this is this will list cases that i don't fit the model very well so in this case case 12 doesn't fit the model very well so if we look at like um row 12 here it's saying that this person doesn't really fit with what the model is predicting and generally speaking if this z-resid value is above 0.25 you might consider removing that individual from the analysis in the example that i present soon i'm just going to say that this person was left in in the report you could perhaps you could either say that you removed outliers or you could explain that the analysis is limited because it it contains outliers similarly with the um the multi-co-linearity assumption if you did have a problem here you could consider removing one or more of your predicted variables because a value below 0.1 here would suggest that maybe some of your predictive variables are kind of measuring the same thing so it's not really worth having separate predictive variables so you just consider removing one or more of your predicted variables okay so that's most of the results let's take a look at how to report these so because i'm just going to start off by saying that we did a binary logistic regression to examine whether age and sex were associated with the likelihood of having an anxiety diagnosis so you could use a sentence like that but obviously you would just need to change the names of the variables to match what it is that you're interested in and then this is referring to the the multi collinearity analysis that we just did first so a primary preliminary analysis suggested that the assumption of multi collinearity was met i've just reported that tolerance value so that's 0.97 just to recap that comes from so um this coefficient table and it's the tolerance column there and then i've referred to the uh outliers so an inspection of standardized residual values revealed that there was only one outlier then i just reported the value of that a standardized residual and i've just said that that outlier was kept in the data set but as i said before you might consider actually removing allies from your data set so that value i just reported that this value here from the casewise list table so that's the the assumptions reported next i'm just going to say that the model was statistically significant suggesting that it could distinguish between those with and without an anxiety diagnosis so we've got all these stats here so let's just take a look at where they come from so we've got um chi-squared equals 11.36 and so that comes from if we go to the block one section of the output so that chi-squared value is here in the omnibus test of model coefficients so 11.36 i've just rounded that's two decimal places uh you can see the degrees of freedom here so that's what has been reported in brackets here i've also got n equals 30 so that's just the number of cases or participants um if you're not sure about how many cases or participants you have you can go up to the case processing summary table and that gives you your n number so we've reported the chi squared degrees of freedom n number and then p equals zero zero three and so then i've just reported again this is from the block one section of this test of model coefficients so it's just the same value here 0.003 okay next i'm just explaining how much of the variation within the dependent variable that the model was able to explain so it said that the model explained between 31.5 percent and i've referred to the cox and snell r square value so that just comes from the model summary table so we're still in the block one section so cox and snell are square point three one five so i've just moved that decimal place over the decimal point over so it's 31.5 so that's what i've reported here and the same thing for the nail kirk r square so 42.1 that comes from the same table just this column here just move that decimal point over two places so it's 42.1 and i've said that the model correctly classified 70 of cases and that comes from the classification table within the block 1 section and we can see that the overall percentage of correctly classified cases is 70 percent so that's what has been reported and as shown in table 1 age but not sex significantly contributed to the model so i haven't reported any stats in the sentence instead i've just referred the reader to table one because there are lots of stats associated with this this sentence so it's probably easier to report those results within the table so let's just take a quick look at the table i've produced so just such as logistic regression predicting the likelihood of an anxiety diagnosis so you could use something like this but obviously you just replace anxiety diagnosis with um the appropriate name for your dependent or outcome variable and there's lots of um obviously lots of information here but this is all comes from the variables in the equation table and i've basically just replicated this table so i won't go through where all these numbers come from because as i say it's just a replication of this table so i've got saying all of the same columns in the same order same rows in the same order and that the only difference is i've rounded most of the statistics to two decimal places with the exception of p values which is sort of the apa style and just to just to recap so age but not sex significantly contributes to the model and we're just saying that on the basis of these sick values so sex has a sick value of 0.7072 so that's above 0.05 so there's no significant contribution there whereas h has a p value below 0.05 so h does significantly contribute to the model and i've said lastly the age odds ratio of 0.90 suggests that for every increase in age in years participants were 0.90 times less likely to have an anxiety diagnosis so we're using less likely because this value is below 1. and that 0.90 also just comes from this this table here so 0.90 is the odds ratio and we have the same thing here so that's about all there is to the logistic regression analysis if you have any questions about anything just let me know in a comment i'll get back to you and thanks very much for watching
Info
Channel: David Robinson, PhD
Views: 412
Rating: 5 out of 5
Keywords:
Id: sQdbwrTS2Lc
Channel Id: undefined
Length: 22min 2sec (1322 seconds)
Published: Fri Aug 06 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.