Proportional odds (ordinal) regression for likert scales in SPSS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so you've got your data and you'd like to do a regression and you find that your dependent variable is on the ordinal scale so you read around on internet and you find in some papers they treat it as if it were continuous and run a normal regression so you want it to yourself okay is that the thing to do well there is a type of regression that takes an account that you're depend variable is ordinal and so doing such a regression is probably better unless there's some reason why you can treat those or no variables as if they were continuous so I'm going to give you one example today we're going to go for it first very basic detail and then we'll go for again and talk about all these kinds of issues because you know when you're running doing some analysis you've got loads of issues ok so it's important to go through those issues otherwise it's it's kind of just leading you into false hope that is easy to run so let's say I've done an experiment sample of 208 people are divided up into four groups each of 52 and I let them taste the cheese so this four cheese is labeled a to D and then I ask them to rate it on a scale of one to nine where one means it's really really poor and up to nine where they think it's brilliant so if we look at like cheese a seven people ranked it rated it as a 419 rated at seven we see there are some ratings which have zeroes in them so no participant actually registered that one two three for example on cheese D that's the bear in mind later on we're looking at the output now question for our regression might be okay I want to see whether there's a difference in the liking of the ratings of the cheese's and can we say something about how much someone prefers a cheese a to achieve C say so we can answer such a question with them a regression and we note that the rating scale here is is ordinal so we're going to treat it as ordinal not continuous okay this is what my dataset looks like so I've got three columns one for cheese and you see I've got nine entries Dex I've got nine ratings you see so rating is one two nine and then the third one is the counts to run the regression I first have to weight it by count so I go to data weight cases and then move the count in to the frequency variable which I've done already okay now I run the regression and you do as follows analyze regression ordinal and then you move a dependent variable here being rating to dependent I have cheese now this four types of cheese is a categorical variable so I move it into factors rather than into the covariance if gee if I had a variable that were continuous I would move that into the covariance box rather than to the factor box okay next I go to options I'm just going to tell you this by default SPSS has sets the link function to a log it but if we click under there there's lots we'll talk about this on the second pass right and everything we're doing we are doing a logic logic a locket link sorry log it's linked the choice of link affects the interpretations of parameters so bear in mind that this is for a locket link and for a model with the logic link this is called the is called the proportional odds model or otherwise known as the ordered logit model next we click on the output and we just go down to the bottom and just click here test of parallel lines and continue and then we click OK so we get a load of output as you know SPSS views out a load of output we go to the table at the second one called model fitting information this the output here tells us how our model compares to the model with no factors or covariance in other words just an intercept this is otherwise known as the null model what we're interested in looking is here final and looking at this chi-square value here which is 1 4 8 or interested in the p-value for that which is basically it's tiny the null hypothesis is our model doesn't do any better than the null model another way of saying it is that none of our factors help to predict the ratings here it's clearly rejected because of a small p-value so we reject the null so our model is better than the null now this is not such a great test statistic because you know anything is better than doing nothing so you know that doesn't really tell us anything more more importantly is how much better does this model our model is better than the null model okay now then go to box underneath which gives us a goodness of fit test the null model null hypothesis here is that our model is like adequate relative to the perfect model which perfectly predicts all the cell counts we go to we report if the Pearson or the deviants thinks plotted to do the deviance so it's about 20 the very similar anyway 20 and significance it's 0.5 so it's bigger than not 0.1 or bigger than 0.5 so no evidence to reject the null that that the null being the our model is adequate so that's a good sign okay the pseudo R we don't kind of I don't think it's common to report this so we'll just ignore it pseudo means it's pretend anyway okay next is the parameter estimates this is the whole point of running this regression or our case so we'll address our questions but before we do that let's look underneath here I don't know why they put this right at the end it preferred put earlier in the front it's a test test of parallel lines okay that sounds a bit abstract the null hypothesis is that this parallel line assumption holds okay it's even written down here where it is so basically to go further we would like this assumption to hold so do we look at do we reject to look at the general look at this general chi-square about 20.3 p-value 0.5 so bigger than naught point naught 5 so we no evidence to reject it indeed it's bigger than 0.1 so there's no evidence to reject the null so that's good so 4 out for our model it is satisfied ok now this is where you have to roll up your sleeves and really focus parameter estimate all right there's a lot of stuff dumped over here okay we have four cheese's and being categorical it means we set up dummy variables there are four cheese's asbestos by default set sets the one coded the highest as the reference group so what for cheese so this cheese for which I believe is cheese a is the reference these the threshold are basically my intercepts for each of the cheese's for interpretation we don't we ignore that okay it's more important we these are like my slope parameters this is what I'm interested in interpreting in normal regression these are not interception you know regression intercepts we we don't place much at interpretation on that those okay now here's the thing we're interested interpreting then the slope parameters it's not planted to tell is something about preference about the ratings of these cheese is compared to cheese for now just a warning right away is that if you read around the package SAS deals with the setup of the model slightly different to SPSS Stata and R so if you're going to be reading around and you're going to see an Assessor plication know by default SAS gives does it slightly differently so then the interpretation is not exactly the same as ours it will be only if you do a minus times a minus by the coefficient on the SAS or if you reorder it ok so just bear in mind if you going to route don't get confused and we're dealing with SPSS you okay the way SPSS sets it up is these are our B to the right but one for each of our cheese's if the beta is bigger than zero that implies that since we have got categorical variables these being cheese types of cheeses that you are more likely you'll get a higher response for that cheese compared to the cheese the reference category here being cheese a so basically like a positive beta is like a positive effects more likely you going to rate the thing on a higher because remember rating low numbers for low rating high numbers for high rating and the positive beta means like a positive more positive rating likewise a negative beta means like you're more likely to rate it to lower compared to the reference category giving cheese for cheese a so we'll do it in two steps just first of all forget about the significance they'll significant anyway look at the cheese one it's positive what does that mean it means that person is likely to rate cheese one better more highly than cheese four cheese - that's a negative so that means first is likely to rate it lower than cheese 4 and cheese 3 is also negative so same thing so in that respect you know it's very similar to the normal regression positive beta means like a positive correlation between x and y but that's only because of the way the model was set up not writing any equation to you so I'm just telling you ok but we can do better than that because that's just the sign but how about the magnitude one point six one three how can we interpret that well for the log it link if you take the exponential of the number that would gives us the cumulative odds ratio for a one unit increase in that X so let me give you an example and that make more sense let's take exponential of this thing here okay the odds ratio is about five this tells us for cheese one I forget she's D the odds of rating given a rating bigger than or equal to any number you like from one to nine zero to zero to nine sorry is five times higher than the odds for that for cheese four which is cheese a in other words that magnitude that's at five gives you like an idea how much more highly at some regards the roots are rating this cheese one compared to cheese four okay let's duties two as well okay that's got a minus one point seven one so take the exponent and show of that it's point one so literally this says but for cheese - the odds are someone rating it began equal to some value on the rating scale is 0.18 times the odds of cheese 4 since this number is less than 1 that's equivalent of saying that the odds of a high rating for cheese 2 is 0.82 times lower than for cheese 4 by doing 1 minus that number there and we can do the same for cheese 3 now another thing we can see is the comparative signs because everything's that long as compared to cheese 4 but if we look at the order we can see then that d cheese 1 is the best because it's bigger than all the numbers begin these all of them followed by cheese a when I say the best that means prefer to preferred followed by cheese 2 and the last is cheese 3 because got the most negative sign okay so that's the first pass so I'm going to go through it again let's talk a bit more in detail about of just not in detail just give you a bit more information about some of these things first we go back to the choice of the link function so we chose the logit link and now i can tell you the logic link is like when a more popular ones and it's convenient in terms of interpretation point of view because it if we take the exponential of the slope parameter gives us some kind of odds ratio it being an accumulative odds ratio if you use any of these other links then the interpretation the beta will not be a will yeah will not be odds odds ratio those of you actually learning about all your regression might encounter the adjacent adjacent cumulative log I mean jacent category log it which ad L which the parameters may be interpreted in terms of log odds so potato exponential it is an odds so does it matter which of these guys you use well for a point of interpretation why the login is important sometimes the mold doesn't fit so well and so you try these other ones like a trial and error and to find out works whether it works or makes your bot model fit better and SPSS has a has a guide on which one to use saying that depends on the distribution of the choice of the in our case the ratings as it skewed to left skewed to the right but okay so we can forget up anyway so just know that all these are the kinds another thing this example has been very limited because my independent variable are just being a factor okay so I haven't introduced any covariance so it could be some other things that impact your rating so this example has been very limited but you can think of like you've got other converts could be like age in there factors could be other things like ethnicity also we can introduce like interaction terms as well so for all these things that the interpretation can have to go into that but it's too much for like one video next I want to go into this goodness of fit test the goodness of fit test and so with all your other goodness of fit test relies on the expected cell counts and then most of them being bigger than five a rule of thumb is like more than 20% of your cells should have expected cell counts bigger than five so here we've got 22.2% so I should kind of just be a bit cautious about the result I've got how can we deal with something like that well just like with crosstabs who can kind of combine some of our categories like rating eight or nine I could compare those but it's inside the problem with lower ratings I could combine the first two ratings ratings zero one or something like that or one or two or three or something like this the other thing is it's a large sample test so we know that as n increases and both these Pearson and deviants will be basically approximately the same so that's why seeing them so so similar that's why it's like don't really care which one were using alright so for our test it passed how about if it doesn't pass if we if we fail here being being to reject the null what would we do then well one reason could fail is because of our independent variables so we might have to might be because we haven't introduced like some important factor so covariance in there so we could try that another thing we could try is to change the link function or do both and if I could just rewind a bit if we got expected cell cancer a lot of a high proportion expected cell counts I said we could combine groups but we could also then use them compare nested models so a smaller just like a bigger model using the difference of deviances so that's kind of a bit robust to two cell slow cell counts okay this parallel line lines assumption let's look at it because it's a bit weird what is what does this mean okay so parallel lines makes you think of like a graph doesn't it with lines and yeah it's graphical way to kind of think about what's going on here and what the proportional odds assumption or parallel lines assumption says it what it says is that the odds ratio is the same no matter how you split the ratings here so let's say we're looking at the odds ratio between cheese D and cheese C I get an odds ratio for like one two three compared to four to nine then whatever that number is all bit the same between so long as I've got these two cheeses the same as between one to five compared to rating higher five to nine so however you split the rating into two parts comparing one to the other part the odds ratio will be the same so long as it's the same cheese I guess like in layman's terms it might be like saying it doesn't matter like what value you choose to be like the threshold of saying like and dislike you know you might say like for for downwards is like my dislike and higher is like doesn't matter how you choose that value that's not going to affect the value of the odds ratio can I say also then the parallel lines assumption it needs to be tested irrespective of the link function you choose so it's not just for the login link now how about if this assumption doesn't hold don't panic right what do we do well one thing to do is to relax the assumption of parallel lines believe it or not so in other words it allows it allows the odds ratio to change if I just kind of split these into two like 1 2 3 - 49 now might be different that's ratio if I did 1 to 5 compared to 6 to 9 SPSS doesn't allow you to do that but if usually package like our that has something so where you can kind of run the auto regression without the parallel lines assumption what you can do with our is instead is you can kind of then revert forget those ordinal and revert back to a multinomial model or you could run like pairwise binary logistics to compare like in our case when it's a regression when it's binary regression comparing like these cheeses want to the other like D to see D to be e to a and like that or then you could even revert back to thinking okay let's pretend our ordinal variable is actually continuous and see where we go one thing though that you instead you run like a binary logistic and compare the odds and you see the odds are similar to the ones you get from here even if the parallel life lines assumption doesn't hold means like okay it was broken violated but it didn't affect the results too much okay so I hope I've given you a flavor of ordinal regression and if you want to find out more that's when you need to find out that interpreting the where you've got the covariance I continuous X's and dependable 's and interactions in your model but if your application has everything that's categorical then your your position
Info
Channel: Phil Chan
Views: 61,047
Rating: 4.7260275 out of 5
Keywords: ordinal regression spss, proportional odds model example coefficient interpretation
Id: gs5nvwrzNVw
Channel Id: undefined
Length: 19min 35sec (1175 seconds)
Published: Thu Nov 10 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.