How to do Regression Analysis for Likert Scale Data? Ordinal Logistic Regression Analysis

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] you [Music] hello welcome to my easy statistics in this video I'm going to discuss about ordinal regression ordinal regression is done when we have dependent variable with ordinal measurement scale and independent variable with a nominal or continuous measurement scale now let us start the discussion with an example in this example there are two variables first one is satisfaction and the second one is brand satisfaction is an ordinal data and brand is categorical data if you want to find out what is the association of brand with satisfaction we must conduct ordinal regression as satisfactions dependent variable and brand is independent variable the dependent variable satisfaction is a Likert scale data with highly satisfied you can see here one is highly dissatisfied 2/5 as highly satisfied the brand is categorical data or nominal data with one as brand a brand B brand C and Brandee they are totally four types of brands for the categorical variable brand in this research 100 respondents who are using the above four types of brands are requested to rate their satisfaction level from highly dissatisfied to highly satisfied for the brand's they are using now let us start this ordinal regression analysis click analyze in this regression and in this ordinal now dependent variable is satisfaction and independent variable is factor now this brand is categorical data so I have selected in factors if the independent variable is a scale data or continuous data we must select in covariance you must select in the covariance okay now among all these buttons select output and in this output by default we have goodness of fit statistics summary statistics parameter elements apart from this selector the test of parallel lines and click continue and now click OK now this is the output screen now in the output screen you can see the first is a warning the warning says they are 4 cells approximately 20% of the data with zero frequencies it means if you see the cross table between dependent and independent we will find that okay row I have taken as brand and column as a satisfaction now if you see here this is the cross table between type of brand and satisfaction levels total they are 20 cells are there there is 5 into 4 5 satisfaction levels and for brands 20 cells in these four cells that is this cell 1 2 3 & 4 these four cells are having 0 frequencies so this is a warning which we got that there are 4 cells that is 20 percent cells with 0 frequency generally system expects that there must not be any 0 frequency cells so since we got 4 cells with a zero frequency system is giving a warning for us now the next table is case processing summary in this place processing summary we can see the satisfaction level distribution you can see mostly people are 54% dissatisfied and 20% are highly dissatisfied because total number of respondents are 100 we are getting the number and percentage equal and if you see the brand brand ABCD all are 25 25 equal proportion we have taken each brand respondents who have taken 25 people who are using for last one year brand a brand B prancy and brandy 25 members for each brand is select judge now this is a case processing summary now let us come for the important table which is called as model fitting information in this table first we must see the significant value the significant value here is a zero point zero zero always the significant value in this table must be less than 0.05 then we will be rejecting the null hypothesis what is a null hypothesis the null hypothesis statement is like this there is no significant difference between baseline model to the final model then what is a baseline model baseline model is a model without any independent variable means in the baseline model will have only intercept but no independent variables and the final model will be with all possible independent variables now we are creating a statement all hypothesis saying that there is no significant difference between baseline model - final model and when we are rejecting that null hypothesis means there is a significant difference between baseline model - the final model the conclusion for this table is if the significant value is a less than 0.05 we are considering that there is a difference between the baseline model to the final model in case if this.value significant value is not less than 0.05 then we cannot give any surety that what the analysis we are doing is correct so the first important constraint to be checked is the significant value for model fit must be less than 0.05 after this condition is fulfilled next we are going to go for the next table which is called goodness of fit in this goodness of fit we must see this Pearson value significant value must be seen this significant value must be greater than 0.05 how much value we got in this case 0.14 6 which is greater than 0.05 now what happens if it is greater than 0.05 means that goodness of fit test whether the observed data is consistent with the fitted model we have a null hypothesis for this the null hypothesis statement is the observed data is having goodness of fit with the fitted model now if the significant value is less than 0.05 we reject the null hypothesis and if it is more than 0.05 we accept the null hypothesis now we are accepting the null hypothesis that the observed data is having goodness of fit with the fitted data so the second constraint is in goodness of fit this Pearson value significant value must be greater than 0.05 in case if it is not greater than 0.05 means the data which we are using is not significantly fitting the model what we are going to now discuss so the second condition is the goodness of fit significant value this Pearson value must to be greater than 0.05 when we are discussing model fit information the significant must be less than 0.05 but when you are discussing goodness of fit the PSN value this significant must be greater than 0.05 and the third table is pseudo R squared in this pseudo R squared the neglect value this.value here in this case we got 0.38 1 this value must be 0.72 now what is this pseudo R square this pseudo R square is a proportion of the variance explained by the independent variable on the dependent variable in the regression model means how much variance proportion of variance the independent variable is explaining on the dependent variable now we got here 0.38 1 which is less than 0.7 actually we can expect around 0.7 it should be more than that it's well and good if it is 0.7 and less than that then we expect that more independent variables should be selected in this case I have taken only brand as independent variable on the dependent variable satisfaction so you must take not only brand we can take gender we can take some other categories we can take and check the model then this is pseudo R square neglect Rik value which is 0.38 1 will increase more than 0.7 the maximum value we can have here is only 1 so here I am having 0.38 1 because I have only one independent variable in case if I increase the independent variable number then there is a chance that my pseudo R square value that is a proportion of variance explained by the independent variables on dependent variable will definitely increase so this is the third table the fourth table is parameter estimation in this table we can find the satisfaction level of all the four brands brand for that is branded D is a taken as a referential brand and all other brand satisfactions are compact brand a brand B and brand C estimates are positive in this case which indicate that brand a brand B and brand C are having more satisfaction level than brand deep since these values are positive it means these brands are having more positive satisfaction then Brandon Deepu okay if in case any brand estimate is a negative then that brand satisfaction is less than brand deep but in our case all other brands are having positive estimates so we can conclude that brand d satisfaction is less than all of the brands or we can say the other three brands are having more satisfaction levels than brand deep if we calculate the exponential values of the estimate then we can interpret the exact satisfaction levels so in this table we have all the four brands here and the estimates are given branded d estimate is zero so as we discussed these values are positive to means more satisfaction than compared to branded D and if you see the exponential values these are the exponential values brand see exponential value is forty two point eight zero three it means brand C is having 42 point eight zero three times more satisfaction than branded deep in the same way brand B exponential value of four point nine zero it mean brand B is having 4.9 times more satisfaction than brand deep and the same way for brand a exponent L value is one point three it means brand a is having one point three times more satisfaction level then brand D and this table is a graphical representation of the comparisons so as we discussed earlier like compared to brand D we have forty two point eight zero three times more signi more satisfied and compact to brandy brand B is for four point nine zero five times more satisfied and you can see here brand a is having only one point three eight times more significant more satisfied than brand d ok back to the table in the parameter estimation you please note the significant values the significant value for brand one is zero point six three six if the significant value is a less than 0.05 then only the difference is significant but if you see for brand one the value is zero point six three six means which is not significant but for branded two it is zero point zero one one this is significant brand three it is zero point zero zero it is also significant means whatever the analysis we said like compared to brand for brand three is significant compared to brand for brand two is significant difference and compared to brand for brand one is not significant means brand one and brand 4 does not have any significant difference in level of satisfaction so this is an important interpretation the fifth table is a test of parallel lines and it is related to proportion odds you can see the null hypothesis the null hypothesis states that the location parameters are the same across response categories what our location parameters they are brand one two three four response categories our satisfaction levels it is saying that the distribution of brand opinion towards satisfaction level is uniform that is brand ABCD towards the referential levels that is highly dissatisfied to highly satisfied they are uniformly distributed the null hypothesis states that the location parameters are the same across response categories now let us see whether we accept the null hypothesis or rejected the significant value is 0.089 if the significant value is a less than 0.05 we reject the null hypothesis since the value here is 0.08 9 which is more than 0.05 we accept the null hypothesis means we conclude that the location parameters are the same across a response categories so this is a fifth table interpretation now we will summarize all these five tables once ok the first a table is model fit fitting information this significant value must be less than 0.05 this is good in this case the goodness of fit value the significant value for Pearson must be more than 0.05 then only we call goodness of fit is yep this is also satisfied pseudo R square neglect value this must be more than 0.7 this is inadequate the proportion is not the variance of proportion is not explained because we are taking only one independent variable so if we increase the independent variables automatically this neglect Eirik value can be increased to the R square value can be increased more than 0.7 and the 4th table is parameter estimates in this table reference to Brandfort the level of satisfaction can be known in this case the satisfaction of brand one brand to brand three are more than brand 14 but there is no significant difference between brand one and brand four because the significant value is greater than 0.05 and the fifth table is a test of parallel lines in this test of parallel lines the significant value must be greater than 0.05 then only the location parameters are the same across response categories so this is how we interpret ordinal regression and I wish this video is informative for you please like the video share the video and subscribe to my channel thank you [Music]
Info
Channel: My Easy Statistics
Views: 75,807
Rating: 4.9154186 out of 5
Keywords: G N Satish Kumar, SPSS, How to do Regression Analysis for Likert Scale Data? Ordinal Logistic Regression Analysis, Ordinal Logistic Regression, Ordinal Regression, Test of Parallel Lines, Model fitting Information, Goodness of Fit, Pseudo R-Square, Parameter Estimation, Regression with Likert Scale data, Likert Scale data, Ordinal data, Categorical data, Linear Regression, Probit Regression, Null Hypothesis testing, Proportional Odd, Odd ratio, Regression with Likert Scale
Id: P76sELMvo-I
Channel Id: undefined
Length: 19min 14sec (1154 seconds)
Published: Sat Apr 25 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.