Understanding and Identifying Multicollinearity in Regression using SPSS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello this is dr. Gandhi welcome to my vid standing and identifying multicollinearity using spss multicollinearity is when two variables are highly correlated and oftentimes in counseling research this construct is of interest to us when we are conducting a regression so taking a look at the fictitious data have loaded in the data editor in SPSS you can see I have an ID variable and I have five predictor variables depression anxiety substance use panic and hopelessness and one dependent variable functioning so before we get to the stage of running any analyses we can take a look at these variables and see potential problems for example between anxiety and panic we would expect that there could be a high correlation a high positive correlation between these two constructs they are different anxiety is distinct from panic but oftentimes we see overlap if a participant suffers from anxiety there's a higher probability that they would suffer from panic and the opposite is true as well and the same thing could occur with hopelessness we think of hopelessness oftentimes as the symptom of depression so again if a participant has a high degree of hopelessness on a measurement we would think it would be likely there'd be a high score on a Depression Inventory now we'll test for multicollinearity between all the variables but a particular interest would be anxiety and panic and the depression and hopelessness so there are two different ways I'm going to look at trying to determine if we have multicollinearity the first I'm going to go to analyze and then correlate and bivariate and I'm going to load in just the predictor variable so depression anxiety substance use panic and hopelessness hit okay and run this analysis and this provides the correlation between all the different pairs of predictor variables and we can see that between anxiety and panic we have a point six three seven and between hopelessness and depression we have a point nine three eight now as is the case with many statistics there's no definitive agreed-upon value when interpreting correlations in terms of what's equal to multicollinearity and what's not popular cutoff scores that would be 0.7 0.8 and 0.9 so let's say in this case we use point seven the depression and hopelessness those two variables they're greater than point nine they correlate at point nine three eight and the next highest correlation next strongest correlation would be anxiety and panic at point six three seven so looking at this first table the most concern we would have would be between depression and hopelessness and we'd want to keep an eye on the relationship between anxiety and panic so then I'm going back in go to analyze regression and linear now of interest here is not all the output from linear regression but rather just looking at multicollinearity so I'm going to load functioning as the dependent and then all the other predictor variables into independent the independent listbox and if this were if I was going to complete linear regression there's actually a lot of things I would do but here I'm just going to note the collinearity Diagnostics I want them to be in the output and we can bring those up from statistics and I'm going to uncheck estimates uncheck model fit and just check off : e era t Diagnostics so very narrow view as to what's going on these variables just focused in on potential multicollinearity issues so it click OK and we can see we have collinear Diagnostics down here in the bottom table but of the most interests are going to be these coefficients you can see they're referred to here as collinearity statistics and there's two of them one is named tolerance and one is variance inflation factor or vif the variance inflation factor is the reciprocal of tolerance so if I were to take this table and copy it and move over to excel and then paste it here we can see we have the tolerance values and the variance inflation factor values so I'm going to copy the tolerance values and then move over here to this table I've built to the left under tolerance and I'm going to paste the number and formatting and you can see it automatically populated the correct variance inflation factor that corresponds to what we have here in SPSS if you look at the function for this you can see I've include the if error function so that I won't have an error appear when the tolerance is empty but the point of interest here is that it's 1 divided by the tolerance right so it's the reciprocal and one of the reasons I'm showing you this is because it's important to understand when looking at the cut-offs for tolerance and variant variance inflation factor because just like there's no agreed cutoff for correlations correlation values there is no one agreed set of cutoff values for tolerance and variance inflation factors but if I were to take some of the more popular ones here for example tolerance a popular cutoff is that the value needs to be greater than 0.1 so anything less than 0.1 would be indicative of multicollinearity so if I enter that in we can see the corresponding variance inflation factor is equal to ten in fact that is one of the rules one of the guidelines would be a tolerance less than 0.1 is worrisome or a variance inflation factor greater than ten could be problematic another common tolerance cutoff is 0.2 and you can see that changes the variance inflation factor to five these are the two most common guidelines the 0.1 and 10 and the point two and five there are others as well there are variance inflation factor cut-offs of four and three but for now I'm just going to take a look at these two guidelines the point one and ten and the point two and five so moving back to SPSS to the output if we were to evaluate these predictor variables based on what we know now about the potential cutoff values the anxiety and the panic predictor variables do not appear to be problematic if we remember their correlation 0.63 seven and then looking at the tolerance and variance inflation factor both variance inflation factors here below three this ones below two for panic so whether we're using the cutoff of ten or five for variance inflation factor we would be good here with these variables we would continue on and not worry about multicollinearity now the depression and hopelessness variables that's a little bit different now technically under the 0.1 and 10 rule of course they meet that rule we have point one one zero point one one two and nine and eight point eight for the variants inflation factors however if we take that and then also add into what we know about the correlation it's a strong positive correlation point nine three eight between the depression and hopelessness variables with all that taken together we would probably say here that we do have multicollinearity so under the 0.2 and five rule point two for the tolerance and five for the variance inflation factor these two variables would be problematic and even though we're using the point seven correlation value as a cut-off even if we use point nine the depression and hopelessness correlation is too high to meet that rule so in this case we would probably decide to drop one of these predictor variables either depression or hopelessness depending on the structure of our study so let's say that we decided that we were going to leave out the hopelessness variable we go back to the data editor of course we can run this analysis from the output viewer as well go back to regression linear and we remove hopelessness as a variable and of course all the other settings are saved and that's just the cullinary Diagnostics they're checked off and you can see now with hopelessness not in the model not entered as a predictor we can see that all the variance inflation factors are within acceptable levels and of course the tolerance levels are as well I hope you found this video on understanding and identifying multicollinearity to be useful as always if you have any questions or concerns feel free to contact me I'll be happy to assist you
Info
Channel: Dr. Todd Grande
Views: 94,004
Rating: 4.9243088 out of 5
Keywords: SPSS, multicollinearity, variance, tolerance, variance inflation factor, VIF, collinearity diagnostics, regression, predictors, predictor variables, independent variables, dependent variable, linear regression, correlation matrix, excel, correlation, counseling, Grande, Regression Analysis, Statistics (Field Of Study), Microsoft Excel (Software)
Id: pZsOn6wnGSo
Channel Id: undefined
Length: 11min 26sec (686 seconds)
Published: Tue Dec 01 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.