How to detect outliers in SPSS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video I'm going to describe and discuss the method that is used by SPSS for the purposes of detecting an outlier and I'll point out in the first instance that there are two ways that SPSS goes about that based on something called the interquartile range rule and the multiplier it uses in that context I'm also going to discuss it in the context of whether it's really appropriate and if I had to choose which one I would use so the what I've done is I've created three variables one of which does not actually include an outlier but two of which might include an outlier now I'm going to use the utility and SPSS to help me identify whether there are outliers in these three variables to do that going to analyze descriptive statistics and explore and I got a variable in there already because I did this analysis so I'm just going to look at the first variable in the first instance you don't have to click on any buttons SPSS has as a default the result that will be produced in this case if you press ok so I'm going to do that and just scroll right to the bottom of the output and what SPSS produced here is an analysis to detect outliers but none more actually identifying what's produced here is something that people might call a box and whisker plot what I have is the median here in the middle of the box and then I have the 25th percentile here people also refer to that as the lower quartile it's also the 25th percentile then I have the upper quartile here which is the 75th percentile and then I have the highest observation observed in the data or the largest which is a value of 12 but SPSS has not identified it as an outlier I'll show you in a minute what it does when there is an outlier and then at the lower end have a value of 1 that is also not a outlier these are just the extreme points low and high so let's look at the second variable where all I've done is I've increased the highest point or the largest value of 12 to 13 what's going to happen well let's check it out so put variable two and click on okay and what's happened is the mean the median and quartiles are the same and but what's happened is this value of twelve point five which is really a value of 13 when you look at it in the data file is identified by SPSS as an outlier so it's actually a little bit higher than a twelve point five and what SPSS does in this case is it denotes the outlier with a circle and it puts the number 12 there because that actually is what corresponds to the case number its case 12 that looks like it has an outlying value now what does SPSS do to actually use as an algorithm to determine when something becomes an outlier and when it doesn't because when it was 12 it wasn't an outlier and now that it's 13 it is well what SPSS uses in this case is the interquartile range rule but it's using a multiplier of 1.5 and basically what that means is that SPSS has calculated the difference between the upper and lower quartiles and then multiply that difference by 1.5 and it's identified values that are actually beyond that across the upper quartile and this value actually fits that criterion so based on the interquartile range rule of 1.5 SPSS has identified that value as an outlier now what complicates matters is that there is research that has found this rule of 1.5 as inaccurate and so about 50% of the time you will identify an outlier as an outlying observation when it actually isn't and I actually don't think that there's an outlier in these data myself if you look at the frequency the histogram rather of this data file I do not find it particularly disturbing with respect to outliers so there's an observation here that's equal to 8 and then it goes up to 13 to me that's not really an outlier and it's not an outlier too-hoo glen and eagle whit's who did research in this context and failed to identify 1.5 as a valid indicator so what SPSS does though is that it offers you the opportunity to also examine outliers based on the interquartile range rule of three so now it's going to multiply the difference between the 25th and 75th percentile values and then multiply that value by 3 for the purposes of identifying a potential outlier so let's look at this variable 3 which is now increased to 19 will that be identified as a interquartile range rule multiplier of 3 outlier and I click on OK and yes SPSS has identified it as an outlier and what it does to denote it as a different or more extreme outlier is it puts it at a puts a star next to that value and it also again informs you of the case number to which that value corresponds so in this case case 12 has a whopping value of 19 and it's identified basically as an extreme outlier and when I look at the histogram for variable 3 I do become quite convinced that it is an outlier if you look at this value here that's quite a big gap sitting between the next highest value and this value of 19 the problem is that Hoagland and angle weights based on this study and a couple of other studies I found that 3 is too extreme really the sweet spot is 2.2 and unfortunately you can't tell SPSS to use the interquartile range rule based on a particular mount multiplier be great if it did you just have to type in what the multiplier is instead it by default uses one point five and three I would definitely not use one point five as a rule to identify an outlier if I had to choose I would use three but if you're willing to do the work extra work and calculate the interquartile range rule yourself you can do so and I show show how to do that in a video that I uploaded some time ago showing the quote unquote right way to detect outliers now a lot of people won't want to do that extra work so what I would recommend that you do is use only the star values as outlier as the potential outliers do not use the values denoted with a circle in my view they are not outliers nor are they outliers in the view of Hogland and eagle ones so those are the options that you have in SPSS to detect outliers and if you want to do the extra work I recommend you do so but if not use the multiplier of three
Info
Channel: how2stats
Views: 221,963
Rating: 4.8657408 out of 5
Keywords: outliers, SPSS, interquartile range rule
Id: qQqF6HZo0Gc
Channel Id: undefined
Length: 7min 51sec (471 seconds)
Published: Wed Apr 20 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.