Week 6: Local Spatial Autocorrelation

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so last week we covered globe four different views of that the two that the weight matrix Maran Zion Kiera C and then two that are based on distance as a metric the nonparametric correlogram and the Vario gram so today we focus on the local version of this as I mentioned the global statistic is good for assessing patterns of non randomness but most of the patterns that we deal with are not random so it's kind of automatic that it's significant but where where are the cores of these patterns what is interesting what are the hot spots what are the cold spots that's what we start to focusing focus on today so the next three lectures really this one and the next to deal with different concepts of clusters and today we focus as clusters as special locations that standout in some way and then in the next two weeks we'll focus on clusters as groupings of locations that are similar in some way and so that's basically the core of the material you will need for your assignment then the last week I'll talk about odds different things but that's doesn't have a lab so it doesn't really is not part of the assignments so everything you need for the assignment will be covered before Thanksgiving okay the principle of a local indicator of spatial Association so as I mentioned several times so the global measure of spatial order correlation does not tell you where the clusters are it tells you that the pattern and the data as a whole is not something you would expect under spatial randomness but it doesn't tell you where or why right so then as opposed to this cluster detection really wants to know where's the action where are the hot spots where the cold spots that's really in a exploratory sense really what it's about because then the next question is of course once you've identified the patterns the clusters the hot spots the next question is why why are these places different from other places in some respect so today we'll focus on two important aspects of this what are these locations where are they and then also how can we assess significance and I will spend quite a bit of time on that part because it's actually very complex and it's not just unique to the spatial analysis but it's an issue that has come up more and more as we move into computer age statistical inference there's a book with that title by Efren and hasty which is really if you have a little bit of Statistics background it's a fascinating discussion of the changing perspectives as computers have allowed us to analyze both bigger datasets but also use the computing the computer and computing to construct statistical tests that we could do otherwise so and I have to say there are many cluster detection methods it's on my to-do list one of these years old have a course just on spatial clusters but I'm not I don't have time yet so here's this concept local indicators of spatial Association or Lisa and Lisa turned 21 last year but you know so I have a paper Lisa 21 so it's a local spatial statistic there's one for each location and there's another twist to it which is a bit technical it's not really that you use that much it has it's useful for Diagnostics purposes just to see how your global statistics comes about and that's the connection between the local and the global so the some of the leases there's one for each location so if you add them up there is a connection to the global statistic so when we talk about the local Maran statistic in a few minutes we'll see that the global miranne statistic is actually the average of the local so there's a connection between the global in the local which then allows us to think of who's like Brij influential points those kinds of things that I mentioned in the context of the Moran scatterplot but we'll we can do this much more detailed so there are two aspects to this very importantly how do we assess the significance of the statistic and each location and then how do we use this information on significance to identify two types of patterns we've already seen this in our classification of spatial autocorrelation through the Moran scatter plot we saw the notion of spatial clusters either high high low low or the notion of spatial outliers that are locations with observations that are very different from the neighbors so both of these will try to identify and then in terms of interpretation of significance we have to worry about whether there is global spatial autocorrelation or not and since there is a connection between the local statistics and a global statistic in most situations in practice you have both significant global autocorrelation at significant autocorrelation and then the question it's a technical question but it comes up all the time to what extent is your assessment of significance of the local statistic influenced by the presence of global significant autocorrelation and these are technical issues but as we'll see in practice I have a very pragmatic approach to dealing with this and that's basically and it's all a learning from site and by use around and we'll see how to do that in the lab you really do get a sense of where are the things that really matter and the noise technically mathematically it's very simple at least I think is very simple we saw that many of all the global statistics we've seen so far the Miranda and Aguirre si double sums sum over all the observations of a sum over J where J typically is the other observations with a weight that's the spatial weight so any statistic and many can be written this way globally as a sum really of a component for each observation separately so it helps to be able to be separated by observation if you connect different observations then it doesn't work and so basically there's a separate entity for each eye then that separate entity is the local statistic and we'll see in a few minutes how that works out mathematically it's actually much simpler than itself so glad to start with the local Moran and the local Geary statistics which are local versions of the ones we saw last week and those are both technically leases in that they connect to the global statistic and all that stuff and then I'll go I'll cover some and a slightly different flavor of local statistic which was developed by Geddes and Ord which is very similar in nature but yet a little bit different and it's not technically a Lisa because there's no connection with a global statistic and then I'll talk about some interpretation issues and odds-and-ends and extensions depending on how much time I have so we know from last week that Moran's eye is basically a double sum over I and J of W IJ Zi Zi J divided by the variance the variance doesn't change by observation so we don't have to worry about that so in essence we have something there's a product of Z and I then something that's constant we don't have to worry about and then basically the spatial AK the weighted sum of the values at the neighboring observations so this is the cross product between the value at an observation and spatial lag as opposed to our Tsar as compared to the cross-product between a value at a location and the one at another location separated by a given distance we use that concept in constructing the correlogram so here's a very similar concept but it's only it doesn't look at all the pairs of other locations it only looks at the neighbors and it summarizes the information in the neighbors by the spatial lag so the spatial lag is the sum of W IJ times the Z J for all the J's but we know that W IJ is 0 for most of these J's so it only counts for the neighbors it's the average of the neighbors and to keep it simple there's some technical reasons for this we use Rho standardized weights and so the these dividers the S sub 0 the sum of all the weights and then they cancel out so that's all it is it's some constant that doesn't change by location so for all practical purposes we can ignore it and then the cross between a value and its spatial lag and remember in the Moran scatter what we are plotting is the value the Zi on the x axis and the spatial lag on the y axis so the points in the Moran scatter plot have a direct connection to these cross products because you know the regression fit is basically a cross product statistic it's close connection so then again as I said more from a technical point of view the some of the local Moran I statistics is n times the global n being the number of observations so you can flip this around and the global is the average of the local what do we use this for well as I said not very often but you might be interested in finding out are there some influential locations that drive the global statistic but actually a be more interested in these influential locations per se in and of themselves and identified them potentially as cluster cores or spatial outliers as the case may be in France is a pain okay you can do it politically but it's an approximation and it's a terrible approximation and without getting too technical in a nutshell approximate are based on large sample ideas the idea that you can approximate what happens in a statistic if the dataset conceptually grows and grows and grows and grows one of the problems with the local statistic is that because it is a local estimate and the number of neighbors is basically limited and fairly small it doesn't actually grow and grow and grow even though mathematically you can go through all the contortions and derive all you want conceptually this approximation isn't really happening it would only be happening is if your neighbors are say within a radius and that the radius gets populated ever more densely which is a different view of space than then we're using here so for all practical we have to use a computed and conditional permutation is essentially the same as permutation but repeat it for each individual location remember the statistic is the cross product of the value Z at location I and its spatial Lac z @ i doesn't change so think of a pot of value we take out Z of I and then we reshuffle the other ones and in fact we don't really reshuffle them we pick out K values if there are K neighbors we randomly without replacement pick out K values out of the pot and compute the spatial lag for those K values and we do this many many times that it's just like with the global Moran's I creates a reference distribution for that location for the statistic at that location so if for Drupal Moran's I we did permutation say 999 times in the local we do 999 times as many times as we have observations as many times as we have locations so very quickly this can get crazy right you have three thousand counties 999 permutations times three thousands this gets big very quickly so you have to work about computational what do we do with all two things first we identify the locations that are significant and let's leave aside for the moment what the right p-value should be let's just say we take a p-value if you know just like we did with the global statistic we look at our reference distribution we get a pseudo value that gives us how many times is the reference distribution equal to or larger than the observed value and then we record that if that p-value is less than whatever we pick let's say for the sake of the argument point oh five we say it's significant if it's not then we forget about it and this is very important to keep in mind and I stress this earlier as well if it's not significant it doesn't exist so don't try to say anything about it being positive or negative it's not significant so for all practical purposes look not globally necessarily but locally focused on that particular location the particular layout of values and their surround is spatially random that's what that means so the local significance map essentially is a map with green colors that show you shows you which of these locations are significant and everything that's not significant is white doesn't not there so this is in contrast with some other pieces of software that give gradations for every observation that can be very misleading because if it's not significant it really isn't there so the other thing that this is very useful for and we'll revisit this when we discuss what p-value we have to use is that it gives you a sense of how significant these locations are so the example I will be using throughout except for my lap it's because at the time is a actually kind of a fun data set it's based on a an essay by a French sociologist Gary in 1850 and it's a moral statistics of France in 1830 it's things like crime literacy donations suicides it's one of the first historic multi-piece that were the quantitative well this is the data set the Giotto sample data sets it comes with the software I just picked a evil donations because even though I say never do this if you look at the map you know you can see some structuring call it clusters but there is something going on though these colors are too similar in big areas of the map and under mandamus you shouldn't it should be all over the place so this is our significance map for the local Moran statistics so using and I went a little crazy here 99999 permutations it's very fast I mean geo des is for this calculation completely paralyzed so you can actually scale this up fairly well so the reason I did this is because it shows you the gradation of African so the darker the green color the more significant it is in other words the fewer values from the reference distribution were more extreme than the one that we observed right and so we start at all five peel 0.05 and then we go up all the way to point zero zero zero zero one which is the most extreme you can get with that number of permutations that means in other words none of the reference values were larger than you know none of the 99999 reference values were larger than the one that we observed so you see some gradation but as such it isn't really that use is really to me one of the most effective exactly that's what this is about okay perfect lead-in you don't know what it is about you just know it's significant so this what local cluster map actually makes the connection between the significance and the Moran scatter plot the Moran scatter plot centers the observations on the mean and gives you four kind of types of spatial Association so the two that are surrounded by similar neighbors we call them clusters high high low low and then the opposite which we call spatial outliers high low low high so what the cluster map does behind the scenes it looks up where these significant points are in terms of the Moran scatter plot and then gives them a color based on where they are so this is the mechanism right you can think of this as going on that's all the points the upper right quadrant which are above the mean surrounded by neighbors above the mean so we call this high high and then we see nificant map that some of these are significant but many of them are not so the gray areas this is the way selection works in the fall so like in Yoda so the transparency is affected so the ones that are not significant and the green ones are significant and so then what we again behind-the-scenes and come up with four different colors so in this case the red ones are nothing but the significant ones from the selection that we have before and then we see that in fact I think nine points out of the original 22 in that quadrant are in fact significant at 0.05 and I get back to that in a few minutes so that's yet this then in some sense you look it up Maran scatter plot and you classify them as foreign types of local statistics and in a minute I'll talk about the difference between clusters and outliers but basically this is what it looks like so 0.05 is very liberal 29 significant locations the red ones hi-hi Buster's but they're really not clusters there to cluster because the cluster notion of a cluster is that that location is more similar to its neighbors than would be the case randomly so their neighbors should really be part of the cluster and I'll get to that in a second and then a dark blue ones are low low so from the other quadrant the lower left quadrant in the Moran scatter plot these are also cores of clusters so the cluster itself is this location with its neighbors and then the two other colors and the reason I picked donations is that it actually shows all four types most cases in practice you don't get that you get to almost never so the low high watts are the light blue ones there's two of them and the high low ones is the light red rosy type there's one of them and so what happens you go from point O fat to or one is basically chilly so you from 0.05 which includes all the lighter colors to only the darker color and so they shrink too and that you know of all this red stuff only those two in the top are left and of all the blue stuff it shrinks look up it's much more our green at the south part then there was up north it was mostly point ofit and then the darker ones were in the bottom which as you crank up the p-value or make it smaller actually then this will shrink even more seem more examples of the Iran so let's look at the outliers so the outliers are locations that are Seurat aber or either to the spate outliers mister map that's the red one with its neighbors defined by Uwe T and this is our natural breaks map where you see that you have a darker color surrounded by lighter colors so it picks that up of course is a simplification into six gradations so it's basically a histogram be not whether they are at the high end or the low end of the these that and similarly this is the low high outlier and blue surrounded by its neighbors and you see it's light brownish surrounded especially by this really dark brown which is an outlier and since these statistics are the average of the neighbors if you have the extreme values in theirs or pull everything up and that's significant pretty straightforward that pairs they are what they are their location that you are interested in clusters little bit different because the cluster is really as I mentioned the look right so what do we do with these neighbors well we can select them and we'll see in the lab how we do that so let's say let's start on the right hand side 0.01 what call them and I find it their neighbors so the gray ones are the neighbors that are not significant and then of course the blue ones you kids significant you know in the example up on top there aren't any but in the example with the blue it's some of the blue units are neighbors of each other so they're not great at blue but they're selected and then if you see how that map for to the is a connection and the connection is really it's all depending on significance so when these are fairly strong associations then you see that core at a say P or 1.01 with its neighbors roughly corresponds to the spatial imprint of all the cores at po5 0.05 right and for the red ones is basically the same thing but it's not the same ones in the center that's what I mean by learning from the data by assessing how these patterns change as you change the P values alright and basically the bottom line at the end of the story is that something really interesting is going on in the South of France in terms of donations people don't give any money to charity now why is that that's the next question we don't answer that right something you would expect a priori this is something you learn from the data unless of course you're an expert on France in the 1830 then you might know that already but that's another point so local Moran I think very straightforward you just basically decompose the numerator of the Moran's I statistic then you carry out conditional permutation analysis to find the locations that are significant and then you classify them and you classify them you can't classify them directly you can't say positive or negative you know you have to combine it with the information that's embedded in the Moran scatter plot which is ng Oda is done behind the scenes so then the local Geary statistic will be similar so we have the global version just do we share memory instead of a cross-product statistic this is a squared difference to the stick and the denominator is again some measure of variance that we knew used to standardize everything and then the local version again has something that is pretty constant here we don't have to worry about so in essence it's this thing which is dead of a cross-product it's a weighted sum of distances in attribute space so you can think of the difference in the value at I and a value at J squared as a measure of distance in attribute space if you think of the values as say located on a line then the distance between them the square distance between them is exactly this so then we'll extend this notion later to more than one dimension which is very straightforward but that's that's the idea now as we saw last week see it's not the same as Moran's I moran's I positive means positive negative negative Geary see is the other way around because it's square differences means large square differences means this similarity means negative spatial autocorrelation and negative means small square differences means similarity there for positive spatial autocorrelation as long as you have that in your head there's no confusion right one is a cross product it has to do with linear Association the other one is a square difference which has to do with distance in attribute space there's nothing linear about it the distance works for nonlinear associations as well that's the main difference between the two so we can do some analytical derivation x' fun but that useful the mean expected value of the local gear EC under spatial randomness is stupid you may recall that the mean for the global severe C was 1 but the global VEC had this 1/2 thing there see the 2 in the in the denominator of the numerator you divide by 2 as sub 0 that having made the mean be 1 so when you don't do this which we don't in the local statistic we end up with the mean being - but it's basically the same idea in practice we do computational permutation the exact same thing as we did with Moran's I so we take a value it's conditional we take it out of the pot and then we select K neighbors and compute the square distance to the neighbors and take the weighted average that's so how do we interpret this very important get away from linear Association you know I know it's difficult but it's all about distances it's all it's not about slopes it's about distances so it's a different way of measuring attribute similarity it's distance in attribute space so smaller values mean similarity large values means dissimilar and the statistic itself is nothing but a weighted average of these distances in attribute space so think of our line again we have one value here let's say we have three neighbors one here one there one there we compute the difference between them square it and take the average that's the statistic and so this is kind of a summary measure just like Geary C was a summary measure of the the distance in attribute space but the extent to which that is reflected in geographical space because we don't take the distance between all the pairs we only take the distances to the geographic neighbors and so the question is really fundamentally underlying this is the distance to these Geographic neighbors different from the distance to any arbitrary set of neighbors or locations really right and if it is then we have locally an indication of higher similarity between that value and its neighbors or higher dissimilarity then would be the case randomly so it's the same idea as under by the local Moran's I but it is applied to a measure of attribute similarity that squared difference rather than a cross-product as I said interpretation as long as you keep this straight in your head it's very straight forward you know large is negative small is positive and it's all relative to the mean so what we're going to do is with these reference distributions we do you know whatever 99999 permutations and we have the mean and then we look where is the local gear we see relative to this mean is it smaller than it then greater similarity or is it larger than greater dissimilarity and then so it's in a sense of one-sided test in that sense so then you know again the question we have the same there's significant locations what do we do with this how do we interpret this and this is not as straightforward as the Moran scatter plot because in essence the Moran scatter plot uses the same criterion of similarity as the Moran statistic I mean there's there's a cross product rationale but behind the linear fit to the points in the Moran scatter plot so there's a sense of one-to-one correspondence and that's what we exploited to our cluster map here we have square differences there's no such counterpart we can still look at the extent to which these significant locations where they are located in the Moran scatter plot to interpret that as high high or low low but if that's what gear EC tells us it doesn't tell us high high low low connecting the gear EC to the Moran scatter plot can give us some insight into high high low low but not always and so there's this kind of strange category of other and we don't know whether it's high high or low low they're very similar but in some sense they cross the mean so they're on either side of the mean one maybe high above the mean the other below the mean but they're very close together so in Moran's I because it's a cross product you don't have a dish it's one or the other here because it's a different criterion there are situations where we cannot conclude we can conclude this location is more similar to its neighbors than would be randomly but we can't conclude necessarily it's because both values are high and similar or both values are low and similar so we have this other category and with negative assaulter correlation we're totally we can't do anything because we're squaring the difference so by squaring the difference we don't know which way it goes so it could be high low low high since we're squaring it we're losing the sign so we can just we can only say it's I hate to call it substantive but the interpretation of the significant locations for the local Geary is not as rich as for the local Moran the you know what to gain for that if in some sense is that is is not constrained by linear but it can capture points that close together the lint work you know in there and the cluster literature you know you have all these kind of sample data sets and some of them are though like I call them the croissants you know you have these sir arcs you know and the arcs are all similar locations but a linear method can't pick that up because it just looks for a line through it and can't do anything with it methods based on distance can because they look at closeness not so much at linear Association so it's not really in either/or it's both techniques that's whether there are any interesting patterns in the data that's how I would put it so an example of where we are able to draw a conclusion so I'll get to the details in a second but this is the same variable and I've highlighted the locations here in the bottom low low locations and they are significant in the significance map and then you can see in the Moran scatter plot through the linking mechanism and we'll get more into that in the lab you can see that actually so really the logic is the other way you know you select the logic that the software does as you see in the scatter plot where are these points are they significant if yes then they are low low if they're in the high high thing if they're significant than they're high high then the other ones that are significant and are smaller than the mean they're similar but we don't know what kind so and then the negative ones I mean in this case larger than the mean they're colored and in our case we have two of them so this is then the illustration of the you don't know really know outliers we just know the very different from their neighbors but we don't know whether this is high low or low high you know this is the quandary with the negative association so these two points do not like they do in the Moran's plot they fall either in the upper left or lower right quadrant here they could be anywhere so that's we have this cluster map I couldn't find one that had other positive but every once in a while you see that so the dark brown red is high high the I'm not good with colors let's just call it orange a color is low low and then we have two blue outliers and as make more precise their significance we see again that a whole bunch of these disappear but we keep ending up with this grouping in the South of France which is the same that we have before we'll have the middle part with if you recall with the local Moran that middle part disappeared when we went from point O 5 to 0.01 so this is picking something up that is a little bit different from the local Moran okay so when you compare the two the I am still working on this and it's you know as I try more examples I start seeing more differences in similarities it seems to me that for the same type the same p-value cutoff at least in my experience since I've been working on this which is about a year ago and many many different applications that you tend to pick up more of different locations with gear EC especially at 0.05 but basically forget about 0.05 0.05 is just kind of give you a broad set of reference but that is by no means a good p-value but as you crank it up they tend to converge to similar locations there are always some important differences and to me these point to potential nonlinearities in the Association if they were identical there would be no point in picking one over the other but they're not identical so each of them contributes something to contribute something to the insight and patterns of the data so you know there as long as you remember it's a different type of attribute similarity it's not the same thing so there's no reason they should be the same actually this is not discussed much sure but if you do a lot of applied work you'll run into that the global Moran's ah and the global gear EC by no means always agree either you know there are differences between these two and the the quandary here is that these are so called diffuse dusts so in testing hypotheses you know you have a null hypothesis that says in our case spatial randomness the alternative hypothesis is no spatial randomness right some spatial patterning of some sort right so what kind of spatial patterning can it be well it can be patterning according to a cross-product type of rationale or it could be patterning by having closed locations Lukie using a distance kind of rationale the problem with any diffuse test is that you don't know which one it is all you're doing is you're rejecting to no and you can say positive or negative spatial autocorrelation but you cannot say this is caused by this specific model as opposed to a focus test say regression analysis and you all know what a t-test is on a coefficient that's a very focused test it says this coefficient is zero this one not any coefficient this one or it's not so very focused you know exactly if you reject to know you know what's going on right here if you reject to know you sort of know what's going on you know it's not spatially random it's either positive spatial autocorrelation but you don't know where this spatial autocorrelation comes from what is the underlying model what is the underlying process that yielded the pattern it's about patterns it's not about and so we don't know what the alternative is so then the third set of statistics as I mentioned it's actually a little bit older than the local Moran and the local Geary is based I mentioned it's not a Lisa in a strict sense and there's it doesn't really or initially it didn't have a connection to a global statistic with a little bit of algebra and a different low global statistic you could kind of go in the reverse but importantly this is a knob a ston a cross-product logic or on a distance square distance kind of logic but is based on the point pattern or logic and we haven't really covered point patterns and I'll talk a little bit about that the last week and kind of the you know miscellaneous lecture but essentially in point pattern analysis the object is the same is the pattern that we see of the points is that something that spatially random or is there some structure in this in other words our points closer together then they would be or are they further apart than they would be and one of the ways in which you can assess that is by basically counting how many points there are within a given radius and you know or you can derive under spatial randomness what is the expected number of points in a given radius so if you have many more points clustering if you have many fewer points dispersal so that's basically the logic so an hour.get is has done a lot of work on point during his career so garrison ort came up with a statistic that is basically a ratio of the and there's two two versions of it so basically you can think of it as counting points in a given rageous relative to all the points in the map and then you translate points in actual magnitudes you have a variable X um all the values of X for all but the location that you're looking at so you have the location and then you take all the X's and then in the numerator you count all the X's that are within a given rages as or whatever neighbors are defined by the way you can think of this as a similar as similar to the point law but instead of just counting the points you actually add up the values observed at those locations and again the the logic is this if these this ratio is higher than it would be on the randomness then we have clustering of high values if it's lower than what we would expect random to under randomness then we have the opposite then we have fewer values than we would expect it and that is called negative spatial autocorrelation so the two statistics are very similar GE I statistic excludes the value at the location itself the GI star statistic will get to this shirt well I'll come back the GI star statistic includes the value at the location so the the GI statistic is like a donut right you you don't take the value in the middle and then you take the ring and you compare the values in that ring to all the values outside of the core whereas the GI star window you include the value in the middle of the donut in the window in the numerator and you take all the values in the denominator as a result this denominator is the same for all observations because it's the sum of all values but same logic is more value in our little window than there would be on average then there's something going on there a clustering of high values if it's lower if it's much less than it's the other way around so let me backtrack now inference approximation no good again conditional a permutation to the rescue right same principle everything is the same we can do a significance map we can do a cluster map the great advantage the geta sort statistics relative to the local Maran is that you have the interpretation right away so in the local Maran you have this extra step of going through the scatter plot to figure out what is going on with the local geary it's even more complicated here positive is positive negative is negative so positive is called a hotspot negative is called a cold spot so the hotpot is the Reds the cold spot is the Blues if you have a good visual memory you will recognize this as exactly the same map as we had with the local Maroons I except for two differences that I'll show you in a second in my experience there isn't really a lot of difference between the GI energy is statistic this maps are exactly the same and I did do the analyses separately I did I will do this in the lab in fact we concede by don't allege it's a different one exact same pattern so again the advantage is hotspots cold spots right so again as I said before if it's not significant it doesn't exist positive is local clustering of elevated values again you have to take this with a grain of salt this is high relative to the mean right if the mean is low then high also long but it's always relative to the mean and negative is cold spots it doesn't do spacial outliers so that's a drawback so in comparison with the local Maran obviously I'm a little biased and so I always looked use the local Marais almost always you know when there is no negative spatial autocorrelation when you don't care about outliers then as we saw the results are basically the same so it doesn't really matter which one you use and it's I mean in the old days it was easier to use their get is autistic because you looked at the sign and you knew hot spot cold stuff you did Moran scatterplot roundabout way that's basic essential difference between the two as I mentioned before you typically get the same results in terms of the clusters remember these are clusters also in get a sword statistic so they are the cores of the clusters we have to do the thing with the neighbors same idea but as you see there's a plea the same except you know what is read in the get is port statistic or is identifier as out where's Homer at and same with the blues so one of the Blues is an outlier in a cross-product sense and no two of them well one of these blues and the local Geddes statistic is identified as an outlier in the local Moran and the two of the red so that's the main difference but as you crank it up and I don't have it but as you crank it up to 0.01 there are no more outliers in the local Moran so then it's identical okay this is the hard part so lots of discussion in the literature about inference power getting approximations to the distribution this is very technical literature I just want you to know it's there the essence of the problem and I want to spend a little bit of time on this because this is a generic problem that is becoming really prevalent in modern statistical analysis for larger data sets particularly in the analysis of genomic data you know large genomic database is a large number of variables large number of observations the game changing in terms of what is a p-value right so we run into this problem as well so technically a p-value is for a single statistic so you have no hypothesis you compute the statistic from the data you compare the value of the statistic to what it would be under the null and then you reject the null or you don't reject to know and you do that with a given type 1 error which is the probability that you reject to no one in fact it's holding so you're you you I think you call this false positive I always get confused but you know you're being too eager to reject the null you really shouldn't have right so the problem is that once you do more than one test when you once you do whether it's called multiple comparisons then these type 1 errors are no longer what you think they are so when you say I think I'm take using a p-value of 0.05 if you do two tests in a row it's no longer point O five and it's it's larger so you're more likely to be too eager and have these false positives so you know you think it's point of five in fact it's a 0.1 the problem is it's really really difficult to figure out what the exact p-value is or what the exact p-value is once you do multiple comparisons and there are a number of solutions while they're not really solutions a number of Roach's to deal with that out there the two most caught on fir oni bounds and false discovery rates and I'll just spend a little bit of time on this as I say here none of these is fully satisfactory and that they're just bounce they're not you know it doesn't tell you my p-value is point zero one five three no it says it is most likely within these bounds right but are you close to the bound or far from the bound you don't know it's really basically a more conservative way of interpreting so it wants you to reject fewer null hypotheses because of these multiple comparisons you probably remember from your intro stats class if you test long enough eventually it's going to be significant well that's what multiple comparisons is about yeah that to caution you from that so the idea and a lot of these ideas come from the work of Ephron and Efrain and hasty in this book that I mentioned statistical computer computer age statistical inference so the idea is that you want to figure out the probability of making one false rejection out of all your comparisons that you carry out and that's kind of a target and that target is called the family wide error rate why don't ask me okay I read this somewhere the sense to me but apparently it does so that's this alpha so that's kind of our target it's not really target p-value but it's a target overall type one error the entire target risk you willing to take to reject the null when in fact it is valid it is true and so what should this target be and there again there's no real consistency based on a lot of large data analyses Efrain hasty suggests 0.1 and I I just want to highlight this because this target is not the p-value so when you maybe read few weeks ago this new you know opinion piece that said you know by about seventy statisticians that said forget about 0.05 let's go with point zero zero five you know and it's it's actually interesting there's a table in Efrain and hasty that has Fisher's interpretation of p-values and so what is the evidence against the null hypothesis and point oh five is only weak and I don't know somehow the literature kind of converged on 0.05 because of the normal distribution and ninety-five percent coverage is two standard deviations all that stuff but really there's no good reason to do that and there's a lot of reason to especially in large data analyses which are basically glorified fishing expeditions you know if you don't know what you're looking for you don't want to reject right you want to make sure when you find a gene that tells you whether you have cancer or not that's the right one right so that that's really where all this comes from you have thousands and thousands of these things you know how do you know that this is really something meaningful and significant is also out the window we don't use that term anymore well I mean people but Ephron and hasty also strongly recommend to use the term interesting observations rather than significant significant is too loaded so basically in this concept of the fishing expedition we find something aha this is focusing on that's really the spirit in which this is developed we have the definite answer here discussion is closed so um I mean the what statistics you know it's a different way of thinking about it so once and the very there's a they're very similar they're good if you don't let's say your target value let's take it as point oh five you do five comparisons what should you use in each of these comparisons 0.01 so you divide your target p-value by the number of comparisons so that works really well if you don't do too many of these but are these for each location so for our example there's only 85 observations so you have 0.05 divided by 85 that's still a reasonable number if you have three thousand counties that's already getting pretty small if you have forty thousand house prices forget it nothing is going to be significant ever right so that's a problem with bonferroni bonds so for example to put it in concrete terms you take 0.01 as the target alphabet even 99999 permutations you have if you have more than a thousand observations you're out of luck in terms of bonferroni nothing is going to be significant if you take point O five then if you have more than five thousand observations you're out of luck so keep this in mind this is probably way too tight about and that's the sense in the little multiple comparisons it doesn't work well for big data so this is what happens to our locations you know for point of about three and one there's just one left but again in an exploratory exercise this tells you loud and clear even with this crazy criterion there is something going on in the South of France that does not jive with the rest of what we're observing so what is it somebody's telling people don't give to charity you know who knows okay false discovery rate this is a good one so anybody familiar with this FDR is not Roosevelt it's false discovery rate so this is a way to basically a lot of statisticians working with large datasets and lots of comparisons realize this bonferroni business no good it doesn't help us I mean what good does it do if you do an analysis and nothing is significant ever right so this is kind of an heuristic it's a procedure that has certain properties and in a bayesian context there's a very nice interpretation of it but in practice this is what we do we take all the observations and sort them by the p-values and then we compute show you in a second so P target B divided by n that's bonferroni multiple by the sequence number of the observation and that we see whether the p value that we have is less than this FDR value and basically formally we find the sequence number which we call IMAX which is less than or equal to this particular X and let me show you how this works so here I've done this analysis this is all in geo de we sort it and then ER is this ratio so the first one is the bound that we had for bonferroni 0.01 divided by 85 the second one is 2 times 0.01 divided by 85 the third one 3 times and so on so the first one this point zero zero zero zero three means only three of our 99999 random permutations were more extreme than the value that we observed so that's very very rare right so then we go up and it's still less and still less and then after the third ones so then the four on three significant location it's at the bottom here at that's what point oh five and thing to the South of France so even though you know now we're being super super careful there is still evidence that something is going on there that warrants special attention and that's what a lot of this exploration is a little thick genetic found are highly suspicious and might provide evidence of course so a couple of things to to close its exploratory it suggests interesting locations remember that we identify pattern we don't identify process multiple processes processes can yield the same pattern you know apparent conjugation and real contagion same pattern different processes it's also univariate and there can be interaction between variables and so the univariate order order correlation can be really because of correlation with something else that we don't include in the analysis and or but this is one case where scale mismatch can give you very misleading potentially misleading read so one of the analyses I did once was for Southern California Orange County and in there developers come in and build a thousand houses at the top all the same so if you're doing hedonic analysis that's all about explaining the price of the house by its characteristics if the characteristic in the price is all the same very high local spatial autocorrelation at the parcel level but basically what this maybe I shouldn't be analyzing this at the parcel level there isn't sufficient variability in my data at the parcel level and if I go to the neighborhood level or something else then I get the variability back and then I get the inside back so this is one other way of interpreting results of a local auto correlation analysis okay the extensions everything you ever wanted to know in ten minutes that's going to be fun a lot of literature this is I mean this paper of mine is as citations up the wazoo I mean has been literally applied to anything you can imagine you know our anthropology archaeology in zoology forestry anything also I'm afraid to say let me move quickly just to give you an idea of how we extend this to multivariate context because this as it turns out it's not that simple so multivator spatial autocorrelation if you look at the literature there's basically nothing there and I looked at this way back one in the early 80s and there was one paper this paper by Wharton Berg who in essence used principal components and included them included something like spatial lags in the principal component analysis but for global analysis and then about ten years ago some French tested statisticians picked this up again and actually this is where I got the idea of using this Gary dataset because that's the data that they used in their papers and again it's based on the global Moran's I and bringing a spatial lag variable into the analysis and then there's one paper that kind of makes a distinction between correlation and the spatial part but it's it's very tricky if especially for multiples we have to differentiate the similarity of the variables as I call NC 2 in the same locate similarity with the neighboring locations and there's a lot of misconception variate Marans i we saw a little bit of that it's it's in the notes maybe we didn't see it in a lab but it's really difficult to make sure you do the right interpretation so all the stuff we saw or didn't in the bivariate case for the global Marans i translates directly to the local case because it's just the same statistic but then there's another way of thinking about this just to show you how simple this is think of two variables so you know two variables the distance between two port in a bivariate scatter plot so the observation is Z 1 Z 2 that's a point in two dimensions the other observation is another Z 1 Z 2 so how similar are these right you compute the distance between them in attribute space so if they're close they're similar if they're far they're not we saw this indirectly when we did the EDA and we had the 3d scatter plot and then we have the parallel coordinate plot so when you had the points close together in the 3d scatter plot they were very similar in their attributes and they're the lines in the PCP were very close together right so now how do we do you know extend this notion of attribute similarity to some kind of locational similar remember it's always the combination of these two concepts and one way you can do this is by you know you have the distance between two points now you take one point at one location and you find its neighbors in geographical space not in attributes me so it has neighbors in attribute space that are close to it but we take the neighbors in Geographic space and compute to wait at distance to these neighbors that is a measure of similarity in fact that is what local Geary is and because it squares it's additive that's the beauty of it so you have cross products very quickly you cannot really extend that cleanly to multiple dimensions without having matrix matrices involved you know that's what multiple regression is but with this it's a distance measure you know distance in two dimensions fine you add them up three dimensions you add them up so that's this notion of a multivariate local Geary which is simply the sum of the univariate local Geary's along each dimension of the variance it's incredibly simple of course you get in to skip the one thing that's easy is you divided by the number of variables included and then all these numbers are comparable that's the easy part the difficult part two things one is inference what's the right p-value and we're back to multiple comparisons but not only multiple across the observations but in some sense Parrott variables even though it's not you know each their individual local Geary's so each of these has an individual significance and we are adding them up so the sum of these has some caption to visual significance what exactly it is it's too hard we're back to square one you know what do we pick as a p-value and that's where FDR and bonferroni come in to help so the inference and interpretation is where you see the interpretation is about similarity and dissimilarity and the way I find it easy to think about is think about two concepts of neighbors in space one is an attribute space think of our points in the cube and the data cube points are very close together in attribute space right one is a map points are very close together geographically in the Geographic space now let's combine the two and let's find the points that are close together in Geographic space where are they in the attribute space or they also close bingo we have a cluster a not we don't so how do we decide whether they're close enough yeah that's the million-dollar question and that's where we use this false discovery rate as one way to discover clusters mostly clusters they I find very little evidence of this similarity but maybe that's because of the particular datasets I've been looking at so we have a cluster map again these are locations are similar both in multivariate space and in gr gr geographical space this tension between the two we are going to revisit it next week and the week after it is part and parcel of multi variate analysis because you have two notions of similarity and how do you combine the two with the univariate approach we had our weights matrix we would and multivariate it's not so just an example very quickly and as I said I ran out and I have this for the French data but I didn't put it in here and we'll do it in up anyway the reason I took this dataset actually is for a different reason this is one of the very few example run into this is a classic data set might have used it for some of the homework it's a data set of sudden infant death syndrome that's this crib death in North Carolina and it's classic because it was used in no crisis text mystics for spatial data and used as an example by many people for new test statistics but one of the things that's peculiar about this is that for one of the years and I forget which one the first one it's not significant so when art get is and and Keith Ord first presented their statistic that they said okay no problem of global what it is and they showed the significant location so even in the absence of global autocorrelation we could find e2 variable the same variable at him and the significant locations as identified by a univariate local giri with 2005 just for the sake of it um on the top here we have location locations I mean this is strictly not right because there's multiple comparisons you know there's two tests but roughly these are the locations that match between the two years on the top and then on the bottom or what the locations that a bivariate multi a bivariate Geary local Geary two as clusters so my very the closing slide is that difficult to interpret as the number of variables increases so I thought it was gonna help a lot well it helps a little it doesn't help that much there's it's still you you have to use it in combination with other insights on multivariate similarity and then of course we have our p-values again where I have found this very useful is in combination with the computation of principal components and we'll talk about principal components next week so if you combine something that I call spatial izing the principal components so you apply spatial analysis to then it becomes actually I think powerful so you have a bunch of variables one of my former students did this work up 22 health indicators reduce them to two three principal component look at the multivariate spatial clusters of the three components and you find very strong
Info
Channel: GeoDa Software
Views: 4,060
Rating: undefined out of 5
Keywords: geoda
Id: HF25odbiV3U
Channel Id: undefined
Length: 77min 8sec (4628 seconds)
Published: Fri Nov 24 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.