Week 5: Global Spatial Autocorrelation

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I mentioned the other day spatial autocorrelation about the combination attributes similarity and locational similarity we saw that with allies locational similarity with weight tricks attribute by me a number of different functions cross product square difference absolute difference and there is a difference between global spatial autocorrelation which is an attribute of the pattern as a whole and the way to think about it is is one number for the whole data set and then next week we'll look at local space autocorrelation which is about specific locations and so in an exploratory sense local spatial autocorrelation is more useful because it brings up hotspots cold spots those kinds of things global spatial autocorrelation is more appropriate in a diagnostic sense in a regression context where you look at the residuals of a regression and you want to make sure that they're not correlated or not otherwise systematically related but nevertheless it's a very important concept as a start because you can't really do local relation before without having a good handle on what global spatial autocorrelation is about so there's a lot of stuff here this is one of my 70 slide delia so i'm gonna have to speak very fast actually seriously so i start with two classics Moran's ax and Geary C which are kind of like the best-known measures for spatial autocorrelation and then we go into a way to visualize these which i call a Moran scatter plot because the Moran scatter plot action Mobile statistic and then the local statistics that we'll talk about next week and then I cover two different perspectives so the Moran's ax and Geary C are tightly connected with the weights matrix and then we step away from the weights matrix and look at distance as the organizing principle and we still have attribute similarity and we still have locational similarity but now locational similarity is organized by increasing distance of pairs of observations and the 30 between the observations is either in product form which is a correlogram or in squared difference form which is a varial gram which is at the basis of a lot of Geo statistics and and so these are all if you wish different phases of spatial autocorrelation they all deal with the same problem how to combine attribute similarity and location similarity in in some kind of summary way so we saw this a couple of times already one statistic for the whole pattern and very importantly not clusters but clustering and so as we've saw last week the point of departure the null hypothesis is the absence of anything is spatial randomness so we want a statistic that allows us to reject that null hypothesis in favor of either positive spatial autocorrelation or negative spatial autocorrelation so Moran's eye is arguably the most commonly used spatial autocorrelation measure because in my opinion because it has a direct connection to a notion of a linear relationship as opposed to Geary C which we'll see in a few minutes which is based on square difference which I have to be honest confuses a lot of people it's actually not confusing is just the reverse you know if you have product it's similarity if you have square difference is dissimilarity so it's the other way around but whereas for Moran's I a positive number means positive spatial autocorrelation for here EC is the other way around so a negative number of spatial autocorrelation and a lot of people get kind of thrown off by that but once you know it all you have to do is change the sign and it's the same so this is a cross product statistic so the measure of attribute similarity is the product between the values at two locations in deviations from the mean Y in deviations from the mean because these spatial autocorrelation statistics make two very important assumptions one we get rid of very easily is the assumption of a constant mean so if you express everything in deviations from the mean then the mean the new mean is zero so it's constant you don't have to worry about it the second assumption is constant variance and that is actually quite a bit more difficult to deal with and we've already encountered situations where this plays role namely when we were doing maps for rates or proportions we saw the intrinsic variance instability that is part of a proportion and how we can deal with that by smoothing the rate and so if we do spatial autocorrelation of rates our little red flags should go up and say hey wait a minute the variance is not constant this is not really appropriate and so we'll see in these extra notes that there are ways to deal with that but importantly they're strictly speaking the inference for these statistics only works if one the mean is constant which we deal with by taking deviations of from the mean and the other one is that the variance constant so numerator we have a double sum over so basically all the pairs of observations Zi and ZJ these values are multiplied which is with each other but as I mentioned the other day they are pre multiplied by the weights so only if the weights are not zero do these cross products actually count so when they are not neighbors I and J the product can be whatever it want it doesn't count in the statistic and then in the denominator we have a measure of variance we have their sum of square deviations from the mean divided by the number of observations so this looks very similar to a Pearson cross product correlation but it isn't and that's important to know so looking at this a little more closely this may seem very weird so the in the numerator we don't divide by n square even though we take all the pairs of observations we don't divide by n square instead we divide by this s Sub Zero and S sub zero is actually what we should be dividing both by because it is the number of actual neighbors among all these possible pairs so if you sum all the elements of the weights matrix that in essence counts how many pairs of neighbors there are so you're just enumerator by that to take the scale out we adjust a denominator by n to end up with a measure of variance so the ratio of these two is a measure of correlation and so as I mentioned the other day a statistic is nothing but a summary that you compute from the data and because it is from the data it has randomness in it and the challenge for any statistic is to figure out whether the value that you observe in your data is special in some fashion so is it likely that this value is generated by your null hypothesis of spatial randomness so how do we do this with Marans I we have this cross product in the numerator and the squares in the denominator even if we assume nice things like a Gaussian distribution independent under the no it's really not that easy to analytically derive what is the ratio of a cross product to a square it's this is one of the assignments that I got when I was in graduate school and I can tell you it's not easy to do so it's possible to get an exact distribution for the statistic but it's unwieldy and it's particularly unwieldy because you basically have to compute it again every single time so it's not like you have z-statistic or a chi-square where you know what it is this you actually have to figure it out so my preference is to and in any case the practical implementation of these distributions end up all being approximations so there's two basic approximations one is assuming normality the other one is assuming equal probability so every observation is equally likely to be at any location which is called the randomization assumption and then analytically you can derive in under to know what would be the variance under the know to go further and try to derive the full distribution or you can just you know throw up your hands and say okay let's approximate this by a normal distribution which is typically done in practice so that's the analytical approach the you know the calculations for the mean and the mean are simple the variances gets a little complicated but then there's another approach which is computational and I've already alluded to this a number of times we basically will create a set of data sets that conform to the null hypothesis by randomly reshuffling all the observations so we mimic what spatial randomness would be and for each of these fake data sets we compute a statistic which then gives us a reference distribution and then we compare what we actually measure in the data to that reference distribution and this very confusingly is sometimes referred to as randomization which of course is the same word as the randomization that we just talked about in the analytical approach or preferably permutation permutation is clear you permute the data and then it's a permutation test which is it's just a spoke your family computationally driven test approaches based on permutation based on reshuffling the values so then another thing is that the statistic itself and this is very confusing a parent especially if you come from a Pearson correlation kind of world amaranth I statistic as such is not comparable so if you say my moran's is 0.5 and somebody else has a morons eye of 0.3 the one with 0.3 could actually correspond to stronger spatial autocorrelation than the one 1.5 which is completely counterintuitive because it has to do with the weights it's about interaction and interaction combines the weights with the parameter so the marantz I coefficient as such doesn't really mean anything unless you know what the weights are right however if you have it's mean under the know and you have its variance or the square root of the variance which is equivalent then you can convert to statistic to what we call a Z value or a standardized value and these Z values now are comparable because the computation of not so much the mean but especially the variance takes into account characteristics of the weights so that has been controlled for so once you turn it from this original Marans I that looks like a correlation coefficient but isn't to the Z value then you can compare the Z values between different data sets between different variables between different weights matrices so very important to keep that in mind that the moran's I as such is not really comparable so this is our data set we'll talk about this more in the lab but just to set the stage i compute Marans i for the house prices in our data set and then using two analytical approaches here are the summary characteristics so the the statistic of course is the same because the statistic has nothing to do with these assumptions and the mean of the statistic is also the same the theoretical mean of the statistic under the null hypothesis again this is kind of strange is minus one over n minus one so it's not zero for a correlation coefficient that you're used to the mean it's centered on zero and it's symmetric in both directions the spatial correlation the moran's I is not centered on zero it's a little bit off to the negative side but of course it's minus 1 over N minus 1 as n becomes larger and larger this for all practical purposes goes to 0 but it starts off in finite samples it's a little bit to the negative of 0 and the theoretical mean is the same whether you take a normal assumption or a randomization assumption that doesn't matter the variance does matter between the two and it's very interesting the the variance for the normal assumption is if I remember this correctly not affected by the weights but the variance for the randomization assumption is affected by the weights we'll see in a second whether that's correct or not but then we take Moran's I subtract a mean divided by the square root of the variance and we have a value and then we can compare them okay so the permutation approach as I mentioned a couple of times already very simple to implement but it requires computation so you reshuffle the data set recompute the statistic and then make a graph of the reference different distribution and so in our example same example the blue curve and the histogram that's behind it is what we get for moran's I our 999 random pain because we compute something called a pseudo p-value and it's not a real p-value it looks like a real p-value but it isn't because it's based on computation so what we do is we compare the red lot which is the value of the statistic in our actual data set to the distribution we count a number of times one of our replicated values is equal to or larger the red line in our case zero right then we take one for the one that we observed from the data set and we divide that by the number of replications 999 plus one for the one from the data set and that's why we take 99 999 9999 so we get nice-looking p-values that are in some weird fraction of something even though it's totally arbitrary the other thing to remember is that the kind of you can call it the precision of the p-value even though that's not really the right way to put it is completely artificial because it depends on the number of replications that you carry out so if you have the same data set and you do 99 permutations you say oh the p-value is 0.01 but then you do 999 permutations and on all of a sudden the p-value is point zero zero one so is that more significant than the other one no it's the same and so it's important to remember that this pseudo p-value it has really a bad term for it because it's not a p-value at all it's just a summary proportion that tells you how often you're replicated randomly reshuffled data gave us this sister statistic that was as Extreme as the one that you observe and for the actual data set so if you've never done this before it's easy to fall into this trap fixated on these pseudo p-values as if there were real p-values but they aren't and just to give you a sense go back to this table and see the real evaluate on an analytical approach has many more zeros for the four this is highly highly not spatially random and as I said earlier that's what we expect because most phenomena in real life are not spatially random so we expect to reject this null hypothesis okay how do we interpret this and a couple of things that are again not standard first of all I mentioned the theoretical mean is minus 1 over n minus 1 so it's not zero but basically it gets close to zero as we get larger and larger datasets the interpretation of the alternative hypothesis we talked a little bit about last week so if it's positive and significant then it points to clustering of like values as I mentioned the other day not high or low could be both or some combination right so clustering of like values and negative and significant alternating values checkerboard presence of outliers [Music] spatial heterogeneity more variability than you would expect under the no so very important that this is a global property I know I've said this before but you wouldn't believe how many times I see this interpreted in the wrong way it's a global characteristic of the data it's not about clusters it's about clustering as a property of the data also if it's not significant then it's not positive or negative that's another one that I see a lot you know people write up the results is Ran's up is positive but it's not significant it doesn't have a sign not significant it means is spatially random and so for all prime it doesn't exist so it's not positive or negative it's nothing when you compare them as I mentioned before very important to keep in in mind that the magnitude of the coefficient depends on the weights and as long as you have the same weights and different variables you can compare them the moment you have different weights you cannot compare them and as I mentioned before using the mean and the variance you convert them to Z values the mean not the mean the variance incorporates characteristics of the weights so then the Z values are actually comparable and just to illustrate this this is the same data three different weights so the top row gives you moran's I they're all pretty much in the same ballpark they're a little bit different and you would think that based on this value of moran's I the one given by the K six nearest neighbors is the strongest evidence of spatial autocorrelation so then we do the mean the mean is the same the variance is not the same that's where the effect of the weights is being incorporated and so as a result the Z values won't be the same either then if we look at these evaluate the measure based on the distance weights gives the strongest evidence of positive spatial autocorrelation so the Moran's I as such not comparable once you turn them into the Z values you in a sense correct for the differences between the weights and then they are comparable so very important to keep that in mind when you write up your results so as I mentioned it it's about clustering no location of clusters I say that so much because I see it all the time you know and you better make a note of it that when you write up your results when you do your project that you don't refer to clusters when you do a global statistic you can't even though you look at the map and you think you see a cluster you run Moran's eye you say ah it's positive and significant I have my cluster no you don't you won't have it till next week next week we do local spatial autocorrelation which will get you to cluster some really more philosophical problems there's this notion of true versus apparent contagion and this has to do with what in the literature is called the inverse problem and it's a difference between pattern and process so process is how something works pattern is the outcome and so one of the problems especially with cross sectional analysis is that we get a pattern but we have to figure out how do we get there what was the process that gets us there and you know it's the same in a lot of Sciences you you get an archaeological dig you have to figure out how did it get there right what was the process that got us there friends you can do the pattern measure all the correlation all you want that doesn't tell you how you got there and one of the problems especially in cross-sectional data it very different types of processes than can give you the same outcome one is what you like to think is a process of contagion mimicking peer effects spatial diffusion spatial interaction that gets you groups of observations that look alike that's what your spatial autocorrelation will give you and and and that is the natural natural interpretation but sometimes incorrect interpretation because the same outcome can be the result of spatial heterogeneity when there is an underlying factor that makes the process different and in my other class we were talking about on snow and the cholera epidemic and so this is a an example where I mean he was lucky but there's there's a tension between apparent and true contagious so one explanation of the pattern would be that somebody who was infected moved into the neighborhood and infected everybody else so you see a grouping more disease than you would expect randomly so you reject - no the other one is that everybody drinks the bad water from the Broad Street pump and get sick spatial heterogeneity a different type of explanation so the key concept here is that especially with cross-sectional data you can tell which of the two it is and so again this matters when you write up your results when you interpret the results you're dealing with a pattern you're not dealing with the process to deal with the process you either need space-time data or multiple sets of cross-sections that can somehow be categorized along different variables you need more information that oughta mine and this is a major problem when you deal also with cluster detection and you know explaining why the clusters are there what is the process that gets you the cluster okay Giri C so Marans I cross-product cross-product as we'll see in a few minutes as a direct relation with a regression linear Association we're very comfortable with that Giri C is focused on this similarity which is not the same as similarity it's the opposite so when the values are small the two observations are more alike one of or large they are not alike o Giri C uses square difference as the measure of attribute similarity instead of the cross-product and in that sense it's but not go to the notion of the Vario gram which will see a little later the main difference between the two is that with the Vario gram we take the distance as the organizing concept to four PI's locational similarity and with giri see we have awaits matrix so again the sticks a little complicated actually isn't that complicated the numerator again we see the double sum we see the weights which really means that we only count two neighbors the other ones are now counted we see the square differences and we divided by some factor to rescale things and the two in there is really to rescale it such that the mean of giri c is 1 not 0 1 and then the denominator is the same as before this time we divide by n minus 1 which makes it an unbiased measure of the variance not not a consistent one and again the mean effect is expression that I have in this slide is the textbook one and it can be if you look at it very carefully you might say hey why does he have X in the numerator and Z in the denominator it's because it doesn't matter because you take the differences the means get washed out so whether you expressed this in the original variable or in deviations of the mean it's the same and so in the textbooks it's often written like that of course in practice if you've computed the deviations from the mean you're going to use them in the numerator and in the denominator it doesn't really make a difference so as I said or is the flipside of Moran's I Marans I positive is positive autocorrelation here positive is negative autocorrelation and negative is positive autocorrelation why similarity small dissimilarity is alike alike it's positive spatial autocorrelation so the mean of Geary see and this is one of these other weird things is not zero it's if you have smaller than one below the mean or you use the standardized form the Z value that is negative that points to positive spatial autocorrelation and the reverse of course points to negative spatial autocorrelation so to me as long as you know that it's not more confusing than the other one it's just a way of thinking about it but it's important to appreciate the difference between these two they're not the same a lot of times when you look at this the first at first sight they often give you the same indication the same conclusion but not always and why not always because they really measure different things the Marans eye is all about linear Association the DVC doesn't need that it doesn't need the linearity because it's based on if if you think about it the square differences is a measure of distance in attribute space so if you think of one variable and you have two observations on a line one is here and one is there how far apart are they that's what Geary C measures the square difference between their values is you know in geometry what the distance would be on later we'll see how we can extend that to multiple variables but it's the same concept so here we see is really all about distance in attribute space whereas Miranda is about a slope there's a slope of linear Association we'll see in a few minutes so inference same thing you can do a lot of algebra and do analytical derivations and then use approximate it's a little complicated with Geary C or you can just do a permutation approach and then a reshuffle again same example notice all these values for C are less than one so they suggest positive spatial autocorrelation it's not because that there's positive that they're positive that it is positive spatial autocorrelation if you had one point two that would point to negative spatial autocorrelation so it's with reference to one that you have to compare them you see the mean is one no matter what assumptions you make the variance differs that's where the effect of the weights is taken into account and then the V Z values differ as well and in this case the ranking for Moran's I so again if you get the coefficients on the top row the smallest value which is the population a nearest neighbor with six neighbors but then by the time we get to the XIV the most negative positive spatial autocorrelation is for the distant band weights which is exactly what we found for Maryam sigh this is our reference distribution and you know as opposed to Moran's I where our statistic was on the right-hand side of the reference distribution here it's on the left-hand side so it's much color anything else we simulated by reshuffling the data which click into it being significant so as I mentioned the main difference between the two is that zai is a cross product so it's you know just like a correlation coefficient is related to a regression coefficient we we're very comfortable with that but that's not always the best notion to use because if we have highly nonlinear features then Moran's I may not pick it up whereas Giri see this linearity it's all based on distance might be able to pick them so they are not the same so Moran's I I don't need to recap we just did it but if go a little lower and do a little bit of algebra especially if we row standardize the weights row standardizing just to refresh your memory means that each row of the weights matrix sums to 1 so the sum of all the weights is the sum of the sum of the rows so each row sum is 1 we have n rows so the sum of all the weights for row standardized weights is the number of observations we see in the expression above we have n in the denominator then s sub 0 in the numerator these tips loud so everything looks a little simpler and if you remember your algebra from a simple bivariate regression slope that's what Moran's R is so Moran's I is the slope in a regression but not the regression that comes natural the natural regression is the value of the location we grew up neighboring locations so that would be the Zi as the dependent variable and the spatial we expected as it's called so you can't use ordinarily squares to regress that slope the slope is biased but if you flip it around it's now and in fact given that the Z squares are in the denominator unique around so the dependent variable is actually the spatial lag and the explanatory variable is the value at a location so it's flipped around but once it's flipped around you have a scatter plot you know every pair of a value and its neighbors is represented as a point in that plot and then you can fit a linear line through the scatter plot as we've already seen in the lab a couple of times before and the slope of that line mathematically is moran's i should you do a significance test using regression statistics on this no ok very important that's not valid it is a regression but it is not the kind of inference that we do for the spatial autocorrelation coefficient so we still need to do the permutation to figure out what the significance is so this is very easy to do and we'll I this is the same example again the linear fit is the line the slope of the line is given on top point 2 8 2 that's Marans I autocorrelation coefficient a couple of things that you can do so you can't just do look at the traditional T statistics for this but you can use this as a diagnostic for the extent to which particular locations may be driving this slope and in particular in the in the Cleveland example I don't know if you looked at the data at all a little more carefully but there's this one sale about half a million that is just totally non representative of the rest the sales so this point is way out there and if you think of regression I don't know how much regression analysis you've had but there's this notion of leverage of leverage points so there are certain observations that have maybe a disproportionate effect on the slope of this regression and and you know we can you know using geo de for example you can kick this point out and redo the analysis and see the effect on the slope of kicking this point out and and you can do other such things looking at the influence of particular observations another classic example is when you look at international data some work I did a long time ago we looked at conflict in Africa and African countries at the time Sudan was still one country and so Egypt was Sudan's neighbor and Sudan was Egypt neighbor so there was no averaging going on in the computation of moran's I it's just one neighbor and you you will tend to find that these are kind of high leverage points mutation of the statistic now if you have a lot of observation that all kind of gets washed out but if you have smallish datasets this can make a big difference in the Africa case it did make a big difference so one that this Moran scatterplot allows you to do as I mentioned in the introduction is start to make the transition from the global thinking to the local thinking okay and how do we do that we do that by sectoring everything on the mean so these variables are all standardized so the mean is zero and these are actual standard deviation units so look at this this is going to be about eight eight standard deviation units away from the mean so by any definition that is an outlier so we Center them on the middle and then we think about or think about categorizing the type of Association into four quadrants as two quadrants right and lower left of course intuitive spatial autocorrelation and the other ones correspond to negative spatial autocorrelation and then you can refine it even more and think about the top right quadrant is high values in the sense of above the mean surrounded by neighbors the spatial lag of the neighbors is the app and are also above the me so we call that the hi-hi quadrant but keep in mind that this is just relative to the mean if the mean is very low then high high is not going to be that high right and you know it's all relative to the mean and this is important when you do this over time and we'll talk a little bit about a little weird every time you do this so even though go down over the year but on the mean so high high even though you or 10 years before that's something to keep in mind so high high is positive below the mean with neighbors below the mean is low low also positive and these correspond to notions of clustering and then next week we'll specialize this to specific locations in these quadrants which will be clusters locations surrounded by other similar locations above the mean high high and below the mean low low then the other quadrants are what we call spatial outliers or what I call spatial outliers these are here in the lower this one right hand side the value above the mean surrounded by neighbors below the mean so this is a spike surrounded by much lower values and then above is the other way around it's a dip surrounded by higher values so a lower below the mean value surrounded by neighbors above the mean so this we call high-low the other one low high so the visualization of the Moran's I coefficient in the Moran scatter plot has two aspects one the slope of the linear fit is the statistic so steeper slope higher statistic negative slope negative statistic and so on but secondly it allows us to start categorizing the type of spatial autocorrelation whether it pertains to values above the mean or below the mean surrounded by neighbors above the mean or below the mean and the categorization of locations as potential spatial outliers so to summarize we have the four quadrants positives spatial autocorrelation like values negative autocorrelation outliers I call them spatial outliers here they are and then of course given the linking architecture behind software like Gio de we can do link to find out where those things where those locations are and these are the outliers then we can do other fun stuff with the Moran scatter plot is basically do anything that you can do with any scatter plot and one is why pick two the obvious linear fit is Moran's I so reason but we might be interested in structural breaks spatial heterogeneity especially if these structural breaks have a geographical spatial interpretation and as it turns one of the examples I concocted for the lab it's perfect you know you can really see the difference but in the degree of spatial autocorrelation in different subsets the data here it's a little mixed there's clearly something funky going on but you know that could be in part because of the outliers every move yeah and this I removed them but still indication of a positive slope in part of the data and then once we get beyond these really high values that are more than one standard deviation of the mean there is much less evidence of positive spatial autocorrelation he do we would want to see where where are these sales and is there anything about these transactions that's different from the other ones in the if I can call it the classic Columbus example which I used way back one to illustrate spatial autocorrelation there's actually a way to divvy up the data so that part of the data has spatial autocorrelation and the outlying areas don't have any at all so they're you know if you it's crime statistics so you know the core of Columbus has all the crime and is very correlated spatially once you move through the outlying areas there's no spatial autocorrelation whatsoever so then again that points to natural Jannetty you have to take this into a cup to build for the data so in a interactive dynamic graphics kind of sense which we've seen before this is how you use the lowest smoother to characterize structural potential structural brakes in the nature of the spatial autocorrelation so we've seen Marans I cross product and it's visualization in the Maran scatterplot we've seen Geary's say C square deviations both of them rely on the weights matrix a lot of people don't like the weights matrix they find it's counterintuitive it's arbitrary where does these neighbors do these name you can see where I'm coming from somehow distance is deemed to be superior actually it's the same thing you there's no free lunch you're trying to model interaction and you don't have interaction data so whether you put in prior structure through the weights or you put in prior structure by sorting everything by distance there is still a prior structure and so one way to approach this is a totally data-driven way and we call it a nonparametric correlogram and it's nonparametric because there is no model so moran's i has a cross-product and some connection to a regression slope Dec is a square difference here we let the data drive the shape of the Association so both Moran's I and here we see one coefficient for the whole data set here we're going to fit a curve so this is a little bit different when we there's no nonparametric autocorrelation coefficient but there is a nonparametric autocorrelation curve which we call the spatial correlogram and unfortunately again there's a lot of confusion about the terminology sometimes a correlogram is used to designate a series of Auto correlation coefficients computed for increasing orders of contiguity so you will see this in the literature say we have our prices we do first-order contiguity Marans i second-order contiguity Marans i third order and so if you show these in a graph that is also called a spatial correlogram this spatial correlogram is different in that it's more related to the concept that comes from G statistics which we'll see in in a few minutes so we use the actual correlations in the sample and and how do we do this well in general and typically we do this for standardized variables to get rid of the effect of the mean primarily and also a little bit effect of the variance so any cross-product of variables if they're just in deviations from the mean we have to rescale them by the variance but if they are in standardized units we can just multiply them that's an estimate of the covariance or you know if they're standard but it doesn't get us very far you know that doesn't get it's the same incidental parameter problem that I talked about the other day so we we can't have really have a different parameter for each possible pair so we have to organize this somehow and one way we can organize it is to make this covariant spatial covariance or correlation depending on whether the variables have been standardized a function of distance and not a specific function now like you know inverse exponential or something like that but a function that's dictated by the data and we've already seen this concept a few minutes ago in our local regression course the local regression applied to the scatterplot fits a different kind of regression using a subset of the data so it follows the variability in the data much closer than say a linear fit which is one slope for all so there are a number from ways in which you can do that local regression is one kernel regression is another one so the essence is that it's there's no functional forms pre-specified the fun by the data and if you know this works well if you have a lot of data if you don't have a lot of data this doesn't work very well so in general it just says the correlation between I and J is some generic function of the distance between I and J and the way we deal with this is you can think of a graph and I'll show one in a few minutes that's a little bit different but think of a graph with all the correlations on the vertical axis as the value of the correlation on the horizontal axis is the distance separating the points so every pair of points is every pair of locations is a point in that graph the higher the more correlation the lower the less correlation the further out the further apart they are the further to the left the closer they are and then we take all these points and we can fit a curve to it just like we do with the lowest curve or a kernel or if you just fit in a non parametric curve to it so that's the principle you have all these cross product and you do explain this is that if you're all familiar with the notion of a very always positive there's no such thing as a negative various so once you extend this notion from you know a univariate context with the variance to a multiple variable context you have variance covariance and the counterpart of being a positive variance is that you have to have a positive definite variance covariance matrix in a nutshell means the determinant has to be positive it's it's not just a property of the variance itself but it's also probably that includes all the covariances and one of the problems with this nonparametric it is that is that it does not guarantee sure of the variance Kovach extent you end up with is actually positive definite now that's a bit of a technical notion but it does mean that if you're not careful that you can end up with nonsensical results like negative variances the counterpart of negative variances the other thing is that it doesn't necessarily respect Tobler's law Tokoloshe laws and everything correlated with everything else but for closer things more so or translated for less so oh you want to see a decay in the spatial correlation with distance and because this is nonparametric there's no guarantee that the slope actually goes down it can go down and up and up and down as we'll see so one way to implement this I mentioned is to fit a local regression curve or lowest curve or a kernel kind of fit on all the pair's a simplified way because remember pairs is n square so that's all good and well if you have 40 50 observations if you have a hundred thousand that's a lot of pairs right so it becomes a little unwieldy very quickly so what we do in the Geo de implement as we've been up and you'll see in a few minutes the counterpart of this in geostatistics so basically we divided engines and then put all the pairs that are within that then and then spatial correlation for them and then fit a local regression curve through these points so every one of these little blue dots is the average of the spatial correlations of all the pair's in the distance bin that corresponds to it and one thing's notice is that this is not a simple distance decay in fact it goes down and it comes now why would a case well two reasons and they're basically related to the same thing as we move up in the distances that separate the points these bins get smaller and smaller so that's the nature of the distribution of pairs of points there are you know not all the points can be very far away from each other so the point that are truly far away from each other there's very few pairs and those bits that means that the precision of the estimate goes down so as these blue balls they are just estimates for the correlation so as you have more pairs and the bins the precision of the estimate will be better as you have fewer pairs in the bins the precision of the estimate gets worse so in other words you know whether that blue dot in the last bin is actually effectively under the zero line we don't really know it may be surprised that for all practical purposes it could be zero so that's one aspect of this the is related to this and it's Tobler small as these points get there and further away from each other in fact they shouldn't be correlated at all they're doing here is having an imprecise estimate of what should be zero so that's why there are a lot of rules of thumb and we'll talk about this in the lab - you know cut the distance adjust a distance over which you evaluate these pairs mess with the number of bends and all this kind of stuff that we'll talk about in the lab so this is an example of as you shorten the distance it's easier to see the the shape of the curve what is the most important aspect here it's not so much the magnitude of the correlation which is really the focus in Moran's I and Geary C but it is the range of the correlation so for all practical purposes once we cross the zero line we can say that pairs of points further away from each other than that distance which we call a critical range are no longer spatially correlated where is this useful if you want to sub sample points that are not spatially correlated you just make sure that they're away from each other than this critical distance rate so if this is useful in a big data world if you have lots and lots and lots of data points and you want to use traditional statistical analysis that requires uncorrelated observations you figure out what is the range of the spatial autocorrelation and then you resample points that are far enough from each other that they are in fact uncorrelated and then you go back to your standard techniques now of course if you only have 40 observations you can't do that this only works in large data settings so as I mentioned it's an octave to using spatial weights we use distance instead there's a whole bunch of technical issues that I may have time to talk about in the lab but you know there beyond the scope they have to do with their generic technical issues of how you fit a local regression curve what kind of kernel function do you use what kind of bandwidth do you use those kinds of things we talked a little bit about in the context of the lowest fitting to a scatterplot but is all the same thing so I really like this as a quick and dirty sense is the range of interaction that we might have in the data and if you have lots of data like in typical real estate transactions you have thousands of data points if you want to do an analysis that is free of spatial autocorrelation you figure out the range of interaction you resample and you're good to go now we switched gears again so we started with the weights then we got rid of the weights and we use distance we didn't have any function but we still use cross products now we're going to distance again but we won't use cross products and this is actually could be a whole course in its own right and in fact it typically is so here I spent 15 minutes anything you ever wanted to know about geostatistics right the concept the critical concept in geostatistics is a very ogram function or more precisely a semi very ogram but it doesn't really matter I mean it matters to people into your statistics but as we'll see it's just a matter of dividing by two so you divide that by two this is the half of area gram the semi Vario grab the one on top is the full Vario gram as long as you know which is which and what you're talking about we're all we're all good so very Oh Graham comes out of mining actually gold mining and it's really a thought ology underlying a particular form of spatial interpolation and just think of the situation where you work for a mining company or you know in this is a big deal in in the oil exploration so you're going to drill for oil it costs millions and millions of dollars to do a test drill how many test drills are you going to do and how are you going to estimate how much is under there based on the result so you can build as sample points and of your estimate or as a spatial interpolation typically in reality in three dimensions but we won't go there it's the same thing so it's basically how do you get a sense of the function that you can use to interpolate spatially meaning coming up with values for locations that you do not observe this is the same problem as say air quality measurement in a city you have sensors you know the array of things is going to deploy a bunch of sensors in the city how do you assess the air quality for location don't have a sensor so probably there is something systematic in the correlation between these values systematic across space and that's the notion that's the idea behind this very ogram which is actually much more robust than everything we've done so far because again just like geary see it doesn't use cross products remember i mentioned in geary see whether you take out the mean or now it doesn't really matter because it gets rid of it so you don't need to make any assumptions about the mean here's the same thing you know so it takes it's a very strange concept right strictly I mean you'll see in a few minutes it's actually more intuitive than it looks at first sight but it's the variance of the differences across space so it's not so we have air quality measures right one here then one you know a mile away from it it's not the variance of these measures is the variance of the difference between these two measures that are a mile away from each other and in a few minutes after a little bit of algebra we'll see what it boils down to so that is called the Vario gram and the underlying assumption is that somehow this measure the summary of the data shows systematic variability with the distance of separation between the points the locations to which it pertains and the distance of separation so s summarizes the location s plus H is a move away by distance measure H from that original location what is the variability of the difference between the two measures and then you / - it's the same averag Ram so if we do make a constant mean assumption then the mean of the differences which is the difference of the means between because the means are the same is zero so then if you remember your stats the variance of a difference is the expectation or the mean of the squares minus the squares of the means and the squares of the means is zero so you end up with the mean of the squares and now we're back in business just like for the correlogram we are pairs and we got the average cross product from all for all the pairs in the bin here we're going to bin our pairs and get the average squared difference for all the pairs in the bin and that is the semi Vario gram however unlike the correlogram where correlation gets smaller as we move further away since we're measuring squared difference we're measuring this similarity that this similarity will go up as we move further away so the correlogram goes down the semi Vario gram goes up as we move away with distance so how can we summarize this a couple of different ways Chioda doesn't do this why not because we don't want to reinvent the wheel and there's fantastic software already out there to do very Oh gram analysis and geo statistical analysis we have time in the lab in 2g stat which is really a very good package and our to analyze this but in any case you have my notes so you can do it on on your own so one thing again is to throw everything out there you know just like I mentioned for the correlogram we could have all the pairs or basically half because they're symmetric and then just kind of look at this right and this is called a very Oh gram cloud plot and that's what it looks like it's a mess because there's so many pairs basically the bottom part is all clogged up you don't see anything and you're basically not really interested in this bottom part what you are interested in are outliers and outliers in the following sense so observe it's on the left side of the graph are very far apart so therefore they should be very similar they're this similarity which is the square difference should be very small so if you have a point here that has a very small distance but has a large square difference you say what is going on here right and then in Gstaad and other similar software you can identify these spares for so for example we click on a point over there each point corresponds to two observations not one observation and then we can identify them in the point map and we see for example that these are points that are very similar and very far away or very dissimilar and very close together so these are the kinds of you know exploratory things you can do with the Vario gram cloud plot the example I think these is the house prices so it identifies these couple of transactions that are way out of line compared to the other ones another example I use a lot is air quality measures in Los Angeles if you're familiar with Los Angeles it's a basin that's why they call it Los Angeles basin at the edge of the basin our mountain ranges and you go over the mountain ranges there's a desert right so some of these sensors are on each side of the mountain range but the air quality is very different so there's basically a break structural break in the air quality but they're very close together so in the rationale of the variable also the correlogram as similarity decreases with distance or dissimilarity increases with distance these two sensors these two measurement stations are very close together and show very different measures on each side of the mountain range so that's something have to take into account of your modeling there's no way you should use these together you should somehow model the structural break in in whatever you do so that's what you use this for so then we have this notion of an empirical Vario gram which is very similar to the correlogram that we just saw but it just like Giri see relative to Moran's i1 is cross product the other one is square difference this again is based on square difference and we've been pairs in a distance band and we take the average of the square difference between them and very similar to what we saw before so now we have these little s with values of the square difference the square difference goes up as the distance between the pair's increases just like I mentioned before there's a bunch of rules of thumb I'll talk about this in the lab half the maximum distance 1/3 of the diagonal of the bounding box at least 30 pairs per bin all this kind of stuff the bottom line is because these are summary estimates number of pairs that you take into account so at some point it doesn't matter anymore if you have thousands of pairs at the difference in precision is not going to be a big deal but if there's very different densities if you wish in each of the bins then the precision of these values will be very different and then any pattern regular pattern that you might accept expect will be very hard to detect because of the imprecision of the measure just a couple of quick things it shouldn't keep going up it should flat now because at some point there is no more effect of increasing distance if you're not correlated you know whether you are 50 miles apart or 200 mile apart you're not correlated so it should be flat right this one is now flat this in practice tends to point to the presence of a trend in the data so to get out the trend you basically use a regression and then do the analysis again on the residuals of the regression that's a little too technical but you know you see it gets flat flatter and flatter as we take residuals of of a model in this case a trend surface model so the idea is to use this empirical Vario Graham to inspire a model for the change of this similarity with distance but we've before we go there yeah so well you can read all their stuff before we do go there very important assumption is this assumption of isotropy and I saw trippy means that the only thing that matters is distance and orientation doesn't matter now jool statistics is a big deal in the natural science it's not so much in the social sciences but in the Natural Sciences is used in soil science and used in meteorology meteorology I can't say that word but climate science and there the assumption of isotropy doesn't necessarily hold even in air quality it doesn't hold there's prevailing wind directions and you know for example in the LA basin though the wind comes from the ocean or from the desert is going to make a big difference so one of the things that you have and and if we had more time we could explore this in in further depth is the extent to which this assumption actually holds and there's a number of devices that allow you to do this if we have time we'll go over this in the lab but just to give you a sense what is to split up the the pairs of points but also by distance and direction you know the angle of separation are they horizontal and go and then you do basically take subsets of the pairs and have a separate very Oh gram for each of these and if there is no directional effect they should all look the same right another way of doing this is to go even further and rather than just taking action angular ranges and distance you combined it to so you have basically a little each of these little boxes shows separation in both distance and direction and again what you want to see is nice concentric circles so that means there's no directional effect if you see these kind of elliptical band to evidence of directional effects what problems with this is that these bins get really sparse very quickly so then there's all kinds of tricks which we can see in jeast at the software where you eliminate the ones that don't have enough pairs and still you have even stronger evidence of some form of directional effect okay what is the end of all of this is a model in it's everything you ever wanted to know about very ogram models some strange terminology we also saw the notion of a range so range is basically distance beyond which autocorrelation spatial doesn't matter anymore SIL is this comes from mining is a another word for the variance of the process the total variability in the process and the nugget is totally bizarre that is just a measurement or error that is there should be no variance as distant zero but in practice because of lack of pairs at very small distances there's always a bit a little bit of a rounding error and that's called the nargis so the ideal theoretical very ogram looks something like this you see a slow continuous increase of the dissimilarity with increasing distance up to a point the range and then it's flat right of course you saw a pair of big Rams they don't look not they don't look anything like this right so then there are a number of models remember this thing about the positive variance so they're valid very ogram models that do mathematical expressions as a function of the distance that have the the right property in the sense that they have a positive definite variance covariance because these are squares again it's flipped around there has to be a negative different definite Vario gram we won't go there but these are a couple of the most commonly used spherical and there's a classic one up in the upper left-hand corner the exponential one is also used a lot gaussian one then this is the wave there's all the weird ones that can account for strange phenomena they're really having to do with different scale effects some things work on smaller scales something work on intermediate scale so this is just a little bit of terminology and then the end result is that in practice you want to fit one of these to the points in your empirical very Oh Graham and again you can spend a whole course on how you do this and all the different complications of heterogeneity precision robustness and then once you have this function then you're in business because then you can use this to do spatial interpolation because you can for any distance get an of either the square difference or related to this with a little bit of math the covariance and then you can relate the value at a unknown location to observed values at known locations in function of the distance to those known locations and that gives you the tool for spatial interpolation and I did it 70 slides I promise I won't do this again
Info
Channel: GeoDa Software
Views: 4,263
Rating: undefined out of 5
Keywords: geoda
Id: TeKN5SaabPs
Channel Id: undefined
Length: 77min 13sec (4633 seconds)
Published: Fri Nov 17 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.