Week 4 - Spatial Autocorrelation

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so today we're moving a little bit away for there's the food or something we're moving away a little bit from pure exploration to hypothesis testing and we're going to discuss the concept of spatial randomness spatial autocorrelation then in general terms what a test for spatial autocorrelation involves before next week we get into specific test statistics and then I'll spend some time on the topic of spatial weights and spatial weights are often for people who do not come from a geographical background something very difficult to grasp and they're actually very simple and we'll see why we need them and how we can construct them and what their properties are and then in the lab next Monday we'll be dealing exclusively with spatial weights and so I'll have the lab notes for G Oda up again and I also have my notes from last year on how to do this in R they will be up there as well okay so we'll start with a discussion of the notion of spatial randomness which we've already seen a couple of times then so that's going to be the null hypothesis the point of departure and then the alternative hypothesis is basically what we hope to find but as I'll explain it's not really that simple in a statistical logic is the alternative of either positive or negative spatial autocorrelation and it's good to have an idea of what this looks like then as I mentioned bit about the logic and the structure of a spatial autocorrelation statistic in general terms and then the spatial weights so spatial randomness there's actually kind of boring it's it's the null hypothesis is the absence of any pattern so spatial randomness is where you go home and you can't do anything because it's random you can't explain randomness random is random right so what we're looking for of course is not this randomness so the whole point is to reject the null hypothesis of spatial randomness and then if we reject it then we want to find out well what is actually going on and that typically is a spatial or spatial structure or a spatial process depending on exactly that the point of departure is this notion of randomness so when I say it's kind of boring it's not very interesting that's of course in the context of exploration if you want to explore some data and it's random forget it can't do anything so that's not a good thing but if you are specifying a regression model and you're checking out the residuals of that regression model then actually randomness is a good thing because you don't want anything systematic left in the error terms or in the residuals of an actual model so there's really two different contexts in which to consider this and in this class we focus a exploratory part in my spatial regression current across we focus on the Diagnostics part and there we're looking for because we don't want any systematic patterns but if you're starting looking at a data set and there's no rhyme or reason to it which is what spatial randomness is then you're stuck so there's two basic ways of thinking about spatial randomness they may seem the same but they're not the same so the one is to think about the pattern as a whole and the pattern being where the particular values end up in space on the map and basically we have an equal probability of any possible map in other words the map as such does not provide you with any information any arrangement of the values on this map is equally likely and we've already seen this when we reshuffled Milwaukee and we'll see it again so that's one view it looks at the pattern as a whole you could think of it as a simultaneous review the other view is something which you could call a conditional view where you say does knowing something about the neighbors affect what we know about a given location in other words conditional meaning knowing what happens for the neighbors do we know something about a particularly location at hand and so if that is not the case then we have spatial randomness so it's a slightly it may seem very subtle to you but it's actually pretty fundamental not so much in an exploratory context but when we go to the modeling weather we take a simultaneous view of the whole pattern at once or a conditional view of each location conditional on what happens in the neighbors but the bottom line is neither of these should have any kind of information so what do you know about the neighbors should not help you in any way about knowing what happens at a particular location or where do you see one map or any of the possible permutations of the values it's all the same and we saw early on that that's a fundamental difference between non spatial analysis and spatial analysis so in spatial analysis the structure the location of values has information contact and non spatial analysis as we saw the histograms were the same for the two maps box plots the same for the two maps it doesn't make any difference so this will be very important later on when we try to build inference for test statistics is how do you operationalize spatial randomness and this is very powerful and very computational this is something that doesn't really fit an analytical framework I mean it does sort of after the fact but the key is that we use computation to generate what the null hypothesis might be so if the null hypothesis is that any value that we have can equally likely appear in any location then we can actually mimic mimic that we can simulate that by just randomly reallocating the values to locations so that's my example Milwaukee map so the one on the left is the real map and the other one is the exact same values the exact same census track data but randomly reshuffled to other locations and so the way and I'll come back to this later the way that we use this the way that we explore of a statistic under the null hypothesis so in shorthand we call that a distribution under the null so what do what we will do we will construct some kind of statistic a statistic and I'll repeat this later is nothing but some kind of summary of xmm of the data that wouldn't be a good spatial statistic because it doesn't tell you anything about where things happen but this sample of a statistic and so what we're interested in is how does this statistic behave under the null hypothesis because then we can compare the value that we actually get for our data to what it is likely to be under the null hypothesis and if it's very unlikely then we reject the null hypothesis that's a standard approach in in in statistics so the difference is that in a lot of textbooks today stiix the null hypothesis for statistic is actually derived analytically you know you do a bunch of math and then you have a distribution typically a normal Gaussian distribution or something related to it here it's computational so we get our null distribution by letting the random numbers cranking and reshuffling the data and recomputing the statistic each time so spatial autocorrelation can really not be discussed without referring to Tobler's first law of geography sounds like a tautology but it actually I mean it sort of is but there's two two ways of thinking about it so everything depends on everything else but closer things more so so basically what this does is it structures the dependence in space by stating that distance matters and distance matters in the sense that observations closer together with a smaller distance are more dependent than observations that are further apart and what I like to say is that you can also turn this around and you can actually define distance by how much observations are correlated so if observations are correlated a lot they're close if they're not correlated they're far but it's basically the same thing it is using this metric a a metric a distance trick to structure the dependence so you can't have a situation where everything depends on everything else equally strongly because then you have no information it's like having one data point but you can extract some knowledge about the structure of the dependence if you pause it ahead of time that there is some distance metric that induces a decay on the correlation induces a decay on the relationship between the variables and that's the essence of Tobler's law you need to have something in there that structures the dependence and one way of doing this is actually excusing distance measure as we'll see next week the other one is to use a weights matrix which we'll see later today but they're basically the same thing their ways of imposing structure on this dependence so no is spatial randomness what's the alternative so there's two and they're very different the one is the more natural one is positive spatial autocorrelation is the notion of clustering and is very important to make a distinction the distinction between clustering which is a characteristic of the pattern as a whole and identifying clusters which are specific locations so here we're dealing with what I call global spatial autocorrelation which is concerned with clustering the pattern as a whole is this pattern as a whole different from what it would be under spatial randomness clusters or local spatial autocorrelation is pinpointing pots and cold spots at specific locations in the map that's a very different thing as we'll see next week we'll consider global spatial autocorrelation and the week after we'll consider local spatial autocorrelation really things but different one has to do with the whole map the other one has to do with specific locations so in an exploratory setting clustering is actually not that useful I mean sure we reject a null of spatial randomness but then what you know it just tells us there is some patterning in the data but it doesn't tell us anything about where the patterning is nor what type of patterning it is and that's a often a misconception we're positive spatial autocorrelation is somehow interpreted as having clusters of high values in particular locations well it's a local thing that's not a global thing global spatial autocorrelation the rejection of the note and the finding of positives or the correlation is that for the whole like values tent neighboring locations and very important like values not high or low either one so positive autumn spatial autocorrelation can be driven by the grouping of high values or by the grouping of low values or by both a combination of both so positive space autocorrelation clustering is kind of an intuitive thing we we can at least we think we can visualize it we think we can imagine that negative space law correlation is the opposite but you know what is the opposite of clustering that's not that easy to conceptualize so negative spatial autocorrelation is really the easiest way for me to describe as is to think of a checkerboard pattern where adjoining values tend to be more dissimilar than they would be under the null of spatial randomness so high low high low high low systematic patterns like that so that's negative spatial autocorrelation whereas positive spatial autocorrelation is a notion of information loss so one things are positive correlated in space you actually have less information than if they were independently distributed your standard random and that's something that we deal with in the regression analysis correcting for that loss of information negative spatial autocorrelation you can think of as having more variability than under randomness so in some sense that would be a good thing it's additional variability which is in some sense additional information although it doesn't really work that way negative spatial autocorrelation is typically associated with heterogeneity with spatial heterogeneity and oftentimes also with scale issues remember the first lecture I talked about these kind of plungers with spatial analysis and one of them is scale what is the proper scale of analysis the modifiable aerial unit problem with all that oftentimes when you change the scale of analysis the sign of the spatial autocorrelation changes you know think of huge blocks so Zenda and each block has parcels a similar so at the parcel level you intend to find very positive spatial order correlation because all the parcels are very similar at the spatial autocorrelation high valued low value black high-value block Laval so when you see negative spatial autocorrelation or more precisely when you see a change in the sign of the spatial order correlation with the scale of analysis think maop and think carefully about what it is that you want to find in the data at what it is that you're looking for because it may be different at different sorts of analysis and that's nothing unusual that is the you know it's between micro analysis and macro and ours you know there's not always consistently aggregated okay so net positive space loader correlation as I mentioned impression of clustering you ten clumps of like values those clumps would be clustered although chin but with global spatial autocorrelation we don't sell with that at important to remember high or low I've seen some examples where somebody there's an analysis and they find positive spatial autocorrelation and they conclude that the correlation is due to the clustering of high values and it doesn't have to be there's a classic example of a stroke belt in Georgia and part of the south east of the US there's an there's elevated rates of stroke and nobody really understands why is it the soil the food is it genetic and what is it it's very complicated and one of the early analyses of this at the state level which is no good to begin with but at the state level up these stroke rates concluded one finding positive spatial autocorrelation that it was due to the high rates in the southeast in fact if you do more careful analysis local analysis it was actually due to the prevalence of low rates that clustered in the Midwest and Upper Midwest so this notion of positive spatial autocorrelation is not nothing to do with high or low has to do with similar and it could be actually a mixture of the two where the lows are together and the highs are together so together you reject the null very difficult to judge by I and every year I say the same thing I should run this experiment but then I don't really have the time to do it so what I'm doing here is taking a 10 by 10 grid so I have a hundred observations and I take a standard normal random variant so mean 0 variance 1 and I generate a hundred values and then I make a choropleth map with six categories of the desert this is what spatial randomness can look like right so you know if you look at it if I have done this properly and you know I had given you each these clickers I say is this correlated or random and you bye-bye to anything half or more the people will say it's good and it isn't it's just our brains are wired to see patterns even when there are no parents so I had this hundred independent uncorrelated standard normal random variables and then I put them in a spatial model and with the spatial model I could autocorrelation and I crank it up and crank it up and that's with a spatial parameter of 0.9 so that's very high that is significant positive spatial autocorrelation so it's characterized by clumping of like values but whether this clumping is in the upper left corner or in the lower right corner that a global measure does not concern itself with that's going to be the local analysis which will find where are the locations that are more similar than they would be randomly so this is positive spatial autocorrelation board pattern really difficult to distinguish from spatial randomness and again I should run my little experiment just don't look at the captions and see if you can you know must be nice to so the one is again completely spatial random the other one has minus 0.9 a spatial auto regressive coefficient from a spatial process so both of them have some tended to show alternating patterns but that one is significant and test their spatial order correlation so this is not not your brain seeing things this is actually not compatible with spatial randomness at a given probability level which is your p-value or your type 1 error that's the you know you can think of it as a decision process one do I decide that this is really highly unlikely to conform to the no and that's what your p-value see what is a yeah it can't be right so positive negative spatial autocorrelation one is a notion of clustering the other one is a notion of checkerboard patterns alternating values one has to do with information loss the other one has to variability or greater heterogeneity so point of departure is randomness we reject randomness in one or the other direction how are we going to do that so we're going yeah okay dispersal up to some level it's really scale dependent that's the key issue so it's something that I mean I should say first of all you will find much more positive spatial autocorrelation than negative spatial autocorrelation in in typical practice right when you find negative spatial autocorrelation it often has to do with scale issues where you have aggregated so much that you in a sense have cut up the data into pieces that are very different from each other and that is reflected you know think of a checkerboard pattern and that would be negative space model correlation so it's not something that you see very often in regression analysis you do see it and there it's often a problem of additional heterogeneity so in econometrics that's called heteroscedasticity but so that that's a different issue but in an inspiratory context I don't see it very much but this is very and I'm glad you mentioned dispersion because if you recall the first lecture where I talked about the different kind of spatial data and how they each had different kinds of research questions so what we've been dealing with is what is called you know lattice data or spatially aggregate data by aerial units right if you deal with point patterns events then the nope no spatial autocorrelation is positive is grouping and negative is pushing away so points would be further away from each other than they would be randomly and for example there's interesting case studies of this when you look at the location of facilities so some types of facilities like to be close to each other like you get clusters of stores that sell the same thing you know why do they do that because they're all basically have the same market and they just kind of get the economies of scale from people coming to a district way you do you know or bar districts or theater districts I mean they cluster together they're much closer together than they would be randomly right and then you have the opposite where they go far from each other because they don't want to compete they want to kind of pull the market away from each other you know you can think of certain type of big-box stores for example if you look at big-box stores of the same company they're not going to put them next to each other right and again when you look at the relative to where they would be randomly they will be more regular more dispersed than random and that would be a notion of negative spatial autocorrelation okay questions are good okay otherwise I run out of time because today I didn't overdo it for once I don't have a million slides so just to make sure we're all on the same page I know many of you know their stuff but maybe you haven't thought this and it's interesting to think about it what is a statistic you know when you say a t statistic or something like that no correlation statistic what is that actually right and so it's really something that tries to capture a characteristic of the distribution of the data right so it's based on the data it's calculated from the data it's not made up and because your data are you know as you typically say a random sample they're not exact so therefore whatever you calculate from the data is not going to be exact either it's going to be having some ran is associated with it right so if you take you know classic example the length of people in a classroom right and you take the average that average is not an exact number but it's actually a random variable it has some variability associated with it right and so this test statistic that we compute from the data under the null hypothesis has a particular kind of behavior typically say it would have a Gaussian distribution being nice and elegant so find that if you compute you know the average average is a calculation from the data the data are random variables that that average actually has a normal distribution that is centered around the true mean of the data the true mean that we can't see right so we need to do the same thing for to get a handle on spatial autocorrelation we need to figure out we need to compute something from the data then we have to figure out what is the distribution of that thing of that statistic under the null hypothesis of no spatial autocorrelation so just like we take the average and we figure out with some roofs and theorems that that is distributed as a normal random variable centered on the true mean we will have to figure out well this thing that we calculate from the data how is it distributed when there is no spatial autocorrelation and we're of course interested in a statistic that will show a very clear contrast between its value under the null of spatial randomness and the value that we find when there is no spatial randomness so when there is spatial autocorrelation we want that value of that statistic to be different from what it would be usually them and different enough in a sense that it is rare enough to be generated by a spatially random data set and that's the p-value right that's where we make our decision so when the probability of a particular value that we observe in the data to have been generated by a spatially random process is so low it's low enough that we say no way then we reject to know and conclude positive or negative spatial autocorrelation and of course we want a statistic that helps us point in the direction not just say well it's not spatially random now what right but it's not spatially random but there is spatial autocorrelation or there is negative spatial autocorrelation so will one a statistic where the sine of the statistic tells us something about the likely alternative hypothesis and where of course the significance of the statistic tells us whether or not the no holds so that's the key that's in a nutshell what statistics is all about at least hypothesis testing in a classical world I won't go Bayesian on you yet so this is basically the logic right we have this nice red curve nice red curve is the distribution of the statistic under the null hypothesis actually this is a real example I left it out this year but last year I talked about something called a join count statistic which is a statistic to measure a clustering of binary values 0 1 black-white that kind of stuff and this is an an actual empirical example where you have the null distribution and then the dark blue line is the observed statistic and so now it's up to us to decide whether what's left in the tail is small enough for us to reject or not right and that depends on the context last time too there's a lot of debate about this I'll revisit this when we talk about local facial autocorrelation statistics you know what is a good p-value how do you have a good sense of this type one error which is the probability that you reject to know so you say no spatial randomness one in fact there is spatial randomness and that probability is exactly the tail of that red distribution because these are the extreme values that could be generated by the null the null hypothesis on the red but their extreme so are they extreme enough if they're extreme enough you reject if this blue line were more to the middle you would say well there isn't really any evidence in the sample to draw the conclusion that the null hypothesis doesn't hold there's just not much not enough there okay so the key concept and this is very important to get this straight to get the some intuition for this the spatial autocorrelation statistic so a statistic in the sense of something that you calculate from the data that has good power to reject a null of spatial randomness incorporates two characteristics one is what we call attribute similarity attribute molarity is what a regular correlation coefficient measures so you have two variables income and crime what is their linear relationship that's a correlation coefficient so that's attribute similarity right that's one part because we'll be interested in finding out whether like values are located some similarly so the second attribute the second characteristic is what I call locational similarity so we have to somehow build and something and index from the data that combines these two things so it combines some measure of attribute similarity and some measure which is going to be similar to like a correlation but correlation is not enough you have to bring the space in there as well and that's the challenge and that's not immediately obvious so the easy part is the attribute similarity why it's easy because we're used to it in non spatial statistics we do this all the time so a couple of things that are different from your standard textbook you might think correlation well correlation it's auto correlation it's correlation with itself so there's not two variables like crime and income there's one variable crime crime and crime crime at one location versus crime at another location so how do we measure the similarity of this one variable at two different locations right you could say they're equal non equal that's not going to get to very far so you need to do something else that is a summary of similarity and in general I call this a function f of Y our variable of interest at location I and y @j right and there's a couple of these around but probably most familiar with all our correlation coefficient so take that form and basically a cross product statistic will okay you can think of this as a thing of values right so you have in its traditional sense here you have crime and you have income and if they're not related then there wouldn't be any systematic matching of high crime with low income low crime with high income it would be just kind of random right there's no rhyme or reason to it and similarly with the same variable autocorrelation when we take two locations remember spatial randomness location doesn't matter shouldn't be any systematic patterning in the pairing of these two values between two locations between any two locations so there shouldn't be any systematic large or small values when there's systematic small values that means that I and J have similar values now we have to we have one part of the picture covered the attribute similarity now we still have to define one or I and J spatially similar locationally similar right so cross product is an easy one then two that are actually I like these but for some reason we'll have kind of a funny intuition about this and it because it's the opposite right product is similarity and product is comfortable to us because it has to do with linear associations and we are kind of conditioned this is changing but we're conditioned in textbooks to think linear and a lot of reality out there is not linear right and these dissimilarity measures are actually better at capturing nonlinear phenomena and they're square this difference or absolute difference so they're the opposite right with product similar will be higher with the similarity measures similar will be smaller so if y and i and y a j are very similar they're squared difference or absolute difference you get rid of the sign will be very small and this can happen when y i and YJ are both very high but very similar or both very small and very similar right so this is kind of the other way of looking at this rather than focusing on the cross product which is a linear Association we're looking at a difference measure and the smaller the difference of course the more similar the larger the difference the more dissimilar and just head for those of you who've done a little bit of GIS or analysis before already the cross product will lead us to Moran's I statistic of spatial autocorrelation the square difference will lead us to Geary C statistic and the concept of a varial gram or a semi Vario gram so there are two different ways of looking at the world it's not that one is better than the other because we don't know what the true alternative is you know these things are limited basically what they tell us is it's not spatially random and it's either more similar than it would be on the randomness or less similar than it would be on the randomness or less similar meaning more dissimilar you know scatter checkerboard so we have the attribute similarity covered locational similarity how do we formally express that I and J are locationally similar so either we say they're close to each other so we'll use the distance between I and J or we will use some notion of what our neighbors and the underlying assumption is that neighbors interact and non neighbors do not interact and that is the notion of spatial weights that we'll focus on in a second so basically the spatial weights matrix it's a matrix because it's about the interactions between every I and every J so for every I as an origin all the other observations can be neighbors that I interacts with so this is a matrix and the structure of that matrix gives us a sense and I hate to use the word prior because that's a load of term in bayesian statistics but it's kind of it is like a prior it's where you say well using some criterion and we'll see the type of criterion in a few minutes we say that I and J cannot be neighbors and then the other ones are neighbors so when they're not neighbors the elements of the weights matrix will be zero and when they are neighbors it will be something else right so then our statistic is in general something like this it's typically a sum over all possible pairs so there's n square of these things of some measure of attribute similarity that's our function cross-product square difference absolute difference and the spatial weights elements that match this so just running ahead we'll get back to this next week in more detail look at what happens when the weight I and J are not neighbors then WI J will be 0 so one W IJ is 0 this thing doesn't count so this is a sum over all these possible measures but only one W IJ actually is not 0 because when it's 0 it doesn't count so that's the way spatial autocorrelation test statistics work so they basically have a measure of similarity for the attribute that we're interested in and then it adds this measure up for all possible pairs but it cuts out all the pairs that are named not neighbors it only counts it for those that are neighbors and the neighbor structure is formally expressed in the spatial weights matrix I mean it's a matrix conceptually operationally it's not you wouldn't want to have a matrix you know if you do a hundred thousand house sales you wouldn't want a hundred thousand by a hundred thousand matrix you know that's mostly zero because the neighbors structure is fairly compact as we'll see in a minute so that's where we are so we know our point of the power spatial randomness that's a null hypothesis we want to reject that and we want to reject it either in the direction of positive or negative spatial order correlation to do that we need a statistic we need to construct something from the data we need a statistic that embodies two aspects one is the attribute similarity the other one is the locational similarity attribute similarity is easy cross-product square difference locational similarity not so much and we'll take a crack at this notion of a spatial weights matrix so why do we need this and there's a lot of discussion about this in the literature we need it because we don't have enough information so this is changing the new world of new data big data we actually start to get information on actual interaction as opposed to having a cross sectional meaning at one point in time map we're kind of stuck so as far as I'm concerned this weights matrix is a thing that's going to disappear because less and less we're working with data situations that are pure cross-sections we're getting more and more information that is both spatial and time it changes all the time and whenever we can observe changes we can observe actual interaction for example some of the new phone call data you know who calls who you know how far they are you know how often they call you know with the movement data from Twitter analytics or whatever you use uber analytics you see how often people go from one place to another how far they travel how often they travel what time of the day they travel all that kind of stuff which you cannot do if you just have a static picture of a cross-section so the weights matrix is really something that is a fix to deal with the lack of information because we're really interested in interaction interaction is an N square phenomena so interaction is from everybody to everybody else right so that's if n is the number of observation points interaction is about the whole matrix and square but we don't have n square observations we only have n observations so it's like working with all the companies in North Carolina they happen to be a nice round number of 100 in North Carolina so if you take interaction to be symmetric which would have to be you would have 5,000 pairwise interactions but you only have a hundred observations and no matter what magic you use you cannot estimate 5,000 parameters with 100 observations there's no way so you have to impose structure and either you impose structure in the form of a distribution in this case we're going to impose zeroing out possible interactions with our weights matrix and by forcing the interaction to be the same for everybody that's going to be our spatial autocorrelation coefficient so coefficient for the whole pattern so that's an extreme simplification I mean first of all we're putting some structure on who can interact with who that's our neighbor structure and then we're saying well this interaction is the same for everybody no matter how many neighbors they have or no matter how far the neighbors are apart it's the same one parameter and then of course we're back in business because we can estimate one parameter with a hundred observations we cannot estimate 5000 parameters with a hundred observations okay so the structure is the solution we learn we limit a number of parameters that we can estimate technically what this is called and you may not be familiar with this term it's called the incidental parameter problem incidental parameter problem is when you have a situation that the number of parameters that you're trying to estimate increases with the sample size so you can't win no statistics is all about winning with the simple size over the statistical problem trying to solve so if you're trying to estimate say a mean and the standard it the MOOC data points you have the better that estimate will be the more precise it will be right but if every observation has its own mean then you're out of luck really getting more and more observations just means implies you have to estimate more and more means you know that's a no-win situation and here it's even worse because it grows with n square because it's all about interactions it's not just a sample size so the number of interactions is actually an incidental parameter problem and you can't solve it but you can cheat and anytime you run into an incidental parameter problem in statistics you see that it gets solved by changing the definition of the problem which is cheating right you solve an easier problem and the easier problem is imposing structure saying first of all excluding certain pairs from the interaction and then secondly imposing the same parameter on all the interactions so there's two things going on here that let us tackle this issue and this is our spatial weights well the spatial weights are going to be a way to express beforehand now based on the data whether I and J are likely to interact and it's really as a point of departure is a binary choice they either do or they don't so that's something you can that makes sense in a cross-sectional situation now if you observe phenomena over time and say the interaction doesn't change over time then you don't need a weights matrix then you can estimate the interaction from long time series of pairwise observations you know if you look at your interest at say in macroeconomics whether the GDP of certain countries moves together or stock prices move together okay you're not going to put in up spatial weights matrix you have data over time long time series so you can use these time series two at a time for each IJ pair to figure out whether or not they're correlated and how strong that correlation is so you don't need a weights matrix we need a mate's matrix one we don't have that time series so when you have the time series typically a weights matrix is not going to help you very much it might again come in handy if the structure of the interaction changes over time because then again you can't solve that problem but if the structure is stable over time you can use the time dimension to estimate the interaction pattern but if you don't have a time series then you're stuck and again the key solution is that we have a single parameter right very important to keep in mind and this is a common be especially if this is not in to immediately intuitive it's kind of easy to fall into a trap the coefficient the spatial autocorrelation coefficient that we will estimate or compute rather as such doesn't mean anything it only means something in combination with the weight structure and let me put it you know in more economic terms you have an identification problem so really what you measure is the product of the two so the product of the autocorrelation measure and the weights in the weights matrix that's your estimate for the interaction between I and J so if you have large weights and the weights matrix you have a small correlation coefficient for the same interaction if you have small weights you have a large correlation coefficient okay think of the interaction is 100 right and your weights are 5 to get your interaction your correlation coefficient will be 20 now if your weights were 20 your correlation coefficient would be 5 so it's not the 5 or the 20 that matter it's a product that gives you the 100 that matters so this is an additional wrinkle that gets thrown into the analysis that it's not very and these weights matrices you know often times I mean you'll see the kind of weights matrices that I'll talk about a kind of simplistic and a typical reaction typically somebody trained in a you know substantive social science will say well why don't we make this economic weights or economic distance or social distance or network distance right that's all very fine that nothing wrong with that but it changes the interpretation of your correlation coefficient because it's all about interaction and interaction is between I and J and a correlation coefficient is just kind of like a scaling factor of the I and J interaction that you put in the weights matrix so that's something to keep in mind and we'll get back to it when we Dacian of the statistic this is my classic example I've used this for years 6 aerial units their neighbors their share common border so how do we define a neighbor structure between them late half a graph where each polygon is a node or a verdicts and then the existence of a common bound and I knew it a link or an edge in the graph so all these solid lines correspond the existence of a common boundary between these aerial units so we can go from if you wish on a graph representation with nodes and edges and then we can represent the graph by matrix and the matrix is the spatial weights matrix so it's n by n n is the number of observations the weights in general W sub IJ so that's a weight between I and J it's all about pairs is nonzero for neighbors I'll talk about the value the magnitude later and it's zero meaning that I and J are not neighbors so if it's G or zero remember our statistic F IJ type W IJ WR j is 0 J doesn't count right so no matter what it is it does and then by convention there is no self of neighbor relationship so that means that W the weights from eye to eye me to myself is 0 I'm not a self neighbor now this is somewhat arbitrary but not totally and it turns out to be extremely convenient because in a lot of the statistics that follow on I'm going to do a little matrix algebra here I can't help it so trace of a matrix is the matrix is a table right square matrix so you have the elements on the diagonal of the matrix the trace of a matrix is a sum of the diagonal elements and many manipulations become much easier when the trace is zero which is the case if all the diagonal elements are 0 rather than if the trace is n let's say you put a 1 on the diagonal then the trace is n n is the size of the data set you don't want that in statistics you don't want anything that grows with the size of the data set you want things that are constrained limited ideally 0 right so that's a good reason for not having self neighbors every once in a while it makes sense to have self neighbors but those those situations than to be exceptions so we have our weights matrix you know square diagonal element 0 and the other elements are labeled by you can think of it as origin destination of course most situations this will be symmetric so if we have neighbors defined by having a common boundary common border if a has a border in common with B then B has a border in common with a if a is within five miles of B then B is within 5 miles of a but nearest neighbor not necessarily so B can be dn't closest neighbor to a but then C is the closest neighbor to a and they're not symmetric so some of these are symmetric some of these are not symmetric and that will have repercussions repercussions for the analysis later so we'll primarily focus on simple geographically based weights it's very interesting you can use the exact same content concepts in social networks where borders but aerial units but we have network connections between node social networks are rather sparse you know I mean you can't interact with everybody equally strongly so you have a closer set of friends and then these friends have friends and some of these friends are in common and some are not and go so you have a network structure is exactly the same thing so a geographically the most simple or simplistic even definition of spatial weights is based on the notion of contiguity which means having a border in common so if I and J share a border they get a one if they don't share in border they get a zero and it's a very simple binary weights matrix so in our example first row is for observation one and in turn it looks at the other observations and determines whether they are neighbors or not so two is a neighbor three is not four is a neighbor five is a neighbor six is not so the first row is 0 1 0 1 1 0 you see that 0 there zeros and ones this because this is a symmetric concept contiguity is symmetric yields you a symmetric matrix so that's what it is now it's not always that simple even in a case of a regular grid what is a neighbor is subject to some interpretation so we have this example here of a 3x3 square grid and we're interested in defining the neighbors of the central set so one definition is to say those grid cells that have a common site so two four six and eight our neighbors of five because they share a side this is called in technical terms a rook criteria and similarity to the movements on a chess board you know north south east west you know so those are rope neighbors and there's four of them we can also define and and sometimes in some context this makes sense that neighbors are not the ones that share a border but share a corner this is not used that much but it's called the bishop contiguity and using it analogy so now we again have four neighbors but they're not the same so right away you know our statistics are gonna be messy because which is it I said Rock is it Bishop it gets even worse if you combined it two criteria that's called a queen then you know not only is is a combination of the two but you have twice as many neighbors which is going to again affect your statistics and this is the queen criterion again in analogy with chess so now you have eight neighbors everybody is a neighbor of five right so luckily in four this is a problem with regular grids but with every irregular spacial units it's not a big problem it's much less of a problem it's rarely a big issue but I did find one and I didn't do this in context of the the fires that are going on but this is a county in the Bay Area California that little thing that sticks out that San Francisco here this little hexagon or whatever it is polygons at the top so that's Solano County on the top you have the neighbors defines purely by common sides and then on the bottom you have Queen criterion you have one extra so what is going on here and this is for those of you who've familiar with GIS this is an issue of geographic precision because there is there very few situations there are a few a four corner situation where literally there are no edges touching is just a corner but that's not the case here this is something that has to do with the precision of the underlying map with the simplification the generalization that happens when you turn reality into a map you know and just think of an example I used in my other class when you look at a map no and you see the border between two countries and it's like you know a millimeter thick but given the scale of that map that millimeter could be a kilometer so and there's no way you know the border is I mean the borders are very well-defined not give or take a kilometer right so as the same here in the construction of this map whatever might have been a small little side border is simplified to becoming a point just like our border is just a line on the map so oftentimes issues with the base map the precision of the base map create these artifacts these artificial corner situations so that's why in practice I generally recommend to use a quean criterion because it's a little bit more encompassing typically the difference between the two will at most two extra neighbors always more neighbors for Queen than four okay so that's the basis so very simple contiguity common border then we can go to distances and typically distances are between points so if we have point maps we can have common borders we'll see in the lab that we can with some trickery but you know we can have a distance between points so now we can define neighbors as being close enough to each other that correct and are typically expressed of as a critical distance or distance cut off beyond what to units cannot be considered neighbors anymore right now there's a problem with this and we'll see in a lab you know what these complications are in that when we have points that are distributed irregularly then the dense points will have very small distances to their neighbors whereas the sparsely distributed points will have much bigger distances so if you have one size fits all you may have problems two kinds of problems if the distance is too short a lot of the sparsely distributed points won't have any neighbors if the distance is too long then the densely distributed points will have too many neighbors so that's something we'll talk about in the end in the lab so weights are very simplistic there's a cut-off if the distance is less than this critical distance deed and their neighbors if they're not then they're not neighbors so as I said two problems too many or too few neighbors no neighbors are considered isolates so what we typically do and what Giotto does is uses a as a default you can up as a default it uses something which we call a max min criteria so the min part is you take the nearest neighbor distance so for each location you take the closest neighbor and what that distance is so that's the shortest distance to reach a neighbor and you have for each location you have this nearest neighbor distance and then you take the largest of these the tim'm of the minimum of the minima right and the largest of these will T that every location has at least one neighbor but when the points are very irregularly distributed it creates problems as I'll illustrate in a second for example here's our Solana County again 90 miles distance band has quite a few neighbors where does the 90 miles come from that's the max min distance to ensure that every county has a neighbor where the distance is computed from the central points to centroids of the county we saw this towards in the lab the other day so that's how we convert the aerial unit to polygons to points and we can compute the distances so this is an example of the problems we can run into and I actually drove through there this summer Elko County Nevada is not much there I can tell you that so if you take the 90 mile cut off it has one neighbor remember these distance this may seem counterintuitive but the distances are between the centroids so when the neighbors are bigger the centroids will be further away right if you take 80 miles it doesn't have any neighbors so that's one of the problems with distance bands a solution but not necessarily ideal is to take this notion of K nearest neighbors where you take the K ones that are the closest pretty much no matter how close or how far they are again you can end up with kind of strange situations first of all when they're toys you need a tiebreaking rule and this is not obvious say you have neighbors and then you know you want to do a five nearest neighbor definition but neighbor four five and six are all the same distance away so which one do you pick right it's the same problem we encountered with the Box map the other day so in geo de it's a random pick so mother software takes them all but if you take them all then it's not K anymore then some of the case will be five others will be six and so on so it's not a easy situation and sometimes these Keeney's neighbors they look a little weird you know like they may not be what you think it should be but that's why you have these kind of Diagnostics that will use it this is again our friend Elko County with the four nearest neighbors and then the six nearest neighbors and you see that doesn't really begin to make a ring around the county so there's neighbors that are not included in some sense in an arbitrary way but it is the strict application of K nearest neighbors okay so there's lots of different things we will talk about this in the in the lab a couple of important things most weights in analysis are not used in their original binary form but are standardized in some fashion the most commonly used standardization is called Rose standardization is to make sure all the elements in a row sum to one so basically you take how many neighbors there are and you divide each one by the number of neighbors the problem of this associated with this is that the resulting matrix is no longer symmetric and that will mess up our lives in some respects but it does make the analysis comparable makes the spatial order correlation coefficients more comparable than they would be otherwise and it introduces this concept of a spatial lag which we'll talk about in the lab which is essentially around an average of the values for the neighbors so this is our example row standardized if we have three neighbors it's 1 over 3 2 neighbors is 1 over 2 4 neighbors it's 1 over 4 right so as a result if you look at the symmetry there's no more symmetry we have 1 over 4 and 1 over 3 so that's the result of that stochastic weights is double standardization so you divide them all over the sum total sum of the weights this keep CIMMYT it scales them all down quite a bit actually and so if you have a lot of observations then you have a problem with this because whereas the row standardization is a function of the number of neighbors and the number of neighbors is typically bounded the a double standardization or the stochastic standardization is a function of the total number of observations the total number of neighbors if you wish which is n times K so this n in there and we don't like that because for large n what you divide one by will be a very large number say if n is a hundred thousand even if you only have four neighbors on average you will divide every one of these ones by four hundred thousand there's not going to be a lot left that's going to be basically zero so that's something to keep in mind when you see these kind of double standardizations they can very easily kind of wipe out the effect of the neighbors by dividing them by too much okay another big concept is second-order contiguity neighbor of a neighbor so four and three have five in them so they share a neighbor the problem with this is that you have double counting and so the easiest way to think about this is to think about a graph so four and three we have two steps to get from four to three but in general if we look at one we can get from one to one in multiple ways and go to our shorter neighbors and back that's two steps so all the first-order neighbors are also second-order neighbors and then there's a two second-order neighbor which at three which that you have to go through five to get there right so that's a problem and it's a problem that comes up in analysis more than in exploration higher order weights are defined recursively that means they're defined by their definition you know it's kind of going around and round until you get back to the initial point so in general a neighbor of order K is a first-order neighbor to a neighbor of order K minus one so a neighbor of order two is a first-order neighbor to a neighbor of order one and so on so then we have to clean out the duplicate patience and in practice that means we can have two ways of measuring this we'll see this in the lab exclusive of the first order you see the first band is not there Rousseff of the first one which is more filled out so then finally quickly the properties of the weights we'll see this in the lab the distribution of what we call the neighbor cardinality how many neighbors does an observation have on average and when you look at the full distribution then you're looking for problems you're looking for two major types of problems one is identifying isolates or islands those are observations that don't have neighbors you know like Elko with the 80 miles the other one is looking for bimodal distributions where some observations have very few neighbors and other observations have many neighbors that might suggest a notion of spatial heterogeneity we've been thinking about this but this comes back so basically we're mixing apples and oranges we're mixing locations with very few neighbors with other locations that have a lot of neighbors so that the interaction can't really be does what I call a connectivity histogram and gives the distribution of the number of neighbors over the dataset so these are very nice ones for the US counties you see that the US counties have a a median number of neighbors of six so even though there's 3,000 some plus counties there on average they only have six neighbors so that's nice and bounded so we're never going to be dividing by too big a number if we have distance based it's totally different picture so this is the distribution using the max min distance of 90 miles you see we go from county neighbor two counties with 85 using fixed distance and if we make it too small then some counties don't have any neighbors as we see in this one so then we have the high slits and for K nears neighbor weights the contiguity the connectivity histogram looks like this and why would that be because they all have exactly K neighbors there's only one bar okay that's the end we'll play with the stuff in the lab on Monday
Info
Channel: GeoDa Software
Views: 10,076
Rating: undefined out of 5
Keywords: geoda
Id: M9sMu73ogIE
Channel Id: undefined
Length: 77min 26sec (4646 seconds)
Published: Wed Nov 01 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.