Identifying Clusters 3 - Cluster Analysis with Polygon Features in ArcGIS 10.2

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this is the second tutorial undoing cluster analysis within ArcGIS 10.2 in a previous tutorial we looked at using the clustering outlier local statistic to identify clusters within emergency calls in the Fort Worth area in this example we're going to be looking at polygon attributed data values to do the same kind of analysis although working with slightly different data so for this tutorial we're going to be working with map 92 from the data that should be available and from the canvas website so I'm going to open up that map document 92 and here we're working with census data from Tarrant County and we're looking at a scenario in economic analysis scenario what we're trying to identify two different types of communities for different reasons so in this case the Tarrant County Economic Development Office is interested in doing outreach and fundraising work and so they want to target communities with higher median incomes so they can target their can focus or if it's there at the same time they're looking for communities that have low incomes so that they can target their efforts there and target the development efforts there so that they can place more services in those particular communities so the data will working with as I said before is a census data and if you take a look at that data here you can see the kind of data that's available and what we're working with the attribute value that were particularly concern with in this case is this po5 3 0 0 1 which is the median income values for each census tract so that's what working with working little 1000 census tracts here so it's quite a bit of data I'm going to switch over to data view here so we can see the data a little more closely so you can visualize the data pretty easily by simply classifying it according to that particular attribute value that you're interested in so in this case we use the median income value and then picked some kind of symbolization scheme we can see the variation in come across the area and they show whether higher incomes and you can already see that there are some areas where you have groups of census tracts that have high or lower income so you can see an apparent pattern in there but as we've learned previously that pattern can be very subjective because it can be controlled very much by the way that you classify the data so in this case for example if I went to the properties and right now it's defaults to the natural breaks if I change the classification scheme to a different classification seem like equal interval which is probably more intuitively understandable you'll see a slightly different pattern and so the extreme apparent differences seem to kind of fade quite a bit and here working with the same data again it's just a function of the way that you classify the data so how are you to identify then some significant clusters well that's exactly why you turn to a statistic or cluster analysis statistic to kind of find and tease out those those areas so here we're going to work I'm going to use two we're going to use the local insulin moran's i cluster cluster an outlier tool and we're also going to use another one the get a sword GI star statistic so we're going to open up the spatial statistics tool toolbox and within mapping clusters we're going to find the two tools that we're interested in so we're going to start with a cluster an outlier analysis insulin local neurons I which we use previously to work with the feature attribute points for the emergency calls so I open I'm going to open that up and in the dialog window that pops up we have a few choices to make so clearly the input feature class is the one that contains the income data the input field is going to be that attribute field that has the median incomes and it's going to produce an output which is going to be another feature class that contains the visualization of the hot spots or the yeah hot spots so we're going to call this the income local moran takes a second to accept that okay there we go and then we're asked to conceptualize a spatial relationship and here this one's a little different you can always choose you always have a right of choices in terms of how you want to specify the spacial relationship and essentially what this says it tells a computer how to define the neighborhood's around any individual feature so how far should it look out what kinds of features qualify as neighbors to any given feature and so you can define neighborhoods essentially around which do the analysis in order to identify hotspots in a couple of basic ways one is to do it in terms of distance and you can specify how distance changes or how distance is affected for example with inverse distance you're looking at a situation where things that are closer have a bigger influence and things that are further apart and that makes a lot of sense for things that are like like points for example but with polygons it's a little more complicated because you have a variety of irregularly sized features and the site and because the sizes of them vary distance isn't quite as straightforward as it is with points because you're measuring across polygons so in this case one of the typical ways that you deal with polygons is to work with contiguity rather than looking at distance and contiguity essentially says that polygons that touch one another call it qualify as neighbors so when the analysis is done any kind of polygons any polygons that touch are going to be counted and you have two choices there you can choose edges only or edges and corners so we'll be generous and we'll say edges and corners in this case right when you choose that particular option you'll see the distance method is grayed out because it's not relevant in this case and here we're going to be interested in this option for row standardization and what rows standardization does is it accounts for the fact that some polygons are going to have more neighbors and others and and we don't want that to necessarily influence the relative impact of statistic because the number of neighbors is a it's likely to be a function of the size of the polygon so there's smaller polygons they'll have more neighbors as well as the possibility that when you're looking at the edge of the analysis area polygons that are on the edge of the analysis area will have fewer neighbors and so you don't want that to influence the outcome of your result or have too big of an impact lastly you have the option to apply a false discovery rate which corrects for redundancy in the computation of the statistic again taking into account as I described in the previous tutorial the fact that neighbors are counted repeatedly because when the neighbor one point is also the neighbor of another point so this is what we want we specified the attribute value that we want and we're going to have an output and we're going to do contiguity with edges and we're using rows standardization so hit OK it's going to compute the statistic and it's going to produce a new feature class it's going to look like the original feature class in terms of the polygons but it's going to also produce the P values the Z statistics and also the Marans I in death indices or index numbers that you can interpret directly or you can work off of the way that the automatic symbolization scheme that it's going to apply in this case so again it might take a second because it's got a computed for all polygons ok so it ran we got a nice clean result green checkmark which means there were no concerns with this data set you can always click on it and it shows you the results of that process in terms of what it produced but in this case the results are demonstrated to us so what we have here then in the output is the computation of the areas that showed significant clustering as well as outliers so just like with the point data grey areas represent areas that weren't significant they weren't vision shown any statistically significant tendency or they weren't rather they weren't statistically significantly different from a random arrangement of values that you would expect if you were to to reassign the attribute values across this area randomly the black areas show areas where you have high values clustered next to other high values so clearly here if we were targeting for our fundraising these are the high-income clusters of neighborhoods that we would want to target for that the blue areas represent the area the census tracts that where you have clusters of low values next to low values and so this implies these are communities that probably need a little more resources devoted to them so that's maybe where you would target your development efforts the orange and white areas aren't any orange here but we do have a white area right there and one over here as well the white areas indicate that you have a census tract with a low value surrounded by other sensitive tracks with high values so you see again this juxtaposition of differing values so that's how you interpret the local moran's eye local statistic for cluster analysis and outliers using polygon data again using with contiguity we can do the same analysis using the hot spot analysis get a sort GI star statistic so this one will start that up and we're going to run it as well it's going to ask you for a few input features so again we're going to put in the census tract data that contains the income again specifying the median income once again we're going to produce an output and so this time we'll call it income instead of local Maran we're going to call it get us board okay we'll say that and then conceptualization of spatial relationships again because we're working with polygons it's probably important propria to use contiguity rather than distance so we'll use contiguity with edges and corners again to make it consistent with the previous analysis okay and when you do that again you'll see that the distance method is grayed out but you'll also see this standardization is grayed out and that's because in this particular statistic the number of neighbors doesn't affect the computation of the index that's being used in this particular case so this particular cluster statistic is not influenced by that so we don't the work about that again a couple other options without one other option with the self potential field we're not going to worry about that for the moment again you have the option of applying the false discovery rate to account for redundancy and the data analysis we'll leave that off for the moment so we'll hit okay and let that run okay so just grin again greenlight on the outcome of the hotspot analysis right so we're looking at here is a hotspot analysis and in this case the blue areas are cold spots represent areas where you have low values next clustered around low values and then the red areas are areas where you have high values clustered to hyeri high value areas and then the value beige pink in mind depend on your computer areas are not significant so they didn't they didn't qualify as distinctly significantly different from random in a statistic now the nice thing in this particular statistic is that you see that you get different ranges of confidence so depending on on the analysis that you're doing and the level of confidence you want and rejecting the null hypothesis of randomness you can you can look at the data that way so the darker blue areas have the highest confidence and deviate the most from the random assumption and similarly with the red values which you don't get here and the guess or G is the star statistic is you don't get the outliers so that is not pulled out in this particular analysis so if you if you need that you were looking for that then obviously we turn to the ansel and local morons eye local statistic so the two the two processes the two statistics as you can see generate very similar results you can see that for the most part the hot spar the clusters or hot spots essentially occur in the same places so there isn't too much deviation there but you're provided with a slightly different kind of presentation of the information and both essentially get you to the same location so then you can proceed with your analysis and begin looking for ways of describing what you're seeing here as well as looking for doing further analysis to find out why this has happened
Info
Channel: Marcos Luna
Views: 34,751
Rating: 4.9683795 out of 5
Keywords: ArcGIS, cluster analysis, Salem State University, hot spots
Id: _oyUgfV19sU
Channel Id: undefined
Length: 13min 25sec (805 seconds)
Published: Wed Feb 19 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.