Esri 2012 UC Tech Session: Spatial Statistics: Best Practices

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay well let's get started my name is Lauren Scott and this is Lauren rose and shine Bennett and we're happy to be here to talk about something that we really enjoy talking about and that's spatial statistics and analysis we ask that let me see what we're gonna do today is a little bit different we kind of have fun with this last workshop it's an opportunity for us to go into more details than we do in some of our in our other workshops so this workshop does kind of build on the spatial pattern analysis workshop and the regression analysis workshop but if you miss those either or both of those it's not a problem please we don't want you to go but just realize that some things might feel to you like we're covering them just a little bit quickly we're going to review them but we're not going to go into a lot of depth because we're thinking that many of you probably already took those other ones we do something funny as we pretend that Lauren here is a brand-new GIS analyst that she has actually attended the two workshops so she's you know learned some of the things that you have as well and she's been tasked with solving a real-world problem so the context for her now today the context for analysis is that we have this community that's spending a large portion of public resources responding to 911 emergency calls in addition their projections are telling them that their community is almost going to double in size over the next ten years so they have some concerns they have some questions um they have questions like can we be any more efficient in the layout of our police and fire stations that respond to 911 calls how effective are those locations can we we know that some areas in our communities get lots of calls others not so many what are the factors that contribute to high numbers of 911 I call volumes and is there anything that we can do to try to reduce the numbers of calls that we get and given that we understand that population growth is coming what call volumes can we expect in the future so this is gonna be Lawrence Cass today and the data that she's working with actually is real data from the Portland Oregon area let's see how this goes so Lauren one of the things that this community is interested in is evaluating the existing locations of their fire and police stations one strategy that you might want to try would be to run a hotspot analysis on their 911 call data to see where they're getting lots of calls and then maybe you can compare that to the fire and the police units that would respond to that now the way if you remember from the workshop yesterday that the way hotspot analysis works is it looks at each feature within the context of neighboring features and it's looking for statistically significant spatial clusters of high values and statistically significant clusters of low values the cold spots and the hot spots it then it computes a z-score and a p-value for every one of those features and to tell you if the clustering that's find is statistically significant or not and so this can be fun analysis but there's a couple things that you're gonna need to watch out for so we know that hot spot analysis needs an analysis variable and he's a variable to analyze so the first thing that you're going to have to figure out is what is your analysis feel going to be because we have 911 1.8 each incident is just one feature there's not really an attribute that you can use for this so you're gonna have to figure out a way to come up with a count or a rate for those 911 instant calls and a couple of the tools that might help you out there the integrate tool the collect events tool then the hot spot analysis tool is also going to ask you to provide a distance which represents your scale of analysis and a tool that's going to help you with that is incremental spatial autocorrelation so why don't you give it a go okay so here's my data 911 one calls in an area outside of Portland Oregon and like Lauren said I need to run a hotspot analysis because I want to understand the patterns in the data I want to find those statistically significant clusters so we can see if these response stations are in good locations if the locations make sense based on kind of the patterns that we're seeing in the data and rather than looking at just points on a map we wanted to take it a step further and do a hotspot analysis so that's what we're going to do so we'll start by running a hotspot analysis which is in the mapping clusters tool set in the spatial stats toolbox we'll open it up and you'll point to our data and notice that the second parameter that we have to fill in is this input field but I've got point data incident data and that incident data doesn't have a value associated with it so what am I gonna do well I'm gonna start by doing something kind of crazy I'm going to look at the help it's a really crazy idea but unbelievably the help is sometimes actually quite helpful including one of the usage tips here where I see in bold that we have a little note about the input field which tells me that it should contain a variety of values and that if I want to use the tool to analyze the spatial pattern of incident data I should consider aggregating my incident data so I click on this link and it brings me to a whole little section about ways to think about aggregating my incident data right so one really good option is to go to a set of kind of existing geographies or polygons and I can use the spatial join tool to do that to get a count in each of the polygons another good option especially if there there really isn't a meaningful set of polygons in your particular study area creating a fishnet is a good idea - and then doing the spatial join to fishnet and then third option is the option that Lauren mentioned in her slides before we got started is the integrate and collect events method so essentially what that does is it starts by first snapping features that are within a specified distance together and then it creates and then it creates a new feature class that has the count of how many features are in each unique location so that's that that's the option we're gonna choose because well I trust Lauren pretty implicitly so and then there's even some pictures about what what those methods kind of look like so going back here we're not ready to run our hotspot analysis I'm gonna close that for now and start by running the integrate tool which I will search for because I don't remember what tool box it's in and searching is awesome oh I'm searching ArcGIS online of course come on so we've got the integrate tool so first Hill I'm gonna run if you write down one single thing the entire time you're in here today it is this you might notice that integrate does not have an output feature class that's because there is no output feature class integrate actually changes your input data I repeat integrate actually changes your input data so please if you care at all about your input data your original data please start by making a copy of your original data I now absolve myself of all responsibility okay I don't want to get any nasty emails that you didn't remember to you can send me a commiserating email I will definitely feel bad for you and commiserate because I have made the mistake myself but making a copy is very important so I'm gonna use the copy of the 911 call data that I've already made and then we have to pick an XY tolerance and the way that we think about this is I had a great conversation I think two years ago at the user conference with crime analyst and we were talking he was saying you know we've got the data's being collected out in the field by officers and let's say we have seven crimes and they all happened in the 7-eleven parking lot but because of the way that they were collected they all are in slightly different locations right they're not showing as one single unique location and so this is kind of where we're gonna take that accurate the kind of accuracy in our data into consideration and we're gonna say okay if they're within let's say thirty feet of each other in this case we want them to snap so that they have the exact same location let's say you didn't have that problem and you really did have true lots of code already that points that are already truly coincident then you could just get right to collect events because collect events finds points that are truly coincident and adds them up but if they're not truly coincident integrates a good way to get them there so that they are in that same location so we'll run that analysis and then we will run collect events which is actually also in the spatial stats toolbox that I really love searching so we'll point to our copy since that is now changed and collect events is essentially just going to count up the number of coincident points right and it's going to return to us the those coincident points or though that count has graduated symbols so immediately we can see that now we have anywhere from 1 to 25 points in each of those locations and not added that to our attribute table right so we have all that information and now we're ready to run our hotspot analysis so we'll go back in will point to our collective ents data we'll use the count field then we have some more decisions to make fixed distance band in this case is a good option because we want to basically look at each feature within its neighbor the neighboring features by by this moving win so the skill of analysis is the same and each feature is being looked at in relation to a kind of a fixed neighborhood so that's a good option but what we don't what we still do have to decide is what that fixed distance is what's the distance band that we're going to use this is where we decide what the skill of our analysis is and Lauren warned us that this was gonna be a bit of a challenge or something we would at least need to think about right so I don't really know what a good distance band is for this data there's no kind of logical thing that's coming to my mind that is directly related to the way that I want to solve the problem so what I want to do is let the data show me a good scale for my analysis and to do that I'm going to use a new tool in 10:1 called incremental spatial autocorrelation and for those of you who weren't here yesterday just a little note that incremental spatial autocorrelation is a new tool in 10:1 but we originally released it as a sample script for ArcGIS 10 so if you have 10 you can get this this tool as a sample script on our resources page which we will point you to at the end alternatively even if you have a fear at 9 3 you could do this you just run spatial autocorrelation multiple times at increasing distances which is all the tools doing so or you could practice a little Python and write one yourself that's it's not so bad so we'll point to our collective ends data we fill it in just like the hotspot analysis will use our count fields the in the int n1 the the defaults for the beginning and distance increments are really good we chose we chose well and we'll create that report file so what it's doing is it's going through and it's testing for the intensity of clustering at increasing distances so it's gonna see at the first distance which it we can see was about 3500 feet so at 3500 feet how intense is the clustering and then it's gonna see at almost 4000 feet how intense is the clustering and it's going to test the intensity and what we're looking for are peaks in the intensity distances at which the intensity of that clustering is really intense so rather than looking at a bunch of numbers we can go in and look at the output PDF which gets created in the Tenno version it actually creates a table and then uses the kind of out-of-the-box graphing to create that graph but both of them do create a graph and so what we're looking for are Peaks and we can see that there's actually multiple peaks here which isn't unexpected and actually if we looked at the tool help well first of all the even just the the sidebar image here it gives us a pretty good understanding of what we're looking for and the idea that these are these different peaks represent different scales of our analysis and and really answer different questions right so each of these are going to give us a different answer depending on kind of the kinds of clusters the kinds of patterns that we're looking for so in this case we're really interested in the the most local the most neighborhood level clusters that we can find and so that first distance we find that often the first distances is a good one so we'll use that first distance which one which is about 4600 feet so we'll run it with that Oh we'll run it with that distance and it's looking at each of those features within 4,600 feet and it's returning to us the points each of those in each of those unique locations and whether or not they're part of a statistically significant hot spot our cold spot so for me as a GIS analyst I'm okay with this output because I understand that each of those points was analyzed and each of those points has an Associated z-score and a p-value that tell me if I have statistical significance in if I have statistically significant hot spots or cold spots but I know and I know most of you are probably thinking the same thing this is not what the decision-makers in my organization are expecting to see what I tell them I'm gonna show them a hot spot map right they don't want to see a bunch of color coded points they want to see a surface this is what they see in the the news and what they see in the newspapers and so that's what they're expecting to see and so as in in order basically our announces own is only as useful as it is used right so this analysis is awesome but if nobody can understand and interpret the results it's not very useful and so what we want to do is help the decision-makers interpret the results and since they're expecting a surface well we're going to give them one so the way that we have found that we got a create a nice surface is using IDW which is in the interpolation tool set this is a spatial analyst tool so the spatial stats tool is no extension their core and the software creating the surface you do need spatial analyst to do that we'll use IDW we've found that IDW looks the nicest and since the goal here is pure visualization that's what we're gonna that's what we're gonna use so we'll point to the output of our hotspot analysis the value that we're gonna use for the interpolation is gonna be our z-score accept the defaults because they seem to do they seem to look okay and that's our goal pretty picture pretty much and then we get the results of course it does not look that nice because they don't know that we're interpolating a hot spot analysis right so we'll go in and we will update the symbology here we'll do a red to blue looks pretty good maybe give it a little bit of transparency so now we have this surface that pretty much represents the results of our hot spot analysis one thing that we really want to stress is that it's really important and as much as you can keep the points on because that's the true results of the hot spot analysis right so we have the true results our statistical significance associated with those points and then we have a visualization tool to help us interpret those points so keeping them both on you kind of cover cover your bum right you have the valid statistical output you have the surface that the decision-makers are expecting and everybody's happy and so then we can actually do what we've really wanted to do which is look at how this relates to something like our response stations and where those are located and so we can immediately see that clearly this one's in a great location in the middle of the hot spot this one seems to you know be well located near this this kind of smaller hot spot and we can't certainly say that this one is in a horrible location it may very well need to be there so that citizens are being serviced in a certain amount of time but if we were to recommend one to look into you to say is there a better place we could locate this where we could still be serving the population well but maybe be closer to some of the hot spot areas this might be the one that we would recommend so we start to see the pattern we understand and we have an output that decision-makers can use to really understand the results of our analysis so I I did a pretty good job yes you did nice to have Lord before we go back to the slides though let's think about the next analysis that we're gonna do I know whenever I look at a hot spot map and I know it's the same with you to Lauren I immediately start thinking what the heck is going on you know why do we see so 911 calls in that area why are there so much fewer in another area and whenever you see a hotspot map seems to make you ask why questions what do you think of my beat why do you think we might be getting so many hot spots in some areas and not so many and others anywhere population right yeah if we don't how many people were probably not gonna get very many 9-1-1 calls and that was our guests as well could be other factors as well but we're wondering i wonder if we're just seeing a map a hotspot map of population if we made a hotspot map of population would it look just like this one yeah so notice here that we're starting to ask why questions and from the modeling spatial relationships workshop yesterday Lauren knows that regression analysis is all about answering this kind of why questions why are we seeing so many 9-1-1 calls where we're seeing them and in fact this is one of the questions that our community is also very interested in answering regression analysis works by modeling a dependent variable in this case we want to model 911 call volumes as a function of other variables explanatory variables that we think are going to be important predictors of the number of 911 calls that we're going to get and as you know Lauren the most difficult part of any kind of regression analysis is finding that complete set finding all of the key explanatory variables that are important to what you're trying to model and unfortunately until you find all those key explanatory variables you really can't fully trust the results of your model if you do have trouble finding so you're gonna build a regression model and if you do to try to explain calls as a function of other variables like population or income or education levels and if you have trouble finding a properly specified model one of the tools that you can try is exploratory regression we talked a little bit about exploratory aggression yesterday you know you have a good model we went through the checks yesterday in the workshop but we're gonna go really quickly through them again you know you have a good model when the explanatory rebels that you're using are statistically significant when you know that they are actually helping your model and that the signs of the coefficients the relationships that are being represented are what you expect you also want to make sure a good a good properly specified model has variables that are getting at different aspects of whatever you're trying to model so you're going to want to check to make sure that none of those variables are redundant if you have two variables that are telling the same story it leads to an over count type of bias so you're going to check the variance inflation factor making sure that that all of those values are less than about seven point five another really important check is to make sure that your model under and over predictions aren't clustered they should reflect random noise when you have a properly specified model the you over predict a little bit here you under predict a little bit there but the enter and over predictions are really just random noise if you have clustering if you see one area where all of the over predictions cluster and all the under predictions clusters somewhere else it means you're you're missing a key explanatory variable and you can't trust completely trust your model so you need to identify what it may be some other variables that can help you out there you also want to check the heart fair test um this diagnostic make sure that the under and over predictions are normally distributed and what the reason that's important is because if they're not the heart fair test is statistically significant that's not a good thing that means your models biased either your model is predicting really well in some areas but not so well in others or maybe your models predicting real well for those places that don't have a very many 911 calls but not doing a great job for those places that have lots of them so you want to make sure that the heart fair test is not statistically significant and last you want to make sure you have a good model so you're looking for the large adjusted r-squared values and small hi Kiki hockey hockey hockey hockey yeah all right Kiki information criteria that a ICC value we know for sure it's a ICC yeah after that it's not if you do decide that you want to use explore to regression to find a properly specified model you're gonna need to realize that there really is a trade-off you're gonna learn so the what the exploratory regression is going to try every possible combination of potentially very long list of candidate explanatory variables and you're going to learn so much about your data and about the relationships and those variables even if you don't find a properly specified model you're going to learn a lot about correlations but you do need to realize that you're going to increase your risk for committing a type 1 error which means you might say oh I found something when you really didn't you also increase your risk of finding them a model that's over fit so in order to avoid that you're gonna want to make sure that you select variables that are supported by common sense and by theory and that you think really are so related to what you're trying to model and you're also going to want eventually before you report back to the client validate the results of your model so we have data that's actually a couple years old we can validate this model using more recent data before we report to the client okay so why don't you see if you can find a properly specified model for our community well this is gonna be fun so before we can do our regression analysis because we have even for instance population we know population is going to be important right so we have that data in in census tracts and so that's actually what we're gonna do we're going to move from the point level data into polygons where we have counts of each within each these are census tracts right yeah are they in my beat blocks I don't oh there may no longer be a Geo ID we tried to clean it up you know it's pretty well the polygons they have population they have all sorts of data it's census data and what we're trying to print what we're trying to understand in this case is the number of 911 calls in each of these polygons right so I think maybe it's just population I'm gonna go with that because it's one variable it sounds easy let's just try right why not what do we have to lose it's easy enough to test that hypothesis all we're gonna do is run OLS using our 9-1-1 call data using the number of calls as our dependent variable and the population are as our explanatory variable and we're just gonna run it and why don't we do it gives us a lot of output here we talked a lot about this output yesterday first of all we can see our R squared that Lauren mentioned it's 0.39 probably not as high as we need it to be if we want to understand the impact of let's say a population increase in the future like like they're expecting so even the adjusted r-square the performance isn't necessarily as good as as we would want so population only easy population only tells 39% of the 39% of the call volumes story it's not very much is that we thought it would be more did we not think I'd be higher yeah I did yes I did - I did so another one of those checks is spatial autocorrelation right and the output these residuals the the map that we get is a map of those over and under predictions and like Lawrence said unfortunately we don't want there to be clustering and I will run spatial autocorrelation but I don't think I really need to on this one do i it looks like that the red and the blue are clustering I mean I don't know if I've ever seen residuals this badly clustered before but we'll test anyway and we'll run a test for spatial autocorrelation on those residuals generate that report pretty sure we're gonna have a via p-value of point zero zero zero zero zero zero zero there's a very small chance this pattern happened randomly right and we can look at the report and see that that is exactly what the report tells us there's a less than 1% chance that this clustered pattern could be the result of random chance which means unfortunately that we have statistically significant spatial autocorrelation clustering of our residuals and we can't trust our model which means that we need to find other variables right so we have lots of ideas what kinds of variables might we want to include income might be a good one age might be a good one type of crime that would be interesting or type of incident time of day could be interesting zoning would be very interesting one yep unemployment sure yeah so long list right we could come up with I could come up with right now 100 variables I want to test but we're not gonna try a hundred variables we'll just try a couple see what we can find and we'll to do this thank goodness we have exploratory regression makes our lives pretty easy and what we're gonna do is run exploratory regression using calls as our dependent variable and using everything else as our explanatory variables right except for the unique ID the object ID and we probably don't want to predict the number of 911 calls using the number of 911 calls so I will leave that one out but we've got things on population education unemployment alcohol expenditure income education all sorts of variables in here and all I'm gonna do is click go and it's gonna run through oh I didn't create a text file it's gonna go through and it's gonna test all the different combinations of those variables and we're starting to see that we are getting a higher r-squared right using those those new variables we've gotten up to 0.8 point seven nine Lauren was very clear that it is not enough to look at r-squared right and because exploratory regression checks those six things for us I don't have to do it I know right away because there's nothing listed in this passing models group there's nothing here we don't have any passing models which is a real bummer and may happen I can promise you chances are you will at one point or another run this tool and find no passing models but it's not the end of the world it's we we still have a lot of techniques that we can use to try to find a passing model so of course one of the things is we can look at this summary of what we what why don't we have any passing models right which is I think one of the best things about this tool so we don't have any why we've passed we had plenty 83% of them passed the minimum addressed at our squared criteria which the default is 0.58 percent of them have we have our models that have all of the variables are significant 80% of them have no multicollinearity 10% of them have no model bias and zero have passed the test for spatial autocorrelation so we know that our problem is spatial autocorrelation so what do I want to do when I know my problem is spatial autocorrelation I want to look at that spatial autocorrelation that's one of the great things about using regression analysis in a GIS is that if there's clustering of the residuals looking at that map of the residuals is a really good clue to see oh I'm really over predicting in this area what's going on in this area why am i predicting higher than the true values what could I be missing what variable could help me explain that right so looking at the residual map is really really valuable so what I do since we ran just now five almost 5000 different combinations right we did not create 5,000 residual Maps so the only way to create the residual map is I'm what I'm gonna do is I'm gonna pick the highest adjusted r-squared model and I'm gonna run that in OLS to look at one of those residual Maps so this one has population jobs low education median income and median age so let's go to OLS and run it with those five variables same dependent variable then we're using population jobs low education median income and median age those are our five variables that have the highest suggested r-squared in exploratory regression so that we can see those residuals so we go through we see the same Diagnostics that we would have an exploratory regression because it's running the exact same analysis OLS in exploratory regression so we would see the same heart Bera we would see the same conquer we would see the same R squared but we get a map now we get that map of the residuals so we're stuck we're seeing where we're so the red areas are where the true values are higher than what we're predicting so those are what those are under predictions the red areas so why are we under predicting in this error what's going on what could we do what could we include that could help us deal with this spatial autocorrelation so one of the things that we can do is just take a look at the context you know if I was from this area and really an analyst in this area I probably would already know right I don't know so I'm gonna look at the underlying imagery and see if we can get any clues that way oh I should probably turn everything else off first though so now we've got just our residuals and the underlying base map and so what we're starting to see is there's a clear kind of difference in the type of land use the type of area underneath those under predictions than the rest of the area it looks to be kind of the more industrialized the more urban part of this study area so what we did and what we probably should have done from the beginning is created some spatial variables in this case we created a variable that's distance from the city center distance from that urban area and those spatial variables we have found time and time and time again are the key to finding a properly specified model with this kind of spatial data and so believe it or not now we're got what we're gonna do is run this again run exploratory regression again except this time hmm I yeah so I created a Verret we've created a variable I've got a different data set here that has the distance variable in it and we're gonna point to that choose the same dependent variable and this time and we we created that variable by just using the NIR tool we just created a point and did an ear from each one of those census tracts or block groups to that that point so near is a your best friend when you're doing this kind of analysis near highways near hospitals near parks near schools near fast-food the list goes on and on so we use an ear tool a lot I use the near to Allah get rid of the object ID and the unique ID and we can see now that we have in here that distance to the urban center and the only other thing I want to do is I want to change that minimum adjusted r-squared because I happen to know that when I added that distance value we're gonna get a ton of passing models and we don't want to look through all of them so I will create the text file this time give it a run so now it's going through testing all those combinations still no passing models but now we can see starting at models with four variables we're starting to see models come up that are passing right that met all of the assumptions of OLS so if I open up this results file here we can see that we have a bunch of passing models and so now we're faced with the horrible dilemma of having to pick between the lots of passing models that we have which I say snidely because it's a great problem to have in my when I was working on my thesis it's a problem I would have paid a million dollars to ha a million so we have a lot of very we have a lot of models and so the way that we really think about this is which of these so of course we're looking at the adjusted r-square the performance right the lowest AIC value is a really good good indicator that we have a good model right and this this model has the lowest AIC value and it's significantly lower than the next best one right because it's it's lower than three so that's a good way to compared between the models in addition to that we also think about which of these variables might have better implications for remediation is there one that we could do something about more than others if that makes sense you don't you know there's not a whole lot of things that we can do to make people I don't know change their ethnicity we are what we are right so there's really not much we can do in terms of remediation but education if it's a you know some sort of a health outcome variable those are things we can actually do something about right so those are good ones to have in a model if you're choosing between several and so now we have a passing model and it has population jobs low education just instance center and businesses so are we done that's great you have found a path a model that passes all of the criteria for all us and we can use this model and powerful ways to make predictions but actually whenever we find a properly specified model it's a good idea to take it the next step and we run geographically weighted regression especially since if you notice the curve protests has a statistically significant result the current core test we didn't really talk much about here is a test that tells you that the relationships that you're modeling and this is very common with spatial data are not consistent across the study area maybe the income variable is a really strong predictor in one area but not such a great predictor in another yet whenever we have this regional variation it's not a bad thing we have a properly specified model and we actually automatically compute standard errors that are robust to that kind of regional variation but if it's an indication that we might be able to improve our results actually by moving to a model that was designed specifically to allow those relationships to vary so I think that the next thing that we should do is remember the AI somebody is going to remember that a ICC 681 and somebody's going to remember they adjusted r-squared 83% and now let's go ahead and go to G WR and see if we can approve it and see if we can improve the results and if we can we'll want to map the coefficients from some of those variables and so we can see the regional variation in those when we have when we do see the regional variation and those variables sometimes that can be really helpful for us if we want to try to design some remediation strategies okay well I'm sure some of you were thinking I can't believe she's telling us we have to find a whole new model in gwr right but the good thing is that we don't that's why we did all this hard work and exploratory regression is so that we have this passing model that we can feel really confident about it's passing all those assumptions and now we can use the same model when we go to gwr so population jobs low education distance to urban center and number of businesses I say that out loud so I will remember it when I get to gwr so we get the go-to gwr we choose our data choose a dependent variable which in this case is calls still and then we choose those variables so population jobs low education distance to urban center and businesses and then fixed AIC is a good option it's gonna pick the best distance to use when it's creating those neighborhoods and hit go it's gonna go through first of all we're gonna get our our diagnostic so we can see our adjusted r-squared has gone up to 85% from a three point eight so it's gone up a bit right we've improved the model we can also look at that AICC value which was 681 now it's 678 so that's three so it's a significant decrease right so we've lowered the AIC value we've increased our r-squared we've improved the performance of our model but you guys know that that's not our favorite thing about gwr our favorite thing about gwr is that we can take a look at how those relationships vary so if we open up our attribute table we can see that we have a coefficient for every single one of those variables for every single one of those features right and now we can go in and choose our jobs variable and look at is that right yes and we can look at where that relationship is strongest or we can look at let's say education so we can see if we wanted to focus our our efforts on increasing education levels in the area increasing policies that would help kids stay in school for instance these would be the areas that we would get the most bang for our buck of course we would want to keep kids in school everywhere in our study area but sometimes we really have to focus our limited resources this is a great way to do that by understanding those relationships we also just learn a lot right it's really interesting to see how those relationships change and what factors are more important in which areas or even if you sorry it that what we're doing is we're showing where that particular variable is the strongest predictor you're right that was for 911 but that particular variable is a stronger predictor in that area so if we want to produce policies or programs in areas that are going to have the biggest impact we will at least start in those or if you go to the one on jobs did you all get to redo it sorry for example this might be where we want to implement some programs on them in the jobs for job safety you know we might want to go to these places and help them start some programs on how to keep keep things safe and avoid accidents these are the plates not this is not necessarily where we have the most jobs this is the place where that particular or the most one more than most no 911 calls this is the place where that variable is the strongest predictor so great I think there was one more thing that our community was interested in was answered so many other questions and we're going to summarize that in just a minute but there was one more thing that our community was interested in and they said that they know that their population is going to be doubling over the next ten years and there a little bit of concern about how the increase in population is going to put a demand on the 911 calls so can we use gwr to predict than them that what the demand is gonna look like with an increased number of people it's very demanding isn't she yes by I guess it's easy enough actually I'll just use the result I just used from the results window which how many of you guys use the results window I know do you not you're saying if you don't use a results window you should check it out it is so awesome and I always check it so that it keeps my results forever it does make your map documents pretty big but then you can come back to an analysis you did six months ago and see exactly what you did which I don't know about you but I don't remember what ate for lunch yesterday so it's just really awesome to have the results there and be able to rerun your analysis really quickly and just change one or two parameters so if you're not using it I highly recommend getting to know the results window so we'll rerun the analysis still with population jobs low education distance to urban center and businesses but this time instead of using the data we used originally we're going to use a data set that actually has a prediction variable in it and that is a future population variable where our population is increasing and we want to see the impact that that population variable is going to have on the number of 911 calls so we'll leave all of the other variables the same because we're still going to calibrate the model using those same variables but now we're going to use that prediction ver data and we're going to use future population than jobs low education distance to urban center and businesses and you'll notice I did those in the exact same order as I did them as they are here and that's because they have to be in the same order so you want population and future population to match up and then each of those corresponding variables to match up so that it knows which ones related to which in the in the model so now we want to create our output prediction feature class and we're ready to go so the Diagnostics are gonna be exactly the same because we're still calibrating the model the same way right but the difference here is that now we've got this prediction feature class and what I'm gonna do is I'm gonna symbolize that the same way that we're symbolizing our 911 call data except I'm gonna use that predicted value we're gonna go through and we can see hopefully if I did this right the way the prediction compares so we can see that this is what we have now and if we as the population in based on our population projections this is the impact that's gonna have and we can see those areas where that impact is greatest we can see that the places where that change is gonna happen and it can help us prepare for how we how we kind of plan in the future so great and just a I guess a little note because I do get us every time that's yes white which is pretty awesome right there's an effects toolbar and it has the swipe so you're interested in swipe it's in the effects toolbar it's a good one so I think we're done right we are let's summarize what Lauren did here she used hotspot analysis so that the decision-makers could evaluate how well the fire and police units were currently located in relation to the 911 call demand and they could use that information to try to decide if they needed additional locations or if they wanted to move some of the existing ones um where those factors suggested she used OLS to identify the key factors that contribute to 911 call volumes and where those factors suggested remediation or policy changes like with the education programs or the safety on the job programs her GW our analysis of the coefficient value suggested where those projects and policies might be initially rolled out where they might have their biggest impacts and finally well we didn't and finally what we didn't show it here and finally we could also use gwr to predict call volumes for the future this would not only help the community this not only helps the community anticipate what the 911 call demands are going to be in the future but it also provides a yardstick for measuring how effective ultimately the remediation policies are as funny she came in and she goes I don't think we do the actual prediction oh yeah we do okay fine we usually don't we so we wanted to thank you for hearing this session we're gonna have time to take lots of questions if you're interested in learning more about these tools when you get home we have a website ESRI URL comm slash spatial stats this has short videos we're gonna have a new hot spot tutorial up there by the end of the week I promise so if you have never done the hot spot analysis tutorial wait just wait wait till next week not so not Monday through Friday wait till the following Monday to download it yeah it's gonna be better I don't add it's and it's on there yes we also have if you are still at 10'o and aren't going to go to ten one quickly you want to use exploratory regression this is also the place to download our sample script of exploratory regression and then we have tutorials we put our email addresses here because we hope that you will consider us a resource if you have questions please contact us it's our managers always kind of cringe when we do this but because you guys take us up on this but it's really it's really one of our favorite things to do is to help you be successful we ask that you please do go to the ESRI comm /uc session surveys and provide us with feedback on how we can improve how we can do this better
Info
Channel: Esri Events
Views: 1,185
Rating: 5 out of 5
Keywords: Esri, ArcGIS, GIS, Esri Events, Lauren Bennett, Lauren Scott, Esri 2012 UC Tech Session, Spatial Statistics: Best Practices, analytical workflow
Id: ypCM8P-pxLQ
Channel Id: undefined
Length: 49min 52sec (2992 seconds)
Published: Fri Dec 29 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.