ArcGIS Pro: Analysis Overview

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right thanks so much um good afternoon everyone again my name is nick ginner i'm happy to have you here uh from wherever you're attending from and so thanks again for attending this esri national government technical workshop series presentation on analysis in arcgis pro my name is nick kinner i work in esri's educational services team as a technical lead and my job is to provide expertise on spatial analysis and remote sensing for all of our instructor-led and online training so in the next hour we're going to cover one of the most important and powerful capabilities of gis analysis and we're going to actually do this using a real world example to actually do some analysis to solve a spatial problem and that spatial problem is going to be where is the best location for a new fitness center now we might choose a fitness center but in truth it doesn't matter what the actual type of facility is it's just about location analytics so a brief agenda we'll first define what analysis is and then we're going to look at analysis in terms of sort of the dimensionality of data in arcgis pro we could do analysis in several dimensions including traditional 2d the spatial dimension we can add in third dimension some things like elevation or depth or height we could even bring in the fourth dimension time so we'll talk about how to do analysis in these cup in these few different dimensions and we're going to organize this presentation similar to how the analysis tools are organized in the software things like proximity and overlay and raster analysis and network analysis etc we're also going to spend some time discussing charting and how the use of traditional non-spatial charts like bar charts and line graphs alongside spatial analysis techniques can really enhance your analysis enhance your analysis experience as we come down the home stretch we'll talk about some ways to automate analysis with model builder and python and then we'll talk about some ways to extend analysis beyond the techniques that ship with the software using python and r so a lot to cover but a lot of good stuff so at the most basic level what is analysis well i like to think of it as basically something that turns raw data into information or knowledge and when we add the spatial to this it turns raw spatial data or geographic data into information or knowledge and now geoprocessing is the rich suite of tools and a framework for processing geographic data in arcgis pro at the most current version of arcgis pro which is version 2.5 they're somewhere on the order of 1500 total geoprocessing tools they kind of fall into three categories data management data conversion and analysis we're going to focus mainly on the analysis ones but at the most recent version of pro there's about 80 new geoprocessing tools an analysis is achieved in arcgis pro mainly using these tools okay however there are several wizard type workflows for specific types of analysis that we'll talk about these include network analysis performing geostatistics as well as business analysis so let's start with the most fundamental types of analysis proximity and overlay tools these tools help you answer questions like what is nearby or close what things are within other things and where do things tend to overlap so these tools fall in the analysis toolbox and they do things like feature extraction clipping proximity and overlay analysis intersects union spatial joins your standard buffers summarization summary statistics etc so an example here might be a company has store locations and what they've done is they've created one three and five mile rings around each store using buffer what they could then do is overlay their customer data and perhaps calculate the average dollar spent within each one of those rings and generate some summary statistics based on where their customers are coming from basic basic stuff combining proximity and overlay a new tool at version 2.5 of pro is one called count overlapping features this is one that let's say for example you have say school locations here in these massachusetts towns you create maybe a one or two mile buffer around each school location and what count overlapping does is it allows you to quantify and look kind of where the dense uh where the density is the most dense areas of overlap amongst all those buffer polygons some some general proximity and overlay tools raster analysis now raster analysis involves many of the similar proximity and overlay concepts except with raster rather than vector data these tools exist in the spatial analyst toolbox and require the spatial analyst extension and they're mainly for doing things like spatial modeling and analysis using raster data things like density analysis and all sorts of distance type raster surfaces you have some deterministic interpolators and you could do traditional map algebra suitability modeling and create different surfaces so an example of raster analysis would be suitability modeling where is the best location for a new vineyard perhaps and we know that vineyards in order to grow grapes require certain elevation criteria and certain sun angles and grapes grow better on certain slopes and they grow better where there's better solar radiation and we want to put vineyards close to road so a products can be shipped in and out and so when you combine all of these different variables and how suitable they are for growing grapes combine those into one surface where you can identify say the best locations for a vineyard with raster analysis you can also do things like delineate watersheds based on digital elevation models and do classic density maps so let's take a look at our first demo um we'll set up the scenario and the scenario and the certain questions that we want to answer so again our goal of this is to find the best location for a new fitness center or gym and so we're going to need to answer questions like where are our current gyms located where do our current gym members live which gym is each member closest to and where the highest concentrations of gym members and we'll do this using proximity and raster tools so let's take a look at our data we're located here on the eastern uh the the eastern portion of massachusetts and a little bit into new hampshire so basically the boston suburbs and what we have here are 10 locations of current gyms again our end goal is to try to find the best location for a new gym so currently we have our customers maps there's about 1800 customers and their color identifies which gym they're associated with and we can visualize this in another way so we could tell here immediately that some of our customers that are closest to the boston location are actually traveling further to the boston location than they are to perhaps the most near gems okay so this is just immediate information that we could figure out but we want to get a better visualization of this idea of nearness so we're going to start off by running a tool called these and polygons so i'm going to go to the analysis tab and go to tools and under analysis tools under proximity we're going to create these and polygons and what these do is they're going to give us some indication of which area is closest to each gym rather than any other gym so our inputs here will be the boston gyms we'll call these decent and we're going to use an environment setting here to main maintain that our processing extent is the extent of our gym members and we'll click run so what this immediately does is create this set of polygons such that all area within each polygon is closest to the gym within it than to any other gym so again if we go to the boston example we can immediately see here that this particular polygon in the center here your it's actually closest to the boston gym location of our customers all right so i'm going to turn off the lines and we'll turn on our member locations and the first thing i'll do is go to the appearance tab and under symbology i'm going to choose to make a sorry i'm going to choose to make a heat map and this is just going to give me a sense of how dense the distribution of my customers are and so we can see here that the sparser distribution or sparser visualization of customers is kind of in the new hampshire area in the outer suburbs and there's a much denser distribution of of the customers around the boston location now one kind of cool tip with regard to visualizing this data as a heat map right now this is smart mapping okay so as i zoom in that service is going to change on the fly but if i actually wanted to create a raster here under symbology i can go up to a little burger button and i have the option here to convert to a static raster so when choosing this this will immediately open up the kernel density tool and again running kernel density this will give us a visualization of the concentration of these gym members so i've already run one of these and so let's take a look at this here turn on our kernel density surface again we get a good visualization of the distribution of our customers mainly around the boston gym location some around the specifically around store number sorry gym number eight and gym number five where there's a higher concentration so what have we learned with just a few proximity uh and and raster tools we now know kind of the extent of our customers they run they or they come from basically the worcester area down to the cape all the way up into new hampshire where the highest concentrations are in boston and the surrounding suburbs we also learned that many of our gym members actually travel further than their closest gym and really the new hampshire gym doesn't have that many customers so it's a great start and we'll continue building this analysis from here all right so back to our slides let's talk about network analysis network analysis requires a network analyst extension and what it does is it provides tools for working with transportation networks these include techniques such as calculating service areas doing routing location allocation vehicle routing problem so an example of service areas these rings represent say five ten fifteen minute drive times from specific facilities you can do routing calculating the best route from an origin to a destination you can use or you can solve the location allocation problem which essentially allows you to locate facilities to most efficiently demand i'm sorry to most efficiently supply demand points you can calculate origin destination cost matrix which essentially calculates the least cost paths from origins to destinations do the vehicle routing problem routing for a fleet of vehicles and even think about doing 3d routing right so imagine how useful it can be for say a fire company to know the most efficient route from the entrance to of a building to say an apartment on the fifth or sixth floor so let's go back to our study and we'll look at network analysis and another type of proximity uh proximity and overlay analysis all right so we're back in our boston study area and what we're going to do now is calculate some drive times to get a sense of each gym's study area so i'm going to click on my boston gyms and from the analysis tab i'm going to click network analysis and i'm going to click to generate a service area and what this is going to do is to make a service area layer and again the idea is to get a sense of what spatially a 15 or 20 or 25 minute drive time looks like from each gym so this is going to open up the service area tab and the first thing i'll do is import my facilities so i need to basically tell the layer what the input locations are and those input locations are my gym locations so i'll say run the next option here is to specify some parameters so i'm interested in a drive time i'm interested in the drive time to my facilities and i'm interested in a say 15 and 25 minute drive time to each one of the gyms because you know people want to people don't want to drive that far to get to their closest gym so we'll specify these parameters and we'll run it all right so we see here the pink area in the middle represents a 15 minute drive the darker pink area around it represents a 25 minute drive so if we turn on our customers or i should say our gym members we immediately see that there are definitely gaps where we have gym members coming to gyms that are outside of these service areas that we're interested in so we're going to try to do something about that so here's another version of that map we just made and what we really want to learn is we want to learn about the demographic and lifestyle characteristics of the people that live within these 15 minute drive times of our gyms these are going to be our ideal customers we want to know do they like to go to the gym we want to know um what their income is like what their lifestyle is like do they work out from home or do they like to go to the gym um these are so we need to identify our our ideal customers and to do this we're going to use a tool called enrich and so what enrich does is it takes advantage of esri's curated demographics data and basically allows you to choose from something like 13 000 demographic lifestyle consumer variables to append to your data so if i click on one of my drive times here you'll note here that my attributes do not contain any sort of socioeconomic or demographic information and that's why from analysis i want to choose to enrich my my data now what i'm going to actually do is go into my history here and show you how i've run enrich and what i've basically added in so again i'm looking for my ideal customer so to each one of these yellow polygons i want to understand what the median household income demographics look like the number of people that exercise at clubs two times a week and then the total population in certain tapestry groups tapestry is a basically a market segmentation um categorization that identifies geographic areas lifestyle characteristics so basically what i'm doing is i'm using enrich to add these variables into my drive time areas to get an understanding of the people in those areas so in the interest of time i'm not going to run this but basically what you end up with is the same output yet when i click on each one of these drive times i now know the median household income the number of people that go to the gym as well as the number of people that exist in these different lifestyle tapestry groups so i have this information that i could use about my customer base in additional analysis all right so we have that now let's go back in and we'll turn on our 15 and 25 minute drive times we'll turn on our current gym members and what we now want to do is to get a sense of how many of these gym members are existing outside of these ideal service areas so we'll do a simple select layer by location where i'm going to select my gym members that intersect with the 15 and 25 minute drive times and i'm actually going to choose here to invert the spatial relationship because i actually want to select the ones outside of these areas so now anything in pink is essentially are essentially gym members that are sort of in underserved areas okay so we're going to keep this in mind as we then proceed with this analysis and figure out where the best place is for a new gem alright so let's go back to pro or to powerpoint i should say all right so we've talked about proximity and overlay we've talked a little bit about raster we've talked a little bit about network analysis now let's start to talk about some of the statistical tools that are available in arcgis pro first one being geostatistics this requires the geostatistical analyst extension and these are tools for performing both deterministic and geostatistical interpolation it provides options for doing exploratory spatial data analysis krigging methods doing things like sampling and simulations doing interpolation in 3d and it provides a geostatistical wizard which is this sort of um interactive wizard that walks you through the uh the process to model the structure of spatial data which is what you need to do uh to do creating so an example of performing geostatistics is to take discrete measurements of some continuous data like temperature important part of geostatistics is to do exploratory spatial data analysis there are many assumptions associated with methods such as kriging and these assumptions need to be met in order to provide optimal predictions some of which include that your data is normally distributed so there's tools to look at histograms and qq plots which give you a sense of whether your data meets certain assumptions the big kind of the most important part of the geostatistical wizard or one of the most is the id is the uh the semi-varigram which allows you to model the spatial structure in your data and it it informs the parameters used to make the uh the prediction uh intriguing so what you could then do with geostatistics is take those discrete point measurements of a continuous variable such as temperature create a prediction map and not only a prediction map but also a standard error map and these maps give you a sense of how good or how reliable your predictions are staying in the theme of statistical we have many spatial statistics tools and these tools in the spatial statistics toolbox are available at all license levels in the software these are going to be tools for performing spatial statistics um descriptive spatial statistics and inferential spatial statistics which are tools for uh determining if spatial patterns are significantly different from random there's also tools for performing prediction and these prediction tools can be essentially traditional ordinary least squares regression techniques but also some different geographic regret um different geographic regression techniques and machine learning methods so for example of a descriptive statistical tool here's one called standard deviational ellipses which gives you a sense of directionality and concentration of a point pattern tools like hot spot analysis and cluster and outlier analysis finding hot spots cold spots in your data also doing things like multivariate clustering so finding clusters in data based on their attributes and the ability to do this in a spatially constrained way so in this example every single one of these polygons in yellow is more similar in terms of its attributes to all the other polygons in yellow than to any polygons in any other color so different clustering techniques again different regression techniques both traditional non-spatial as well as geographically weighted regression so in this case what you're seeing here is that there's uh this is the result of a geographically weighted regression and that the darker counties here in the u.s have a stronger relationship between the dependent and independent variables than do the lighter counties and some of our more recent tools for again understanding how relationships between data variables actually vary across geographic space so lots of really great spatial statistics tools we also have many tools that men incorporate that fourth dimension time and these tools build upon some of the spatial statistical methods by incorporating that temporal dimension are there patterns in both space and time and these are available in the space-time pattern mining toolbox again they perform spatio-temporal statistics things like emerging hot spot analysis time series clustering and then being able to view these results in both 2d and 3d so here's an example of tornado locations in the state of texas of course it's important to visualize uh temporal data in something simple like a line graph right we can see here um uh perhaps there's a trend right tornadoes have maybe increased from 1950 maybe stabilized a bit and then kind of dropped again towards the beginning of the 21st century but we could incorporate this temporal element into the spatial analysis so this is the result of something called emerging hotspot analysis so it builds upon the traditional spatial cluster analysis technique hotspot analysis and helps you determine if the hot spots and cold spots that you see are new or are kind of fluctuate over time or are on the decline by incorporating time you can also visualize the results of these temporal hotspot analyses using space-time cube visualization so you're essentially seeing a stack of these locations over time and and the results of those of that analysis over over that time we also have time series clustering which essentially creates groups of locations in your data that have a similar time series all right let's get back to solving our problem and we're going to actually see how a few spatial statistics tools help us identify some of those ideal locations for our new gyms the questions we're answering here is where are the gym members most densely clustered where is the center of each cluster of members in particular these sort of members that are underserved by the drive time areas and then finally we'll start to hone in on which potential gym location is most similar to the best performing existing gym and this might be the one that we decide to choose as our new gym going forward all right so we'll go back to arcgis pro and i have to clean up a little bit here so i'm going to turn off my drive times and again remember i have selected customers or i should say selected gym members that are outside of those drive times so the underserved ones and what we want to do is to essentially go in and find clusters of points based solely on their location and try to sort of separate these clusters from noise so i'm going to go back into analysis and go back into tools and we're actually going to hit the spatial statistics toolbox and again with spatial statistics tools simple descriptive statistics of spatial patterns quantifying patterns mapping clusters doing modeling of spatial relationships in this case we're going to choose one of our clustering tools called density based clustering and again we're going to look for clusters of points based solely on location and try to separate these out from noise so my input point features are going to be my gyms my gym members and again this is only going to operate on the selected points we'll call this gym members density density-based clustering has three different algorithms available the one that we're going to choose here is called hdbscan this is by far the most data-driven approach for density-based clustering it has the the least amount of user input parameters all i need to tell it here is that the minimum features within each cluster have to be 25 gym members so let's run this note there that i actually didn't specify the number of clusters i wanted it automatically found four clusters of data so anything that's colored is considered a cluster that met that criteria anything that's sort of gray and small is considered to be noise so i have these four customer clusters what i'm then going to do is a selection so we'll go to map we'll go to select by attributes from these density clusters i'm actually going to select the ones where the cluster id is equal to negative one nope sorry is not equal to negative one this should select only those points that are not noise only those points that are part of clusters now with the selection here i'll actually go back to my spatial statistics tools and grab one of the ones for measuring geographic distributions this one's going to be called mean center so what mean center does is it basically finds the average center location amongst a group of features so my input feature class here will be the gym members density on the selection we'll keep it mean center and the only other thing here i'm going to specify is a case field and the case field is cluster id so instead of finding one average center for all the data it's going to find four average centers each one associated with one of the four clusters all right so if i turn off these i've now whittled down this area to four potential gym locations and that was based on where my current gym locations are where the year their service areas are where my current customers are and where the current customers cluster in underserved areas so we're gonna stop here for now with these this idea of four potential new gym locations and then the next thing we'll do is whittle those four down to only one all right so we're in the same area here and if you recall what we did was we had our current 10 gym locations in yellow we know the information about the people that live within those yellow areas such as their income their exercise habits at gyms as well as their tapestry information lifestyle characteristics what i've then done behind the scenes is i've created the same 15-minute drive times around my potential locations which are in blue and i've also gone in and enriched those with the same information so i have this sense of what the customer base is like around my existing gym locations and what the customer base is like around my potential gym locations and of those four potential gym locations i want to now identify the best one which is most similar to my best performing gym location so what i've already done behind the scenes is i've run a tool called summarize within and essentially what this did was for each one of the 15 minute drive times it gave me a count of the average dollars and total dollars spent by all the members attending this particular gym so i could say use the average dollars spent as my indicator of my best performing gym so here's an example of how the charting capabilities can be helpful in pro what i'm going to do is create a chart i bought a simple bar chart around my existing gyms and i'm going to create a chart for facility id and i'm going to take a look at the average amount of money that facility generated so i could see here immediately that facility number five which is now selected on the map is my best performing current gym so i'm going to keep that selected and then i'm going to do one more step here to whittle down my four potential gym locations to the one that's most similar to my best performing gym so we'll go back and we'll go back to spatial statistics tools one more time and i'm going to use a tool here called similarity search and we can see here what this tool does is that it identifies which candidate features are most similar or dissimilar to one or more input features based solely on their attributes all right so my feature to match is my best performing store so that's going to be boston gyms yep my candidate features will be all of my potential gyms we'll call this sim search i'm going to choose to find the most similar based on attribute values and the attribute values that i'm looking at are the same for that we enriched with the median household income the number of people that exercise at gyms and then those different tapestry attributes and again my number of results is going to be four because i have four potential gym locations so let's run this and this should just complete here in a second so here's what my output looks like the darker the polygon of the potential the most similar it is to my existing best performing gym so if i were to go in here and turn on my massachusetts town boundaries we see here that the potential at the gym with the most potential for i'm sorry the the the potential gym that is most similar or most ideal to what we're looking for exists in the town of natick massachusetts so we might then hone in and start to find available facilities for this okay so we did a lot there we essentially started with current gym locations and our customers we identified where there's gaps in service to those customers we did a few different spatial statistical methods to kind of whittle down these potential locations to a final one and we're going to continue on from here all right so back to our slides where were we here okay let's not forget 3d in arches pro you could do analysis in 3d and you could do that through the 3d analyst extension these are going to be tools for working with things like digital elevation models so surface models or 3d vector data lidar and multi patches you could do things like look for different relationships between 3d features interpolation data management generating surfaces doing visibility analysis so things like generating sight lines using elevation data or 3d vector data looking at skylines so basically dropping a point and seeing what might be visible at the top of the skyline doing things like calculating volumetric measurements in this case you can actually calculate perhaps how much water is in this sort of channel leading up to a dam we have different 3d tools for generating procedural symbology as well as doing traditional proximity and overlay but with 3d so for example uh this is a let's say this is a power line what i've done what i've done is created say a 15 foot buffer 3d buffer which actually is a cylinder around this power line and i could actually see if uh if trees intersect this uh this 15 foot buffer around a power line that's a real world problem of course in 3d analysis you work with lidar data in pro as well geoanalytics desktop tools these are tools available at the advanced license level and what they allow you to do is to perform kind of traditional geoprocessing type things like analyzing patterns and summarizing data and proximity like we've already seen but on bigger data sets so you do this by sort of leveraging parallel processing over multiple cores using apache spark but it's all on the desktop so an example here um is this example i did where there were where i mapped 1.8 million wildfire incidents in the u.s and what i wanted to do is to get a count of wildfire incidents per state so using the traditional summarize within tool so not geoanalytics desktop this aggregation of 1.8 million records to 50 states took almost five minutes using the same tool but with geoanalytics desktop this aggregation took only 29 seconds that's about a 10 times difference okay so doing these sort of traditional um traditional gis type of analyses but with distributing the processing over multiple cores you can also do imagery analysis so we've already kind of discussed raster analysis using the spatial analyst extension but there's also a suite of tools for working with imagery and remotely sensed data we have tools for working within doing deep learning stereo mapping image classification so traditional remote sensing image classification using um statistical techniques as well as machine learning techniques image space analysis working with video and doing multi-dimensional analysis so here's an example of traditional image classification you take an image and you can either do pixel based or object-based classification you turn that into categories using different algorithms we also have the ability to work with um it's referred to it as map space versus image space so this is an example of an image that was taken at an oblique angle if you look at it up top side down um you're basically seeing that these buildings look kind of bent and that's due to the angle but we can actually visualize and do analysis in image space which allows you to basically view that imagery from the oblique angle that it was taken we also have a bunch of tools for doing multi-dimensional analysis so basically working with scientific data cubes like net cdf data that contain many variables geophysical environmental climatological depth time etc and these all fall under imagery analysis charts let's not forget charts let's not forget the non-spatial um we spent most of our time thus far focusing on spatial analysis tools um and we'd really be remiss not to consider how important these traditional non-spatial charts and graphs are in analysis and particularly when these graphs and charts are used side by side with spatial analysis tools i think as gis folks we tend to think i'm sorry as gis folks we we tend to think that a lot of traditional data analysis people focus on charts and graphs and line charts and neglect the spatial as gis people i tend to think that sometimes we focus too much on the spatial and neglect the traditional non-spatial and i think that charts really um have a big impact on what we could do and so they help us visualize qualitative quantitative and temporal data and there's some great charting capabilities in pro you could do things like create traditional bar charts we just saw how looking at a bar chart um of our dollars generated for each of our gym locations how that helped us see the the best performing one and we could interactively select it it selects something on the map everything works together so doing things like bar charts box plots which kind of give you this combination of categorical or qualitative and quantitative traditional line charts which are great for looking at time series other ways to visualize temporal data such as data clocks and calendar heat charts again traditional histograms we just saw how important understanding and distribution of your data is for things like geostatistics and meeting different assumptions qq plots again normal distributions as well as the ability to create scatter plots so bivariate and even scatter plots um on a matrix for for a number of different variables so great charting capabilities um not only for the kind of pre-analysis exploration of data but a lot of the geoprocessing tools specifically a lot of the geo spatial statistics tools actually produce charts along with the mapped outputs which can be really helpful for an interpretation of results and i think we already saw that a little bit with the time series clustering all right so we also have a bunch of ready-to-use tools and portal analysis tools and so these ready-to-use service tools they're they perform a lot of the same types of functionality as the geoprocessing tools you've already looked at um however they leverage um their services right so they leverage arcgis online analysis capabilities um the portal analysis tools kind of do the same thing however they rely on an arcgis enterprise deployment so again couple of ready to use tools mainly deal with network analysis because arcgis online provides a network data set as a service to do a lot of this um but there's also ready to use tools for doing some more of those raster type analytics again portal analysis tools both with feature analysis and raster analysis particularly raster analysis they require arcgis enterprise with image server and basically it allows you to leverage distributed server based processing for rasters and imagery so again a lot of the similar tools for pattern analysis and summarizing data raster analysis tools a lot of the similar ones but again being able to do it distributed and then geoanalytics tools that require geoanalytics server but again server based processing primarily the raster analysis and geoanalytics tools are for for working with big data in quotes all right as we come down the home stretch here we'll talk about automating analysis so thus far everything we've done is to run individual tools but what if we want to run tools over and over again on the same data maybe we have a folder full of 100 shape files and we want to um reproject all of them into a different coordinate system well i could run the project tool a hundred different times or i can automate it perhaps through model builder or through python and get that done all in one click or maybe i want to connect several tools together in a workflow that could be repeated so a couple of different ways to do automated analysis or through batch geoprocessing through model builder exporting to a python script and then one of the latest things is to be able to actually schedule geoprocessing where you can go into a tool and you could schedule it to run at a certain time each day or at a specified time maybe when your computer isn't doing other really computationally intensive stuff so an example of batch geoprocessing here i have eight land cover maps for eight different towns in massachusetts and for any geoprocessing tool if you click on it in the in the toolbox and right click on it you have the ability to do that tool in batch mode so in this case i'm running a tool called tabulate area which calculates the um the area of different category categories within polygons and so the batch version of it is going to take in eight raster inputs it's going to take in the uh the polygons of the towns and what it's going to do is cr is run eight batch i'm sorry eight tabulated areas one on each town with with basically one mouse click i can also use model builder which allows me to string tools together the three geoprocessing tools here xy table to point near an x wide align all of these combine to create a spider diagram and so once i create a model i can do a little bit of work with parameters all in the interface to create a geoprocessing tool that actually looks like it ships with the software with a graphical user interface so you could do that with model builder you can also export models directly as python files if you um uh if you have to do further work with or maybe you have to use some kind of non-gis python type workflows um what that will look like here is this script right here is script that was generated automatically just by clicking this export to python file right so i didn't write any of this code it's all in there it's all done for me and not only is this helpful if you have sort of non-gis python workflows but it's really helpful if you're kind of a new person to python because this i think can help you learn what the syntax looks like and can help you kind of troubleshoot and get you on your way it kind of gives you this baseline all right so a couple more things we talked about in the beginning a little bit how we're going to discuss extending analysis with python and so at pro 2 5 you can now leverage the interactive analysis capabilities of the jupyter notebook directly in the arcgis pro application and this allows you to tie a lot of things together uh core python functionality arc pi so all the geoprocessing tools the arcgis api for python and perhaps most importantly hundreds of third-party python libraries to do things like data cleaning data engineering machine learning statistical modeling and so on and so forth so let's go back to our study area here and let's kind of review what we did we have our original 10 gym locations and what we did was we decided that the best place for a new gym is where this green dot is in natick what we then did in the beginning was showed you where all of the original customers are coming from right and remember we saw that a lot of these customers or i should say gym members are attending the boston gym location but they're coming from far away so given that we've added a new gym we're gonna essentially remove a gym and then reallocate our customers all right so we'll turn these off again original gym locations new gym location what we've decided to do is to remove the new hampshire gym location and replace it with our new one so that's what this looks like here so we still have 10 however uh i should say we still have nine however no longer the new hampshire gym location what i've done behind the scenes is i've run a tool called location allocation which essentially allows me to reallocate my customer locations um using basically a a there's a couple different methods to do this i use one called maximize attendance and what this did was it reallocated all of my gym members to to the new stores or i should say to the existing plus the new gyms okay and that's kind of what this looks like so i took this one step further and i wanted to use python in particular something called a search cursor to help me calculate the average drive time for all the gym members to the gym that they've been newly allocated to and while i don't have enough time to really go through this i'll show you at least how the notebook looks so the first thing i have here is to import a few different modules i've defined a couple of variables so my origins are my new gym locations and my reallocate and my destinations are my reallocated customers and essentially what this step is doing is it's using something called a search cursor to loop through each gym location and its newly assigned member and use a network analysis tool called closest facility to calculate the drive time for each member to the new gym that it's been assigned to it then goes through a series of steps to average the drive times and aggregate each gym location for for display so for example if we go back to our map here these are the newly aggregated boston gym locations and customers and for my final map based on the python script what each one of these dots is showing is the average drive time for each one of the customers right so we've taken a lot of data and kind of whittled it down through a series of steps here and again if you look at the map this kind of makes sense right you have a lot of the new hampshire customers that were formerly going to the gym uh the most northern gym now having to come to this one in massachusetts and they have a longer average drive time we've definitely whittled down the distance that customers are coming to the boston location and not only have we sort of whittled it down but we now have a better sense of what the average drive time is for all those customers gym members so all in all for this analysis we started with a set of gym locations and a set of members assigned to those locations and we looked at the data and we got a sense of the distribution and we determined the best potential locations and we used a couple of different methods to kind of whittle down to the best location based on our best performing current location we reallocated our customers and we used python to uh to kind of do some calculations for us to to get the the average drive times for our new gyms okay so last thing i want to cover here before i wrap up and can you continue the conversation of extending analysis we'll talk a little bit very briefly about r so another great way to extend the analysis capabilities of pro is using something called the r arcgis bridge and this allows you to combine the statistical and data science capabilities of the r programming language with the spatial data management visualization and analytics of arcgis so there's kind of three main patterns of use here the r bridge can be used by arcgis people to access thousands of open sources statistical packages in r and apply them to their spatial data it can be used by our users to access the spatial data storage visualization and spatial analysis capabilities of arcgis and it can be used by arcgis developers to basically create custom geoprocessing tools that leverage our functionality so let me show you one quick brief example of how you can use r with arcgis what we're looking at here is a map of uh the the the use case here is species distribution modeling so we're looking at a map of um incident locations of the three-toed sloth in uh the northern part of south america so what i have are incident locations designated by these crosses and i have a series of environmental variables representing things like um temperature precipitation different wetness measures so several different environmental uh sort of continuous variables and then i also have a map here on on different biomes so the idea here is that we have known occurrences of the three-toed sloth we have a bunch of environmental variables we want to create a suitability map that might help us figure out where the best locations are for or i should say where these species will be distributed so very briefly what i've done is i've created an r script and of course i'm not going to get into this but there's a few main uh sort of things to be aware of with regards to linking together arcgis and r the first is the arcgis binding library this is the library that's required to essentially make the connection between arcgis and r checks it checks that it works and then there's a few lines of code here that basically take a feature class from an arcgis geodatabase and turn it into a format that can be used in r so this creates an r data frame i read in my rasters and truly the most important thing are these few lines of code to read in the arcgis data and then everything else that i do here is very r specific so this is where i can actually use different species distribution modeling tools like this one called bio climb as well as another one called max ent to actually do species distribution modeling these are the capabilities that are not available in arcgis but the point is is that i was able to very easily read in arcgis data do a bunch of stuff in r and then finally write the data back out as a tiff that i could read in arcgis so very simple with a few lines to read data in all the stuff in the middle is our capability and then a few lines to write things back out so what that eventually created here were two examples here's the bio climb suitability and the max ant suitability where green is more suitable for the three-toed sloth so again we don't have much time to go into the details but the point is is very simply you can take your arcgis data export it into a format that you could read into r do your statistical work your machine learning work your modeling in r on capabilities that are not in arcgis write it back out and then use more tools in arcgis or visualization or whatever you need to do so there's a couple of examples of extending analysis and i'm coming up right on the the final minute here so um thank you for coming i know it was a lot of information but hopefully that gives you a good indication of pretty much all the different analysis capabilities in in arcgis again tons and tons of tools but the most important thing is that um you start off with a good spatial question and figure out the best ways to answer that question and there's certainly many of them do so so hopefully it was helpful hopefully enjoyed it and stay well thanks you
Info
Channel: Esri Industries
Views: 7,514
Rating: 5 out of 5
Keywords: Esri, ArcGIS, GIS, Geographic Information System, ArcGIS Pro
Id: KI4VjXyJxf0
Channel Id: undefined
Length: 58min 52sec (3532 seconds)
Published: Thu Jun 04 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.