Summarizing within polygons in ArcGIS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello so last tutorial that we did we focused on uh points within polygons right so we're comparing the campus trees to efficient polygons that we had created today's tutorial we're going to kind of expand a bunch on that we're still going to be looking within polygons but we're going to think about not only points so we'll revisit the points again but also lines polygons and raster data uh within those so that's sort of four separate pieces to work on that's going to be a lot in this single video um so feel free to break it down you know into the four component parts work on the points first work on the lines some other time polygons and then raster right so um yeah so lots of content today which is fun all right so we're looking at and again i'm working in arc pro but you can do all of these same processes in arcgis desktop um we are looking at a bunch of watersheds in massachusetts just sort of a random selection of watersheds we can open these up and see that in this case there's not too much going on with these watersheds all we have is a name of the watershed um so one nice thing to start out with is okay these watersheds unlike the fishnets which we created ourselves right the watersheds come with a unique identifier or what looks like a unique identifier anyway um you know we can double check that it's a unique identifier by summarizing it and seeing if the sort of count of each individual name is the same as uh you know should should equal one basically let's just do a count so basically i'm just double checking that there aren't duplicates in this field called name and there are 182 different rows here if there are no duplicates then this summary should also have 182 rows and everything should have a frequency of one so that's good that tells us that there aren't any duplicates within this i'm just going to delete this because i don't need it anymore so these are all each unique identifiers and we can you know go around and oops go around and look or i wanted for where each of these basins are located these must be some small ones so there we go you know all these different ones as you would expect with uh with the spatial layers i have given you um dams in massachusetts so here is a data set of locations of dams some of those are inside the polygons some of them are outside these watershed polygons so you should at this point be pretty uh have a pretty clear process in your head for how to count up the number of dams in each of these watersheds but let's go ahead and go through that quickly right if we want to know let's open our dams if we want to know where these dams are located relative to those different watersheds then what we need to be able to do is to pull that unique attribute information about the watersheds into and associated with each of these different points right so that is our spatial join and our target feature is the one where we want the information to end up is our dams we want to add to it our basin information and you know none of this is all that important when it's points because they're all going to be the same um all of these different options sorry all of these different options are going to be the same completely contains within intersect etc so off we go now the main thing that we should be expecting here is that in our massachusetts dams once this finishes processing we should have an additional field at the end um that let's see i guess in this one and our new spatial join layer that has information about the name of the watershed now there's a bunch of nulls in here and that is to be expected because there are points because our watershed remember is not the entirety of the state it's only some portions inside of it so there are a bunch of points that are outside of those watersheds and we don't care too much about those because all that we care about are the ones that are inside the watersheds to count those up how do we count those we use our handy summarize tool based on lots of fields in this data set based on the count of all of these names names being the name of the watershed and this is just an error telling us that there were some null values in here and gis ignored them which is fine because you know there were 1000 points 1063 points with null values that you know we don't really care about in this case we just care about the ones that actually fall within the watersheds so it turns out there were only 82 um oh no never mind 82 is the number of watersheds i don't know what i'm talking about uh i think we started 1500 with the number of points that 1500 dams so 1063 so we're left with whatever that is 437 dams that are actually inside one of these watersheds we want that information back to the watershed so that we can display it and count up how many are in there that's where we do our regular old join name to name i should have renamed this earlier but being lazy today all right so now when i open my basins attribute table i should find a frequency in account which are the same thing right because uh summarize automatically does frequency so count is redundant but i won't let you go unless you do that so i'm going to add a field to this and i'm going to call it um counts good and i'm going to copy over with my field calculator that information about either the frequency or the count name would be totally fine let's go with frequency it's on top and then we end up with zeros for places where it was null which is what we want and numbers for everything else so now i can release that join because i don't want all this extra junk that i'm not using in my table clean it up and now this means that let's turn this off um as you would expect now we can summarize [Music] trying to like pull this down there we go um we should now be able to not summarize symbolize our base and study area using let's go with graduated colors because we've got a number low to high um and it's automatically choosing the counts of dams because that's the only number value in here um and i not so sure about that particular graduated color i feel like maybe this one is a little bit more intuitive um in terms of the light to dark where the light ones are areas where there were zero or one dams and then as you get darker you see the watersheds that have a whole lot more dams you can also so you could normalize this just is another form of symbology um and if we were to normalize it we would need an area is typically what you would normalize by so let's call this area in square kilometers um it's a long integer okay let's change this to floating point just for just so we've got more precision um and so we can calculate the geometry of that and i want that to be area does not let me use oh it does let me choose a unit that's fun area in square kilometers because i told myself it was going to be the area in square kilometers and hopefully we can um actually not sure i think that massachusetts dams is okay so that's in mass state plane shouldn't matter what the coordinate system is as long as there's some sort of coordinate system defined and gis can handle doing square kilometers i don't think you even need to check anything there so if we had an area this is just a little bit of a tangent but if for example this tells you the total number of dams but you can see like this watershed is pretty big compared to this watershed which have both have a lot of dams in them so maybe instead what you actually want to display is dams per square kilometer watershed in which case you can do the normalization and then you get a slightly different picture of what this looks like you know actually because this one is so tiny you end up with large concentration of dams per square kilometer so depends what you're trying to visualize okay so that's our point all of that should have looked pretty familiar in terms of stuff that you've seen and done before um next let's think about roads so roads are a line file right this is actually the census roads layer so that's why it's taking a little chunk to draw here there are a lot there's really fine articulation like every single road in the state um at least as of 2010 should be in this data set so lots of roads in here let us zoom in a little bit so that we can see the roads and kind of see the basins behind them okay so i want to ask the question of all right we've counted up the dams in each of these watersheds now i want to know what's the total length of roads within each of these watersheds right and typically if you're combining lines with polygons or combining polygons with polygons start with the lines first typically if you're combining lines with polygons length is a question that you want to ask i mean you might have there there may also be plenty of cases where you have you know individual line segments and you want to count count them but length is a pretty typical thing that you might want to ask when you've got a line file similarly when you've got a polygon file area is something that you typically might want to ask when comparing polygons to polygons so let's start with that length of roads we'll get to polygons later so this is the line to polygon segment of the tutorial first thing is that if we you know so so just like we did with the points if we want to summarize the total length of roads in each of these things we need to get that basin name into our road file so we've got to do sort of the exact same process or not exactly keep contradicting myself not exactly the same process but nearly the same process what's different about this one is that if i go in and i look at one of these roads and i click on this road then i can kind of see that this particular road extends all right well that's giving me some corner of it i'm not quite clicking on the road you know there are like these segments of roads okay there we go so this road is in the watershed it's in whatever this watershed is wt187 but it's also outside of the watershed right it's over here and it's quite a ways over there if we did a spatial join then we could add the information about this watershed to this line segment but it would also include this chunk you know over here and what if this road went into another watershed then we start ending up with kind of a mess basically in terms of where this information gets appended you end up with like more rows than you started out with which means that so i prefer uh in this case to use an intersect instead so in intersect you might remember um oops find me my intersect oh look it's right here an intersect is one where let's see if the help menu over here gives us a picture yes i saw it an intersect is one where you only keep the area that overlaps right the intersecting area between two features so this is showing it for two polygons but you could imagine that if there was a line going through this circle then you would just end up with the line portion that is inside the circle to move my head um and that's what we want right because we only care about the length of the line segment that actually intersects this um basin uh sorry this watershed area so our input features the nice thing about intersect is that there's no order so you don't have to guess about it so we want our basin to come in and we want our roads in massachusetts to come in and let's call this in we want to keep everything all of the attributes in the join so that we make sure that we've got you know the name information um although these pieces are going to be confusing later on so maybe all right not too many options there unlike spatial so the nice thing about spatial join is you get to choose which of your fields come come with you in the um in the kind of joined result um intersect you just get everything so we might end up with a partial mess later on so we're going to have to delete some stuff if we want if we care that much or want our intermediary data set to make sense to us later on all right so that intersect took a minute so probably on your commuter computer it will also take a minute but now we have an output of our roads that are inside the watersheds um which i think i want to make that look a little bit brighter just so we can see it definitely can see that all right hot pink roads um and you can see if we zoom in that these roads are just the parts that are inside the watersheds right we have turned off basically or removed all of things that are outside the watersheds and if we scroll around we can also see that some watersheds have a few roads and some watersheds have a lot of roads in them so you know if we were looking at something like amount of impervious surface in these watersheds um that's going to be something that's relevant to um the length of roads would be relevant to impervious surfaces okay so i'm going to open my attribute table and let's take a look at what we've got so the most important thing that we've got is the name of the watershed associated with each of these 88 000 road segments so it gives you a sense of why it takes gis you know a minute or two to process that intersect because there's lots of different files to work on these pieces they came from oops that's not what i meant to do um these bits that came from the basin are not meaningful here really the only things that we care about are the name of the basin and the length of the road um now gis has already because this is in the geodatabase file it has already calculated and updated the length of those roads the one thing that we don't necessarily know is what the units are of these um the units of shape length if you've got that in a geodatabase are always going to be associated with your projection of that of your data so let's just check out what our projection is so in this case spatial reference tells us that we are in mass state plane and that the units here are meters linear units are meters it's also there vertical units are meters there's nothing vertical about these roads but you know meters are what shape length is going to be so this is a teeny tiny little road segment of a tenth of a meter this these ones are slightly longer road segments of 600 meters and so on um because i previously calculated the area of the basins in square kilometers and i could have done that in square meters too like it really doesn't matter it's up to you what you know units you're using but i prefer to stick with the same units throughout so i am actually going to call this length in kilometers and we'll add that to our field and then just give it don't want to have anything selected just remember it only operates on the selected things and let's just calculate the length of that length in kilometers cool all right that took a minute to calculate and then i realized that i made a mistake when adding this field which you can see over here which is that i made it an integer um or i select i allowed the default to persist which was probably a long integer um and when you've got an integer you're going to round to the closest integer so um 0.347 kilometers around into zero point six five two rounded to one so let's fix that [Music] i want to make this floating point instead um and i will have to delete this other one later on good enough for good enough for now though all right and since i remember that this takes a while i will do that same thing and then we'll come back when it's done thanks kilometers all right hopefully this one will work right right so now we have actual length and kilometer um in here and i deleted the other field so that i wouldn't get confused later on okay so now we're off to summarizing and here's where summarizing is going to be just a little bit different from what you did before because rather than focusing on count we actually want the sum of all of these things right we want to add up how many total kilometers there are within each of these watersheds so we're summarizing by the basin name but our statistics that we want so basically the numbers that we want to generate are going to be based on length in kilometers and gis by default is selecting uh the sum of these things and in this case so this is where things are going to be slightly different from the way that the way that you've done things before now we do need a case field right so we want that case field to stick in here as name it's coming in by default because that's the one that i clicked on um let's call this length we've got sum of length with within each of our named watersheds i think we've actually had that case field thread i'm not sure what i was thinking just then and it looks like it's here so let's open that up and now we can see that and so just like we did with uh with the points we always by default end up with this frequency which is the number in this case of road segments it's not particularly meaningful because the way that the road segments are you know some of them are just like continuations of the same road as it crosses a town boundary or changes its name or what have you so i don't care i don't think that frequency is particularly useful but this piece is pretty useful right so you know this watershed has 78 kilometers of road length there are going to be some watersheds that have you know over 3 000 kilometers of road length um to some that don't have too much in the way of roads at all so now we're going to bring this information into our basins because ultimately to visualize these things we want it all to end up in the polygon so that means that we need to join from the polygon um our join field is going to be that name right and i want to join the road lengths good so now i'm gonna do same process that i've been doing you know this whole time with um with my uh tables i don't know why i can't scroll over to the right i guess i can go this way um i've got this some length kilometers but again i'm not going to remember that if i come back you know a month from now a year from now that some length kilometers is actually total road length in kilometers so let us add a field that is called road [Music] no good enough length in kilometers and now i'm going to remember to make sure that this is a floating point because i want to keep all of my integers here and just like we've done a few times before we're going to use our field calculator to count to carry over that some length in kilometers um into so should be the same same value here to here and now i'm happy so i'm going to release my join so i don't have to deal with these repetitive multiple fields right so cool now i have the area of my basins i've got the road length i'm going to turn off these roads because road length was the main thing that i was looking for so now rather than visualizing dams per square kilometer let's visualize road lengths in kilometers per square kilometer and get rid of these guys go down a real problem getting this table to be smaller without closing it so i'll just close it okay so now we can see and this makes sense right so eastern part of the state has a lot more urban and suburban areas than the western part and so we actually end up with this kind of interesting gradient of higher road density moderate road density and low road density as you go across the state in terms of roads if we didn't normalize it then the area of those different basins is gonna influence things a fair bit right this is a really huge basin with a lot of road length and so it ends up with lots of roads so that's where that normalization can uh give you just a different picture of how things look right again this is a cartographic choice it depends what you're trying to emphasize okay so hopefully that's our sort of next segment of combining uh la lengths combining lines with polygons to get some information about the length or the total length of things in each of these polygons so last up are or actually no next up because we're going to do raster too next up are our polygons to polygons so here i have this layer which is interior forest created by mass gis and we can turn that on um i never like symbology with a black outline because i always feel like it makes it really hard to see things so let's just make these like dark green with no outline and then i think you can kind of see where the forest is a little bit more right so um we're probably also with this particular layer going to end up with one of those gradients going from you know the western part of the state to the eastern part of the state um but this piece also if you're interested in kind of water quality or watershed the amount of area of interior forest might be relevant for trying to understand something about that watershed water quality um so we're gonna do very similar now very similar process to what we just did with the lines except in this case we're gonna do it with these polygons right so again when we were dealing with the lines and they went across different watersheds or inside and outside a watershed we use the intersect to just uh focus on or create a new data file that was only the lines inside each of those watersheds we're going to do the exact same thing with these polygons because in this case what we want to know about is the total area of interior forest right before we were focused on length now we want area same process though so we'll start with our intersect now i want it to be with my interior forest and i want to call it forest in watersheds off we go okay so that didn't take too long on my computer just because uh there's actually fewer um shapes of the interior forest than there were of the segments of roads so that's why it takes a little bit less time um forest watersheds let us just change this to just that single symbol um and but you can see the difference between these things because we use an intersect remember all of those attributes get combined so that's why um we had not only the location of all of these forests but also the name the area the road length you know other stuff going on um in here in terms of uh information that came from the watersheds rather than the forest so gis decided for us to symbolize those based on you know the density of roads within each of those but that is not useful in this case so um what we want now is to repeat that process that we did with roads there is an area of square kilometers but remember that this area of square kilometers came from the watersheds right so we have to be a little careful with what we name the field um so that we summarize against the correct thing here still have our name in this case because we're in a geodatabase like length um isn't really meaningful for polygons usually but area is a thing that we we are interested in so this is the area of all of these different um individual polygons of interior forest within these segments within each of these watersheds again what are the units of this these are going to be in square meters because the projected coordinate system of mass state plane is in units of meters so in a geodatabase it's going to update it as square meters because i'm working in square kilometers today i don't want my thing in square meters i want it to be and i'm going to call this forest forest square kilometers the reason i'm doing that is because i want to differentiate it between area and square kilometers that information that was just appended to this data layer from the watersheds forest area in square kilometers and i want to make sure that i make that floating point so that i get um more than just the integers so just like we've done before we are using our geometry calculator to calculate forest area and square kilometers as area in square kilometers like it okay good so there are a million square meters thousand times a thousand it's a million square meters in a square kilometer so if we move this over six decimal places we should see this result that's just going to make it easier for me later on if i want to normalize you know area of forest relative to area of um basin that we've got the same units okay don't want this close that for now now i am back to my basins i want to join up my name of my basins to the name of in this case oops i'm a step ahead hurry step ahead first we need to summarize this right and we want this to be called forests and watersheds we'll just call it area case field is already there in terms of the name of the basin and now we want forest area and we want to add all those up again just like we did with the road length and that should give me this and now i'm good and i can do my join aim to name but the one that i want is called forest and watersheds area okay and i don't want this guy but i do want to check out my attribute table here again a bunch of gobbledygook going on over here apparently we've got at least one watershed with no interior forest in it i'm gonna add my field forests and square kilometers right i love the alias because you can write whatever you want in there we want this floating point okay and remember that the sum of our forests and kilometers is our interior forest area in square kilometers so i'm just going to copy that over should be the same looks the same all well and good and we remove those joins and now we should have a clean looking data set with our area of interior forest total road length in kilometers and our area of the watershed and square kilometers as well as our count of dams so now i can rather than looking at road length i can instead do this in terms of area in square kilometers normal or sorry forest area and square kilometers normalized by total area in square kilometers and i suspect that we're going to see that same kind of gradient in this case in reverse where you have higher density basically of forest per square kilometer in the western part of the state and lower density but a couple of watersheds looking not too bad lower density of forest in the eastern part of the state so basically the reverse of what we saw for that road density all right so there you have it in terms of polygons with other polygons right so the intersect is key when you're dealing with trying to just capture the length of roads or whatever length of lines and the area of polygons inside of some other polygon you first need to intersect it so that you've just got the stuff that's inside all right good so we are a good chunk of the the way along let us next look at this wacky layer all right so this is our topography this is uh i believe elevation in meters above sea level so going from sea level actually below sea level uh in some somewhere to uh a thousand meters or thereabouts somewhere up here in western western mass okay so raster data sets we treat a little bit differently from vector data sets right so in this case it is not going to be the geostatistics tools so we can't intersect these things we can't do a spatial join or anything like that because those only work between vector data sets now we need the tool that works vector polygon to raster so take a second see if you can remember the name of that tool while i grab a coffee all right so hopefully you thought about that for a second the tool that we're looking for is something that's going to give us statistics or information about a raster grid that is overlapping with polygons and that is called our zonal statistics tool um we want zonal statistics as a table because we do want the output of horizontal statistics and i don't know what the difference between image analyst and spatial analyst is but spatial analysis is what we usually use for raster so let's go with that okay so our input feature zone data so this says raster but that's kind of confusing so like you could you could use a raster grid i guess too if you wanted here but generally your input is the polygons so that's our basin study area our zone field remember and this is important so gis is helpfully suggesting the correct one but we want name to be the zone field because we want this information based on the name of each of the different watershed names right and then our input roster is our um this is a topography from the national elevation data set let's call this uh topo watersheds statistic type of all uh you know if you're not sure like what you want then all is good because you'll get everything but you also have lots of options so if you wanted the mean elevation within each of these watersheds the majority the one that happens the most often um the maximum the medium the minimum blah blah blah on down this thing so um i'm i'm kind of cool with all because then we can decide later which um which of these we actually want so let's output that table cool that didn't take too long see what it looks like so we should find still our 182 watersheds right with all of that information and then we have um so count in this case is going to be the number of pixels inside of that watershed so it's going to be proportional to area area i think here is weird probably because this unit of area um i'm guessing that the projection here is a geographic coordinate system and that yeah so it's a geographic coordinate system so any units of area are basically not actually units of area because this is square decimal degrees and again because decimal degrees are different lengths when you go east from when you go north this is not a super meaningful unit so we'll ignore those ones i don't really care how many pixels there are i know i already know the area of my basins but minimum elevation maximum elevation range of elevation mean elevation all of those things you know standard deviation all of those are pretty useful potentially useful pieces of information for understanding more about these watersheds so we've got name we've got all this lovely information that we want just like we've done now a few times before let's bring it back into our basin information based on name and i want to join to the toppo in watersheds that i just created close this and open the sky and now once again we should have a bunch of additional information at the end if you want all of this information the fastest rather than adding a dozen new fields and copying that over the faster way to do this is just to export this data sheet as you know basins whatever it is version two oops wrong what that here um because that should i'm just gonna double check what the coordinate system is i want it to be the same as my basins which should be my mass state plane it would have done that automatically i just wanted to double check this one this basins version 2 should automatically come in now with all of those additional fields attached to them as long as you remember that min max range blah blah blah is topography not you know because max doesn't necessarily mean anything but you can always you know like change the alias on um change the alias on some of these so you could make it men elevation etc on down here so that you don't have to excuse me so you don't have to guess later on about what these things actually are as long as we make all the fields wide enough so we can even see that cool um and then of course you know like if if i think these are not useful i'm just gonna start deleting them um zone code not particularly useful i don't need name twice there is a um there's a delete field tool someplace if you have you know like a dozen things that you want to delete and you don't want to do it all by hand then you could just say like this this this get rid of all of them um so that's what i would do to clean this up i'm actually going to go back to my original i'm just going to remove this and pretend in this case that all that i wanted was a mean elevation so that i'm going to repeat the process that i did before i always like to put units in my names because then i never have to wonder all right so i'm just gonna field calculate mean over to this guy so that should be oops i see that i made that same mistake of accepting gis default again and chose the long integer rather than the floating point um i can i'll fix it not be lazy here and actually make it correct mean okay that's better so if the if this was a case where that was the only one that i wanted then i'm just gonna do what i did before um remove all of my joins and now i have my average elevation of each of these um basins so let's go back to our symbology and now in this case i can make it average elevation that's not something where normalization makes sense right it's just like you just want to know is this high or is it low you don't care about the the area associated with it um so once again just like our interior forest area we've got a lot higher elevation as we go west into the berkshires getting towards lower coastal plain as we get out towards um towards the atlantic ocean which makes sense cool so that was it those are the four pieces right the reminder of how to get point counts into polygons how to get line length and polygon area into polygons and now how to get basic statistics of a raster layer into those polygons we did all of these for kind of a simple case scenario where each of these you know we only had one type of line just the roads we only had one type of polygon just the interior forest and one type of so the raster layer was a continuous one you know low to high as opposed to categorical dealing with um categorical data is sort of another step on top of this that's what you're going to do for the lab report is deal with land cover data as both polygon land cover data and raster land cover data so we'll go through in the next um we'll do another short tutorial for dealing with um for what to do when you've got not just a single interior forest but all the types of land cover and you're trying to understand information about those within the polygons and same thing if you have a raster land cover data set so more on that next time thanks guys see you in class
Info
Channel: Bethany Bradley
Views: 539
Rating: 5 out of 5
Keywords:
Id: PIRIgdDRcR8
Channel Id: undefined
Length: 51min 5sec (3065 seconds)
Published: Mon Feb 15 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.