Mapping with R -- lat/long plotting ; Choropleths with sf or tidycensus

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so from here what I want you to do is like right click on this link this is code for hands-on workshop fall 2018 and just open that in a new window you should see a github page like this and if you use it's alright like this if you use github you'll probably know how to pull this repo down if you don't use github that much just click on this green button you'll get a drop-down menu and then you can click download zip and it will download all of the code that we're going to use today you can put it anywhere you want and you can open that up as a project in your art studio as you see fit but the reason why we're the main reason we're going to do that is because once you do that and I'll do that with you I want to run this patch package program first now as I do that let me just note that what that's going to do is ensure that you have all the libraries that we're gonna use today and if there's something some concern that this should not cause any problems on your computer but if you have some concern about the versions of the packages you're running maybe you don't want to run this although I would find that really strange I can't imagine why that would be the case but I feel like I'm kind of honorbound or at least mentioned to you that this kind of updates some packages possibly so from this expanded view of what we just download it you can double click on that map - fall 2018 which is the project file the our project file and even if you're on a Windows or a PC that should wear automatic it should launch you straight into a project in our studio which will allow you to have I don't know if you use projects in our studio but the nice thing about projects in our studio is it allows you to reference all of your data and your scripts from a relative position in other words you don't have to have these really long working directory paths and you don't have to use the set WD function which to be fair is not particularly reproducible so that's one of the reasons why we promote it so I'm going to launch you're going to open this 0-0 path and packaged our D file and the easiest thing to do from this point is to just click the green arrow right here in the first one that says the following packages enable code in the work station in the workshop so I'm gonna run that while we're talking it's gonna do its thing and then I'm going to run a different one on my machine because my machine makes some other patches since we're in the lab all right so the first student who walked in I'm sorry I didn't catch your name I said by law of probabilities I'm betting that you're a grad student in the Nicklaus School and he agreed that he was so we're a little smaller group than I was expecting but I'm gonna still bet that but there's another grad student in the Nicholas School in this room is there no there we go three of them and you're either public policy or medicine or engineering nope wrong on all three oh well can't do so well with probabilities when you only have it and a equals four grad student undergrad oh great and what's your project just out of curiosity super oh yeah we um there's an interesting visualization in our lab that have you ever you probably have you seen it yeah yeah it's really kind of cool so that visualization just to bring everybody else up to speed takes satellite images and has it before and after of mountaintop removal and how that's changed the landscape and it's really quite fascinating right so I'm gonna launch from here I'm gonna go back to this page and I'm going to just take a look at the slides now I will say up front that I am actually much more of an AR enthusiast than I am I'm definitely not a geographer I know how to make a lot of maps and do some geospatial analysis in R but if your question gets moderately advanced that's when marks going to show back up mark is our GIS specialist so there's plenty about GIS and I don't know but I should not be a problem in any way in this class is mostly introductory but if we need to move farther along in a particular question of yours that's when I would encourage you to either make an appointment or come see me in my open office hours and if I didn't say it I usually work on Wednesdays from 1:00 to 3:00 in the open in the lab so this is part of the our fun series we do workshops on other things and our as well as other topics but primarily today we're going to cover mapping and it's really handy to have some sense of what how the tidy verse works and how ggplot works in order to take this workshop if you don't it's no big deal you can look in these background resources and sort of catch yourself up later one of my other colleagues one of the other GIS specialists true Keener was really keen on calling this mapping as opposed to GIS because he had a strong sense that we weren't actually doing any particular analysis and I'm well I think that's debatable and certainly light on the analysis but the point being the real point of this slide and Andrews comment is that to some degree doing your spatial analysis and R is still relatively new the behemoth application in the mapping GIS spaces ArcGIS which is a proprietary tool and it's free on campus but it's not going to be free when you get out into the world there is an open source sort of cousin to that called QGIS that a lot of people use and you may find that there are lots of people more people using those tools than are for mapping but and if you're really on the lighter side tableau probably or if you're programmer you could also use Python but we're gonna focus on are specifically these three packages in bold leaflet tidy census and SF but there are a couple other packages listed there that we're gonna end up using there are some more advanced packages for mapping that we are not going to use but it might be useful to you to reference these slides and note that these other packages exist the one that jumps out at me is raster I don't personally do any raster analysis but if you are one of those people you're probably going to end up using that package so we're going to jump right in we're going to do three things we're going to make an XY plot where we're just plotting latitude and longitude on a grid we're going to use a tool called tiny census to create a choropleth and then an ax choropleth is basically a thematic map and then we're going to go beyond Targa census to do some more fine-grained control of our thematic mapping alright so I hope this is not too basic for you but I never know who's showing up so this particular map is what we're going to create next in our case it really consists of just two layers all right the layer that we're adding on top and all the little blue markers which are the XY coordinates the latitude and longitude it has a base layer which is really a conglomerate layer right there's a layer if you look at that base layer closely you can see that there's a road layer there's a labels layer for example where you get Durham there's clearly a water layer where you get waterways there appears to be some land characteristics and the green I don't know what the grains but there's clearly a whole conglomeration of layers that's packaged up as one single base layer and so that's really nice it's a real convenience but you don't always get that base layer it depends on how you end up visualizing your map but I wanted to point that out because one of the sort of central ideas not only of GIS visualizations but visualization visualizations in general is that you have these layers that you build up on top of each other so we're going to end up doing that if anybody here has used ggplot you're familiar with the idea of layering your images okay all right so jumping right in let's go to our downloaded files in our studio and I need to run one more package and while that's running we're going to open up the o1 file called oh one geo-referenced RMD and i will just as soon as i can i will expand the font on this so it's easier for you to see appearance now it's always kind of a luck of the draw on fonts when you're putting them through the projector so some audience participation will be helpful there is that font large enough for you okay and I can change the country you know I can make it a dark screen if the contrast is not right but just let me know if it's difficult to see we can change it so the particular sort of tidy verse coding style that we're using here is called lyric coding works really well with this R markdown method I'm not leveraging all of that but there's a workshop on our markdown and literate coding if you want to learn more well what it allows you to do is have these things that are called code chunks which are surrounded by pros where you provide more literate commentary on what you're doing so the first thing we're going to do if we just execute this first code chunk is just make sure that we have the two libraries we need tidy verse and leaflet and then we're going to go down here and we're going to load some data from the data directory and this data that we're loading in is all the Starbucks locations in the US in 2012 then down here at line 26 we're going to run this exit and execute this code chunk and we can take a quick look at this data frame I'm just gonna pop this out to make it even yet a little bit easier to see you don't have to do this but here's our data frame of Starbucks information and the thing that we're really interested in actually is the latitude and longitude and the name write the name vector is right there and it's just the name of each store and if we scroll over this data set actually it's sort of almost an embarrassment of riches when it comes to being able to geolocate because under location we have not only a full street address well we have latitude and longitude there and if you keep on going I still have the street and everything broken out into its own fields and then it has coordinates latitude and longitude and then it actually has latitude and longitude broken out so this almost never happens in real life right usually your data is far harder to pull together in it just so happens that this works really well for a workshop and if you need help wrangling your data we're happy to help you but this is the data that we're going to really want to grab is this latitude and longitude columns so that's the Starbucks NC that's this data frame right here the first thing we're going to do is send it to leaflet these pipes am Tiger style and then I'm gonna stop right here just run these first three lines which I'm going to do by highlighting and hitting control enter and what you see here really is a very general base map right if i zoom in this is the leaflet base map and as i zoom in it should be no surprise I'm getting more detailed right so I can also set the view and set the zoom by adding the fourth line so I'm going to control enter on that and so the view weighs if you look at the help let me pop this back in for just a second set the highlights set view and online machine I'm going to hit f1 I'm sure there is a Mac equivalent to that but I don't know what it is but hitting f1 opens the documentation for this particular library and set view will tell you that there's a number of ways that you can set the view alright so that's how you can learn more but back to our code oops and they just run these four four again control-enter so we're setting our latitude longitude to make a a banking box that will define what the window is and we're also setting the zoom and then the very last thing we're going to do taking from this data frame up here we're simply identifying under add markers and again in the f1 help if you look over here and markers there are a ton of different ways of markers and you can add tiles you can add circle markers you can add circles you can add rectangles the list goes on so it's good to know about all of those we're just going to add basic markers where we identify the latitude with the latitude and our data frame and the longitude with the longitude and our data frame and I'm switching my syntax here just so you can see that there are different ways to do it I'm also taking my pop-up and identifying that in my data frame as the vector name right so just to bring that home a little bit here's the vector name in the Starbucks MC and that pop-up function does probably what you imagined so I'm going to run this one more time and so I've got my latitude and longitude markers actually automatically set and if I click on any one of them that's where I get my pop-up and of course that pop-up gives you a lot of control over what you put in there but it's stuff that you would have in your data frame to begin with all right all right so that's step one and what I'd like to do now is take five minutes or less we can decide as a group go back to your files tab and open this file that says exercise 1xy leaflet ORD i covered it so briefly that i forgot that i didn't say more about it so it's really just a shortcut it's the same as saying Starbucks NC dollar and you could do either one right like I could I instead of the tilde I could put this here and instead of this I can put the tilde and that will still work and I I just put that there so you are aware that there are different ways it should I think it won't work and I which has been up has something I don't know if that's because it's leaflet or what I mean I just don't know because all it's doing is referring back that you're piping in there yes it should no let's blame it on leaflet because you know to have an undue amount of respect for the whole tiny verse world I'm gonna say that a tiny verse develop this that you wouldn't have to do that oh that's a great question we're going to cover that a little bit more but it's this add tiles feature and there are ways to later the workshop will manipulate that base layer to choose a couple different ones and you can actually it's also worth noting if we just comment out that one line then you know we're still plotting our X Y but the context is harder to understand in this case but going back to that concept of layering you know what if we didn't want that whole conglomerate base layer we just wanted rivers so we'd have to go out and find a river layer and the two of those might make sense together probably not Starbucks and rivers maybe highways and burglaries highways and Starbucks would make more sense there's all kinds of bad scientific assumptions you can make about the locations of Starbucks and the the frequency of locations of Starbucks right like you might assume that farmers don't like to drink coffee if you project it right that would be wrong I think farmers probably do drink a lot of coffee they don't drink it at Starbucks because there are no Starbucks and rural areas right because it has to have a certain population size to the right I don't know did we do pop we can do that we could we can do the Corp let population oddly like if you go back up here yeah there's a I like the fact that the very first Starbucks listed in this data frame is in Elizabeth City I don't know if anybody's ever been to Elizabeth City but can I have two it's a nice town it's a small town you know okay so I think we probably spent enough time on this but I'm happy to let you guys drive so first question does anybody want more time all right we're gonna move on do you now the answer file once again actually answers everything that we just did but I have different classes that want to do things differently so I'm asked me to go over the answers some say let's move on so show of hands how many people want to go over the answers okay perfect so we're gonna go on to corpus all right there are lots of kinds of maps that one could make and we're only gonna cover the XY plots and the corpus but if you wanted to get into deeper different kinds of maps can talk with mark Andrew and myself later in case you don't know what a choropleth is the idea behind a choropleth is that you're shading a region or an area with a variable right so there are some problems with coroplast this particular core flat shows wages average wages from the Bureau of Labor Statistics over regions and the regions in this case are states so the darkest blue is the highest wage one of the problems typically with core plus is that the regional areas are not all the same so it doesn't necessarily represent all the same population right so you can often have to sort of dig in a little deeper to really understand what a choropleth is trying to tell you and as with all visualizations and all statistics you can lie blatantly with these and create all kinds of misleading on the other hand on the positive side choropleth can tell you a lot in this case it tells you I forget exactly this is wages for a particular occupation I'm pretty certain this is actually come to think of it I'm pretty certain this is wages for like oh like overdose clinic nurses and I don't know what exactly one is to interpret by New Jersey in this case but it is clear that New Jersey and California seem to have the highest wages in those cases the lowest wage places are listed in yellow and just for another point of explanation my visualization colleague Eric Monson tells me that and I had actually reversed the scale so that the colors would show yellow to dark blue but Issa did tell me that typically in visualization you want the brightest color to represent the highest number I reversed it because I thought it actually was easier to interpret the other way right so there's some responsibility on the part of the person reading the visualization to figure out what we're trying to say but just by terms of standards so that's a core plan now what we're gonna do now is we're gonna use tidy Census US Census so it's a our package to learn a little bit more about corpus and learn a little bit more about the US Census US Census sort of two things that you need to know is that the census really is at least two things there's the decennial census that's taken every ten years starting so I don't know 1790 maybe even earlier and that's in the US Constitution and then there's the ACS census which is relatively new and going on at best fifteen years maybe not even that long and it's sampled data the other thing you need to know about census job working with census data and mapping is this concept of census geography so let's cover both of those this is the Samuel senses like I said it's done every 10 years it's 100 percent everyone in the country I believe it's everyone in the country I don't know how we handle homeless people but there's an attempt to count everybody it's used for creating congressional districts it's used for other kinds of policy points but it's just counting things it's not particularly interesting from a social character that's where the ACS comes in so there's just a smattering of people in any given year who have to answer they don't have to but are fast to answer the American Community Survey and then what they do is they ask these same questions over a five year run and they do some statistical magic and try to prevent you from identifying any particular individual and then release some trend information about social characteristics of people now I'm guessing that nobody other than mark and I remember but back in 2000 when they didn't have the American Community Survey they used to have this thing called a long-form so if anybody's ever filled out a census long form you may have filled out you'd have a better sense of how the census works is that right now I think we might have gotten one too right so there are those two different types of senses and it can be confusing because sometimes people want to know a lot of social characteristic information and it simply does not exist in the decennial census is you have to know where you're headed the other thing you need to know is you need to know this concept of census geography right if some of these are pretty straightforward you can get total number of people who live in the nation there's a idea of region there's my vision States is pretty easy the idea of how many people live in each state counties like we are not only in Durham City we're also at Durham County our neighbor to the west Chapel Hill and Hillsboro are in Orange County right so you can get all that kind of information this is where I think it gets a little more confusing but also interesting let's start from the bottom a census block it's a little bit like a city block all right I say a little bit like because that really only holds true in the case of urban areas they have census blocks in rural areas as well but they're no sit there no city blocks in rural areas so it doesn't make sense what if you think of it this way it's a it's the smallest unit by which the census counts so that's a citywide block groups are a little bit like a neighborhood it's a collection of blocks and census tracts are a little bit like a region of the city there's a collection of walkers so you have to identify a variable variables either going to be in the ACS or it's going to be in the decennial census and then you have to identify the geography of the reporting unit that and you'll notice all these other colors we can kind of ignore but they're different sort of shapes and sizes that you can report out on and they're the detail upon which you can report out depends a lot on that visualization but most of the time we're going to stick on them or not off all the time today we're going to stick on a very high level like total number of people in States it's pretty easy just by way of information this is an example of regions so the regions of blue up there the regions thinner this is a nominal notion that doesn't doesn't have any real importance but there's both regions and the colors and state boundaries that's the black and white alright so I mentioned that there's this other issue that after you identify the geography or before you also need to identify a census variable how many people live in X region how many you can actually get all kinds of information how many housing units or how many householders who have an education level of beyond high school all that kind of stuff now to be fair there are literally thousands of census variables so I'm going to open this bottom one in a new tab to give you an idea and we're not going to go into this in great detail but there's three different ways where you can read more about how to identify your census variables or you can send us an email mark in particular has an encyclopedic knowledge of the census board I mean he doesn't like it when I say that but he knows more about that more about the census than anybody else in the library and then a lot of other people and it just so happens that just you know sort of goes without saying just because you think there's a sense is variable for something doesn't mean there is so sometimes you have to find these proxies for things but just to bring that sense that idea home who I said there are literally thousands of census variables this is a list of ACS variables American Community Survey variables for 2015 and you can see you know this is an exhaustive list and so far I probably not done 10% right so it can be a challenge to figure out the census variable you want but I picked out a few for you and we're gonna do that now in oops I want to do a little bit too far I wanted to do this GG map GeoRef let's let's go ahead and go to the choropleth and then we'll come back so we're going to open this file Oh to choropleth RMD and let's first run this first code chunk at line 9 oh no I don't have tidy census in here I thought I I know I have this problem with my yeah I'm not sure I'm not sure what I did wrong let me yeah that's right there maybe I just needed to reinstall it I mean I mean what I meant to say was maybe I needed to restart it so I'm gonna run this again real quickly again this is a problem that it's only exists in this lab you probably should not have this problem on your on your workstation I'm going to go over here it's a session restart bar just one more time go over here to this and there we go was installed by a different version and needs to be alright one more time what in the world is happening names baseload felt I think it's packaged rap Durer's was installed by different version that needs to be reinstalled please with this version I just did that all right let's try reinstalling tiny senses oh my gosh I don't know what to do now I've never had this crowd before I'll tell you what I'm going to do I'm going to shut this down and open it back up desktops files with this barber okay I know what to do here let's see so Murphy's Law of doing live coding examples is that something's gonna break Murphy his stroke but I am NOT gonna let Murphy defeat that workshop you know what's really strange is I'm having a similar problem right here but with a different library sorry what about it yeah I don't know what that means you could reinstall so what it's downloading the it's something to do with the version right here like the units like one of its dependents yeah did you try I was here just like install.packages yes it seemed to install it really try to do library yeah it's fast like it successfully its door like there's some version issue working on I'm sure that we're sadly not going to be able to solve all the installation problems at this moment but I am happy to work with you today in the open office hours just to make sure let me through yeah this is what let me go ahead and make this larger okay so run all the libraries and then the next thing that we're doing oh right I need that so you need to this is where you need to run your census key if you brought it with you all right there are a number of ways to run the to identify your census key I'm going to turn this back into a code chunk by putting the squirrely bracket the curly braces you guys see what I'm saying yeah but you can't see I'm saying sorry there we go so this this block right here a second ago let me see if I can return it to what it was look like this right if you will put braces around that just highlight it and hit the I don't want to say squiggly brackets you know what I mean curly braces that's what the technically they're called curly braces just put it right there remove the space that'll make it an executable code chunk and then what you're going to do in this place where it says your API key goes here that's the thing that you would have brought with you and it's like a whole series of crazy letters and numbers go ahead and put that in without that and the the API key by the way is free without that this won't work the other thing you might want to do you can say put a comma after this and say install again this is all in the documentation install equals true and if you do that you will install the key into your environment so you never have to do it again since this is being sorry specifically for the census time yeah so I've already done this and on this machine and I don't want it Mikey to be exposed into the recording so I'm just going to move on but if you you're doing that we're going to go ahead and the bottom rule to use that in just a second so the first thing we're going to do actually were to use it right now so we're going to use this get HTS function and this is where some of the stuff we've just talked about comes into account right County is the census geography level so we're identifying counties and because we don't actually want county level data for the entire US we just want to record Carolina I did a function and C and in that case a lot of times people are interested in smaller chunks of data you can very easily gather the chunk that's more usable right a variable name I pulled out for you and I at one point I had that identified as to what that variable is I think it's population and then the other thing was true what it will pull down is what's called shape file and shape files if you're used to using in particular ESRI products are those polygon shapes that you can then make core plus out of right so for example the state's if you don't say geometry true all you're getting is just the statistical variables just fine but if you want to make a map out of it you have to say geometry equals true so when I run that the last thing I'm doing and this is sort of a little trick is if you then view it with this tiny burst function called as Tibble if you I mean you can view NC pop without he has typical function but it's just a little bit easier to read it throws it into this page view and what we're looking at then is every county name the Geo ID which we're not going to use variable name which we can identify to those so that's just redundant that's this right here and then the estimate is the variable that came back so it should be population if you spend a little more time with tiny census you can actually rename that variable name as you bring it down so you know has some value the MOA is the margin of error which statistically speaking we should always be working with the margin of error but we're not today and then this very last thing over here that's the geometry this is our whole collection of shape files as a very weird value says that basically what it's telling us is then it's an s3 our data object which quite honestly we don't really care you know it's nice that it handle doesn't all of this for us and that's all we need to know it's just a whole collection of little lines that makes a polygon in this case a state shape for our County shape I'm sorry right so if you've ever worked with the older way of doing mapping and R which was called SP which I shouldn't mention like one really good reason to use SF instead of SP which we'll talk about is that it was developed by the same person and SP kind of got old and it wasn't as useful in his opinion so I rewrote it he called an SF so there may be some reasons why you run across SP objects and you still need to but you shouldn't really worry about SF being the new kid on the block because it was designed to work with the tidy verse approach and it was designed to be more modern and it probably does everything that you need alright so moving on here's what we're gonna do we're gonna make a court left and we're gonna start and what we're reading into the color cocktail function is this estimate vector right so just to roll back up to our data frame that's this vector right here and what we're trying to do I don't know if I have an example yeah sure I want this collection of colors so before I do any mapping at all I need to sort of leverage this color quantiles function to know how to create the cuts so all that's going on here in color quantile is I'm saying for the domain estimate it's in the data frame in C pop cutting into ten bins and using another library called fear this which gives me those nice yellows to greens that's all it does I mean you can manually set the colors to anything you want but a lot of people are familiar with color Brewer and a lot of people may also be familiar with beardless they do essentially the same thing I like various a little better but the value of both beard are there this in color blue that generally tend to be more friendly for colorblind people and very distant particular is also really good for just printing straight to black and white even though they're very brilliant colors it turns out if you print to a black and white printer you can still distinguish the shades very well right so that's all that's going on there as we're setting up this function called matte palette to be able to print those colors and that happens down here so going back to our data frame and see pop the next thing I'm going to do is I'm going to transform to a coordinate reference system by using we're gonna talk about what do various examples of setting the CRS projection in this case this is the way you set it to this projection so I'm gonna give you my very rudimentary quick explanation of projections they don't you don't always have to set them but you set projections because we're looking at two-dimensional flat representations of a globe which is round so geometrically speaking the reason why Canada looks so huge not only is it huge but the there's a lot of space between the two make that round surface get up to the pole and when you flatten that out it sort of elongates Canada disproportionately longer than it really is that's my explanation for why you need projection it's really quite fascinating but I'm at the same time I don't think that that's really the point of this workshop so just know that there are different kinds of projections you may want to choose depending on where you're mapping so the projection you would set for North Carolina would essentially be entirely different than the projection you might use representing something in Australia okay you can look up which projection to use and where you want all right so if we just start with these three lines we're gonna get almost nothing we have a projection and we have no base map so the next thing that's going on here and this answers your question earlier about setting the base map earlier we were setting a base map with add tiles so I could just run this function I could just add those three and I'll end up with this huge base map that we're pretty familiar with because we've looked into it already has a lot of useful layers but we might not want that basement happen so in this case I'm using a base map called stamen toner lions now important question is how do you know which base map to use so if you get the help on add provider tiles I'm going to hit f1 here and go back down here under provider there are two very nice links then essentially you can follow to find what base maps are easily available to you I'm going to click on one of them and you can just sort of scroll your way through each one of these options until you find one that you like it's a little more complicated than that because most of these base maps that are provided through here are free but some of them are only free if you register for them and it has to do with how API is sort of keep track of who's using their services since they're providing you a free service they want to know a little bit about who you are right and they want to make sure that you don't overwhelm her system but there's several base maps on here that are completely free and don't require registration and the largest collection of those that seems to be used broadly are the statement versions so this is a statement toner and there's a nice one called Sam and toner background their statement water color not sure what you would use that for but it's pretty your statement terrain you can figure that out somewhere down to this list I found one called statement toner lines and I kind of like it so I'm using it so then the last thing I'm going to do is I'm going to fill my polygon so I'm going to use the add polygons function fill my polygons with that setup quantiles in the estimate range which I did it's not on the screen but I did it right here in line 55 all right so I'm coding point Matt pallet down and I'm identifying it right here and again using estimate just being redundant the other thing that's going on here just so you know like we use pop up to begin with we're with the NCS Starbucks thing we use we put in the pop up just the name vector which was very easy to use the name vector was perfectly easy to read in this particular data set called NZ pop the name vector has like too much information in it and so very much you ignore this but this is just a regular expression which is defining which part of that vector I'm pulling out you can process your vectors any way you want so if that's confusing to you you can ignore it but essentially what I'm saying is I don't want my pop-up to say elements in common or Jeremiah I just wanted to say thanks County so that's all that's happening there depending on how you processed your vector you can you can do that anywhere you want all right so I'm just going to run these first bits here leaving off the legend for a second so this is my stainer statement toner light background as a face man and this is my car flat and again the colors are set up based on the estimate using the dearness library now we also talked about layers so you can see that you can add a legend highly recommended by the way like hard to know what you're looking at without a legend and the nice thing about the leaflet map is it will define this aspect of a legend for you as well okay now let's see if there's anything else oh yeah yeah we can also do let's just keep on going we read in the Starbucks data I just want you to know that you can do similar stuff before so if I run this what's going on here is that with my put my Starbucks into the counties as a layer the XY plot over top of the population core plant so that's market that goes back to that thing that you referenced earlier that it sort of looks like people in rural areas don't drink Starbucks or it looks like corporate businesses work hard to be in population centers it depends on how you look you know how you want to tell the story and what's more accurate right so that's probably enough for now let's go ahead and hit exercise show you one thing and then we're going to move back to the third exercise well there's this package called GG math which is definitely on the mapping slat side of not specifically being spatial analysis and that it at least from my opinion now you can extend GT map a long way and if you know ggplot it will help you use GG map without too much effort in that a lot of layering and the connection the connector but the plus symbol is the same and you can definitely use GG map to make some interesting layers but what I think it's exceptionally good at is just making sort of quick simple static Maps right so since that is actually a use case that sometimes shows up because people need to either point them in a publication or make a poster we're just going to cover it real quickly I'm going to read in the Starbucks data like I did before limit down to North Carolina and what you'll note I want you to note actually going to go back over to to the to this website under plot coordinates static map to show you this is what people were doing basically you could say something like get Matt Durham North Carolina and set the zoom level and then under map type the easiest thing to do was to use one of the known variables right so Turay exciting I wrote that before hybrid and then something like this and then you could overlay eat maps or XY plots or whatever on top of that it was a very good way to get a quick static map which I think of for use in some kind of publication or printed thing right a lot of what we've been doing is we've been using leaflet which has a very nice interactive layer to it but in order to share that with others you need to send them an HTML file which is not art or mounted on the web which is moderately more complicated but this is sort of a different use case now sadly Google completely rewrote their API about eight months ago and so in order to do that as you can still do it but you have to go through some of Google's documentation on api's and while that is not hard it's probably more tedious than it is trivial like it's just you just got to read a bunch of documentation you have to put in a credit card number that they never end up using they basically give you this huge amount of free credits it's all just to make sure that you don't run amuck so you can still do this if you want a Google map but the easiest thing to do now is just to switch over to something like stamen vamps so since we're talking about publications we're gonna use this I'm going to show you the stamen map example the nice thing about stamen maps really in my mind is they're exceptionally good contrast for black and white all right so what you're doing here in stamen and what is different is that you have to identify a bounding box by latitude one can check the direct the documentation under under location for getting that but prior to that if you did have the Google API set up you could just put in something like Burlington North Carolina and would give you back a natural bounding box for Burlington you could set the zoom put in this case you have to just be a little bit more prescriptive and identify your bounding box and then the other things that's happening is we're setting the source equal to statement there are several other sources and the map type equal to toner now once again I just want to highlight this if you look in the documentation on that's on board forget math and you go down here to source it will tell you which sources you can use and if you go down the map type it will tell you which map types work for which sources you can't really make those variables up but they're easy to gather and then in the last example we even saw an example of how you can extend that out with additional base matters right so everything I'm going to do here first I'm going to identify the map box and I'm just going to put that vector in here so it's a little easier to read I'm gonna put all that from get map into an object and then the main command is just Gigi map and I'm identifying this object is this object just to make it easy to see and when I end up there with there is that statement Matt but you just saw a minute ago with the toner by contrast black and white and an example of how you can extend it with the GG map ggplot like syntax is I'm adding this Jian point layer on top of that so I can look at my Starbucks example again and I have control over what the what the points look like and how big they are in this case they're red and then I can also just like you can't with off with all ggplot stuff I can affect the theme a bit if I run that whole program again I'm going to take out I think I'm taking out the I'm taking out the latitude/longitude then I'm taking out these X&Y labels and if you want to learn more about ggplot you can right there is you know something that you could imagine would more easily fit into a publication that you're submitting all right but moving along let's pick up somatic mapping and before I get too far along on this back here to my slide so coroplast are a form of thematic mapping or anything I'll just slide that I really wanted some of this we've already talked about SF is a is a present is a successor and it's very easy to cook of course the data objects back and forth and this link will actually tell you how to do that there's some nice vignettes and another thing is actually pretty handy is that you can read and write shape files in the world with these two functions so if you're getting a hold of shape files in a way different from what we just did right we just use tiny senses to pull them down and manage all that for us but people get their shape files from a lot of sources one is to ask mark or send a note to ask data we're going to use a package here in a second called tigress which is a cousin to tiny census but doesn't require an API key doesn't do as much as tiny census but what it does do is it pulls down shape files if you had to share those shape files with somebody else who wasn't doing the same computational wasn't didn't have the same computational style as you like maybe they're working in Python maybe they're working in ArcGIS just need to share the shape files you might do something like this where you write out your shape files and then share them and then they can do their own analysis but basically in terms of workflow for we're gonna get some data from somewhere else in this case Bureau of Labor Statistics data has nothing to do with the census it's using the function append data which is part of the team and library but you can actually use the deep liar library left join does the same thing and that's what we're going to do in our example and then lastly we're going to visualize we're gonna focus on the ggplot visualization with the function G MSF which works naturally with the SF library but there are some others here that are actually far simpler to use but you have a little less control over the final product so let's just we just see if I can show you what I mean by far simpler views and then we'll quickly move on all right so over here at thematic mapping T map right so if you have man if you rent wrangle all your data and get what you need you can actually just send a file that you just send it to qtm and you can see that it will automatically develop your color ramp it will develop their labels in this case it's doing a haul of the US and because the US territories actually expand a huge amount of the globe it's not a particularly attractive map so of course you have to do a little bit more wrangling like in this case we're filtering out we're saying don't want region 9 with those census regions earlier and we get here's a good example of a perfectly fine map that's unprotected so TM shape which is a team at function did all this with minimal effort it came up with the grapes and it came up with the color scheme and it's very readable the only thing I don't like about this is the projection right this flat line across the top is a dead giveaway that it's not projected very well because that's not what the United States really looks like and projection is a big thing in GIS but this wants you to note that there's a way to set your transforms using a different projection that's the syntax for and there you get a more natural shape which people prefer seem to prefer alright so we're going to move on to and close this and course this and I want zero three two theme Attic mapping run all these libraries and the next function we're going to run is from tigress that's this right here and it is using the state's function to pull down all the US state polygon shapes the shape files and by default now we're calling it class SF if you don't have SF working you could put a class SP in there and it might work so now we have this shape file usgo if we look at that real quickly these kind of shape files that and in this case come from the Census are full of a whole lot of metadata that we don't particularly care about like the region and the division and the state code and all that even the postal service state code what we do care about is this this thing right here the shape just refer to as the geometry so that's the reason why we did this so we can get the shape and then over here I pulled down some BLS data that I described above how I grab that and basically end up with a two variable data frame it has area name which we're going to have to blend a little bit because we don't want these codes and it has something else in there annual mean wages so if we keep wrangling that we end up with renaming some stuff fixing the names putting in a date a friend call pls join and having a look this is where we're doing our left joint right so what we're joining is GS usgo which is has our shapefiles and we're joining it to BLS wage right and we're joining it by name equals state so I think in u.s. geo if we look at it here's name right here this is by the way I always hate this like the worst way to do a join us is alpha numerically like if you could if you could join on a code you'd be way better on the numeric code you be way better off but in this case we didn't have it and we're gonna join that to be less wage and so this is where state name equals the name ESG and what we end up with is a slightly bigger table that has and all the only reason why we did this at all is so that we could have this geometry in the right spot with this wage number filter out those regions that we don't want to show up in our visualization and then here's the here's the code that actually allows us to visualize it with ggplot right so everything we did up until that point was just data wrangling between those states now we're going to DG plot where we're filling and color color and fill our missing fill is the internal part of the polygon and color is going to be the border around it we're sending it to the same variable using GM SF sending the projection once again and in this case creating our color ramp with beer this and this will generate hopefully a pretty nice map to waiting on there it is so since it is at ggplot there's a lot more you could do to get rid of these scale and the legend and all those things can be managed and if you look deeper into the documentation I'm sharing you're welcome to manage that and you know the nice thing about doing all this gnar is that you have a lot of fine-grained control the downside of doing it in R is that you have to attend to the fine-grained control otherwise you're not going to get a map so and that's that's the cases where like I don't if you don't really want to attend to it use something like T map and the qtm function that I just briefly went over there are lots of ways to make Cora plus these are just two examples so given that I'm gonna suggest now that you so here's the really freeform part right the last exercise exercise 3 doesn't actually have any answers so I'm gonna invite you to do those while I'm around if you've had enough give me some feedback on the post-its and put them up on the door as you're on your way out if you have specific questions I can try and answer them now but I usually say to people is these workshops are good for giving a broad overview but really horrible for a particular project and if you're really going to learn this stuff you need to engage with a particular project so please come see me Wednesdays 1 2 3 set up an appointment go see Marc we are more than happy to make this relevant to your project but in a workshop setting is often difficult to do but I appreciate your attention I appreciate you coming and that is the end of the workshop so thanks so much
Info
Channel: John Little
Views: 3,717
Rating: 4.9354839 out of 5
Keywords: mapping, gis, [R], choropleth, tidycensus, sf, geom_sf, leaflet
Id: cMNJdj8UGpY
Channel Id: undefined
Length: 67min 11sec (4031 seconds)
Published: Wed Oct 24 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.