Introduction to Spatial Statistics with Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay welcome everybody to um the introduction to spatial statistics with python um it's great to see you all unfortunately we're doing this virtually maybe it's fortunate for many of you because you don't have to come on campus you don't have to find me in whatever room i'm teaching i personally miss the in-person ambiance so uh it's not the greatest environment for me but you know what in the past year this this has actually been working really well so i um i welcome you all to the workshop and i'm going to start sharing my screen and the moment i do that everything moves around [Music] all right so um once again welcome my name is yo i work for the office of advanced research computing at ucla i'm also on faculty in the urban planning department and the digital humanities and i teach spatial data science and gis for predominantly today's workshop we're going to cover a very kind of introductory introduction to spatial statistics using python my intention for today's workshop is to you know as i mentioned i'm i'm not a statistician i'm more of a gis person than a data visualization guy what i want to do here is position you and yourself to be able to have the right tools and knowledge to advance your own research with kind of these methods that are completely in the open source community so we're going to use tools like python jupyter notebooks and all these libraries that you can uh install and download and run on your own although it's not easy but um what we have here today is we've set up an environment for you called the jupiter hub environment which is free and available to all ucla students and i've also set up something called the binder notebook which is also a web-based jupyter notebook environment that is also free for you to use okay um let me see i welcome questions even if it's during the workshop i also welcome interruptions i like to hear voices so feel free to unmute yourselves at any time if i'm going too fast too slow if i forgot something i do that a lot if i forget to share my screen or if something is just not right please mute yourself and let me know i'm gonna do my best to keep my eye on the chat um but also i i may miss something in the chat so really just don't meet yourselves and let me know okay so let's get started um the website here uh includes a link all the material by the way is in the repo so you can come back here at any time we'll be up um uh you know forever hopefully and you'll be able to come back and access it the recording will be available to you i will be sending a link to this recording hopefully within a few days and it will go directly to your email as you registered for the class so if you fall behind during the the workshop be comforted by the fact that you can always go back to the video and kind of rerun it again my case i'm gonna try based on the poll too i'm gonna try to go uh not entirely uh fast or slow so i'm gonna try to hit this medium where i'm gonna assume that you know a little bit about jupiter notebooks and a little bit about python but you're not experts in either so that's going to be my pace and hopefully that will work for the majority of you and i'll stick around afterwards for those of you who have questions okay so in order to get kick started um the bottom of the link that i sent you there's a link to the ucla jupiter hub if you can go ahead and um open or click this link if you haven't already i'm going to try to mimic your environment here you should be welcomed by this big orange button which you can click and it'll take you to a login screen if you are affiliated with ucla then you can select you know or type ucla actually in that search box and it will be the first option that shows up and this will take you to the authentication two-factor authentication page where you can enter your information here and log in okay so do the same thing that you guys are doing hopefully get my cell phone up and get that running um while i have this running if you're not i know there's about 10 of you who are not part of ucla uh use the binder link below here uh for warning this binder link takes a long time so when you click on it you're gonna have to wait literally about five minutes before it it loads all the necessary components the libraries and gets you ready to get going okay so um hopefully we are all um on the same page here and you see the first notebook um open on your browser so i'm going to pause here to make sure that we're all here at least okay and let me know if uh or raise your hand i should also mention uh ben windjump who you can also see in the participants is here and ben just a quick introduction if you don't mind sure hi everyone so i also work with you in the office for advanced research computing uh i do a variety of things um kind of splitting my time between high performance computing on the huffman two cluster and then doing a variety of education related work um specifically with jupiter so i'm i'm the administrator for the jupiter hub that you're using um and i teach you know workshops along the way to kind of related to python so i'm kind of more more oriented towards this strict python rather than spatial science um but yeah if you end up having any questions about the jupiter hub environment or want to set up jupiter and python related things for yourself you're welcome to send me an email to i'll put my email just in the chat in case you want to use it yeah you know as ucla researchers um you're all you know we are all available to you you know for consultation um for follow-up and uh you know we're really here to advance your research agenda and it's not just us we have a stats team we're a big team of kind of supposedly experts in different domains here to help students staff and faculty advance research all right so i haven't heard any uh problems which is great so i'm going to assume that we can move forward with this the environment you see here is a jupiter notebook a classic jupiter notebook environment and it scrolls up and down it's comprised of these cells that contain either markdown you know descriptive text or they contain code cells so further down um there are cells that are kind of a little grayish those are code cells and we're going to run these uh eventually i have mixed in my entire lecture or you know presentation here within the notebook so you'll be you know seeing a lot of descriptions uh mixed in with the code cells which is great right because you come back to this material you'll be able to read essentially my lecture along with running the code which makes this environment really really powerful okay so what i'm going to do is i'm going to go into slideshow mode you don't have to do this in fact i don't recommend you do this i recommend you stay in this mode and you just scroll up and down and run things along the way but i'm going to run this in slideshow mode which is one of the features of jupiter notebooks okay the content that you'll see on my screen is identical with yours it's just that i'm in this presentation mode okay so this is the table of contents um just a quick note that this is currently being recorded okay and um yeah so let's get kick started so today's lab is about spatial statistics um and i'm gonna cover a very particular method called spatial autocorrelation the outputs that you see on the screen today are things that we're gonna actually produce and uh we're going to be working with real data per se we're going to bring in data from the la data portal for a rest so rather than me providing you with a data set we're actually going to bring in data in real time and we're going to create these visualizations based on those data and we're going to look at spatial autocorrelation another way of thinking about spatial autocorrelation is kind of the lack of spatial randomness uh you know in other words are things uh statistically significant in terms of their spatial allocation if you're a statistician you may be familiar with the notion of um you know things being related to one another based on a value of a variable uh and maybe you're wondering whether something is related to another variable based on time in this time period if this happens is it similar to it happening in this other time period spatial autocorrelation is similar in that sense except that we use space if it is what happens here for a particular phenomenon a variable similar or dissimilar to it happening in a completely different place that is what we're testing here and to provide kind of statistical significance to whether or not things are trending spatially trending in particular areas okay so um uh you know one way uh to think about this is um you know the so-called first law of geography is uh everything is related to everything else but near things are more related than distant things uh it seems pretty obvious uh the graphic that i have here is actually um a historical it's one of the first instances of spatial analysis that's been recorded um and this is jon snow's study on the cholera epidemic in london in the 1800s and what jon snow did is he mapped the locations essentially clusters of deaths based on the cholera outbreak but the legend indicates that they're not only deaths but there's also the location of water pumps so it was thought that that the this kind of supposed uh you know outbreak was caused by airborne transmission kind of like uh the pandemic that we're going through when in reality it was coming through the water and the fact that all these deaths were clustered around a single water pump uh became the basis of the study that said spatially clustered around a single pump and the legend of the story goes that you know they shut down the pump and the outbreak stopped um so that's an example of kind of you know what we're gonna try to do today digitally and statistically with python okay um what are we doing today uh the methodology uh i mentioned we're gonna look at lapd data so we're gonna bring in data directly from the data portal from the police department and we're going to use census block groups as well as this kind of defining boundaries to summarize data by and we're going to spatially join these two data sets so we're going to talk about what it means to spatially join maybe you've done kind of attribute joins but we're going to do a spatial join meaning we're going to join and measure the number of arrests that happen within certain boundaries which are census block groups then we're going to normalize the data so rather than you know direct numbers of arrests we're going to normalize it by population so we're going to say how many arrests per 1000 people then we're going to conduct global spatial auto correlation using moran's eye and i'll explain this later and then we're going to also conduct a local spatial autocorrelation which is the fun part because this is where we are able to identify exactly where the locations of these clusters are at okay libraries we're going to use so we're in a python environment and python allows you to import libraries and a lot of these libraries are are you know have huge communities behind them they're very well documented um they're used by millions of people around the world and uh these are just a handful of the ones that we're going to use today pandas matplotlib geopandas i have some links sprinkled throughout the entire notebook uh for help and kind of documentations so you can you know click on any one of these links to learn more about you know how you know visualization in pandas how does it work it works in correlation in partnership with matplotlib which is another python library and it allows you to produce plots that look like these very fancy colorful line plots bar plots and what's great about the documentation of all these libraries is that you know they include a lot of sample code that you can easily copy and paste in your jupiter notebook environment and produce the um the charts that are here and then it just becomes about you then saying oh if i can do this with the sample data i wonder if i can do it with my own data okay so that's panda's matplotlib uh geopandas as the word indicates is geos it's geographic it's the it kind of leverages on the pandas library and adds a lot of spatial uh functions um to python coding it's for a gis guy like me it's it's really awesome to be able to create plots that are map based with you know single lines of code so if you have tabular data that looks like this and it actually comes with a geometry it allows you to then you know build plots that are geographic and spatial and visual in nature and then you can color the different elements on your geographic visualization based on values in specific variables that might be part of the tabular data works with polygons it works with points and there's a whole bunch of really nifty things that you can do with geopandas and we're going to cover a little bit of that as well um other in the statistical side of things we're going to use two libraries um one is called uh esda this is uh exploratory spatial data analysis this is a sub package of of a bigger kind of statistical package called pi cell which is python spatial analysis library and it allows us to do some of the spatial statistics that we'll be covering today and the other one is uh s plot um so this is another library um for the local spatial auto correlation that we'll be using today that allows us to produce these kinds of plots and visual charts of our data okay and and so here we are so we're going to get started with the kind of coding part of the lab and i'm wondering i'm going to pause here and make sure that everyone's on pace and if there are any questions feel free to yourselves thank you ben for answering that question so all good um [Music] all right so i'm gonna go ahead and get us started if you are new to jupyter notebooks let me just give you a very quick tutorial what you have here and i'll get out of this mode so that it mimics more your screens out of my presentation mode you see that when i highlight a cell i can highlight any cell you see the blue bar on the left it means that that cell is currently highlighted or active you can click inside of it a code cell and select things and actually modify it please don't for the moment but you see this is how you can actually um edit and add your own code to run a code cell you need to um hit shift and enter so either you know you can have the cursor inside which makes it active and green or it can be just selected which is just blue when it is selected if you hit shift and enter it will run the cell and you see eventually a number appear on the top left of the cell here which indicates that that number basically says this is the first cell because mine is number one that i have run in this notebook you don't have to run cells sequentially you can run them out of sequence which is a kind of good and bad sometimes confusing thing because it's up to you as the person running this notebook to determine you know how you want to run things typically you want to run them from top to bottom but there might be instances where you don't um the shift enter takes you directly to the next cell if you just wanted to run a single cell and stay on that cell it's ctrl enter so i can run control enter on an active cell and it doesn't scroll me down it just keeps me on that cell notice the number one has changed number two saying that i've run this cell and that's the second thing i've done in this notebook number two okay i know this is review for most of you but i also know that there's a good number of people who wears the first time okay i'm going to go back in my presentation mode you should see a number on the top left here if you don't see a number it means you haven't run the cell or your there's problems so hopefully you see a number that's either a one or a two okay so that means that we've run all these libraries there are a lot of libraries in this lab pandas soda pi and i'll explain all of these as we go through them okay so what are we going to do we're going to prepare our data again these are all elements that we're going to be producing shortly on the left here this is la uh hopefully most of you are are in l.a but maybe many of you are remote out of the city uh and the boundaries roughly you see here in dark is the city of la the crime data we're bringing in is from lapd in the red dots lapd's jurisdiction only covers the city of la so you can see little gaps here from santa monica and other areas that are not part of lapd jurisdiction we're going to overlay these data sets right so spatial one on top of each other and that's the whole kind of spatial component we're going to be working with we're going to join the data spatially quantify them by census block groups which allows us to produce these choropleth maps on the final output here okay and this is basically part one of the workshop and the part two is going to be doing the spatial autocorrelation all right so um block groups we're going to be using block groups which is part of the census bureau the united states census bureau creates boundaries for the entire country based on pockets of population camps so roughly a census tract has about 5 000 people in them and their defined boundaries by the census bureau a governmental agency uh a block groups further subdivides a census tract with about 5000 people into smaller components depending on your research you might want to use census traps which is uh again about the size of 5000 people so it's about a neighborhood i would say or you might want to do block groups which as the name suggests is a it's a group of blocks the granularity of your research might you know make you choose one or the other so if you're working kind of at a tight neighborhood level you might want to go down to the block group if you're working at the county level you might want to stay at the census tract level the data we're going to be using um i've grabbed it from a website called census reporter if you haven't heard of census reporter it's it's a really wonderful resource it's completely free well since this data is free right i mean we pay it with our tax dollars so if you go to the census bureau website uh you can download any census data set anywhere in the country but sometimes it's hard to navigate so there are these kind of you know sister sites or sites built by other either kind of academic or non-profit agencies that are frankly a lot easier to use so the census bureau website um sorry the census reporter website allows you to basically download any uh data set from uh the census by just filling in some text boxes something like you can fill in uh you know race and you can you know uh get race tables and then download them for your project it also comes with a lot of knowledge right it tells you well how does the sentence work this is the actual you filled out the census this is the actual survey that comes in your mail that you you hand in um so there's a lot of knowledge here i won't go into it too much it's not a census workshop but the landing page for our data is the total population very simple total population by census block group in l.a sorry in la city okay and once you land to a single kind of table in the census reporter page there is a very you know i think this should be very more prominent but this is the the magic button here the download data button you can download it as a csv file or a geojson file a geojson file is a format that allows you to work in python in geopandas it's my favorite format actually and uh it comes with a geometry column so immediately you're able to map this data i've already included it in our um repo that we're using today so you don't have to download it but just as reference for your own project you might want to know that that's available to you okay the next code cell that's in gray here will bring in that data so the gpd stands for geopanthers library so that's when i did the data um importing of the library i gave it an alias of gpd so gpd stands for geopandas the dot chains uh a command saying hey we're gonna use one of the geopandas libraries functions called read file very straightforward and in the brackets which i'll accept arguments all you need to do within a single quotes is tell it where that file resides okay so since i've downloaded and put it in a data folder it's a relative path from where this notebook is so it's data slash and then that long name of the geojson file i haven't changed it this is exactly what it comes in as from census reporter i'm calling this gdf which actually stands for geodata frame so go ahead and launch that cell and let's go to the next cell and let's kind of explore the data that we have currently imported into our notebook so gdf geodata frame is the name that i gave this data the dot head allows us to look at the first five rows of the data right so here's dot head for those of you who are new to python i'm just kind of go give you a taste of a few things that you can do once you have data in your notebook so this is you know it's pretty simple there's only one variable total population and as i mentioned it comes with the geometry column and there's a geo id here which is a it's basically a fips code a federal identification code which we'll be using to as a unique identifier for these block groups um dot info okay so we're going to run another command we looked at the first five rows dot info is one of my favorite commands because it tells you a lot about the data that we just imported and uh not only does it give us kind of the name of the columns right so we have the name of the columns it tells us how many entries there are it's nice right what's the number of records in our table so we know we brought in 2516 records so there what does that mean it means there's 2 500 block groups in the city of la um but there's more stuff here like it tells us um how many of these um you know rows or columns are no so if you bring in some other data set and it's kind of messy there might be columns that have a lot of null values and it's nice to know that at a glance right you might be like oh my gosh i'm missing 500 uh records so um oh and finally most important actually is the data type so with one command you can also find out what data type each one of these columns are so we have an object which essentially is text so i know my identifier is a text data type and i know that my total population value right which has to be numeric if we want to quantify it isn't a float float means it can also be a decimal which is great the geometry column is since we're using geopandas it recognizes it as a geometry data type which means that it's hey instantly ready to be mapped okay all right so let's do a couple of other things we're going to trim the data to the bare minimum columns so in order to do this in python in pandas you give it uh i'm gonna redefine it so i'm gonna get rid of a column actually by saying i want gdf to include only these three columns so essentially what you see inside the outside square brackets is a list a python list of only the columns that i want right and what that does and i'm further going to rename these columns you see what i'm doing here so i'm doing trim the data to three columns and then redefine the columns sorry rename the columns so it's gdf dot columns to something that's more user friendly so total pop i don't know what v01 i mean i do know that's the census bureau's table number but anybody else doesn't know that it's total pop okay and i'm going to leave the geometry column as the same name and let's have a look at this data okay so just another command dot head dot tail uh shows us the last five rows and this is what it looks like now we have three columns i got rid of the the error column so we have fipsco totalpop and the geometry column now now uh census reporter does something that we all need to be aware of and that it includes a row in its final row of the table that summarizes everything that you included so this is kind of good and bad right it's great hey i know that the total population of the city of los angeles is 4 million people but it's bad because if you're doing statistical analysis and you don't know that one of the rows is the entire summary it's going to skew all your data okay that's what happened to me the first time i downloaded this data so we have to get rid of that row okay which is great we learned another python method here which is how to drop a column okay there's probably different ways to do it but if i am outputting the tail and i see the id number here i can just drop it with a dot drop command so notice what i'm doing here is i'm redefining gdf to be a gdf without the summary row okay i'm not renaming it i'm keeping it the same so now that i do that [Music] we are we have one less record in our table okay one other thing i wanted to do is clean up the fips code so the fips code here as you can see includes one five zero zero zero us i want to get rid of that part of the fips code because a census block group in terms of the identifier actually starts with zero six zero six is the fips code for california so two two digits starts with zero uh zero three seven is the fifth county code for la county right so we are the 37th county in california and then um what follows is the census tract number in six digits and then the final number is the block group number uh probably too much information okay so we're gonna drop the one five zero zero zero us and all we're going to do here is we're going to say take the fips code and replace this part of it with nothing right so we're doing a find and replace and once we do that um you'll notice that the fips code now actually only has those numbers in there okay okay so uh one last thing uh i'm gonna do is i'm gonna get rid of some values okay and i'm going to use a sorting command so we have a data we can sort the values using a sort values pandas command and i have the documentation here if you want to check it out the command basically accepts um you have to define well what column do you want to sort by so i'm going to start by total population and i'm going to output the head the top 20 rows of the start so here what it does is i have 20 rows showing because i have the 20 inside the dot head and it's been sorted by total population what i want to do here is you see uh total population zero so we have block groups that don't have any people in them um you guys know why that might be why would there be a block group a census block group that has nobody in it any guesses put in the chat no is it like parts of griffith park or that's a good point yes um although usually uh the zero population block groups are i don't know about parks uh parks are surrounded by housing communities so i haven't actually seen that um mountains maybe yes um and airports and like long beach areas where there's a lot of like boats and uh you know the port uh there are a lot of huge areas that are kind of industrial or airport facilities um they cover a lot of land and have very few actual people living in them great all right so um moving forward um let's subset the data so we're gonna um select a subset of the data frame and i have a link here if you want to learn kind of the documentation of how to do that but it's a pretty nifty command right so again i'm going to redefine our geodata frame gdf to be equal to a subset of what we already have how are we going to do that we're going to say hey i want gdf to be all the rows where the total population is more than 100. that makes sense so we're keeping everything that's more than 100 in our geodata frame and actually kind of deleting or cropping the rest okay uh when the data is being cleaned is the data national file being modified and questioned in the chat the actual data that you brought in is not being modified right so the original geojson file is actually being copied into your jupyter notebook and it's not modified however you created a variable called gdf that includes the data in it and as you modify it that variable has now been trimmed and is now doesn't have those records so if you wanted to backtrack you're going to have to then start over again if you need to keep something previously you might want to change this to gdf2 or something right or gdf greater than a 100 but i don't have a need to keep the other data sets so i'm going to keep it as is another question can i have a million records in the data frame uh you can i have worked with a million records it's a matter of computational power a lot of this happens it's a it's a balance the computation will happen in a kernel which is not part of your computer but then you might run into memory issues just like any other software the bigger your data it's going to bog down your system i would say within this environment i've successfully worked without any issues with hundreds of thousands of records i would say beyond that um you run into memory issues or it just starts to bog down the system okay it also depends on which libraries you're using if you're using javascript libraries then it takes a lot of more resources out if you're using matplotlib that is more kind of static results coming back it might be more doable all right so um moving on uh now i have subset my data to include only records that have at least 100 people in them let's do some fun stuff let's start mapping this data before i do that our data includes data in a projection that's geographic it's basically governed by decimal degrees latitude and longitudes which are angular units in order to conduct spatial analysis or that requires measurements of distances and you know locational statistics it is recommended that you go away from angular units which are degrees and use um you know a projected coordinate system right which converts your data into measurements of meters or feet depending on what projection you use it so we're going to bring our data and convert it into a web mercator projection and uh it geopanis provides a really nice quick easy way to do this which is just to say two crs coordinate reference system and uh you know you can look at documentations about different eps gs which are the codes for different projections we're going to use a global projection system called webmercator3857 this is uh also used by a python library that i'm bringing in called context philly which allows us to bring in base maps behind our maps so we're going to convert our data to webmercator and let's create some plots for the plots i'm going to be using this command plt dot subplots plt is a matplotlib alias for something called pi plots and here's the documentation for how to create plots using this command so it's part of matplotlib plt and as you can see here are a whole bunch of examples of how to create a plots using this method and it uses kind of this figure axis method of creating either singular or multiple plots in one output and we're going to do both actually okay all right so here's a code to create a single plot um so we have uh the command here it generates a figure and an axis lines uh four through eight is the actual plot notice that we're prefacing it with our geodata frame that includes the geometry column and we're going to plot our geodata frame with these arguments basically we're saying ax equals ax meaning i'm putting this in this plot that i just created right and these are some attributes on how i want the you know the geometries to be rendered on the plot alpha is opacity so there's a little bit of transparency in the plot we're turning off the axis and we're adding this line of code which is provides us with a base map essentially so when you run this it produces a nice graphical output of our data so we saw this in tabular form and now we see it in visual form so these are the what 2500 each one of these polygons that you see here is one of the 2 500 block groups in la um and moving forward oh yeah i should also mention that the base map you see here in the background that's produced by this line 14. so it's just a nifty uh little helper it's it's optional of course but it just kind of makes the visual much nicer all right so now that we have a map of the block groups uh let's do a let's get some other data so now we're going to work with la data portal so here's la data portal i have a link if you've never used it before a lot of government agencies now provide portals for data that we essentially we pay for with our tax monies right so they have a whole bunch of you know covet and building and safety water and power all and the lapd as well has data here if you search for a rest data in the la data portal you will eventually land on a page that looks like this and this is a data set from the you know the department is lapd it's updated weekly no it's not they're lying what the last update is february 24th what's up lapd so um no it's not updated weekly um they wish but i actually checked this regularly and i think it's more like monthly and they're already quite um quite behind here um it's great location is specified right and you just look at the metadata these are all the columns that are associated with this data set right so we have the arrest date the time of arrest who was arrested basically right the age of the person arrested the gender you know the descent code which essentially is the race i want to preface this data set by saying i've worked a lot with this data this is based on i don't know if any of you have been arrested i've been pulled over before but a police officer will have a notepad in front of him or her and start writing this data in a lot of this is subjective right they don't necessarily ask you are you so and so so they're going to have a lot of this data is based on the police officer's observations all right so um that's the data that we're going to be working with and we're going to use a library called socrata and this is the endpoint of the data that we're going to be using endpoint meaning we're going to use an api to grab the data from directly from the open data portal which uses socrata in the backend to host their data so this is specific to the arrest data and you can see all the different things that you can do with the data using their api these are all the fields you can query them before you bring it in you can do a whole bunch of things there are code snippets in this type of data and one of them is a python pandas right very cool and uh there's code samples to bring in this exact data set into our notebook and this is essentially what we're going to do we've copied and pasted this and kind of configured it a little bit for our lab today uh so we're going to use sodapi socrata which is a python library again i mostly copied and pasted this i added a where statement okay so i added a where statement to kind of hone in on a specific uh date range for data to bring in from the la uh police department so i'm going to say where arrest data is between july 1st of last year 2020 to pretty much uh you know last uh what uh january 31st okay so this is like real data and we're going to bring it in uh right now so hopefully this works kind of nervous time because uh this is kind of one of those live uh dependent things in a workshop okay so that that worked um and let's get the information about the data that we just brought in dot info and these are all the records that we brought in right 25 rows i'm sorry 25 columns these are the data types they're all objects which is text they have a latitude longitude which is also an object this we're going to have to change into numeric um okay so let's do uh let's look at the data it looks like this um this is just the preview of the data and here's the latitude and longitude columns um and let's just move on and we're gonna massage this data a little bit before we can actually uh use it so first of all the data is in what's called a data frame it's not geo pandas yet so we're going to convert the data into a geopandas data frame and this is the code on how to convert and have a documentation for it and so i'm going to convert my arrest data into a geo candidates data frame by defining the geometry as the combination of a longitude and latitude columns from our pandas data frame okay and then i'm going to convert this to the same projection that we had from the angular units right into a web mercator and now we have oh i'm sorry final step is to remember the latitude launcher were objects we need to convert that to floats so again this is the python command to convert a data type column from something to something else so i'm going to do as type float for latitude and longitude and i think we're finally ready to plot it so again the same matplotlib subplot code here and instead of the gdf which is the block groups i'm doing a rests and plotting those this is a point data set as opposed to the block groups which is a polygon so what happens when you plot this is that you're going to get a whole bunch of points rather than polygons so let's see that in action and oh my gosh what's happening here we don't have la we have almost the entire planet and we have a dot here on the bottom right anybody know what's going on here want to guess no no guesses why would we have a dot off the coast of africa why would was somebody arrested off the coast of africa need to remove it yes mistaken that long yeah no no no those are great answers um it is all of the above a lot longer flip actually rachel although that is not um what's happening here that does happen a lot where people mistakenly flip the latitude and longitude and you get things all different side of the world um what this is is zero zero right this is where the zero latitude and zero longitude meet right um that's so this equator and prime meridian and a lot of data geographic data that doesn't have locational attributes end up in off the coast of africa okay all right so it's called the zero zero conundrum and we have to fix it um so again we can subset our data to get rid of it but let's find out what it is so arrests and um where longitude is equal to zero and there's a single arrest a booking uh where for whatever reason uh the police officer did not include a latitude in launch or a location so let's get rid of it we're going to subset it a rest is equal to arrests where notice the syntax here longitude is not equal to zero nifty right so now we're gonna just eliminate uh that record with this command here and now let's do exact same plot and see if we got rid of that record ah there we go so that looks much better it's not uh africa is not part of our map anymore it's very la city based okay um let's create a two layer map we have the ball groups and now we have our arrests so um in order to do that um what i'm gonna do is i'm gonna i want to zoom in to the arrests and not to the block groups so what i'm gonna do here is i'm gonna get the bounding court the bounding box of our arrests and there's a nice command called total bounds for a geodata frame so we can get the total bounds of our arrest data by running that command and once we have the bounds we can use that to define how the areas that we're going to plot and the multi-layered plot is a little longer right the code cell's a little longer so we start with the same thing here right but now we have two chunks of code to produce two layers the block group gdf.plot ax is ax the arrests ax is also ax meaning we're putting it in the same figure in the same axis and we're putting them on top of one another the bounding box code those numbers that i just got basically says i want the x and the minimum x and the maximum and x y's to be a certain value so we zoom in to the arrest data and uh we produce a chart for that okay a plot okay so there we go um let me just get rid of this i can see the whole that's beautiful right so we've done a lot of work so far we've brought in census data from census reporter as block groups they're polygons we now see a rest data directly imported from the la city data portal just the last six months of arrests all plotted on the map okay um any questions uh so far before i move on all good okay pace is good all right okay so now that we have the two data sets on top of one another we need to quantify the relationship between those two layers we're going to do that using something called a spatial join you can read about how spatial joins are conducted in geopandas with the link here and uh we're gonna run a function called s join which does the spatial join and uh it's actually a single line of code s join what are we joining so we're going to grab our sentence block groups which is g pd right you uh i'm sorry no no this is a geopanda so we're going to do s join of what arrests and then the block groups and we're going to go left meaning that we're going to assign a block group number to every single arrest in our data set does that make sense okay so every single arrest we're going to tack on the block group that that arrest happened in okay and uh so that's what it does and i do the head so this is still our arrest data set right booking the date where it happened the latitude launched but if i scroll over to the right we have tacked on a fix code so now we know where this arrest happened in terms of what block group it happened in that's really powerful right so now we can then quantify this data um now we're going to do something called value counts essentially we're grouping all our arrests by their block groups okay and we're going to use a command called a value counts and i'm chaining a few things here so that it eventually ends up with a single data frame that looks like this super simple what's the fips code how many arrests fall in that census block group right um let's see okay let me pause a little bit here i have a few questions here how did i decide on 1000 uh the margin today sorry this was from the previous plot uh pretty randomly actually web mercator measures things in meters so 1000 by padding the x and y's by 1 000 units means i'm giving it a padding of 1 000 a kilometer basically on each side of the map um left to right so if you wanted more padding uh if you wanted five kilometers you'd change that to five thousand okay um and then elizabeth you're getting an error um [Music] i'm gonna have to come back to you on that one maybe if you can stay afterwards i can see if i can help with that one okay everybody else uh good okay uh all right so uh now that we have summarized our data um i love that um let's move forward uh let's create a simple plot so again we're gonna use a bar plot to plot our data and what i'm going to do here is i'm going to again i'm going to subset the data and only show the top 20 geographies so it's colon 20. there's documentation about that as well and we're going to plot using a bar plot and let's see what that looks like and defining the xy axis so here on the x-axis we have the tips code so this essentially tells you these are census block groups and on the y-axis we have the arrest count very simple but very powerful right so we've summarized all our data by census block group what this tells me is that there is a single block group somewhere in la that has 400 plus arrests so that's quite a lot in the last six months okay um okay so now that we have this single table what i want to do next is i want to join that table back to our block group geo data frame so that we can map those numbers right so we're going to do a join back and using the merge command so here's the gdf merge we're merging the table that we just created the rest by gdf on the fips code okay so now what does that do is if we look to our gdf our block group table again now what we've done is we've added tacked on the arrest count to that table right so now we not only do we have total population we also now have a rest count that's really cool right so um now we can normalize our data right so now that we have total population and the rest count let's normalize our data so how many arrests per 1000 population uh so it's we're going to do a little bit of math this is a really nifty pandas method here if you wanted to create a new column all you have to do is what's the data frame name geodata frame names gdf square brackets name of a brand new column right doesn't exist you just create it here what do you want the column to be i want it to be a rest count divided by the total population times 1000 right that normalizes our data to a thousand people and once i do that let me sort it and then show you the top five since it's block groups so here we go normalized uh 163 arrests per 1000 population okay so here's a little caveat it doesn't mean that 163 people out of the thousand people living in that census block group were arrested right don't necessarily have to live there to be arrested right so just keep that in mind uh it could be that this block group includes a park that is known to be a high crime area where a lot of people congregate in and there's a lot of arrests that happen there and there might be a pristine neighborhood right next to that park where nobody ever gets arrested so again always very mind the human element of the data that you're working with all right so um let's let's look at this data and and actually map it and what i'm doing here is i'm taking the top 20 uh census census block groups and just mapping it so essentially what next map indicates is the location of the top 20 uh arrest areas in la city right so we're only showing the top 20. so that's kind of a powerful map on its own okay um the choropleth maps um again color choropleths are colored maps so rather than points we're going to color each census block group by a value and this it's the same thing gdf.plot so we created a pretty much a black gray transparent one but by adding a column right with a numeric value in it so we're going to do the normalized data and we're going to give it like a color map that produces a choropleth map so you can already just by visualizing this map see some clusterings happening and so forth okay i'm gonna skip this in in low of time because we now go into the second part of our workshop which is a spatial autocorrelation any questions so far everything good makes sense okay great okay so now we're moving to the statistical uh analysis part uh spatial analysis part so we we have our two data sets remember our resper 1000 uh what we're going to do next is we're going to run global spatial autocorrelation okay which essentially is a single number it essentially tells us globally for our entire data set are there um statistically significant clusters or clustering uh tendencies in our data spatial clustering right if we have a positive number it indicates that yes there is spatial uh clustering happening meaning it's not you know if you had a deck of cards and you just kind of randomly threw it on the floor it's a random well maybe that's not random there's a name okay never mind is essentially our arrest kind of randomly happening in the city or not okay so um let's uh see if we can provide some statistical numbers for that so global moran's eye statistic is a way to quantify the degree of this phenomenon so let's go ahead and see how that's done in order to run global moran's eye we have to first do something called spatial weights so what we're going to do here is we're going to define how neighbors how we're going to define neighbors spatially right if we think about hey if a location is surrounded by other block groups those surrounding block groups are their neighbors so we need to take into consideration not just that single block group but what is surrounding that block group into our calculation how we define neighbors is called the spatial weight and there's different ways of defining a spatial weight the method that we're going to use today is one of many different methods but we're going to use the kernel k n weight which is essentially the nearest neighbor counting based on distance from your block group and if we say k equals eight it's going to take into consideration the eight closest block groups from that block group from the centroid of that block group and it's gonna count eight of eight neighbors and then it's gonna average out the numbers of those eight neighbors and create a numeric value for the neighbors okay so that's what this whole process is going to do you can learn a little bit more about it in the link provided here and then these commands are taken from uh the library that we're using which is the esda library and uh so we're gonna create the the spatial weight and put it in a variable called wq and so okay so that's the first part of it next we're going to calculate the spatial lag and let's see so the spatial lag um will actually quantify the neighbors now that we've defined what neighbors are now we're going to quantify those neighbors and average out their values into a single value and put that in its own kind of data variable that we have access to okay so we're creating again a brand new column called space uh a rest per 1000 lag and we're calculating the spatial lag based on how we define the spatial weights so all this is done in this code cell here and when we look at our data right now we have total pop absolute number of arrests in that sense is block group arrest per 1000 and now we have a spatial lag here what does that mean so oh and notice i did a sample here instead of a head or a tail this is one of my favorite commands because it basically gives me 10 random rows right rather than the top 10 or the bottom 10. so if i run this cell again it'll give me 10 a different random set so you know i just kind of like doing this to get 10 random self what is this telling us here um so i know so i'm looking at for example this row here a lot of arrests 320 arrests out of 3000 people um a rest per 1000 is 100. the spatial lag column tells me that it's 20. what that means is that this block group has a lot of arrests happening but the eight neighbors eight neighbor eight closest neighbors average out to have only 20 arrests that make sense so that is telling you that this is what we're going to call a diamond right it's a diamond because it's a super high value surrounded by a much lower value of neighbors okay um so that whole phenomenon the donut and the diamond um let's just um what i'm doing here is i'm creating a new column that basically subtracts one from the other to kind of see where the donuts and the diamonds are i just want to know just visually where are the highest number of arrests uh as opposed to the spatial lag as opposed to neighbors essentially and the opposite so some of these have low arrests surrounded by super high neighbors right that would be a donut like a lot of arrests happening around it but inside right in the middle of it low numbers that's a donut and the opposite is a diamond um and so the fips code with the highest negative difference i'm going to have a look at our doughnut happens in this census tract here has a total population of 906 very low less arrest rate right eight but the surrounding eight closest census block groups has an average of 76 arrests that's what the donut is okay on the flip side what's um the diamond in our data right uh that's the block group that it happens in total population is 960 so it's very similar to our donut and the arrest count is 417 people arrested here uh rest for one thousand four hundred arrests for one thousand the lag is ninety that's still super high right um that's 90 arrests in the eight neighbors that it has um so you know just for reference what i'm gonna do next and you don't have to follow this part because it requires kind of a mapbox account um but if you have a mapbox account um you can actually import satellite imagery um so i'm just gonna go ahead and get my token so again you don't have to follow this part uh you yeah hey sorry to interruption right here um i think your pace was a little bit faster and i kind of lost you on the donut and diamond um could you please explain um one more time no problem thank you for that um let me just uh okay let me just copy that for now um okay so let me backtrack a little bit and talk about this output here all right so um the donut and the diamond essentially takes into account where are there instances where a single entity we're talking block groups here is surrounded by dissimilar neighbors right so there's two extremes here you can either be a block group with very low arrest numbers surrounded by block groups with really high arrest numbers that would be a donut and kind of think about it like oh it's you know high values and then smacking them in like a hurricane right in the middle very low values so it looks like a donut the diamond is the other extreme right you are a single census block group with a super high number and what we're measuring here is a rest so you have a super super high number of arrests in a single block group surrounded by neighbors that have slow or low were significantly lower arrest numbers and that's a diamond does that make sense yeah it's clear and yours you were talking about something eight block groups versus seven and six you're comparing the numbers right i didn't follow it how how were you comparing eight versus seven to six eight versus four to one twenty something like that uh when you're explaining you're trying to compare rs count versus rs per lag how do you compare that to how does this number compare to like the other columns so basically i'm i'm trying to understand the what is 8 what is 76 and what is -67 oh gotcha okay yeah so eight let's look at the stop number so eight is the arrests per 1000. so essentially it is um eight divided by the total population 906 times 1000 so that's the eight so normalize the data uh to be how many arrests per 1000 people so that's that number here the 76 here is the lag so it takes based on the spatial weight that we did previously it takes the eight closest neighbors to this uh block group and the number of arrests there and averages them out so essentially yeah question sorry to interrupt you so now you said eight clause closest blog groups right how how does the code determines the eight closest blog groups what is that eight blog group coming from yeah uh good question it is actually defined here when you define the spatial weights right uh k n um closest neighbors uh this is where the eight comes from okay so it's a static number that we provide yes yeah and you can change that number um you can take into account more neighbors or less neighbors right there's also other ways to measure weights there's like the i don't know the queen's method for example which is like the chest where it takes diagonal or the contiguous method you know there's other methods but we're just taking the eight closest neighbors um and if you look at the documentation here it'll tell you about the other methods as well it is a static output in the code okay all good yeah it's clear now great great thank you for for asking the question um what i'm gonna do next is kind of optional so you don't have to follow i grab my mapbox mapbox by the way is a web-based gis spatial visualization platform it is not open source so but it provides if you create an account it provides satellite imagery that you can use in python so i'm going to use my token here and it's essentially free until you publish something that goes viral and you get millions of hits then they'll start charging you but you know you can create an account and use for research it's mostly going to be free so i'm just going to run this again you don't have to follow um because i just want to show you what this data looks like on top of satellite imagery and i'm just going to run a few things here um just to show you [Music] hold on one second what the data looks like on top of satellite streets so we're using a plotly command here to plot our what is this the donut this is the donut yes so i i took in that single block group that i defined as my donut and plotted it on a satellite image so we can actually see where this is happening do you guys recognize any of you guys recognize what this neighborhood this is in remember this is the donut so this is a single block group that is surrounded by super high sorry this is a single block group that has low arrest rates surrounded by super high arrest rates it is in venice yes so venice beach have you guys been here it's a really fascinating area it's the it's called the venice canals so you see this this isn't a road this is actually a a canal and most of the people who live here actually own uh kayaks and canoes you know it's really kind of fascinating and these are super high uh valued homes because it's a very unique neighborhood but it's also in venice beach famous for muslim beach it's a large homeless encampments around surrounding here so it happens to also be a neighborhood that is high in crime okay all right i shouldn't say a high and a rest there are two different things um okay so now uh also for reference i'm gonna map the location of the diamonds and this was really surprising for me well maybe not but the diamond is actually almost in the same neighborhood as the donut which in some sense makes sense right um but this is the um venice uh beachfront area here and um in the pandemic i actually walked uh here a couple of months ago you know what used to be a huge tourist kind of uh walkway is now a homeless encampment there's rows of tents um [Music] so it probably accounts for the high risk numbers okay all right so we digress um sorry can you go back to the um donut oh the donut map sure yeah yeah so my understanding is for donut the center region has the low um uh crime or arrest and the surrounded by higher right yes so the blue section which is highlighted is the centerpiece of the donut yeah so this is a single census block group so this is a row out of the 2500 record table that we had right this is a single row that in the last six months which our data is from it had eight arrests that happened within this area the eight neighbors closest neighbors the arrest lag is 76. so the eight closest neighbors i don't know uh you know i can't visually see them but i can kind of anticipate that they're all the surrounding neighborhoods uh block groups sorry have an average of 76 arrests per census blocker so that's what we're comparing here got it so the blue is the lowest region is it yes yeah okay and vice versa here this is the highest number of arrests um diamond gotcha thank you okay all right so um let's create um a spatial lag map okay and see what that looks like so very similar to the other plots that we've created we're going to create use matplotlib and we're going to use spatial lag coreplef so the column here right we're going to use that lag column all right so let's see what that looks like all right so here is our output for that okay so you can see a higher concentration right because now these colors not only represent a normalized count it represents a count based on its neighbors so it's kind of statistically incorporating the neighbor the value of its neighbors in its color to make this clear we can create side-by-side maps right so i think the the code for that is pretty nifty um what we were doing were single plots so we had one comma one in the previous plot if we change this to one comma two when we create the subplots it means that we're creating one row the two is two columns in other words the plots are gonna be uh on one row and two plots so if this was two one for example instead of one two it would be uh two rows and one column meaning the plot will be on top of each other does that make sense right so one two essentially says side by side a two one would be uh top bottom okay um and what it have what happens here is that the the ax where we're positioning our maps into when we have a 2 here it creates an ax 0 and an ax1 so we're saying on the map on the left the ax0 i'm going to put the normalize arrest numbers and on the not map on the right we're going to put the lag map so i just want to compare the differences between these two outputs okay so as you can see there's there's a very significant difference here in which now on the right here we see statistically significant clusters based on its calculating its neighbor the tendencies of their neighbors right so these are high numbers surrounded by high numbers the greens are low numbers surrounded by low numbers that make sense okay um let's see here how are we doing in time it's 10 30 okay we still have 30 minutes so i'm going to i don't do we want to see an interactive map of this again this is kind of more for for reference and talking about lag my notebook is lagging a little bit so let me exit this for a second all right i'm gonna i'm gonna bypass this slideshow mode which wasn't working so i'll go scroll mode just like you guys okay um okay so what i just wanted to show you um is an interactive version of this map on a satellite imagery which again requires that mapbox account which you can create uh later and if you wanted to see satellite imagery interactive satellite imagery you can use it so let me just quickly show you what that map that we just created with the spatial lag would look like on an interactive plot so you can run the next four or five cells um until this uh choropleth mapbox command which will create an interactive plot let's see if this works i'm starting to run out of memory so i'm a little bit worried here okay okay that worked great so it's exactly the same plot right that we produced with matplotlib but it has uh it's javascript so it's interactive as i hover over it tells you the numbers that we're interested in and it's zoomable too if you scroll your wheel up and down you can actually zoom in okay so this is you know kind of a visual enhancement you know it's pretty it's nice you might want to add it in a screen grab of this and add it in a report but i think for publication purposes you want to stick with matplotlib right but i like it it's really powerful because you can see you know the built environment behind what's going on certain neighborhoods and the mapbox satellite maps are so beautiful and it includes labels so you can actually then identify what areas these things are happening in so i would for that reason i wanted to include it there okay so let's move on to our moran's plot all of this work is essentially to get to a single number called the moran's eye value and once we have the weights defined and we can actually output a single number okay so all that hard work to get to two eight six five so and so on so and so on and that's our global moran's eye value that is the number associated to the amount of spatial autocorrelation happening in our entire data set but what does that mean what does 0.2 mean right it's a positive number right so the documentation tells us that if it's a positive number it means that there is positive spatial autocorrelation happening what does that mean it means that we do have spatial clustering tendencies happening in our data and it means that high values are close to high values and low values are close to low values that that phenomenon is evident and happening in our data set um [Music] and we can okay so what is negative is if this number was negative it would mean that there's negative spatial autocorrelation that happens which is less common but it means that similar values are far away from similar values so that's a negative um an example of a negative spatial auto correlation would be like the locations of hospitals because you don't want hospitals to be clustered next to each other well maybe you do i don't know probably you don't you want hospitals to serve the entire community right so you don't want them all clustered in one location where people have difficulty accessing it from far away so you want to spatially disperse the location of hospitals so that they equitably are accessible to entire communities and that would be an instance of negative spatial autocorrelation right so they're far away from each other all right so um we can also produce a scatter plot all right of our data here and uh so this uh indica this is essentially an x y of the spatial lag on the y and the arrests normalize the rest numbers on the x so there's a strong correlation here positive correlation that we're seeing and this chart will look a lot different as we move forward um but what is the significance of that value zero point sorry it's three i'm missing it well roughly 0.3 um what that means is um oh and actually we can actually plot this number here and this is a nice command here that comes with our spatial library here that we're simulating randomness right so essentially what we're doing here is we're running we so think about it as every single census block group that we have in our data and we assign the arrest numbers randomly across the board to all these census block groups and we do that a thousand times so we say okay randomly assign all the numbers that we have so take them out of their sentence block groups randomly plug them back into sensitive block groups what does that look like so this curve that you see here is the simulation of a thousand random uh simulations of our arrest data into the 2500 block groups that we have this value here is where our current data lies which means that we are far away from random simulation of our data which indicates that we are statistically [Music] you know significant in its kind of non-randomness if that makes sense right so this is what this is telling us um and this is you've probably many of your statisticians have seen this and know that that value zero point um oh the the p-value is 0.001 so it's it's a super um high low number that's p-value um okay so yeah any questions so far on the global spatial auto correlation okay great all right so now we're going to do local spatial autocorrelation i talked briefly mentioned the high high and low low high low low high what i mean by that is where now that we know that there's phenomenon that we're looking at which is arrests in la city is not a matter of spatial randomness where is it happening and how is it happening the where we kind of started to see on those maps that we produce the how like that high are high values next to high values and where are your high high values where are the low low values like low arrest numbers surrounded by low arrest numbers where is that happening and then where's the donuts and where are the diamonds the lows surrounded by highs and the highs surrounded by those okay and we're going to do that using local spatial autocorrelation and so the library that we're going to use is called lisa the local indicators of spatial association and lisa essentially produces the same um plot that we produced earlier oh i'm sorry um right with the arrests on the x axis the spatial lag on the y axis but it color codes them based on whether or not it's high high values or low low values right so on the top quadrant here are high high values and the p-value for that is 0.05 so it's only going to color values where the block group has a tendency to have a you know a highly correlated neighbors had a significant 0.05 significant p value okay same with the blue on the bottom left the low lows the high lows the diamonds and the low highs the donuts so we've color coded these in our scatter plot here we can also color code these on a map right so essentially if you think about this scatter plot right each one of these dots represents a block group and when we create the map the lisa map as it's called okay right it takes the values from that scatter plot and maps them with the same color values here right so now what we see here is not only do we see kind of the visualization of the spatial lag in which we would have to kind of visually determine oh yeah okay so the this i see kind of clusters here now we can statistically say at a p-value of 0.05 we know for sure that there is um high values next to high values in downtown lake at a p value of 0.05 and the blue areas represent the low lows low arrest rates kind of in the santa monica mountain areas surrounded by low arrest neighbors okay and we see some instances in smaller degrees of doughnuts and diamonds right and just to show you one final plot the side by side plot comparing different p values right so this is the p-value at 0.05 and this one's the p-value at 0.01 side-by-side you would see less instances the higher your p-value is because you're kind of honing in to kind of a higher degree of accuracy in terms of your spatial autocorrelation values okay but there are still um areas in la city where that is true at 0.01 all right so um yeah that kind of concludes my material for the day um i hope that was useful i think i kind of maybe hurried a little bit in the end but um let's maybe take some time to digest what just happened and open it up for questions if um you have any or comments or thoughts yeah thank you hey uh yo thanks for uh you know teaching this if you can quickly summarize from the morons um stuff right what actually we are doing it will be really helpful yeah so let's see yeah i can just summarize a little bit um kind of starting with the global moran's eye value right so the global moran's eye value is essentially a single value that is calculated um because you have assigned the weights already and you've already created a a weight parameter and it essentially takes your entire data set and calculates the degree of spatial autocorrelation globally that's happening in your data set so that's just a single number that tells you hey you can feed it you know any data set any any variable and it'll tell you whether or not it's spatially autocorrelated or not it doesn't tell you where things are happening it just tells you whether or not this it's not spatially random the next set of processes kind of determines to what degree is that true or not so that number 0.28 you can further kind of validate that number by simulating randomness right so you take those arrest numbers you pluck them out of their associated block groups and you randomly plug them in to uh back to random block groups and you simulate uh the creation of the miranda value a thousand times essentially and it creates this kind of bell curve and this is where you are so based on the data that we imported uh it tells us that even if you did this a thousand times there's no way it could have been within the random uh simulation right um and then uh the moran p sim outputs the p value that your data is in globally um so we know for sure that 99.9 of the random maps that there's you know zero less than 0.1 percent chance that this happened randomly the local spatial autocorrelation then allows us to see exactly where this is happening right as opposed to just globally i know that there's uh this isn't the things are spatially autocorrelated where is it happening the same plot but now we have color codes on the high high high lows low lows and low highs um and then um put making this into a map since every single dot here represents a a block group we can map it and assign the colors based on those values and it produces this map so how different is this from the diamond and donut this particular graph so this is high uh rs rate surrounded by a high possibility of crime surroundings right how is it from the donut and diamond i think they're all kind of related right i think the donut diamond exercise earlier was a preliminary investigation i would say that um you know it's i like to do a lot of exploration of the data and assumptions to kind of understand it uh prior to doing a lot of processes so that i know that i'm not imagining things um [Music] so that was kind of all that was to be honest um and then also kind of seeing it on top of satellite imagery to kind of validate what i think i am seeing right so that i because when i see it in that context i'm like oh that's right that makes sense the data makes sense i understand that community um i've walked through that community i can feel it so i think i'm on the right track um because i always fear when i do a lot of these kind of number crunching that i might be doing something wrong that i might produce something that you know doesn't actually make sense when you kind of think about it through lived experiences especially when you're kind of working with really sensitive data arrest data is really sensitive crime is really sensitive and if you designate or if you assign communities and and produce reports that say hey this community is written with this kind of phenomena you're responsible for that um statement um so to be honest you know that's kind of where i was going with that and i think this whole process below takes statistical methods and kind of validates it through sound statistical methodologies that are practiced in the field got it makes sense thank you all right well um you know what i'll conclude the workshop here but i'm happy to stick around and talk to any one of you if you have questions about your own data about your own research i'll stick around for that for now what i'm going to do is i'm going to stop recording okay
Info
Channel: UCLA Office of Advanced Research Computing
Views: 569
Rating: undefined out of 5
Keywords: gis, spatial data science
Id: B_LHPRVEOvs
Channel Id: undefined
Length: 100min 21sec (6021 seconds)
Published: Fri Apr 23 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.