Introduction to Visualizing Geospatial Data with Python GeoPandas

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone welcome to another tutorial in this tutorial you will learn how to use the geo pandas Python library in order to plot geographical data now before you get started let's open a new python script for that you can go to your file explorer over here let's right-click and go to a new module and I will name this as plotting geographical data dot dot py because it's a Python file alright so let's go ahead and first import the geo pandas library we import the library by typing import as GPD and now what I'm going to do is I'm going to read in one shapefile now you can see over here I have one shape file by the name of world so this is basically a shapefile of the world so I'm just going to read this file in using the geo pandas library so for that I can first create one variable to which I will actually load this shapefile I'm going to call it as world data let me put a comment over here and describe what I'm actually going to be doing alright so world data is actually going to be equal to GPD because I'm calling the Geo pandas library read file and what you have to do is just provide the path to the corresponding shapefile so I can just go ahead and copy this and paste it over here followed by world dot SHP because dot SHP file is the shapefile that we are going to read in now to run you can just either press here or you can just simply hit f5 and now you can see that we did not get any error or anything like that now you can actually have a quick look at this world data variable but if I make a slight adjustment over here you can see that we've got a couple of columns and then we have some information now let's use the variables Explorer which is which provides us a bit more clear picture of the of the details embedded in this variable I can see that we actually have some information over here we have the name and we also have a column for area some of the information like the region sub region longitude latitude just something to keep in mind is that this geometry column is actually not originally included in the shapefile but when you read in data using geo pandas library the way the program identifies its geographical locations of of each element in this case this actually polygons so the way it identifies the location of each polygon is actually using this geometric column so this is basically a special column which gets created when you read in geographical data now if I check the type off well data it will tell you that it's not actually a panda's data frame but it's a Geo pandas geo data frame so so the way how this geo data frame actually differs from being from a regular panda's data frame is through this through the availability of this special geometry column which actually gives corresponding geometrical properties now you can do a quick visualization what you can do is you can just say well data dot plot and that'll give you a quick plot of the of the data that you that you saved into this world data variable so this is how the how the map looks now I'm just going to make a few changes let's say that when we go to variables Explorer right now you see that there are actually a multiple columns over here which which I might not really require so so instead of getting rid of all the columns that I do not need what I'm going to do is instead I'm going to select only the columns that I need so for example in this case I'm just going to select this first this column which is by the name which actually gives the name of the country as well as this important geometric column we cannot miss this because if you don't read this one in it'll lose its geometrical properties basically it'll get converted into a panda's data frame in instead of instead of it being a G open as your data frame so I'm going to read in these two columns you can save world data equals world data now if you want to just select only one column you can just have a single pair of square brackets but if you're selecting if you're going to select more than one column it could be two columns three columns or any number of columns which is more than one you actually have to have two pairs of square brackets and inside those you specify the names of the columns so as you can remember the name of the column which contains the names of the countries was actually named and the other column which had the geometrical properties was named as geometry all right so what we do over here is we pick those two columns and then we saved that information into the same variable name so basically whatever the data that we had before on the world data will get sort of replaced by this new pair of columns all right now we can run this and now if I say that world data you see that we just have two columns now you can just make sure that it's just two columns by saying world data dot columns and that will give you whatever the columns that we have inside here since we retain this geometric column even if you say now well data dot plot it will plot the figure for you without an issue and even if I go back to the variables Explorer and if I open this over here you can see that we just have two columns now but by but under the same world at a variable name so that's how you do that now let's say you want to calculate the areas let's say we need to calculate the areas of each country you don't have to do much you can create one one column over here called areas and you can basically use Geo pandas to directly calculate the areas as well now let's say that calculating the area of each country all right what you can do is in order to add a new column you can just specify the name of the variable that you have and over here just inside the single square brackets just include the name of the column the heading of the column which you would like to include in this case I would like to have the name as area because it's quite straightforward and then what you can do is you can just specify the Geo data frames name again and just say area all right now you can run this and now if I say world data dot columns you will see that we have a new column called area and if you would like to display the data you can just say well data dot if you would like to just display the first maybe first 15 entries you can say world data dot head 15 now this dot head is actually coming from pandas now even though this is geo pandas you can actually still use the functionalities of a majority of the functionalities of pandas as well so that's why you can use this without any issue and it will give you the first 15 entries and you can see that over here we have another column called area now you can even visualize this using variables Explorer if you double-click over here you can see that we have a we have an area column but now you might just wonder why the areas are a little bit low now one thing to keep in mind is that originally this this shape file is in wgs84 geographic coordinate system and the unit is actually in decimal decrease so the area that you see over here is not in square kilometers or in square meters because it calculated the area using its original base unit which was in decimal degrees so later on I will show you how to convert this into into a different coordinate system but for the time being I just wanted to show you that we can do this kind of an operation just to calculate the area in a very straightforward manner but calculating the area maybe in square kilometres or square meters we'll do it in in one of the upcoming steps all right now if you if I again just quickly do a plot data dot plot you can see that it's basically covering the entire globe but let's say if you want to get rid of one of the countries let's say in this case I want to get rid of this whole Antarctica continent because it's not really relevant for any of the any of the operations that I'm going to do so let's say I want to get rid of Antarctica so let's see how we can do that so again I'm going to use the same variable name as world data and only this time I'm going to specify the data now all the names of the countries and in this case even the name of the continent Antarctica is under this column name so I would like to say that over here name not equals to Antarctica now the exclamation mark followed by the equal sign is basically the notation in Python for specifying not equals so that's why we say that now let's run this one and see well we can just say well data dot plot as well and let me just get rid of this one so it won't confuse us with two two equal plots or maybe let me retain that so that we can see how how the second plot actually differs from the first one so now when I run this all right now you can see that this one basically before we did anything to Antarctica that plot refers to this one and the second plot which is referred which refers to this figure you can see that the Antarctica is basically gone all right now let's talk a bit about projections now first I would like to know which what sort of a projection is assigned to my current shapefile so let's let me put a title saying that changing projections all right so let's say I would like to know what my current coordinate system is I will create a variable called current CRS and the way to check the coordinate system of a shapefile using geo pandas is basically by saying world data that's your geo pandas geo data frame you just say CRS which stands for coordinate reference system now if I run this a knife I say what is my current CRS we'll see you will basically get the epsg code now if you would really not want to know what this EP SD code refers to what you can do is you can just google and see let me open my web browser no it just googled epsg 43:26 and let me go to the first link and it'll actually provide you with all the information that you need to know about the epsg 43:26 so this is basically the world geodetic system 1984 WGS 1984 coordinate system which is a geographic coordinate system and the units are in in degrees in decimal degrees even if you go to this variable explorer and if you open the geometric column if you actually have a look at this geometry column a bit carefully you can see that these numbers are actually also formed using the decimal degree values for example over here you can see that all of these values are actually in decimal degrees now just keep that in mind for a second now let's go ahead and convert this convert the current projection system projection of my shape file into a projected coordinate system because if you are calculating areas it's better to have a projected coordinate system of which where the where the units are basically meters so that we can get a much more familiar representation of the area beliefs so let's see how we can do that how we can change the projection now one way of changing the projection is by saying world data two CRS now you can see that it also shows us which arguments are necessary now if we know the corresponding epsg code of a project system of a projected coordinate system to which you would like to project your data into you can just provide the argument saying that epsg equals a certain coordinate system now now let me let me search for the epsg code 38:57 now this epsg 38:57 basically refers to this wgs84 pseudo Mercator projection which is basically used by many of the famous maps such as the Google Maps the open street maps the Bing Maps and over here you can see that the unit is in meters so what I'm going to do is I'm going to specify this epsg code I have to provide one more argument called in place now the meaning of this in place argument is that in place equals true that means you're actually changing on the spot to the original properties of this geo world at a world at a geo data frame from being a geographic coordinate system into a projected coordinate system corresponding to this epsg code so that's basically the importance of this implies equals true argument and now let me go ahead and say world era dot plot all right now let me go ahead and get rid of this because we actually don't need too many plots just to avoid us from getting confused right now let's go ahead and run this right now you can see actually the differences between even how the projection itself looks whenever the data is in geographic coordinate system and when we transfer it into this corresponding that coordinate system you can see that the map has been stretched a little bit and if you've been working with Google Maps or s3 Maps I think this I think this map format looks a bit more familiar to you guys isn't it because they also use the same projected coordinate system to represent their maps so even if you now go back to the variables Explorer and go to this geometry data you can now recall that previously all these values were in decimal degrees now you can see that the values has been changed from decimal degrees into meters so that happens when you do the conversion of the probe of the coordinate system now let's say that you want to color this map let's say based on each country now you can do that simply by specifying passing an argument over here called column color it by say we need to we'll see the first five let's say we need to color it by the name of the country so what you need to do is you have to pass the heading of the column name by which you need to get the map colored and you can also specify a color map by passing the argument called C map and there are different color maps such as jet rainbow if you if you have a quick look at the matplotlib documentation you will actually be able to see the full list of different color maps as well alright now let's go ahead and run this command now you can see that it more or less got colored based on different countries as you can see over here in case if you're interested in knowing more about these different color maps you can see that all of these are different color type all of these are different color maps that you can actually use now you can see that we have the name over here as let me try this HSV now if I say HSV instead it should look quite similar to jet but let's see yeah you can see that it's a different color scheme so just like that you can actually have quite a quite a wide array of different color maps to actually choose from so I'll put the link of this in the description as well if you guys would like to just play around and see different color schemes you can do that alright so that's how you change the colors of the plot based on the based on the column based on a specific column so if I go back to the variables Explorer now you can see that previously we calculated the area in the coordinate system of the Geo data frame within in WGS geographic coordinate system now since we have already done conversion from geographic coordinate system into a projected coordinate system now let's go ahead and change this one and recalculate this area so so that the areas will be represented in square meters or i'veeen square kilometers let's try to put it in square kilometers because that figures that figure will be a bit more familiar to us or I would say recalculate the areas thing in square kilometers all right similar to what you did over here you can basically do the same thing but only this time I will divide the whole thing by 10 to the power 1 2 3 4 5 6 10 to the power 6 because if I don't do that the areas will be in square meters which are basically going to be huge numbers so when I divide it by 10 to the power 6 it will get converted into square kilometers and now if I go back to the variables Explorer you can see now the areas are calculated in basically square kilometres which are quite easily readable all right so that's how you do that now let's see how we can add a late and into this adding a legend what you basically have to do is just pass a couple of more arguments to what you did over here so I can still say will data dot plot and this time I'm going to specify the column and we can also specify the seed map and apart from that we can also specify the legend to be equal to true now this is a boolean so we specify that to be equal to true now now here you you remember that for this plot actually I assign the colors based on the name name of the country so similarly we can also assign the colors maybe based on a different parameter now we do not have any other any other realistic parameter other than this area now it will it would have been good if you had another column like maybe I did the population or the GDP or something like that just for the sake of teaching you how to do this let's actually go ahead with with the area column so what I can do is I can specify the name of the column over here which is area and the color map still use the same HSV and now I can just go ahead and run yeah now you can see that we also managed to get a legend over here and accordingly you see that the the colors the colors in the plot also got changed that's because now each and every color that you see over here basically refers to a to a value which is the area in square kilometers but if someone were to look at this map for the first time they wouldn't have a clue what these numbers are isn't it so it would be actually good to have some sort of an indication for the legend as well so you can do that simply by passing another argument called keywords and over here you can specify a label for the legend so so let's say my legend with my label over here is going to be area of the country in in square kilometers all right now you can just run this one and see how it looks all right now you can see that we managed to sort of get an Indy of what this legend basically refers to the area of the country in square kilometers you can even change the size of the plot by passing another argument called fig size over here let's see if you want to sort of increase the size of this figure you can simply do that by passing another argument called fig fig size let's say 7 by 7 and now if you run this you see that we actually got the the previous figure sort of stretched out a little bit but as you can see over here the legend is actually not sort of fitting very well when compared with the map itself so I would like to see how I can resize this legend so you can actually do that quite easily using our using the MPL toolkits off of matplotlib library so what I'm going to do is I'm just going to first import SP LT from MP L toolkits access grid 1 I'm going to import make access locatable so let me go ahead and say that what we are going to do over here is resizing the legend you see that we imported the the pie plot object of matplotlib library SP LT so so what I'm going to do is actually I'm going to use the method called sub plots of pie plot object in order to make this plot so the conventional way of doing it is actually by specifying the figure and also the axis and now we call the PLT subplots method now by default the subplots method actually creates a figure and creates a number of rows and columns now by default if you don't say anything it'll look something like this which basically refers to the n number of rows and the columns is actually one by one so I'm just I'm not going to pass any argument over here instead I'm just going to pass one argument called fig size and I'm going to retain the fixed size to be the same which is seven by seven and then I'm going to specify another variable called divider and this divide is now he going to be equal to make access locatable what you imported over here all right now I'm going to specify another variable called CX up in axis now you can specify the position over here either you can put maybe left or right or bottom maybe we'll put in the left side and you can also specify the size let's say 7% all right so finally we are going to plot this world data dot plot now instead of typing everything again I'm just going to go ahead and import these arguments like this and specify that the axis is equal to ax and specify the CX which is basically the access into which the color bar will be drawn and that one I'm going to specify as C X over here and now let's go it and run this command all right now you can see that the legend actually got resized but it seems like putting the putting the legend on the on the left side not a good idea isn't it so let me just go ahead and change this to be on the on the right side and let me increase the figure size to maybe ten by ten all right so before I wrap up the tutorial I would also like to pass another argument over here which is called pad which basically refers to the to the fraction of the original access between the color band and the image axis so I'm just going to pass a value of maybe about 0.1 so what it's going to do it's actually going to create a small gap between this map and a new caliber so it so it won't look as if this caliber has been attached to the map just it creates a small gap fitting so we'll run this one and see how it looks all right now actually the map looks quite quite alright so I guess that concludes this tutorial of course there are so many other things that you can actually discover by yourself especially by reading through the matplotlib library how you can incorporate those functionalities into into making your map more meaningful and also more presentable but the main intention of this tutorial was just to provide you sort of a sort of an outline so that you can see what what sort of things we can do using this Geo pandas library and the plotting capabilities of the matplotlib library so thanks a lot for watching guys I'll see you in the next one
Info
Channel: GeoDelta Labs
Views: 18,883
Rating: undefined out of 5
Keywords: geopandas, pandas, gis, geospatial, arcGIS, QGIS, processing, python, overlay, intersect, shapefile, Json, matplotlib, pyplot, geographical data
Id: IdxL5NZ7h_c
Channel Id: undefined
Length: 28min 28sec (1708 seconds)
Published: Tue Mar 31 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.