Calculating Zonal Statistics of a Raster using Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi guys welcome to another tutorial in this tutorial i'm going to talk about pretty much everything that you might need to know when you try to calculate spatial statistics or zonal statistics of uh of a particular raster based on some given polygon boundaries and we are going to use python to do that but before proceeding into the tutorial i would like to give you some insight to the data that we are going to use now i'm going to use qgis for a moment just to just so that i can quickly visualize the data i'm going to use rainfall data actually to to perform this special statistics calculation now this rainfall data actually chirps gridded satellite rainfall data and i have downloaded a cluster which falls on portugal so i'm going to use portugal as my example today for this tutorial and you can see that i have downloaded this particular data which corresponds to april 15th of april 2020 and i have a shape file which corresponds to the districts of portugal so what i'm going to do today is to calculate let's say the the average rainfall which has fallen on this particular day over each of these districts so what we do is we're going to sort of take the average of all the pixels which falls within each of this district and we are going to use python in order to do that and this is just for one rust now if i show you some additional data which i have over here you can see that i do have the rasters actually starting from the 1st of april all the way up to the 30th of april now we might be able to actually perform some other statistical calculation based on all of these data as well but uh for the first part of this this tutorial let's focus only about calculating the special statistics just for one raster and later on maybe in the second part of the tutorial i might show you how to deal with this kind of a higher number of rasters so that actually you maybe sort of extract all the data from all the rasters and perform some sort of advanced statistical calculation which takes into account time series data as well all right so to get started first i'm going to minimize this qjs window because we won't be using qgis actually for this uh we will be using python now if i go to my file explorer you can see over here i have the rainfall data rasters over here and i do have the the shape file which corresponds to the districts of portugal so for this tutorial you're going to need a couple of libraries you're going to need the jio pandas library and you're going to also need the raster io library if you do not know how to install this library i have created tutorials for each of them and in addition to those two you also will need another library called raster stats now if you haven't installed the raster stats it's quite easy as you can see over here you can just do a simple pip install raster stats installation what you can do is you can go to your anaconda prompt right click and run it as the administrator and from there you can simply just copy this and right click over here and you can paste it and when you hit enter it will get installed now you can see that uh the requirement has already been satisfied because i have already installed this particular library but if you in your case if you haven't done so just go ahead and install this on first just it's going to take a couple of minutes that's it so let's go ahead and import these three libraries first i'm going to first import jio pandas and then i'm going to import raster io and after that i'm going to import raster stats these three libraries we can just run this one and make sure that we do have all the libraries imported without any issue and next what i can do is i can read in the read the district's shape file so you can simply read a shape file using the geopandas library by saying let's say i'm going to call this as districts that's going to be equal to g gppd dot read file and over here you can specify the path basically to this folder and the name of the shape file is districts dot shp all right now you can run this and if you want to have a quick do a quick visualization you can say districts dot plot and that will plot out the districts uh for you as you can see over here all right now next let's go ahead and import the rainfall raster which we have which we selected now as you can see if you can recall you can see that i have actually about 30 rasters over here i'm just going to select one date maybe let's say 15th of april and i'm going to do the special statistics calculation for this particular raster first so we can use the raster io library to read in this raster all right i can call this as maybe let's say rf and that's going to be equal to rasterio dot open and similar to what we did before we specify just the path to that particular file and i'm going to specify the file name over here which is going to be 202415.t and over here we also can specify the mod which is reading and now we can run this all right now if you would like to do a quick visualization of this rainfall raster what you can do is you can simply go ahead and say from rasterio dot plot import show and from here what i'm going to do is uh let's go ahead and run this first yeah and after that you can just say now since we have imported this show over here you can just say show and specify the raster which you imported which is this rainfall raster which is rf and now you can see that we actually managed to do a quick plot of the raster itself as well now before going into the into the zonal statistics part let me just take a couple of minutes to actually show you how to maybe in case if you would like to plot this shape file on top of this raster you can even actually do that quite easily let me just go ahead and show you that one as well let's say plotting the rust and the shape file together all right so for this what you can do is first you have to actually import one another library which is the matplotlib library i'm going to import it as plt and after you have done that let's just go ahead and run this first and now what you can do is you can simply say figure and then you have to define an axis and that's going to be equal to plt dot plt is what you imported from here pld dot subplots and over here i can just put one by one because we are just going to have one subplot in case if you happen to have maybe two or three subplot you can actually specify that over here but in this case in my case i'm just going to have one subplot and just think of this as sort of a creating the the platform for you to plot your graphs on something like that you can you can just imagine that in in that kind of a way since you imported this show over here you can just say show this rf raster but over here i'm going to say the axis on which which i would like to show this rf raster is basically this ax that i defined over here and maybe i can specify a title maybe i can say title equals rainfall all right this one is corresponding to the raster and next what we can do is we can also go ahead and plot the shape file which we imported over here and normally the way that we would plot a shape file is the name of the shapefile dot plot now within these brackets you have to specify the axis on which you would like to plot and as you might have guessed by now we are going to plot this on the same axis as this so that's why i'm going to specify this axis to be the same as the ax value over here all right now let's we'll add one more argument plt dot show and after that you can go ahead and run this and as you can see over here even though it's not that clear you can see that the shape file now has been plotted on top of the raster now we can do a couple of amendments actually to make this more visually pleasing now one thing what you can do is you can actually get rid of this fill color and you can just only have the boundaries of the of the shape file of the district shapefile now to do that you can say you can pass one argument to this you can say the face color equals none and at the same time you better define an edge color so each color would basically be the boundaries the colors of the lines which specifies the boundaries of the districts and by looking at the color of this rasta i think it's better to maybe assign a color like yellow so that it'll be a bit more obvious and apart from that i think uh i'm good to go so let's go go ahead and run this yeah now you can see that we managed to plot the the district shape file on top of this rainfall raster now if you need to increase the figure size you can actually do that as well you can specify pass one argument over here called fig size now just keep in mind that you're passing this figure size argument to this main platform which you created on which you're actually adding more plots now just think that you have another shape file which you need to add what you can do is you can actually specify that third shape file of that second shapefile over here as well and then and then when you specify the a axis to be equal to this axis it's going to get plotted on top of this one so that's how basically that works so as i said you can specify maybe the the figure size to be 10 maybe i can add the figure 10 by 10 figure size and let's try to plot this one and see how it looks yeah it's a bit larger than i expected but yeah i think now you get the idea you can see clearly the boundaries the state the district boundaries and you sort of have an idea how the special variation of the rainfall actually happens around this area all right i'm going to put this back to the default mod in which the figure size is will be not specified by me all right now if you would like to also have some idea about the the raster values of this particular raster you can even create a histogram now let's say if i want to create a histogram right beside this graph right beside this map actually that's possible because now over here we are using subplots now instead of using one by one subplot i'm going to use a one by two subplot which actually opens up a whole new area for me to sort of uh add some more things add some more information or even add a completely different map in addition to this show i'm also going to import one more thing over here from raster io dot plot i'm going to import show hist basically what this choice does is it's actually showing the histogram of a particular raster that we are going to assign for it if you would like to add in another figure over here what you can do is you can actually specify two axes let's say i'm going to call this one as access one and another one as access to and this is how you would specify that and i'm going to put this axis one to be the axis of the main rainfall raster and the district shape file now if i plot this one you will see that it actually opens up some area for another for another map which in which i'm going to sort of embed the histogram if you want to plot the histogram what you can do is since we already imported this show hist over here you can say show histogram and i'm going to call for the histogram of the rainfall raster and i can even specify a title let's say i would write histogram over here and now i'm going to specify the axis and in this case i'm going to specify the axis as ax2 which is actually corresponding to this empty space and in this case maybe let me also go ahead and specify a figure size something like 10 by 4. all right now let's run this one and see how it looks yeah now you can see that we have the rainfall rust over here on top of that we have the district boundaries and over here we have the histogram so this histogram basically gives you some ideas about the the pixel values the density of the pixel values and also you can see that when you when it comes to higher rainfall values such as let's say somewhere over here around 70 something you can see that the histogram shows that the frequency of that kind of value is actually quite less so that's how you basically interpret the histogram before you move into any further analysis it's actually a good way to sort of get a get a hang of the of the overall data sort of a summary but uh something to give you a bit of perspective of the distribution of the data all right now let's go in to the calculation of the space of this uh spatial statistics part all right now what you have to do is first you have to create a numpy array based on this imported rainfall dem now if you go to the variables explorer sorry if you go to the ipython console over here and if you try to check the type of this rainfall rust which you imported you will see that it's actually not a not a numpy array by itself so we can actually assign the values into a numpy array so what i can do is i can say assign right i'm going to name the variable as rainfall array and that array is going to be equal to rf.read and i put one over here and if i run this and i can go to the variables explorer you can see that we have one variable called rainfall array and you can see a bunch of numbers like minus nine nine nine nine and if i scroll down over here you will see that you eventually actually start seeing real numbers you see bunch of zeros over here and you along with that you can see some values which actually looks like legitimate rainfall valleys so just believe it or not these are actually rainfall values which corresponds to each raster pixel so we sort of extracted the pixel values from the using the raster io library and we throw it into a num into an nd array over here so that's what this is and if you would like to quickly check the type you can even check the type of the this rainfall array variable and it will tell you that it's a it's a numpy and the array all right next thing that you need to know is to acquire the values corresponding to the affine now i can create a variable called affine and that's going to be equal to rainfall dot transform now where does this come from all right now if i go over here and if i sorry this one is rf rf.transform now if i would if i would like to get some metadata of this uh rainfall raster which i imported over here i can just say rf dot meta and you see a couple of things over here which gets appeared like the data type which is flood 32 the no data value which gets assigned for every pixel which actually doesn't have any data uh is -999.0 and the width and the height is given over here the coordinate reference system epsg 4326 that's the wgs 1984 geographical coordinate reference system and over here you see one key called transform and this transform has these affine values which basically if you're not familiar with this each of these six values actually corresponds to a certain certain things for example this first number corresponds to the width of each pixel and this second number corresponds to the row rotation typically it's it's a zero value and this third number which corresponds to the x coordinate of the of the upper left corner of the upper left pixel and this value basically corresponds to the column rotation which is also typically a zero and this value again corresponds to the height of the pixel and this value corresponds to the y-coordinate of the upper left corner of the upper left pixel so if you would like to acquire a certain type of a data let's say you would like to have some idea about the the no data values what you can simply say is you can say rf dot no data and that will give you the s the value which has been been assigned for the nor data similarly if you would like to know the crs you can say rf.crs and that will show you actually which coordinate reference system uh this particular raster is in and similarly what you can do is you can say rf dot transform and that will give you this a fine set of values now this is the same thing that i'm that i'm actually going to save into a variable called a fine and as you can see over here just going to paste it over here because when we use the raster stats library we will need we will need these two variables in addition to a couple of more variables so that's why i'm going to make these two ready first all right now i can first go ahead and run and then i can say calculating the zonal statistics all right the way to calculate the zone and statistics would be to use this uh raster stats library which we imported and we can use that let's say i'm going to specify a variable let's say i'm going to calculate the mean the average rainfall over each district for this particular day so i'm going to create a variable called average rf which is average rainfall and that's going to be equal to stats dot zonal stats as you can see over here now here we are going to specify a couple of arguments so the first argument that i'm going to specify is the shape file which i imported which is the district shape file as you can see over here and then i'm going to specify the the numpy nd array which is this rainfall array and if you specify that numpy nd array it's important to specify the affine as well and this affine is going to be equal to this fine variable that we recorded over there and over here you can specify which statistics you would like to have in my case i would like to have the mean but let's say if you would like to have other things like such as the minimum value and the maximum value you actually can specify all of those here and it will produce all of these statistics for you guys in case if you need but for my case i'm just going to stick to just the average values and over here i'm going to specify one more argument which is called the jio json out which is equal to true for the time being let's go ahead and run this all right now i can open the variables explorer and you can see that i have some values in this average rainfall variable all right now if i open this average rainfall list you can see that we have couple of entries actually we have multiple entries corresponding to 18 districts and if i go ahead and open one of these entries over here you can see that under properties it shows you the name of the district along with the average rainfall value for that particular district on that given day similarly i can go ahead and open another district let's say this and under properties you can see that it shows you the average rainfall the average of all the pixels sum together and divided by the number of pixels the value accounts to 0.5819 as you can see over here so this is basically how you calculate the zonal statistics now if you put max over here it'll actually instead of taking the mean so in that case this has to be max rainfall and if i open this max rainfall variable over here you can see that now for each district the max rainfall is like this and if i scroll and go to another district you can see now over here the maximum rainfall is 56.27 so i get you get i guess you get the main idea i'm going to flip this back to mean and over here it should be the average rainfall all right now you can see that this list itself can be a bit messy with all sorts of information that we might not necessarily need so let's see a way to actually pull out this data pull out the data that we need from all of these uh information that's going on over here now there could be multiple ways of doing that so i'm going to actually use a while loop in order to extract the data so let's say i'm going to put a comment saying that extracting the average rainfall data from the list all right so i'm going to create an empty list so let me go ahead and name this empty list as average rainfall and you can specify an empty list by just passing two square brackets and since i'm using a while loop i need to have a counter so i'm going to define another variable called i and i'm going to assign a value of 0 for that i value let's go ahead and write the while loop now i'm going to let this loop run while the value of i is less than the total size of this average rainfall list now if you want to know the size of this average rainfall list what i can say is i can say length avg rf and that's going to tell you that it actually has a length of 18. so let's go ahead and put assign this to run as long as the value of i is less than 18 and i'm going to say that take the average rainfall empty list and append what do you need to append you have to append the average rainfall over here and i'm going to put the i value which i specified over here and since this is a python dictionary i'm going to explain to you guys this statement this piece of code that i'm going to write over here but for a second just bear with me i'm going to specify argument called properties all right now let's talk about this part so if i say average rainfall over here it's actually going to be a huge list with a bunch of numbers you can actually check the type of the average rainfall and it'll tell you that it's a list all right so the way to specify each argument of the list is by passing the index if i say average rainfall zero it's actually going to return to me the all the information which corresponds to the zeroth index and if i open this average rainfall variable from here and you can see that all all the information that's embedded in this list are actually dictionaries the way to extract data from the from a python dictionary is by specifying the keyword that you see over here this is basically something like a column heading but it's just another data structure in python which is called the dictionary so you have to specify the keys of the dictionary in this case which is which happens to be the properties because the information that i need is under this key keyword called properties so that's why i'm going to specify over here average rainfall 0 and over here properties and that's basically going to return me the name of the district you can see the the name is a bit distorted because of the characters because of the portuguese characters but i'm not going to care about that much maybe let me go ahead and select another one yeah you can see that it's actually retrieving exactly what i need the name of the district and the mean temperature the mean rainfall value so during each iteration this will start from the value i being 0 and over here during the first iteration the value of i will be zero and during the next iteration i would like the value of i to be one so i'm going to tell that after you record this particular line which corresponds to the zeroth index increase the value of i by y one unit you can say that by saying i equals i plus one now in a pure mathematical sense actually this doesn't make any sense because you cancel out the i's and it basically equals nothing equals to 1 but if you are new to programming this is actually the way to specify the variables which which you would like to increase during each iteration all right so what we can do is we can just go ahead and run this and now if i check this average rainfall empty list which happens to be the which was an empty list at this stage it's actually not an empty list anymore if i open this you can see that we actually extracted perfectly just the information that we need it's not messy anymore but you can see that each element of the list is actually a dictionary i think this is good enough for us you can see now each district and its mean temperature value but if you would like to make this a bit more presentable you can even convert this directly into a pandas data frame now if you want to convert that into a pandas data frame you can go ahead and import pandas library as let's say pd and i i can say that transferring the information from the list to a pandas data frame right this one is quite easy you can you can just create another variable called let's say average rainfall of portugal and that's going to be equal to pd dot data frame and what you have to do is simply pass this average rainfall list over here you don't have to do anything else and once you run this and if you say print average rainfall portugal over here and when you run this one you will see that now this one gets printed out and in this column you see the name of the district and over here you see the you get to see the mean the average rainfall of each district and if you would like to do a quick visualization you can even do that you can just say average rainfall of portugal dot plot and you can specify the column names in this case i would like to have the name of the district as the x column name so in that case the name of the column is name one for some reason and the y axis is going to have what this mean and the type of the chart i would like to have a bar chart instead of a line line graph and the kind you can specify as bar and from here you can see basically the average rainfall values corresponding to the 15th of april well it would be nice to add a title as well let's say average rainfall on 2020 april 15th something like that yeah that's about it so you can see that we sort of managed to grab the data from the raster and take the averages for each district and we managed to plot it out like this now now you can obviously make the tweaks to make this graph look much more presentable but i guess this was something additional that you that you get that you got to learn through this tutorial uh because our main objective is sort of uh to teach you how to extract how to sort of get the the special averages the the the zone and statistics based on a given polygon boundary in this case it happens to be the boundaries of each district of portugal alright so i guess you learned something new if you did and if you did like this tutorial uh show your support by hitting thumbs up button and also you can share this tutorial with your friends and if you haven't subscribed to this channel yet kindly consider subscribing because you'll get to see this kind of interesting and helpful tutorials i believe to many of you guys and in the second part of this tutorial i'll teach you how to deal with a number of rasters as i showed you guys before because right now we did this analysis only for this raster which corresponds to the 15th of april but in the next tutorial i'll show you how to sort of uh do a further a statistical analysis for time series data because we have data starting from 1st of april all the way up to the 30th of april so i guess we can go ahead and create some daily statistics as well for the month of april and for any month basically because this rainfall data you can download them quite easily using one of the websites where you can download the chips rainfall data so i guess i'll see you guys in the next one so until then take care guys
Info
Channel: GeoDelta Labs
Views: 5,432
Rating: undefined out of 5
Keywords: CHIRPS, Gridded, Satellite, Rainfall, Data, GIS, Python, how to, convert, rasterio, ArcGIS, QGIS, numpy, pandas, Os, DEM, Raster, vector, zonal, statistics, rasterstats, histogram, python, programming, data, analysis, science, hydrology, rainfall, affine, average, mean, max
Id: VIr-pejky6E
Channel Id: undefined
Length: 31min 59sec (1919 seconds)
Published: Fri May 29 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.