Extracting time series data from MULTIPLE netCDF files into a CSV (Part 6)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello guys welcome to the part 6 of this tutorial series of handling netcdf files using Python now in this part I'm going to teach you how to extract the data from a multiple number of netcdf files into one time series and as you can see over here I'm still going to use the same set of netcdf files which have been using for my previous example as you can see over here I have the files corresponding to 1961 up to 1966 which has the average temperature value over the monsoon Asia region now in part three of this tutorial series I already showed you how to extract the data from one of these files into a into a CSV so we are more or less going to actually follow the same structure of the code but this there are definitely some major changes so I guess you might have to follow along at each step very carefully alright so without further ado let's get started now as you can see now as you can see over here I have the files already copied into my working folder so I'm just going to create a new Python script by right-clicking over here and let's name that module from extracting data from multiple netcdf files alright first of all I'm going to use the blob module in order to detect all the netcdf files which are actually located inside this folder now let's say that you have other files which are not really netcdf files and among those you need to pick only the netcdf files one way to easily do that would be actually to use the globe module which comes pre-installed when you in so install python so you don't have to worry about installing it separately so you can just import the globe module by saying import blob and then you can simply run a for loop for let's say I'm just going to name that variable as files or maybe file in glob dot blob now since my python script is also inside this folder i don't have to specifically provide the full path what i have to do instead is to mention the file because now I'm going to detect all the files based on its extension which is the dot NC extension so I have to provide dot NC extension and before that extension I need to select all the files so that's the way to select I mean not really select but then the way to sort of detect all the files which has the netcdf the dot NC extension would be just to add this stuff now let's say if you need to select a set of files with separate file extension let's say maybe dot py so what you do is you just add dot py over here and it star over here that means it's actually going to detect all the files which has the dot py extension all right and now if I say print file and now if I run this what it's going to do is it's going to print for us all the netcdf files inside the folder isn't it so I'm just going to use this loop to the same loop in order to read in my alright now my next objective is actually to create something like a Python list which contains all the years of this netcdf files but in this case we are kind of fortunate that we have the file by the proper file name in this case 1961 up to 1966 we have no other information so it's directly obvious that our files actually range from 1961 up to 1966 but this might not be the case when it comes to the netcdf files isn't it because I'm only demonstrating this for a set of netcdf files which has been downloaded particularly from one website but but in your case the files might come from different sources which might have different sort of file names so it's always safe to go inside the file and then try to capture the information from from the data which has been embedded in the file rather than trying to pick things from the file name itself so in order to do that I also will need to actually read in the file so let's say I'm going to create one variable called data and as you can recall the weight in it - the way to read this netcdf files would be to use the net CDF for library I hope you guys if have been following the other tutorials of this series you know already how to install this netcdf for library so I'm going to import the Nets CDF for library yeah it should be from let's say before we import data set and over here I can just say that data set and now I have to provide the file file name now as you can see over here during each iteration it actually prints out the full file name from the folder itself so during each iteration I don't have to worry about typing the file name manually I just can say file over here because this file is now actually the file name if I type over here it's going to print out the file name it recorded during its last iteration and this is also a string so I don't have to worry about that so I can provide the file name over here and just provide the mod in reading mode all right now if I run this it's going to read in each netcdf file but it's not going to do anything during each iteration it's basically going to jump from one in Cydia files to the file to the other so right now what it has recorded in this data variable is actually this 1966 netcdf file alright so if you can remember how to explore this netcdf files if you want to see have a look at the variables what you can do is you can say that data dot just keep in mind that now this data refers to this 1966 because it just entered its last iteration I mean the last loop so you can see that data dot variables dot keys as you can see we have the time variable as well so I can just say without saying keys I can directly select this time variable from which I can now get the information about this time variable you can see the long name is time the units are in minutes since 1961 first of January and the shape 365 if it's a if it's a leap year it's going to be 366 over here so instead of taking the file name I mean instead of taking the name of the year from here I'm going to take the name of the year from this file nets let's say this had a different entirely a different name but still we don't have to worry because we would be still accurate if we pick the pick the year from here right all right so the way to select this would be I mean one way to select this would be you can say that data dot variables time which will result in this and then you can say units which will tell us basically whatever gets printed over here minutes since 1961 first of January maybe let me put a put this into a another variable time equals data dot variables time and you can go ahead and run this and now if I say time yeah it's going to give me this description and now if I want time dot units it's going to select it's going to give us this description even from this description I would like to actually pick only this the value corresponds to the year right so what I can do is I can now since this is a this is a string you can do the string operations that we would normally do so let's say if you select everything in the string it's basically give you going to give you everything but let's say if you want to just pick a part of the string based on the indexes let's say you want to pick things from index number for up to index number ten yeah that's basically this part so similarly I'm just going to because all of these are all all of these files are going to have the same sort of structure so it would be safe to actually select it maybe let's say starting from 14 up to [Music] let's say 20 yeah we have some additional parts maybe not go all the way up to 20 I think by around 18 should be sufficient yeah so that means index number 14 up to index number 18 is actually going to select for me this 1966 year right so this one I'm going to go over here and maybe create another variable called year and I'm going to save it and we can run it now all right okay now if you want the first if you want to capture let's say you have maybe 10 or 15 or even 50 years it would be good to put all these years which gets recorded into this year variable into something like a Python list so what I can do is before going into this follow I can create an empty list which has let's say all years and over here I specified an empty list but now during each iteration I can program this to sort of append all the years that has been read into this year variable so let's say during the first iteration it reads here number 1961 from here and what it does is if we say all years dot append and this year value over here that means at first when it comes to this point this all year's variable is actually basically an empty empty empty list empty Python list so what it's going to do is actually it's going to append this year into this empty Python list in this case when you get stuff in when it starts it's going to be 1961 and then during each iteration it's going to append one after another and now if you run this and if you say all yes now you can see that it's sort of collected all the years into one Python list we can even check the type again if you are actually new to this kind of new to a pipe and scripting it's always good to actually check the type of your variables because each because when you talk about the data structures each variable has its own sort of way of dealing with it so if you if you know that it's a pipe in list then you kind of have an idea how to deal with it which might be different from the way that you would normally deal with with a Python dictionary or something like that all right that's why I normally just keep on checking the type of type of variable all right so now what we did using this block of code was actually to sort of record all the years into a Python list maybe I can't comment that out now my next objective is to sort of create an empty data frame which empty a panda's data frame which starts from the lowest year and which ends at the highest year so in this case for example it's it's going to detect the lowest year as 1961 and the highest year as 1966 so when you know the lowest and the highest year it's quite easy to actually create an empty panda's data frame which will be ready later on to receive the the corresponding data once we extract them from each netcdf file so I'm going to make use of this list now to extract the things like start starting here and ending here so starting here you can simply get it using the minimum of this Python list now isn't it for example if you same in all years it's going to give you the 1961 entry and Max would be 1966 so that will come here and now you can create a date range using using pandas so for that you have to import and us SPD and I'm going to create a date branch now this debt range is sort of the date range for the full set of data and that's going to be equal to PD dot to date range and over here we can specify the start that's the start starting date and also we can specify the ending date and apart from that we can also specify the frequency now in this case the frequencies the daily the frequency is daily daily data because you're going to extract daily data but even if you set the frequency to month this date range is actually going to sort of create a range of dates where it goes month by month now I'll show you maybe an example so first let's let me state this starting here over here yes start now if you say only yesterday it's going to pick the starting here which is 1961 which has to be coupled with the rest of the parts of the date which is the first of January and the ending here maybe will let me just comment this out for a second and let me run this and if I say the yes dark is 1961 but if I check the type of it it's a string so in this case since we are trying to add strings to strings it's not going to be an issue but in case if it gets recorded as another format what you can do is you can always convert it into a string from here and then add it so that we can be sure that it gets added without any issues now similarly the ending death is going to be this one is year end and this one is going to be 31st of December and the frequency you can specify the frequency as D which refers to daily now if you run this and then oh yes we have some issue the name year end is not defined oh yeah this it has been a sinus in here yeah now we can go ahead and run this all right now if you have a look at the date range you can see that it's ranging from 1961 up to 1966 with daily intervals right but in case if you happen to specify maybe monthly intervals and you can say that range you can see that it's actually going to get recorded in monthly intervals this is 1961 31st of January and this one is 2nd 28th of February 31st of March so it's actually having monthly intervals which we do not want we need the daily intervals and since we have the date range now it's quite easy to create an empty pandas dataframe so let me go ahead and say DF equals BD dot data frame I'm going to specify just zeros for the time being and oh here I'm going to give a column name since we are since we are trying to extract average temperature data I think it's it's okay to specify a column name like maybe temperature this one is up to you and the index important should be this date range over here all right now you can run this again and if you have a look at your data frame the pandas dataframe you can see that it's it's a fully functional pandas dataframe now even if you go to your variables explorer you can see that the pen does data frame starting from 1961 first of January to 1966 31st of December which has been filled with zeros at the moment which is also ready to receive values from the netcdf files once we reach that stage all right so this one maybe I'll name it as covering the whole range of data all right now I think it would be a good time to specify the coordinates for which you would like to extract your data now again keep in mind that this spot six of this tutorial series which is this tutorial I'm going to show you how to extract the data for one point but using multiple number of files let's say when we finally complete this exercise we will have the data corresponding to one single location but then a continuous time series from 1961 up to 1966 but in the next tutorial I will also show you how to use the same code in order to extract better data from multiple number of files for multiple number of points as well but to keep things simple for this tutorial let's only focus about getting data from multiple number of files and then only use just a manually entered set of coordinates lat/long coordinate for one of the locations again i'm going to actually use kathmandu as my location because i've been using that for the the previous section as well so let me go ahead and maybe comment this out saying that defining the lat for the location of your interest so i'm going to specify the latitude of kathmandu that's going to be equal to twenty seven point six nine seven eight one seven and the longitude of kathmandu is going to be eighty five point nine eight zero six okay now when when i extract the data from the netcdf files now what I'm going to do is I'm going to actually read back again all the netcdf files but you can see that already the files have been arranged accordingly in an ascending order from here from 1961 up to 1966 but this might not be the case all the time isn't it but maybe when you take the fall itself the files might be jumbled up it might have 1964 at the beginning and then 1961 at the end like that so it actually it's not quite safe to let the program decide how to extract the data but said I want to go on the way how we extract the data means which file we read first I won't actually a sort of manually program it in a way that it will first go and read the year which code which corresponds to the lowest number and then it will actually keep on increasing one by one so to do that what I can do is I can take this all yes Python list and if I say sort for example I can demonstrate to you guys how that would work if I say all yes dot sort nothing will happen because it has already been sorted but let's say if I say all use dot sort reverse equal to true and now if I say all yes you can see that now it has been sorted but in the descending order isn't it sixty-six up to 61 but now if you want to sort it in the ascending order you just have to say all yes dot sort and that'll actually sort it back into the ascending order all right so this is just to just to get just for us to make sure that we restructure the list in a way that it's going to start from the lowest yeah and going to end up in the in the in the highest year so that we can run a loop and we can ask the program to sort of iterate through each year but then in the ascending order which is what we need all right now what we are going to do is we are going to extract the data from the netcdf files and before that I'm also going to import numpy SNP all right now as you might guess we need to again run a follow across all these files based on the sorted name of the year so I can save for let's say year in all yes which now has been sorted I'm going to read in the files again data it was and we make sure that it's going to pick 1961 and then leave it as a string this is just precautions over here and we still don't have the NC file extension over here so that's why I'm going to add that manually and we read this now if I run this and if I say data yeah maybe data dot variables time dot units now you can see that during the final iteration again read 1966 so this would be reading in the data and now over here what I'm going to do is I'm going to store all the lightly latitude and longitude data from the variables into two variables called lat and lon now this is again what I'm doing is I'm going I'm repeating the things which I explained in the part three of this tutorial series but never mind let's go go through it again but I'll be just going through a bit swiftly than what I did before in tutorial in part 3 of the tutorial because I explain each and every line of code order so this is basically storing the lat and lon data in two variables let M long data off the netcdf file which means we have to create two variables latitude and longitude similarly to what we did I can say data dot variables but only this time instead of taking time we are going to take lat and we select all the data and it's going to be same over here but over here it's going to be long and the next thing would be to store the state take the squared difference from this provided latitude and longitude for our our location of interest and take the squared difference with this so let me go ahead and create commend this out which is going to be let me say sq di F lat similarly this one is going to be for long and as I explained before the squared difference we can take latitude - the latitude of Kathmandu and make it squared and away similarly we take longitude - the longitude of Kathmandu and take the squared difference as well all right next what I'm going to do is I'm going to identify the minimum of minimum index of the of the latitude and longitude based on that squared difference so that one would be identify the index of the mean the minimum value for lat and long which is going to be equal to min index and that's going to be squared difference lat dr. Eggman and similarly for this one it's going to be the longitude and squared difference of longitude eggman now let's run this one and see how these values look you can see that we have the minimum index of latitude and the minimum index of longitude all right now let me also go ahead and record the temperature data directly from the variable temp equals data dot variables and as you can recall the variable for temperature was t average if I run this and if I say then it's going to have decrease centigrade daily mean temperature data and you can see the current shape over here this one would be X accessing the the average average temperature data even though we did not actually specify the date the whole set of data we are going to make use of this later all right now one tricky part is that since we are again reading the years from the beginning it's actually always good to create date ranges during each time that you actually read the data for example if you are reading in 1962 I wouldn't I want to create the date range corresponding to 1962 starting from 1st of January up to up to 31st of December and the same goes for 1963-64 and so on I'll explain to you the reason why in a minute just keep in mind that now what we do over here is we are again creating a date range for the rate for the data that we are going to read in during each iteration all right maybe I'll specify that as start and the start is going to be let's say if I type here over here in this case since it's the last iteration it's 1966 right so what I can say is I can say yeah but I want to make sure that I have the year in string so I converted in the string plus similar to what we did over here when we create the full range of data when we keep create when we created the date range for the full range of data I can do the same thing over here and I can say end and this one is going to be 1231 right all right and then I'm also going to create another D range a date range but only this time it's going to correspond to each year during each iteration it will create a D range date range for let's in 1961 from 1st of January to 31st of January and in the next iteration it will create a date range from 1960 to like that unlike what we did over here was to sort of create the the table which was going to be ready later on to receive the data all right so what I can do over here is I can say PD dot date range similar to what we did back then and now I can directly say the starting date is actually going to be this start this variables is referring to this variable and n is equal to end and the frequency is again going to be equal to daily all right now if I run this and if I check yeah now we have so many variables but if you just try to check the D range you can see that this one is now corresponding to only the date range which the loop read during its last iteration unlike this main DF which has now actually sort of structured the whole range of data now we have come almost to the final part of the tutorial maybe let me comment this out saying that creating the dead creating the time range or maybe the date range for each year during each iteration and this time I'm going to again use a for loop now just keep in mind that we have one variable called the range and now I'm going to iterate through each and every element of this D range for loop that means we're actually going to do a nested loop so in the main loop it will first read in the data corresponding to 1961 and once it creates the D range for 1961 in which the B range is basically going from 1st of January of 1961 up to the 31st of December of 1961 I'm going to look through each and every element again of that D range separately which means we can do something like this for the time being I'll just make a temporary variable called X in all right so over here what I want to do is I want to extract the index which corresponds to each time slot over here for example this 1961 just imagine that this is 1961 because it's corresponding to the first iteration isn't it so let's say that this 1961 first of January is actually corresponding to the 0th index and 1961 second of January is corresponding to the first index so like that you can see that we have a number of entries which is 0-1 all the way up to let's say 1960 the 364 because we started from 0 and it's going to end in 364 because we have 365 elements right so now what are going what I'm going to do is I actually want to extract the index of each time element over here I can do that by doing something like this first I can check the length of the D range and the length means the total number of entries is 365 and if I say that I want to create an umpire array which starts from 0 and goes all the way up to the highest number of elements of this day range you can see that it's creating a numpy array starting from 0 up to the highest number of elements over here 364 now this might change in a in a in a leap year because in in a leap year there will be 366 entries and over here the number would be 365 but of course now we are trying to automate things so we don't want to take the responsibility of typing things out that's why we're sort of structuring this in a way that it does the way does the job for us automatically whenever it passes through through through a leap here all right so instead of naming this so what I'm going to do is I'm actually going to iterate through each and every index time index number over here and for that I can use this as the thing which I'm going to iterate inside and just to make sure I mean just to be a bit more clear instead of writing X let me write something like T index because it's going to be the time index all right now if I come back to this variable which we had over here which is temp and now you know that if you want to access the data based on this current shape what you can do is you can say temp and first you have to specify the time index all right which is 0 and then you have to specify the corresponding latitude and longitude right so let's say in this case we are corresponding latitude and longitude is actually going to be minimum index latitude which again is this one 170 and the minimum index longitude is 1 or 1 right so this one is minimum index longitude now these are the points which actually corresponding to the kathmandu location location of kathmandu yeah sorry this one needs to be specified in square brackets I made a mistake there and 0 over here means refers to the 0th index time index and as you can see it's going to result at us with a must array and over here you see the data point is actually 9.8 and if you want to do the same thing let's say for maybe index number 12 you can see that it has a different data point and similarly if you think that because we know that now right now it's taking the data from 1966 year and you know that in 1966 we only have how many data entries are 365 which means the highest index is going to be 364 but just for the fun of it if you try to say 365 over here it should result in an error because it doesn't have the 365th element so that's why the highest is going to be 364 and now it's going to return the corresponding temperature value of the 364th entry which is basically going to be 31st of January isn't it in 1966 so what I can do is I can actually use this line of code over here and instead of manually typing the index now actually my time index is going to be this one whatever it gets recorded during each iteration and now I want to record this data into my main the large data frame which we created over here corresponding to each bit entry if I open that you can see that let's say when we are reading in the in the netcdf files during this iteration you know that it's going to read in data from 1961 up to 31st of January 1961 right so after that it's going to run its second loop so if we find a way to sort of assign the data during each iteration let's say that when we are reading the data from 1961 it detects that we have 1961 first of January over here and it puts the corresponding temperature value into this entry and similarly once it fills that it comes to 1962 and then let's say it finds out that this entries al corresponding to the seventh month and 8th day and it's putting the corresponding temperature value over here so we can assign that by saying D F dot lock location again if I say B F over here you see that we get that full data frame and if I say D F dot location and let's say if I want to I can select rows based on the dot lock so let's say I want to know about the location which is third of January all I have to do is to enter this as a string and it gives me that on 3rd of January I have a temperature value of zero zero similarly I can put any location let's say I can say 1965 April may be 20th again it's going to pick me that value and then say that the temp we have only one corresponding entry which is temperature and the value is zero and now if I want to access the value directly of this temperature column or what I can do is I can say temperature and now it gives me the value directly as zero now let's come back over here and say that we want to select 1961 first of January and the temperature value is 0 not only that you can make assignments to this let's say if you say that this temperature value I want to be 100 now all right now if you now go back to your DF and you can see that the first value has been changed to 100 right so in this way what I want to do is I want to during each iteration I want to sort of iterate through each and every date and change the corresponding value not by 100 but by this values I get I guess you get the idea of what I'm trying to do so instead of putting a solid value over here what I can do is I can say that the F dot and now I will say that D range and in the D range I want to have the T index and in that we want to know the temperature we want to access the temperature temperature values is this clear for you guys if not maybe let me comment this out for a second and let's say that ok print T index which is just going to be a set of numbers from 0 up to 364 and now if I say that the range you see that we have the date range corresponding to the final iteration and if I say that D range T index you can see that it's selecting the final timestamp which is corresponding to the 31st of December so that's what we are going to do over here now instead of typing this I can directly say so basically it's going to select during each iteration 1961 first of January 1961 second of January so on all the way up to 1966 31st of December so now what we can do is maybe instead of writing print if we want to note what's actually happening in this loop we can always do something like recording the value for this is just for our own check which is actually completely unnecessary you don't have to do this one but if you if you if you do something like this you will be able to see actually what's happening inside this loop I can say that the range and key index all right so just focus your attention to this when I write when I run this script you can see that now it's saying it's recording where lives 1963 64 and 65 66 yeah it's done just in a matter of a few seconds isn't it I think it didn't even take five seconds to do that and now if you go to your variables explorer and if you check your DF you will see that now it has values which has been recorded from 1961 all the way up to 1966 corresponding to each year if you want to do a quick visual illustration of this data you can always do the F dot locked and that will plot the temperature data you can see that it shows the seasonal seasonal fluctuation isn't it in 1961 at the beginning the temperatures are around ten seven point five ten around that and by the middle of the year around June July you can see the temperatures actually rise up to 25 about 25 degrees and again it falls back down during December January February time and again it goes back up so this is basically what you wanted to do you map you merge the data from multiple sources into multiple files into one pandas dataframe and now we are going to do the easiest step which is actually select saving this data frame into a CSV and let's say I can specify the path to be not the path you can even provide the path but I would like to save it in the same folder which I have the data over here in which case you don't have to actually provide a path you just provide the file name let's say I would say temperature Katmandu from 1961 up to 1966 dot CSV a bit of a long file name and you can run this and by the end of it you will see that there will be a file which gets generated into here yeah we can open this and have a look at the data for ourselves even you can do a quick plot insert yeah so these are the temperature data the seasonal fluctuation of temperature in Kathmandu so that concludes the tutorial this tutorial for you guys now I'll see you in the next tutorial in the next tutorial I will be showing you how to do this kind of plot from for multiple number of points let's say you have either geographical coordinates or you have a shapefile or something like that which has coordinates for multiple number of coins let's say for example you have the coordinates of multiple cities from different countries and you would like to create this kind of a CSV file which has the temperature data from all the data into into separate CSV files like this so I'll teach you how to do that in the next tutorial so I'll see you guys in the next one
Info
Channel: GeoDelta Labs
Views: 9,927
Rating: undefined out of 5
Keywords: CHIRPS, Gridded, Satellite, Rainfall, Data, GIS, Python, how to, convert, rasterio, ArcGIS, QGIS, numpy, pandas, Os, DEM, Raster, vector, APHRODITE, APHRODITES, APHRODITE'S, Japan, monsoon, asia, netCDF, multidimensional, data, xarray
Id: M1rXwFZyzC4
Channel Id: undefined
Length: 44min 19sec (2659 seconds)
Published: Fri May 08 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.