Extracting time series data from a netCDF file into a CSV (Part 3)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi guys welcome back to the part three of this handling netcdf files using Python tutorial series so in this part of the tutorial I'm going to show you how to open those downloaded netcdf files using Python and in addition to that I will also show you how to access the data from those netcdf files and and how to extract those data from the netcdf files into a time series and how to write those data into into a separate CSV file so in order to get started you first actually have to install this netcdf all Python library this library actually makes us quite easy to work with netcdf files so I assume that you guys still haven't installed in an CDF for package so since we are using the anaconda prompt we can actually just go to the our Start menu and search and search anaconda prompt right click on it and you can run it as the administrator and once you have done that you basically have to type this command in your command prompt in order to install this netcdf package so I'm just going to go ahead and type pip install netcdf for and press ENTER so after a while it will show you that the package has successfully installed so once you complete that part you can actually uh you can go ahead and open your pipe Spyder IDE now again for this tutorial I'm actually going to use the Spyder ID which comes pre-installed when you install the anaconda distribution so if you have not installed the anaconda distribution yet I suggest you to check out the tutorial which I will put link down in the description below explaining how to install a panic or no distribution through which you will also install these spider ide alright so now let's get started by importing the netcdf library so after installing you can actually do a quick check just to just to check whether you have successfully installed the library or not so you can just check it by typing import netcdf and when you press enter if you did not get any error that that's actually a good indication that you have configured the library quite successfully so now if you recall what we did in the part 2 of this tutorial series we actually downloaded these netcdf files from from from the aphrodite's web portal and these netcdf files actually contain temperature information average temperature information for the monsoon Asia region so now let's try to open one of these files we have the information for the Year 1961 all the way up to 1966 to get started let's actually only focus about focus on only one file which is let's say 1961 and using the package that we installed the netcdf for package now let's try to open one of these one of these files now before we do that I would like to first navigate to my fall which I'm going to work in so that's basically going to be the same folder to which I have downloaded the netcdf files and I'm going to paste that path over here and press Enter and when you go to file explorer it will show you actually now you are already your current working directory would be over here so I'm just going to open right-click over here and then create a new pipe and file and I'm going to and I'm going to name this ass extract netcdf all right so the first thing that I'm going to do is I'm going to import the data set module from the netcdf library so I can do that by simply typing from netcdf for import data set certain parts of this tutorial I will also be using the name numpy library so I'm also actually going to import the numpy library directly from here all right now you can just run these commands by simply hitting f5 or you can click on here and now you can see that when we run we don't get any error that's sexually good indication now I'm going to directly import the file which corresponds to year 1961 so I'm going to create a variable called data and that variable is going to be equal to data set this is referring to this module which we imported through the netcdf for the library and my first argument is actually going to be my file path so actually my file path is this one so I'm just going to copy it over here and add one more backslash and the file that I'm going to read is actually 1961 NC so 1961 dot NC and here we are going to specify that we are just going to read this file all right now you can just run this command to see what happens all right now as you can see the command got executed we got no errors or anything like that what I can do now is actually I can just type print data and when I hit f5 over here you can see that this data variable now it got printed over here now before actually going into these details I just want to have a quick look of the type of this data variable so what I'm going to do is I'm just going to type type over here and ask me to give the type of the data variable and as you can see over here it shows that it's a netcdf for data set all right now let's have a look at these things that we get over here when we directly print this data variable if we look at this line it actually shows us some information about the file format the file format is hdf5 and also here we can see something quite important to us the title of this data is actually Aphrodite monsoon Asia daily temperature with 0.25 degree grids and also it shows us the dimensions sizes and over here it also shows us some variable so before going to any any further steps let's find out actually what's what exactly has been embedded in in this inside this netcdf file so I'm just going to get rid of this statement over here now one more thing that you can do is actually you can precisely call for the variables which are being stored inside this netcdf file using this command you can type data dot variables dot keys and if you print this one now here actually you are able to see the different variables which are being stored inside this inside this netcdf file so they are basically longitude this latitude time ta ve and rst n so now let's go a bit more in depth and see what actually what exactly these things mean so so now what I'm going to do is actually I'm just going to sort of have a look at each of these variables one by one so let's do something like this data dot variables and now instead of calling for the keys I'm actually directly giving the name of this variable in this case it's in this case it's lon all right and I'm going to save this to a variable called L or n longitude and after that let's sexually print lon and see what it gives us so for the time being I'll actually get rid of this one otherwise it'll be a bit confusing all right now you can see that everything that actually got printed over here is is from this print command that we did just right here so over here you can actually see it tells us some information about this longitude variable so it says actually the units of this longitude variables is decreased East and the long name is longitude and the current shape is 360 so so another way to actually directly access this long name which actually gives us the indication what exactly this shortened version of the of the variable you can actually just print long dot long underscore name and when you print this they actually tells you this variable refers to longitude now if I do the same thing with latitude all right now let's erase this back and let's let's run this command again print data dot variables dot keys because I want to have a look at the keys again so there's another variable called latitude so let's actually create another variable called latitude that's equal to theta dot variables and this time I'm going to specify l80 here and then I'm going to print lat dot long name and when I run this one it actually shows me that we are we are talking about the latitude over here similarly if I want to know about no more information about time what I can do is I can actually replace this by time and let's directly print time and have a look at the information so you can see actually the long name is time which was quite clear and now the units are in minutes since 1961 0 1 0 1 so it's actually the first of January of 1961 and the units by which the time has been specified in this particular netcdf files is actually is in minutes so if you start from 0 then the first day has been specified after a certain number of minutes which corresponds to the amount of minutes per one day and for specifying the second day the amount of time is actually twice of that and so on so let's go to this third this fourth variable t a ve I'm not so sure what it means right now but a ve and here we print tive okay now let's run this command and see what happens now you can see actually this da ve refers to daily mean temperature analysis interpolated on to zero point five zero point two five degree grids now this is actually a much better representation of this ta ve right so it actually means the daily mean temperature and the units the units are in degrees centigrade so now you can see how powerful this netcdf files can be because it's all it's sort of stores all these different types of information in just one just in one file so that's why it might be a bit tricky to access it but once you know how to access it properly things are going to get quite easy actually and finally I'm still actually curious to know what this our Sdn means so we will actually have a look r st and what it what this RS Thien means all right now let's run this command and now you can see the long name of this R Sdn is the ratio of 0.05 degree rates which station and unit sign percentage so this probably might not be one of the variables that we are mostly going to be interested in for our analysis actually it's mainly going to be this variable which is the daily mean temperature all right so I think now you got the basic idea about the structure basic idea about the different variables that has been stored in this netcdf file so before going to the next steps actually I'll just take a couple of minutes to comment this out because I'm going to share this script with you guys as well so later if you want to try it out by yourself actually you can use this as a guide so I'm just going to make it a bit more organized by adding some comments and this one would be displaying the names and over here this would be accessing each accessing the variables now here I'll just include a couple of examples like we check this for longitude and also we check this for latitude and also we check this for our next variable which was time and finally there was another variable actually there were five variables but we are only interested in the first four that was tive 50 average all right now if you just run this command you will be able to see now you just have to actually look carefully in order to district distinguish one section from the other so the first one is actually referring to the longitude part so here you can see that the long name is longitude and the units I in degrees east and you can see that the current shape it means the size of this variable is actually 360 and when it comes to the latitude the unit is in degrees north and the size of the variable is 280 and when it comes to the variable time the shape is actually 365 so I think you have you can get the idea that it means that this netcdf file contains 300 information that that's that's corresponding to 365 days but the units are actually not in days the unit sign minutes and finally you have another one called the the the daily mean temperature where the units are actually in degrees centigrade and the shape of that particular variable is actually 365 280 360 so before moving on to the next part of the tutorial let's talk about actually what this one means now you know now in order to have a look at the damage you can actually directly use the name of the variable for example we had a variable called longitude so we can actually type in over here ello n dot dimensions and this is the dimensions of the longitude variable similarly if you check for latitude still going to be the same it's going still going to have this have a similar shape and when you check for time and the shape or basically the dimensions of time is actually also quite similar is similar to the latitude and the longitude but now let's have a look at PTV which is the average temperature alright now you can see that the dimensions of the T average is actually being defined using the time the latitude and the longitude so how do you think that we can interpret this one now if I if I go ahead and run this command again you can see that if you look over here it shows that the current shape is 365 to 180 and 360 and also if I type da ve dimensions you can see actually this time is referring to this 365 the latitude is referring to 280 so longitude is referring to 360 so if you actually still new to the structure of these netcdf files this might be a bit confusing still so I'm just actually going to use simple very simple diagram in order to explain to you how this average temperature data has been stored in sort of a multi-dimensional array using these netcdf files all right so let's try to visualize in this manner if I just open Microsoft PowerPoint I can actually be able to draw a simple diagram that looks something like this so roughly to give you an idea over here it says actually the latitude so according to our downloaded netcdf files and as per the information over here it says that we have the dot the size of our latitude variable is actually 280 so that basically means is that we basically have 280 different lines which I have marked in in red color so over here actually I have only showed you six lines but just because I cannot draw 180 lines but you get the idea the amount or the size of the latitude variable is to 180 similarly we have 360 longitude lines so if I color this I'll keep the color to be blue so these vertical lines you can see that according to this it says that it has the size of the variable is 360 so we have actually 360 different vertical lines and 280 different latitude lines so how about this time time dimension so if I make a copy of this and if I move it in this manner and if I move it again in this manner and if I add some text like this T equals one and this one T equals four now you can see that when we talk about the time dimension we actually have 365 so now you can get the idea that for each of these 365 files so maybe I'll put an arrow like this and I will specify over here we have 365 files so each of these file which corresponds to each day now in this case as you saw over here it starts from 1961 first of January so now you can imagine that each of these 365 days means we have files all the way up to 1961 31st of December so this is basically how we visualize this multi-dimensional arrays so that's how the data has been stored in in this netcdf file which is actually a quite efficient way to store quite complex type of data so now I believe that you actually have some sort of a clear I didn't I idea about how the data has the how the structure how the data has been structured in in this netcdf files so now let's go to the next step and have a look at how we can actually access this real data which which has been stored inside this netcdf file so i'm just going to create another heading over here called accessing the data from the variables all right so first of all let's try to see how we can access this time the data corresponding to time so I'm just going to create another variable called time data and that's going to be equal to data dot variables similar to this and I'm going to specify which burial which variable I'm going to request for that's time and over here by simply putting a colon you command the program to actually give all the available data for that particular variable so if I say over here print time data and if I press ENTER and for the time being I can actually deactivate this print command so that whatever gets printed will not be that confusing to us however later I will activate those alright now I'll run it again all right now you can see actually how the data itself looks like so over here you can see we have a zero we have one four four zero two eight eight zero now just if you can recall you remember that the information pertaining to the time variable clearly stated that the unit which they have used in order to store to store the data is minutes so that means this is actually referring to the zeroth minute this is referring to the 100 1000 440th minute and if you if you just care to calculate how many minutes per are there in one day it actually happens to be 1440 minutes per one day so this is actually the twice of that so this actually refers to one full day and this is the second day this is the third day and so on and you might be able to guess that this might be the this could be the 365th day so that this is how the time variable actually works so similarly if you would like to have a look at how the maybe the lotted longitude or the latitude variable works maybe we can have a look at that two longitude data in that case it should be L or N and now here the variable name should be the same right now we can go ahead and run this all right now you can see that after the time data it starts printing the longitude data so the longitude data is actually starting from sixty point one two five and increasing all the way up to one forty nine point eight seven five now these are in decimal degrees so now I think you got the basic idea of how to how these different data has been actually stored [Music] and if I run this right now you can see that this first set this this first set actually refers to the time information next and the next set actually refers to the longitude information and finally this is actually how the latitude information looks starts from minus fourteen point eight seven five decimal degrees all the way up to fifty four point six to five so so I think we pretty much covered all the basics of how to access the netcdf or how to read in the netcdf files and access the the basic data that actually that are that are being embedded inside these netcdf files so in in the next part of this tutorial I'm actually going to show you how to extract the data in the form of a time series and save those and save that data into into a separate CSV file I think that's actually one of the interests of most of us so in order to do that actually I have selected one particular point in Asia you can see actually I have a KMZ file which is a Google Earth readable file which has been placed in Katmandu based on this based on the data that that I have downloaded from the netcdf file I'm actually trying to generate a time series of average temperature over cut mundo for the Year 1961 and I'm going to save that time series into a CSV file so if you would like to actually know the exact coordinates of this of this location you can just directly go over here and then go to the properties I know here you can see what is the latitude and what is the longitude even though it's not that obvious when we look at it like this this latitude and this longitude and latitude information is actually stored somewhere in here for example for example this these are my longitude data until this point so here you can see actually my longitude believe is eighty five point three two so it should be somewhere over here you can see we have eighty five point six eighty five point three so it basically should be somewhere over here and when we look at the latitude its twenty seven point six nine which should be somewhere twenty seven point six nine should be somewhere over here it's twenty seven point eight twenty seven point six now one might think that we have a very precise latitude and a longitude value over here but over here you can see that the waves do not match exactly to the value over here so in a case like this what we can do is actually we can find out which which one is the closest latitude and longitude in terms of the data that we have over here because these data are sort of graded data so we can actually find the closest grid to which these latitude and longitude data falls into so since this is actually the introduction part of the tutorial I'm just going to leave this script as it is and I'm going to actually create a new script which I think that would be much clearer for the for you guys when you as you are watching this and also if you are trying to replicate the exercises which I am doing over here so let's create a new module and I'm going to name that as extract netcdf into a time series all right a bit of a long name but that should be fine all right so now let's go back to this file and I'm going to import these two libraries again now you can see if I open this variable Explorer we still actually have some variables from the previous exercise so you can actually go ahead and clear all these variables just by using this button remove all the variables that will remove this one and oh here you can actually just clear the working space as well by just typing clear and that should give you a clear workspace to to work in so similar to the previous step I'm actually going to again reading the data from the same file so I'm just going to create the same variable which I created before and now you might be able to recall how we read in the data directly data set first what we do is we specify the path directly as it has been done over here I know here we are going to state that we are going to read the file all right now we can run this alright now what I'm going to do is I'm going to create one variable called lat we can actually go to data dot variables you know here we specify the name which is lat and over here I'm selecting all the data from the netcdf files basically this step is equivalent to what we did before over here it's basically the same thing just to have a fresh start I'm going to save this again into a new variable called that dude I can save the longitude data as well into a variable and now let's run this command and if you see over here we can see that we have now two variables these are the two variables of the type masked array now if I open this the latitude you can see actually it's an array all the way up to 2009 that means we have 280 entries I think you remember and in the longitude it's starting from 0 all the way up to 350 9 means 368 entries and these different values are longitude values I think that should be clear for you guys by the way while you are following along of course if you encounter some problems please don't hesitate to type in I would always welcome a constructive discussion which is which might actually be quite helpful for the other viewers as well so if you do have a question please don't hesitate to comment it comment it down below and I'll try my best to answer it as soon as possible now I'm going to go and open this Google Earth file again and now I need to extract the time series for this particular latitude and longitude so I'm just going to sort of transfer this information back into my Python script as well so I'm just going to create a new entry called latitude of Kathmandu which is apparently this one and the longitude is this all right now I can run this command and now you can see that we have new two variables and the type is actually flawed because it contains a number of decimal points and now I think you can imagine that what we need to do is we need to find what is the closest point let's say if you are talking about the latitude what is the closest value to twenty seven point six nine seven eight one seven in so if I roughly do it manually it should be something around twenty seven point six nine should be either this one or this one oh it even it could be yeah it could be either of these but I don't want to do this manually so what I'm going to do is I'm actually going to create another variable called distance of latitude now the way normally we calculate the distance would be actually to subtract one from an from another so what I can do is I can actually subtract the light variable sorry I can subtract the LAT latitude of Katmandu from this light variable that means basically I'm I'm subtracting twenty seven point six nine seven eight one seven from each of these entries and I'm creating another array which actually has the same dimensions of this one however if we only do that we actually will run into another problem because some values are positive and then some values are negative for example over here these values are negative and certain beliefs are positive so instead of actually doing it as a direct subtraction what I can do is I can actually take the squared difference so I'm going to change this variable to get have the name something like square the square difference of the latitude which I can get by simply getting the square of this value and I can do a similar thing for longitude as well this one should be alone this one should be long as well all right now let's run this command and see what we get so the squared difference for latitude now if I open this one what I get is actually the squared difference so for example I took latitude minus the latitude of Kathmandu so that means if we look at the first entry it's actually it's actually giving me minus fourteen point eight seven five minus twenty seven point six nine seven eight one seven and that answer it actually squares it for me and that's the answer I get over here 1812 I can basically add add a few comments over here this one is reading in the net CDF file this one is storing the into the variables and this one is storing the into variables and this one is taking the squared difference all right now we'll try to find which one is the closest point all right now since we have the numpy library already handy we can actually find out the index of in this mask array of squared difference of latitude and also for longitude we can actually find out quite easily rather than scanning through it manually you can see that over here the the least the least squared difference should be somewhere over here there's actually a function in numpy which we can use to do this quite easily which is called the Augmon so as you can see over here it returns the indices of the minimum values along an axis so we can do that in order to quite easily retrieve the index of the minimum value all right I'm going to create another variable called main index for left that's going to be this variable admin and similarly for longitude it's going to be all right now let's run this command and see what happens now if I look at this variable minimum index latitude you can see that the minimum index is actually 170 and the minimum index longitude is 101 now let's open this square difference latitude and oh here it says that the index is 170 so I'm just going to scroll down by looking at this index column 170 and this minimum value is zero point zero five three zero point zero zero five three and so on so you can actually do a quick test also if to see if this is the minimum value or not you can just take this variable you can take this variable then you can simply like me and as you can see it actually gave us this exact number which is the minimum number and using this and using Augmon we actually managed to get the index corresponding to that minimum number alright now I'm going to create another variable called temp and similar to over here I'm accessing that variable called T AV e which is the T average now if I run this command and if I type print temp you can see that we actually get the information regard about this temp variable which again has the current shape of 365 280 and 360 corresponding to a time latitude and longitude so now if someone wants to access the data of the first day corresponding to a certain latitude entry and a certain longitude entry what they can do is they can simply type temp over here the first entry should be the entry corresponding to the day in this case the entries let's say if we are talking about the first day we have to enter the index of the time in this case the index will be the 0th index that's actually corresponding to the first day and when it comes to the longitude now we found out actually over here the corresponding longitude of the netcdf file which refers to cut Monday is sorry this one is the latitude which refers to cut Mundo the latitude index which refers to Katmandu is 170 so you can put 170 over here and the corresponding longitude which refers to Katmandu is actually 101 and if I print this now it's giving me the average temperature value because this temperature is actually stored in this variable and we are calling the variable based on the time the latitude and the longitude this latitude and longitude are actually the corresponding indexes of the latitude and longitude in which to which these real latitude and the longitude values belong to now you know that we after we found out the corresponding indexes for the latitude and the longitude these two columns I should these two values actually not going to change because now our point of interest is just one single point it's not actually just moving around so the thing that actually moves or increases or decreases the num is the debt which we specify by again the index over here this is the these actually the fifth day debt because our starting index is zero so that's nine point nine two all right now let's see how we can extract also if you would actually like to print the units you can do that at the same time two you can type temp dot units and now it shows you that it's nine point nine to decrease centigrade now let's go for an index which corresponds to the 364th date you still have a value seven point eight one degrees but if you try to print the sehun the 365th date it will give you an error because we have only 365 entries and it's starting from zero therefore the highest number is 364 which corresponds to the 365th date I think that must be clear to you by now so now instead of printing this one by one let's try to find a way to extract all these information into a I'm series so for that actually I'm going to use the pandas library if you mean if you if you have been following my tutorials oh you might know that I'm actually very fond of using pandas especially for this kind of tutorial so I hope you have configured and US library already if not you can actually go to the link down in the description below and learn how to install pandas first so assuming that you have done so I am going to import and thus s PB and first time but what I'm going to do is actually I'm going to create an empty structure in order for in order for us to store these data so for example in this case we are talking about a single file which corresponds to 1961 so I know that that file will have 365 entries so if we think about a table which has maybe let's say two columns the first column is actually showing us the date and the second column is showing us the temperature value directly we can create we can first sort of create the structure using pandas and that structure we call as a data frame a panda's data frame so what I'm going to do is I'm actually going to create an empty panda's data frame and later on I'm going to fill those fill that empty data frame using the using these temperature beliefs which I'm going to extract one-by-one fully automatically using a for loop all right so let's see how we can do that so first I'm actually going to create the date range now if I go back to date dot variables and my variable is now I want to have a look at the time variable for a second you can see that if I want to take the units the units will be in minutes and starting from 1961 first of January actually I can directly access this line by putting a dot over here and if I type units it's actually giving me this particular strength now what what I want to do is actually instead of typing this manually because now we are doing this exercise for one file but later probably in the next part of the tutorial I won't actually I want to utilize all of these files and extract the data from all of these files and sort of combine them into one CSV or maybe a one continuous time series so if I were to do that I don't want to actually keep on typing the yes the name of the year one by one so I'm actually finding a way where I can directly extract it from the netcdf file itself so now over here you can see that the year is already the year has already been mentioned over here so what I'm trying to do is I'm actually trying to ignore this part up to this point and I'm going to actually just cut this part from the netcdf file itself and I'm going to specify this to be my starting date so how we can do that if I just go back to this the previous command now if I specify the indexes you see since this is a string by starting in index is 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 so at the 14th index I have the character which is 1 so I can type 14 over here and if I go 1 by 1 15 16 17 18 19 20 21 22 23 now if I put 23 over here let's see what happens all right you see that we managed to extract this part of the string but without this one right all right so I have explained this before as well but never mind one more time if you want to extract a certain character you actually have to you go one one unit beyond the character that you would like to stop it so instead of putting 23 I'm actually putting 24 all right now you can see that we actually managed to successfully extract the dates directly from the netcdf files information so I'm actually going to create another variable called starting date and that starting there is nothing but this thing right over here all right so I want the starting date and house I also want the ending date all right now I know actually my starting date over here would be 1961 first of January and if it's another file it'll be 1960 whatever it is or 1970 whatever it is 1st of January but I know for a fact that if you are talking about this particular file my ending date would be that particular year 12th of December sorry 12th of 31st of December so I'm instead of extracting this whole thing I'm actually going to extract only this first part in which case I can still use the same kind of a syntax but now instead of going all the way up to 24 over here I think it's better to stop at maybe around let's see what happens when you put 19 yeah we have this additional - character which we don't need so I can actually go on unit B low which is 1961 so here I'm actually going to take this one and I'm going to add manually 1231 so if you would like to see how this would look now just keep in mind that you can add characters to characters at string two you can add strings to strings so that's why we can do this without any issue but if I were to actually add a number here it won't work because you cannot add integers or you cannot add double off lots to a to a string so you can see that it's my ending net is now 1961 December 31st which is exactly what I want this should be ending debt all right now I'm actually going to use a functionality of pandas in order to create the date range and the date range can be done by PD dot date range all I have to do is specify what is the start my start is actually going to be the starting date which is this and my end is actually going to be the ending date which is this and now if I run this command and if I try to find the variable which says the date range oh yeah I can simply type it over here date range you can see that I have a set of dates which starts from 1961 first of January all the way up to 1961 31st of December which has a length of 365 you can even open it from somewhere over here all right now you can see that the date range is from 1961 first of January all the way up to 1961 31st of December all right and then what I'm going to do is I'm actually going to create an an empty data frame this is a panda's data frame the way to create the panda's data frame would be PD dot data frame I'm going to put 0 as the value because right now I'm not really worried about the value because later we are going to replace these values and there will be a there will be one column by the name temperature to which we will extract these values and fill fill it in and for this my index is actually going to be this date range all right now I can run this and if I open this DF pandas dataframe this is how it looks now you might be able to imagine already that for each date I'm actually going to extract the values and fill it over here which is which is which has currently been filled by zeros all right this one I'm actually going to name it as creating an empty and data frame which is not technically empty but we will consider that as empty even though it's filled with zeros all right the next thing that I'm going to do is actually I want to create a numpy array which has the same dimensions as the time variable so for example if I again go to data dot variables time you can see that it has the size of 365 so now if you had if you if if we are doing it this for multiple netcdf files you can see that you can know that the this one be 365 all the time because there are leap years in which the number of time steps would be 366 so I don't I don't want to take the risk of manually typing 365 even though that's true for three successive yes but not for the fourth one so again instead of typing this by hand I'm actually trying to use the information in the netcdf file itself and I'm trying to extract what is the dimension of this time variable for each case so I'm actually going to create another variable called DT and that's going to be in P dot arranged which basically creates an array with single increments which starts from 0 and if I take this one and if I check the size of this you can see that it's actually giving me 365 so I can directly use this command so whenever it's 366 it'll auto my automatic Lee recognize that it's 366 all right now do you would you like to see how this one looks it's a it's a numpy array if I can run this directly and if I open this DT which is this it's basically a set of integers which starts from 0 all the way up to 364 which has 365 elements all right we almost at the end of the tutorial now we can actually run do a simple follow for time index this is a variable which I'm giving if if you would like to give us a different variable name you're very welcome to do so for time index in DT that means in here without going much further if I say that print the same variable over here what do you think it would do it'll actually just print out all the time indexes over here like this which is not necessarily what I need so I'm going to get rid of this now before writing the the command over here let's go back to the DF the the data frame the empty data frame that we created so if I'd put sorry it should be DF you can see actually this is how the DF looks so this is the index column and this is the the data column so if I need to select something using the index location still this is this would be my 0th index this is the first index this is the second index so for example if I say DF dot index location give me my 0th index location you will say that it's temperature value is 0 that's basically it's it's checking this row and it's checking what are the columns available for this 0th index and you see that we only have one one more date one data column available and it's actually returning me the value so if I say give me the DF index location of 1 still going to be the same value so what I'm going to do over here is for each iteration now you can see this time index is actually increasing values starting from zero one two three four and so on so I'm going to say that DF index location put this time index over here so similar to what you what we did D if I lock in the first iteration the time index is zero isn't it so that so it'll be something like this in the second iteration the time index will be one so it'll be something like this and so on it will go all the way up to 365 because this time index numpy array because this DT numpy array has 364 in 365 entries with the maximum of being 364 so I'm going to make this equal to so for example if I open this DF using this statement we are actually referring to each and every entry over here so I'm saying that this entry make it equal to basically what we just did over here basically what we did a few minutes back temp if I put 0 over here and my latitude is 170 and my longitude is 1 or 1 we need to print this out [Music] some saying basically replace this the existing zero value by this one but now instead of putting this kind of fixed values I'm going to put this one to be the time index and this one to be the latitude which we identified through here again don't forget we are talking about the indexes over here not the precise not the exact latitude values all right now let's run this one and see what happens oh sorry this one should be the minimum index longitude now let's run this one and see what happens you can see now it's still running and it's done what we can do is we can actually have a look at this data frame now over here D F all right now you can see that it's actually it has been filled with the corresponding temperature data all the way from 1961 up to the 31st of December 1961 now you can even open it from here directly now you can see that it's a successful extraction of data for a specific latitude and longitude of our choice and it's extracted just like this now I can actually save it to a CSV file as well quite easily you can just say DF to CSV temperature I can name this as temperature Kathmandu dot CSV this one I can name as saving the time series in to a CSV alright now we can run this one and if we navigate back to the place where we kept the data you can see that it's already here now I can open this directly it's a CSV file all right now you can see that the data is already here you can actually do a quick plot and this is how it looks you can see that around January the temperature is about fluctuates from around five to about eleven during June July it actually goes up again during noam December it comes down and that's basically the cycle of the temperature variation in Kathmandu which we extracted directly from the netcdf files so so this concludes the tutorial for today now in the next part of the tutorial I will actually explain to you how to combine the data from the other netcdf files into either either into the same same CSV or into the same time series basically it's the same same concept and extend this time series all the way up to whatever the time for which the data is available so I hope you enjoyed this tutorial there were many requests asking for this particular part of extracting the data into a time series and saving it into a CSV file using netcdf files so I hope the tutorial was quite clear for you guys if you do have any questions please comment them down below because as I said I would like to have a constructive discussion in the comment section and if you do have any any queries I would be more than happy to clarify them for you so I'll see you in the next one
Info
Channel: GeoDelta Labs
Views: 25,421
Rating: undefined out of 5
Keywords: CHIRPS, Gridded, Satellite, Rainfall, Data, GIS, Python, how to, convert, rasterio, ArcGIS, QGIS, numpy, pandas, Os, DEM, Raster, vector, APHRODITE, APHRODITES, APHRODITE'S, Japan, monsoon, asia, netCDF, multidimensional, data, xarray
Id: hrm5RmsVXo0
Channel Id: undefined
Length: 69min 50sec (4190 seconds)
Published: Sat Feb 29 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.