Getting BIG DATA by unlocking the power of CDSAPI - You will go loopy!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello and welcome back to combat unbox now if you search on youtube you'll find a great michael mcintyre video sketch about hotel buffets you might only have an egg for your breakfast normally but faced with a hotel buffet it's just too tempting to take the sausage the fruit the bowls of cereal and everything under the sun now we'll find that human nature operates the same way when it comes to getting era 5 reanalysis and other data we tend to get a little bit greedy and i've been practicing my manic evil [Music] life greed for data so what i'm going to do today is i want to show you how this is where the strength of the api really lies and how weekly writes simple loops in python to automate downloads now anybody who's an expert already in python will probably want to skip on to my next video this is really aimed at people with limited background or no background in python at all now behind me you'll see the browser for the era 5 hourly data and in this case on single levels we saw this in the first video of era 5 downloads a typical workflow might be the following we might decide at first we're logging on to have a look at downloading uh a variable such as temperature so we click on temperature but then we say well actually while i'm here why don't i get the dew point temperature and then maybe also total precipitation as well um we're here why not and we scroll up to the next section and we say well we need data for you know a couple of years five years ten years well while we're here let's just take all the years and all the months in the year and all the days in the month and well we want to calculate the dino cycle and if we want to do daily averages we we need all the hours in the day as well so now after making those selections we scroll down we select net cdf that's the format and we're all ready to press the download button and so we scroll up and oh we have an error we can see that the request it says is too large we're trying to get over a million fields and the limit is currently set to 120 000. so what do we do well we need to reduce your selection it says so we'll scroll back up now we could do this in a number of ways we're roughly 10 times over the limit so it would be enough to simply select one time or one day or one month and i'm going to show an example where i'm going to select just one year so i'm going to click on clear all and i'll click on 2020 to select only the data for 2020. now when we scroll back to the bottom all the buttons are green fantastic so if we wanted to get the 2020 data we would just click on submit form and it would launch the retrieval well that's fine but then well we don't just want 2020 do we want 2019 and 2018 and 2017 so we'll have to scroll back up change the selection to 2019 unselect 2020 scroll back down press the green button download the data it gets tedious very very quickly so this is where the strength of the api lies i'm going to click on the api button just to show this example and this gives us this little piece of code that i introduced in my earlier era 5 video so i simply want to highlight this code like this i'm going to press ctrl c and i'm going to minimize the window and i'm going to go now to my desktop and i'm going to edit a text file using emacs in this case remember you can use any text editor of your choice and i'm going to call it get um data dot no let's call it get error 5 dot pi okay and then i launch an editing window so if i switch this window now here we have it so we have a blank editing window and i want to paste that code into that window so this is the code that we had from the website of the cds now i could just save this and run it just like i showed you in my earlier video but the point is at the moment it only downloads the data in this case for 2019 that was my last selection so what we need to do is we need to make three modifications in order to set up a loop over all of the years so the first thing we're going to do is we're going to set up a loop over years so it's now time for the climate unbox short guide 2 [Music] loops in python so oh hi there so um most codes have a structure for loops which are rather like the following you have a indicator for the start of the loop saying how many times you want to repeat the code then you have a bunch of codes that you want to loop over and then you need some kind of statement that says here is the end of the loop for example in fortran you could write a simple loop like this where i is a variable it's looped over the numbers one to five and then we simply have a statement print hello world which is repeated five times and then we have an end do statement that says this is the end of the loop now if you look at both of these examples you can see that there is one thing that's perhaps not that nice and that is that it's quite difficult to see where the loop starts and where the loop finishes and so for a long time it's been programming etiquette to actually move the code inside the loop to the right so you actually write justify add a tab statement or a couple of spaces so that you can clearly see the start and the end of the loop so the problem with this is that human nature of course is to be a little bit lazy so you usually find that often people forget to indent or the only independent part of the loop that code moves raggedly from right to left and it's a bit of a mess and so python does something quite clever it actually gets rid of that last statement the end of the loop and forces you to indent the indentation actually specifies what code is inside the loop now the standard you can actually move any number of spaces to the right is your choice here we have two but in fact the stand is to have four spaces as an indentation so most editors build that in and of course then we have an example here where we can then have sub nested loops so then if you just simply have a colon and then more code justified further to the right then that last line there would actually be in a nested loop and repeated in within both of those loop structures so the loops are done in the following way we can say for year in and we want a list of years and we can do this very simply using the range function so range we simply give two arguments for the start and the end year so i type 1979 to 2021. now note an idiosyncrasy of python range function is it gives you a list of integers so it gives a stride of 1 1979 1980 but it doesn't include the last number so this will give me a list of years from 79 up to 2020. if you want 2021 you'll need to end it with 22. now we need a column and i press the enter now if you notice something strange happened when i pressed the enter key the cursor didn't return to the beginning of the line it returned to a position indented by four spaces so the indentation actually specifies where the loop starts so what we need to do now is the retrieval command needs to be indented i'm going to press the backspace and you can see now the c dot retrieve command is indented four spaces to the right so i need to move this way in order for it to be considered part of the loop if i didn't do that if i have it to the left this retrieve command is only run once and therefore i would actually get an error because i would need to have a statement here like print year so the year here would be printed out each time in the loop but the retrieve command would only be operated once so let's indent again four spaces to the right so this becomes part of the loop great now there are two more steps that we need to make now the first is quite obvious when we actually specify the year here we need to change it for the variable which we've called year now you'll notice that when we pass the arguments to the retrieve function it's actually passed as a dictionary and each member of that dictionary the argument is passed as a string or a list of strings okay so if we were to just type in the variable year here we'd actually get an error because year from the range function takes an integer value it's a number it's not a string so therefore we need to convert it to a string by using the string function which is s t r and then curly brackets so this would actually pass us a string of the year in turn each time this loop is operated so that's great we would actually now get the data for each of the years in the loop but there's still a problem now i wonder if any of you can spot what the problem might be so you can pause the video for a second just to consider it okay so for those of you that spotted it well done you're well on your way to being python masters and if you didn't spot it it doesn't matter if we scroll down to the bottom here we have download.nc as the output file so this means in the first loop we would get the 1979 data and put it into download.nc in the second iteration of this loop for 1980 it would get the data but it would store it in the same file name and it doesn't append it it would just simply overwrite it so at the end of the program we'd end up with just the data for 2020 so what we need to do is need to modify this so that each time the loop runs we have a different file name and it's easy to forget this step it's happened to me many times so we will change this download is anyway a bit of a boring name isn't it so we will change this to era5 underscore i'm going to type here now this is inside the string so it will simply have the letters y-e-a-r in the file name and now i want to concatenate strings together so to this string i'm going to add another string which actually contains the value of the year now remember again the year is a number so we still need to use the string function to turn year from a number into a string otherwise we'd get an error and then i'm going to add on the last piece of the string which is dot nc so each time now we have a different file name with the year number inside so i'm going to save this and just to emphasize then to remember the three things we needed to change we had the year inside the loop we needed to indent the retrieve command the variable of the loop we needed to basically change from a number to a string and then we needed to change the file name okay so i'm going to save this and i'm going to go back now to my desktop over here and we can then try to run this script so off it goes oops and i've tried to use python 2 which doesn't work let's try again with python 3 and we can see the command is now off and it is queuing in the system so while that's retrieving the data i want to go back to the emacs window and just show you one or two more things we've looped over one of these variables but there's no reason why you can't loop over one of the others so for example you might actually want to separate the data into both years and months so we could also set up another loop and have nested loops by doing the same thing for men in range and then 1 to 12 or 1 to 13 i should say but we also have another possibility we could very easily cut and paste this list of strings and actually just iterate over that list it's already pre-defined so we could just say for man in and then i'm going to paste that list we need the colon don't forget the column so now of course we need to indent once more okay so that the year loop is part of the month loop and the retrieve is part of both of these nested loops so mum will take in turn each of the string values inside this list and then we would loop over years and of course we then need to add mon to the options so we have man now you notice i don't need to convert it to a string because it already has a string value inside this list and we also need not to forget to add this to the file name in this way so we put underscore man and then we add the month here now you could do this to any of these variables obviously you don't want to make the retrieval too small because you're constantly going to the server and queuing in the server so use the web interface to get an idea of roughly the size of retrieval which is you know close to the maximum but not too small you might however want to consider looping over the variables to have a separate variable in a separate file so again you would just take this list of strings for the variable names and say four var in this list so that way i've shown you how to do two different methodologies for looping over the data so we saw how it was very easy and simple to use a loop within the python script in the cds api to set up a retrieval for very large amounts of data much larger than you can get using the web interface to really take an advantage of the strengths of the api go away a couple of hours maybe you need a day depends on the size and you will have a whole list of files in your directory in the case that we show today each file being the data set for one particular year so i hope you found that useful and interesting and i look forward to seeing you again on climate unbox
Info
Channel: Climate Unboxed
Views: 417
Rating: undefined out of 5
Keywords: cdsapi, era5, climate, weather, python, cmip5, cmip6, Copernicus, climate data store
Id: H47VBPeUGQo
Channel Id: undefined
Length: 16min 33sec (993 seconds)
Published: Thu Jul 15 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.