regridding and interpolation - an essential life skill !

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello and welcome to climate unboxed what i want to talk about today is re-gridding data now when you're working with climate data this will always be a task that sooner or later you will have to face now let's say we have a data set that might be a very low resolution we maybe want to compare it to another one which is at higher resolution with finer pixels smaller grid box sizes even maybe a data set like this now let's imagine that we have these two data sets and we want to make a comparison between them maybe one is model data set and one is an observation but they're on different grid box sizes now the problem is it makes it very difficult to work out differences or combine these files together if they are not on the same grid size it's basically like trying to compare apples and oranges and as we know that's a very difficult task so today i'm going to show you how to do this using climate data operators so let's get started shall we so there are a number of options available to us when it comes to interpolating or remapping grids i'm going to consider these in turn now we have to consider two cases one is where we're going from a large grid size to a small one and the second case is when we're going from a fine grid to a coarser grip now to make things simple i'm going to consider one dimension first which i've labeled here x and so in the x dimension we imagine a set of grid boxes one by one in which we're going to define a function called f now f could be humidity it could be temperature precipitation it doesn't matter but we have some variable is defined in each of these grid boxes marked by the yellow dots now the horizontal lines are just to remind us that the value is representative across those grid boxes shown at the bottom of this diagram what we want to do is in the first case we are actually interpolating from a course grid to a fine grid so the fine grid is represented by each of these red dots so the first interpolation method i'm going to discuss is nearest neighbor if we take one of these points as an example we map it to the nearest grid box value from the original data set we have another example here and so we can remap all of these points on to the value of the nearest neighbor so what we can see straight away is when we use nearest neighbor interpolation even though we're remapping to a finer grit the data will retain its original coarse looking structure a second interpolation method is known as linear interpolation now with linear interpolation rather than just taking the nearest value what we do now is we make a linear fit between neighboring points shown by these red lines so we simply draw a straight line between each of the points in the original grid our new fine grid is interpolated to the value that is represented by these straight lines we get a smoother transition from one point to the next however if you look closely you can see because the new grid is slightly offset compared to the old grid we actually will be reducing the spatial variability there'll be a little bit of smoothing because we miss the extremes of the original data set now the third method is called cubic interpolation now rather than just fitting a straight line between neighboring points cubic interpolation takes four points from the original data set and fits a power three cubic polynomial through those points so that we're going to have a smoother interpolation when we do the remapping of the points so now we can extend this now we've understood these three basic interpolation methodologies to the two-dimensional example so the blue lines here represent the original grid cells now let's imagine again we're remapping to a modified grid which is much finer shown here by the yellow grid boxes so the nearest neighbor interpolation methodology is simply going to remap the value to all of the cubes which are found in its vicinity so we can see these two examples of the new grid will take the values from the nearest point of the course original grid now as i've shown here this is fine but if the new grid is offset with respect to the original grid notice that this offset will also be apparent in the new interpolated data set now if we take the second example of bi-linear interpolation the new grid value is taken from the four surrounding points and remember point that is close by will have a much greater weighting than the point that is far away so it's just the same as we saw before but in two dimensions so this will go right through the grid where every point in the grid takes its value from a weighted average of the four surrounding points in the original grid if we look at those three different examples this is how the nearest neighbor interpolation will look like this is just for a random field so it doesn't matter how fine the new interplated grid is it will still look exactly like this then as we go to the linear interpolation we can see that the bilinear interpolation makes the interpolated field much smoother and then we go to bicubic interpolation where the field is smoother too what you actually want to use in this case it's really up to you if you want to make sure that you keep the structure apparent from the original grid then you want to use nearest neighbor on the other hand sometimes you will actually prefer to have a smoother interpolated field in which case you might want to use bilinear or bicubic interpolation now it's important to emphasize that when you go from a coarse resolution grid to a fine resolution grid you are not adding any information you are simply making an interpolation so this means if you are comparing a course resolution and a fine resolution data set it might make more sense to actually go the other direction where we compare the two data sets at a coarser resolution in other words what we want to discuss now is the case where we up scale data so here we have the example again of the original grid in blue boxes but now we want to actually interpolate this to a coarser resolution grid this has the advantage that the final data sets that you're comparing will be much smaller in size but there are problems that you have to look out for which i want to highlight now we could just use one of these three interpolation methods so the first was the nearest neighbor so in that case each of the points in the new grid takes the value from the nearest point in the original grid one thing we see immediately is that there's a little bit of a problem in that boxes i've shaded blue here they will not have their values used in the interpolation method so that is to say that the information in those grid boxes in the original file will be lost this will be especially a problem if we're interpolating a field which is very heterogeneous that is to say that it changes very rapidly in space for example precipitation if we use bilinear interpolation for this example that this wouldn't be a problem because with by linear interpolation as well as the nearest point we actually use the values from the other three surrounding points all of the boxes within the domain will actually be used so that seems great we're no longer sub sampling the original data set however unfortunately we still have limitations now i'm choosing to interpolate to a grid that's even larger if we look closely there are boxes being missed out so you can see that the bilinear interpolation by using the two surrounding points in each of the x and the y the longitudinal latitude directions improves the situation but if the grid size is much larger we have missing data now of course if we use bicubic that uses the four surrounding points in both directions and that improves things even further so what this effectively means is we have a limitation to how much we can course grain data reliably using these methods now as a rule of thumb i usually suggest it's one two three limitation nearest neighbor you're limited to the new grid being effectively the same size it's finer than the original grid with bilinear interpolation you can go to two times the original grid and with bicubic three or maybe even four times the original grid size that means we have a limitation how can we get around this well there's another methodology available to us and that's called conservative interpolation now conservative remapping effectively looks at all of the grid boxes from the original data set that lie inside the new grid definition and it takes the value weighted according to the proportion of the grid box that lies inside the new grid all of the points within the original box are used the spatial average of the new interpolated field will be identical to that of the original field great so now we understand four major methods used to remap data nearest neighbor bilinear bicubic and conservative remapping either first or second order but how can we actually do the remapping so i'm going to show you that now using climate data operators so the command is actually very simple we use cdo and then we need to specify the remapping method and then we need to specify the grid that we're going to remap to and then we have the input file which is the input file we're going to be map and the output file is the name of the new remapped file so first let's discuss the method while we've seen the methodologies available to us they are nearest neighbor by linear interpolation by cubic and conservative remapping with con2 representing a second order conservative remapping so we just need to substitute this string for method in the command we see at the top of the screen so for example if we were going to use by linear interpolation we would use studio re-map build all one word with no spaces remap bill to specify we want bi-linear interpolation it's important to specify that there are no spaces between the command and the grid specification so how do we actually specify the grid resolution now cdo actually has three methodologies which i'm going to go through in turn now the first methodology is to specify a global latitude longitude grid using the following string format so first of all we have an r which is for a regular longitude latitude grid now there are other options available to you but but the regular lat long grid is the easiest to work with and it is usually our go-to grid now the 360 that comes next is the number of longitude points now remember with cdo as we saw before longitude always comes first i long to be first then we have an x and then the second number of course is the number of latitude points here we are specifying a grid resolution of one by one degree we have 360 longitude points and 180 latitude points but what if we don't actually want to remap to a global grid maybe we only want to remap to a proportion of the globe such as this area over europe for example now we can do that very easily as well by specifying the re-mapped area and resolution in a text file a grid descriptor file so to do that we will open a text file using a text editor which i'm going to call here for example grid spec dot text you can give it any name you want and in that file we specify the details of the new grid now again if we want a regular longitude latitude grid we specify grid type equals long lat and now we simply need to specify the area of the grid so the first thing we need to do is specify the longitude point at the left hand side of the area we're remapping to with x first in this case i've put minus 10. we then need to specify the resolution also in degrees using x inc for the increments in the x direction so in this case i've got a half degree grid specified and then we have to specify the number of points with the x size keyword now be careful here because x size it would be easy to assume that this was also in degrees longitude but it's not it's the number of points so in this case with 120 and a half a degree resolution it means my box size will be 60 degrees now once we've accomplished that for the longitude we also need to do it for the latitudes well so that's using the equivalent keywords y first white ink and y size it's important just to mention that the order of these keywords is not important now the third methodology is perhaps the easiest of all to use and that's if we are comparing two files a and b or two different resolutions we can simply ask cdo to map one of the files to the other the easiest way to show this is to give an example directly using cdo so in this case we're using bilinear interpolation now the first file name target.nc is the netcdf file which has the grid that we want to use to do the remapping so the only thing we're using from target.nc is the grid specification in the grid header of longitude and latitude points none of the data from that file will be used input.nc is our next cdf file that we want to actually interpolate and last of all the output.nc is the final output with the interpolated data so the data from input.nc is projected onto the grid of target.nc using bilinear interpolation so let's sum up by giving three examples of cdo remapping with three different grid spec methodologies and three different techniques for actually carrying out the interpolation the first is the global regular grid the second is the specific area remapping using a grid file and the last is using a target next cdf file so i have a directory here if i type ls you can see i have one file which is t2m.nc remember if we want to look at the header we can do nc done minus h you can see that this file contains just one time slice to make the example very quick it has 1 440 longitude points and 721 last two points and the data is t2m the temperature from the hero5 database so now i want to give an example where we remap the quarter of a degree grid to a one degree global grid and if we recall that's a four times coarser grid so we don't want to use nearest neighbor or bilinear interpolation so i'm going to use re map con for conservative remapping and we're going to map to the example with one degree so we want r360 by 180 t2m [Music] and then we need an output name so i'm just going to call it p2m one degree it's calculating the weightings to do the remapping and it's reprocessed the file so now if i do list minus l the new file is quite a lot smaller and now we can look at this using nc view so we still have a global file but if we actually look at the header of this file we can see that it has 360 longitude points and 180 last year points and note that in the history here we actually have the remap on command so the next remapping technique if you remember is using a text file to describe the region and resolution so i'm going to open a file now i'm going to call it grid.text and i'm using emacs but you can use any text editor on your laptop and the first keyword is grid type equals [Music] so now i'm going to save this and now to do the remapping you need to decide on a interpolation methodology for this i'm using a half degree grid so i'm going to use bilinear interpolation this time and then we have the grid description file grid.text and then the input file and the output file which i'm going to just call america like this we see it's calculated the weights and it's interpolated to this grid 100 by 150 points nc view t2m america and you can see i've not been very careful with how i've described the grid and i've managed to chop off a little bit of brazil so if anyone's watching from brazil i apologize for that so let's move quickly on to the third methodology so the third way that we can remap is to remap directly to the grid specified in a second file so to do that we do cdo remap and i'm going to use bilinear interpolation and then we specify the name of the file with the grid the input file and the output file which i'm just going to call underscore test so we can see that the interpolation has been completed and now when i view the file once again we have the output for the same region in south america if you want to double check that the resolution is correct and remember we can use nc dump and then minus b to output a variable and i'm going to write long look at the longitude points t2m underscore test and it prints out the longest two values which we can see start at -90 as we ask for and go to minus 40 and a half and the difference between adjacent longitude points is half a degree just as specified so we successfully remapped the new grid so there we have it i've introduced the four main methodologies for remapping data and i've shown you how you can very quickly and easily do the remapping with cdo using either a global grid specification a limited area grid specification using a text file or simply remapping one file to another so now you're in a position to compare your apples and your oranges so i hope it's been very useful and please don't forget to subscribe if that's the case and i'll see you soon on another episode of climate unboxed you
Info
Channel: Climate Unboxed
Views: 1,519
Rating: undefined out of 5
Keywords: regridding, remapping, interpolation, netcdf, grib, era5, climate data, observations, gridded
Id: 79o6DXr_3zM
Channel Id: undefined
Length: 22min 30sec (1350 seconds)
Published: Wed Apr 14 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.