The Beauty of NetCDF

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] so [Music] once upon a time back in the days of europe there was a problem a data problem the problem starts with a scientist who wants to share his data his data is contained in a simple file it could be a binary file a file that contains raw numbers it could be a text file less compact but with numbers stored in a way that is human readable but it contains numbers a lot of numbers a second scientist wants to read that data the scientist manages to open the file all they see are numbers what do they mean which variable is it what are the units is it gridded data or just a function of time what are the scientists to do well the first scientist could simply tell the second scientist what is in the file not very efficient and not very reliable not to mention the fact that if the first scientist retires or is otherwise indisposed well we can see it's not a long-term solution another solution which was adopted by the model we used in the 1990s was to provide an example code that reads the data but what if scientist b can't understand that programming language and the codes have to be updated in tandem with changes to the data versions again not an ideal solution [Music] what about describing the data in a documentation file otherwise known as a descriptor file that is actually the solution used by the program grads the descriptor file gives the file name of the binary file to which it refers and then other information such as the name of the field stored in the file and the grid of longitude and latitude points and also time it works but it has problems the descriptor file is very limited in the information it can portray about the data scientist b can plot with grads but most programming languages are unable to handle generic descriptor files this means that scientist b needs to write bespoke code in order to read the data not to mention the fact that the system is not very reliable if we imagine the data file as a car and the descriptor file that's the key well lose the key and the car can't be opened or start for that matter how can that happen well easily what if you decide to rename your data file and forget to update the descriptor file come back in six months time and you will not remember which descriptor file belongs to which data set well is there a better way well yes in fact welcome to the world of self-describing file formats cdf grid hdf are all examples of self-describing files but what do we mean by self-describing well we simply take the data file combine it with the descriptor file what's more we design this in a flexible and standard way so that all coding languages can build apps and modules that can easily read the self-describing format so how exactly does it work in detail [Music] well let's imagine a textbook it has a contents page to outline where to find information in specific chapters and an index to find terms information is then arranged in chapters and finally there is a section on general information that doesn't necessarily pertain to any specific chapter but to the whole book itself such as the publisher details the date of publication and maybe the copyright information a net cdf file is very similar instead of the contents page we have a section of dimensions which describe how data may be arranged then we have a section of data which contains the specific variables themselves but in addition to the data values we also have metadata this is all of the sundry information that is crucial to understanding what the variable is its units a detailed description and so on lastly we have general information that again pertains to the whole data set just like the textbook this may include details of the publisher the date of publication and usage permissions for example let's look at an example a typical netcdf file for the atmosphere may have dimensions that include time longitude latitude and height but it doesn't have to include all of these or there may be other dimensions if many models were run a dimension could describe the model number for example next comes the variable section here we have an example of the t2m which is the temperature two meters it is a function of longitude latitude and time so we can sketch the data as a three-dimensional cube each cell represents the temperature at that particular location and time just as is important is the metadata how do you know that t2m is the two meter temperature well an attribute long name is there to tell you another gives the units and another may flag the value used to indicate that data is missing in a particular cell there are other variables so here for example we have rain which may also be a function of three dimensions longitude latitude and time but variables don't have to always have the same dimensions we can have more or fewer for example topography is time invariant and is only a function of longitude and latitude or you might have time series data that gives the mean value of for example carbon dioxide changing as a function of time lastly we have the global attributes that give the all-important general information such as the model version used to generate the data or for example the creation center a contact number and the creation date clarity and traceability that's the aim of the global attributes you can never put too many attributes in a net cdf file so virtually all software has apps and modules that can read netcdf for what is more there are community-adopted climate and forecasting standards for variable metadata which when followed allow netcdf files to be easily manipulated and plotted which we will show in the next video and that is the beauty of netcdf
Info
Channel: Climate Unboxed
Views: 1,462
Rating: undefined out of 5
Keywords:
Id: UvNBnjiTXa0
Channel Id: undefined
Length: 7min 41sec (461 seconds)
Published: Fri Apr 02 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.