RUS webinar: Pollution Monitoring with Sentinel-5p - ATMO02

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
and welcome back to another who's Copernicus webinar my name is Miguel Castro and today I will be guiding you through this topic today we're going to be having a look to the pollution of the north of Italy using Sentinel 5b data so a few words before we start this session I would like to highlight that this webinar is going to be run using Python so if you are familiar with the most webinars this time we will not be using snap but Python is that so before starting let me tell you the objectives of this session today you will learn two main things first how to monitor pollution using Sentinel 5p data and second where is the route service and how it can help in your projects with Copernicus data so just before starting be aware that this webinar is being recorded and that you will be able to repeat the exercise by yourself later on using the cloud resources of all scope articles so before starting let's very briefly have a look to the outline of this session will first of all introduce the route Copernicus service so that you know the potential advantages that this project can bring you to your earth observation projects we will then describe in detail the sending 5-piece satellite its characteristics and we will go into the details of the data files that we are going to be using today we will then run our exercise as I said using Python this time and at the very end we will have some time for a Q&A session in total the webinar should last around one hour and a half so let's get started and as I said let's start by describing the reus Copernicus service so as you may know already who stands for research and user support for sentinel core products it is an initiative founded by the European Commission and managed by the European Space Agency with the objective to promote the uptake of Copernicus Center of data and support your R&D activities the service provides a free and open scalable platform in a powerful computing environment hosting a suit of open source toolboxes pre-installed on vehicle scenes using the baton the battle machines you will be allowed to handle and process the data derived from the center of satellites so in other words would what do I mean by that well with the large amount of data produced by the central satellites each challenge in earth observation is no longer data availability but rather storage and processing capacity to solve that whose Copernicus offers virtual machines so that you can have the appropriate computing environment to handle the data in addition to all that whose also provides specialized user help discs to support your remote sensing activities with Sentinel data and a dedicated training program sites from which these webinars is part of so you can find all the details and all the full information about the project in our two main web sites so I recommend you to check them whenever you have time after this webinar so you can get familiar with the project its characteristics and different options you have to exploit Copernicus data so let's jump now into Sentinel 5b and let's have a look to this Copernicus mission oh by the way before I do that I forgot I would like to highlight that we have the YouTube channel so here you will find all the previous recorded webinars and it is also here that you will find the recorded version of this one in a couple of days so as you can see we have many many topics already and I'm sure you can find something that might be interesting for you so as I said let's go and have a look to the Sentinel 5p mission so Sentinel 5 precursor or Sentinel 5 P is the first Copernicus mentioned dedicated to monitoring our atmosphere the satellite maps are multitude of air pollutants around the globe launched in October 2017 Sentinel 5 p reached its routine operation phase in early 2019 it aims to fill the data gap and provide data continuity between the retirement of the inviscid satellite the NASA our mission and the launch of the upcoming sand five satellite the subject carries the state-of-the-art Trapani instrument to map a multitude of trace gases that we'll have a look later so some details about the technical characteristics of the missions in this location we have a single satellite mission so if you compare this to Santino one two and three which are missions that are formed by - in satellite so two identical satellites or being orbiting a hundred degrees to each other in this case we have only one satellite which has a swath of twenty six hundred kilometers has a daily revisit time and a spatial resolution of seven by three point five kilometres and it follows as you can see the animation a polar sun-synchronous orbit in loose formation with the Suomi NPP mission from NOAA and this is actually a very key aspect of Sentinel 5 P what we what loose formation means here is that sending 50 orbits 3.5 minutes behind the Swamy satellite and the reason for that is that he only provides a co-located high-resolution cloud mask which is very important for calculating some of the level two products that Sentinel 5 director provides so let's now talk about the instrument carried by Sentinel 5 pin so what sets Trapani apart or Trapani by the way means the as you can see on the slide the tropospheric monitoring instrument so what sets Trapani apart is that it measures the ultraviolet and visible near infrared and shortwave infrared spectral bands this means that a wide range of pollutants such as nitrogen dioxide ozone fall Monday sulfur dioxide methane and carbon monoxide can be imaged more accurate accurately than ever before let's move on then to the different data products that are produced by Sentinel 5 beam the data products are from Sentinel 15 instrument are distribute two users are two different levels and it's very important for you to understand the differences so that you know when you have to download the data which one you should pick so on one side we have the level one B product which provides geo-located and volumetrically corrected top of the atmosphere earth radiance in all spectral bands as well as solar irradiance and is the main input for level two for the level check processor so in the level one the product you will find the actual measurement done by the instrument that is covered in central 5-pin on the other hand we have level two product and this is the one that you will be more interested on so level two provides Atmospheric Geophysical parameters this is the product as I said that the prototype most of you as regular users would be interested on and is of course the level the product level we already using in this webinar so in the level two you will find the products that the that use level one be information to derive the different concentrations of the gases that are measured by Center and five P so until in this list you can see the full you can see the full list of level two products so on the left side of the tables you can see the acronym that is used to describe that prototype and on the right side the name or the short name of the parameter dyes retrieved as you can see there are different products for ozone there are also four you have the level check palette for nitrogen dioxide and carbon monoxide etc so as you can see very extensive list so the last technical aspect I want to share with you about the mission is the timeliness of the data this is another key aspect that you need to understand when working with this mission as you will have to define this parameter when downloading the data so sending o5p data can be delivered to user or is being delivered to user to users by either in three different time let's say you can find products in what is called the near real-time so this is mainly to be in line with the needs of numerical forecasting systems and it only applies to level 2 products which are to be supplied within three hours after sensing so three hours the satellite acquires the product it is made available mainly for this America forecasting systems and remember here you will only find level two products in new sorry you will only find near real-time products for level two products on the other hand audio the second option we have in terms of timeliness is the non time critical option which is formed by the offline products and the offline products takes advantage of increased accuracies achievable with specific calibration steps and trace gas retrievals are performed with relaxed timing so they are made available as you can see for level one B and level two and depending on the level you will have the products either 12 hours after sensing or five days after sensing and this time delay is just to make sure that higher accuracy can be achieved another main aspect between the level 2 near real-time products and the level to offline product is that the offline product will always be the full orbit of the satellite while the near real-time will not contain a few orbit this is something we will see later on on the exercise the third option in terms of time of terminus for data the other non time critical option is the we pre processing or if we processing sorry so in this case it is an option that is used by either whenever a major upgrade has to be done to the data due to a change in the algorithms are used to derive level 2 products and this is something that happens whenever is a or the scientific committee decides to make the update so with that being said let's move on and the first thing I want to clarify about Sentinel 5 P datum since I guess this is the first time maybe one of the first times you see these products is their name so as with Sentinel 1 Sentinel 2 and 703 Sentinel 5b follows a standard naming convention and as with your sentinels it is actually very important for you to understand what this very long name of The Sentinel product means because this will give you a lot of information of the product even before going to be download process and whenever you have to work with a lot of data and you want to optimize your download process as much as possible so here we have an example of a level 2 product from Cindy l5p you can see that the very beginning we always you will always see the name of the mission in this case it is Sentinel 5b so and it's always going to be in the case then it is followed by the processing string so same thing 1 5 P can be in a new real-time offline or reprocessing so this keyboard in orange will tell you which kind of processing stream the product you are working on has then we have the processing level you know already either level 1 B so top of the atmosphere measurements or level 2 geophysical parameters in case and here I'm only showing the product identifier for level 2 so as product identifier for level 2 you can you can see a very long list remember we are measuring different trans gasses throughout the atmosphere and here you can see the icon for each of them then this is followed as in any other Sentinel mission by this start and end of the granule and the format of that is earmuff day followed by at t which is a fixed character and then our minutes seconds then we find the orbit number which is a lot an absolute number the collection number it processing version and the time of processing for this granule those last parts of the name most probably you will not be interested I would say you are interested from the mission name to the start and end of the trial but it's always good to know the meaning of everything so with that clear let's plot a little bit more about the data products you are going to be using in this formula as I said we are gonna use level 2 products and more in particular we will be using nitrogen dioxide the nitrogen dioxide product so let's go a little bit more deep into this those products so the thing you have to know about about Sentinel 5 P is that all the products are saved or ours are delivered to users as single netcdf files so if you are new to something a 5 P or you or if you come from the standard optical domain in the Earth Observation I'm referring to something or - or Landsat etc you might be used to data formats such as gif or JPEG 2000 etc and maybe the next year format is not very familiar to you but it is the case for something like 5 feet and it is very important to understand the characteristics of this format - later on be able to process the data so I'm briefly describe you as you can see on the slide the characteristics of this format the netcdf file and then we will continue so next year which stands for network common data form is a file format for storing multi-dimensional scientific data which are variables such as for example temperature humidity wind speed or in the case of Sentinel 5 P the concentration of a specific geophysical parameters just no.2 so each of these variables can be displayed throughout a dimension and the dimension here can be for example time and this format by the way is widely used in the atmospheric and also Colonel oceanographic concentric communities so as you can see some of the main characteristics of this format is that the files are cell describing portable and scalable you can see the description of those characteristics on the slide so knowing that you have to know that every netcdf file has a specific structure has specific elements now from the end at CDF file and where you will find the actual measurement of in this case the subtly so the four things you have to remember is a netcdf file has dimension as variables as attributes and has coordinates so the dimension here lets the F dimension has both name and the size and can be used to represent a real physical dimension for example the dimension for time latitude longitude or height don't worry if those concepts are a little bit abstract now for you later on during the exercise you will even see them together in action and it will make no sense but just try to get the idea then we have variables so the variables Internet's EF file is actually where you will find the measurement itself in this case the no.2 concentration for example so as it is written it is in variables of a netcdf file that you will find the actual measurement and variable represents an array of values of the same type and variables are used to start the bulk data the bulk of the data in this kind of files and the variable also important to remember has a name a data type a shape and I described by a list of dimensions specified when the variables is created the two other things you need to know about the the characteristics of the net Slayer fans are attributes and coordinates attributes it's very easy those are let's say the metadata of the data so netcdf attributes are used to store asler data or metadata and most are tributes provide information about the specific variable some attributes can be related to a specific variable but some others are called global attributes and are used to describe the entire netcdf file and finally we have the coordinates which is a one-dimensional variable and this is very important with the same name as I mentioned this is what we call a coordinate and is associated with a dimension of one one or more data variables and typical defines our physical coordinate corresponding to dynamics so again if you're new to nets the f5 this might sound like information you can find a lot of references to for this on the internet but we're continue with me and later on in the exercise we will see more in action in these concepts those concepts okay so so so the next thing I want to talk about about the level tuples we're going to be used is how they are structured so you see when you work with the next deifies of Sentinel 5b you will find that the product is organized in different groups so so working with this data you will see that different groups are used to organize the data and make it easier to find so the outermost layer is the file itself and then two groups can be seen the products and the metadata group and both of them contain at the same time subgroups so why I'm telling you that well because whenever when we once we start to play with the data we will have to reach our liable that contains the measurement and we will have to navigate through the different groups and subgroups of the product so you need to know that this is the way the level two netcdf files are organized so for example you can see that within the metadata group we have a subgroup let's go for example algorithmic algorithm settings and there you can find some attributes and we know already that attributes means let's say metadata however for example in the product group we can find variables directly we can find dimensions but also we can find subgroups that contain at the same time more variables so very briefly words what is the product and where is the metadata code so the variables in the product group will answer to the questions what when where and how well this group stores the main data fields on the product and include the position of the main parameters the latitude longitude etc so one thing but one thing that is very important to know in this product group is that it is here where you will find the Q a value parameter and we will come to this later on but I want to introduce it now this is very important because it summarizes the processing flag the processing quality or the processing flags into a continuous value giving a quality percentage for every pixel 100% is the optimal value and 0% is a processing failure and you have values that range from from 0 to 200 from 0 to 100% then we have the metadata good which is just a group that collects metadata items required by metadata standards so the thing I want to highlight in this slide is as you can see the structure of the file but please if you are going to work with this data I really recommend you and it must be it is a must if you want to understand properly the data to go through the document the technical documentation made available by izi on the Sentinel 5 team is a website so you will find there couple of I think there are five technical documents describing the data and everything related to this product including the algorithms that are used to retrieve those geophysical parameters so I really recommend you to at least go through the product user manual so I am living here you and leaving you here a screenshot of the beginning of this document a link that you can go if you just google the name of the document you will also find it it is here where you will find the description of this fine structure and the meaning of everything so please do check this document it will make your life easier when working with this data okay so that's all for the introduction of the sentence affirmation now let's focus on our exercise let's start to do something so what are we going to do well the objective of this exercise of this demo is going to be to assess the decrease of air pollution caused by the lockdown in the north of Italy duty be copied 19 outbreak by analyzing the no.2 average concentrations in March 20 20 computers to March 2019 and how are we going to do that well we will do it using us as framework Jupiter load book more precisely at Jupiter lab then within this Jupiter law book we will use Python to run our analysis using two main libraries first one is going to be the x-ray library which will allow us to process the net CDA files in Python the other library is going to be harp harp is not actually a Python library but it's a software that I'm going to introduce you now but this Harper software is going to allow us to pre-process our sending a5:p data and bring it from level 2 to level 3 so don't worry I will go very deeply into this later on just to give you an overview so now that we are talking about the tools you can use to process sending o5p data I want to give you very briefly an overview of the options you have in the u.s. Copernicus retro machine so if you are more into a graphical interface setup instead of a programming one such as the one we're going to be using today you can use the easy atmospheric toolbox so if you are coming from Center one Sentinel 2 or Sentinel three you might be familiar with snap however if you want to work with something a5:p data snap is not the sort to go doesn't mean that is is not giving you any option it takes actually so there is the ISA atmospheric toolbox where you can see here a very brief description of it and the important thing here is that the project consists of several several components and the main ones are Cola harp and reason and today we will be using harp which is part of these atmospheric tools so the good thing of all the thing to remember about harp is that it will allow us to ingest process and by doing that inter compare supplied data and the thing you have to know I will have is that the tool set provides a set of common line tools so you can run it from command line but also a library of functions that can be directly used from an interface in Python or as you can see an ideal ideal and MATLAB so it is up to you to choose the interface you will you prefer the most and giving you here the links to this software so if you're interested you can help by the way if you're interested to learn how to process sending a5:p data in a more graphical way instead of with python as we will do today I recommend you to check our previous rules Copernicus webinar on this topic we did it a couple of months ago it is called quality monitoring with Sentinel 5 P it is available on our pulse Copernicus YouTube channel and there you can see how we use the vision software and combine it with heart which is part of the atmospheric toolbox to process something a 5-bit data if you are a ruse Copernicus user you can request the training kit and you will get as well step-by-step guide as with as always with ruse so just a side note ok other options as graphical interfaces are for example panoply is also very well known and very well documented software to process netcdf files and if you go more into the programming languages such as what we are going to do today I am giving you here Ali a very short list of packages or libraries that are available in Python and in our if you are more into our so in Python as I said today we'll be using x-ray but of course there's the very well-known netcdf or library or the iris module as well in our we have the very very well-known are raster packets but also dedicated next year packages just our netcdf or NC bf so again I'm leaving you here all the links so that you can have a look by yourself you if you're interested so now with all the introduction being with all the introductions being made let's move on to the outline of the exercise so what are we going to do now well first of all I'm gonna show you how to download something a5:p data in case you've never done it we will then have a very brief introduction to Python and Jupiter remember this is not a Python webinar the idea today is not to teach you how to write your code in Python however I would go slowly through it and for those of you are our beginners or there are new to Python it will be very easy to follow also then I have divided the exercise in two parts the main part the the first main part is point number three there we will I will show you how to process one single level to a null check product using Excel a Python so we will see how easy it is to import the single product in Python access to the netcdf variables of interest and then do some basic operations such as filtering the product for quality and doing a geographical subset the second main part of the exercise is going to be point four and there but we are going to do is to run our pollution monitoring all the north of Italy I will show you how you can process it multiple products in python using accelerate but also using harp because we will need to move our data from a level two processing to a level three and later on I will tell you what's the difference between both of them so we are ready to go yes let me go to my vector machine so here we have the virtual machine I've already open it and I have already logged in as you can see if you are new to ruse and to the VMS the environment is just as a regular let's say desktop is just like sitting on cloud resources for your information this is a Linux based environment and as you can see I have a predefined list of servers that are already installed so you have snap definitely the process company quiz data but also we have develop and development environment such as our studio or spider for example for pi Jupiter whatever you you prefer this is just a regular let's say you can treat the VM as a regular computer so you have your own and dedicated internet browser you have your file manager open twice and so on so the internet what I mean by that is that the interaction with the VM is the same way as you would do with your regular computer so let's start and let me show you how you can download something like five feet products so the thing you have to do is to go as always with Copernicus data to the Copernicus open access hub here you will have to go to the Sentinel 5 P 3 operations tab so it's true that the mission is already in an operational phase and this is completely true but the data are still sitting on a dedicated hub let's say which is the Sentinel 5 P pre-ops but do not confuse that with the mission status which is already operational so you will click here we will access the standard and regular interface in the Copernicus open access hub so we just need to do as always let's login to sorry let's zoom in into our study area in this case and as I said it's going to be the north of Italy so we can just draw our study area and then we need to define the parameters for our search so the first thing you need to set is the sensing period so in this exercise we are going to be comparing data from March 2019 to versus March 2020 so we will set the period I will show you how to do it for March 2019 and later on you can do the same for 2020 so for example I can go to 13 March the first and then again 2019 March 31st then of course we have to take all the same kind of affirmation to activate it and then the main parameter we have to set here is the product type so you you know already from previous from the slides I was showing at the beginning we have the level 1 B and the level 2 in this case we are interested in level 2 and more in particular in the no.2 product so with this being set the other parameters you could set they are not completely mandatory but just for you to know is the processing level so level 2 and the timeliness so since it's a product that is level 2 I know - we will select the option of line with that we can of course login using the guest account and then we just click on search so come on the main difference if you are coming from Santino 1 2 or 3 that you will see when exploring this data is that you are downloading full orbits of the satellite so in this case the data is not let's say crop to tiles like in 1702 but we have the full orbit and as you can see by the footprint of these layers it is quite massive right so you can see here we have 48 products and here the question can come why if Sentinel 5 Big Data why if the Sentinel 5 P mission has a daily visit time why do I have more products than days during the month of March right I mean I at the beginning I said it has a daily visit time so what's happening here well it's a very valid question and the answer is that if you remember my animation and maybe I can show you you can see that same thing of 5 P has some overlap between the orbits so if I go back to my video there you go once now so you see that while orbiting there is some overlaps between the orbits and depending on the lucky you are so the more you are to the north of to the south father you are from the equator your blood will be greater so you can see that in some areas the coverage by the satellite is twice so that's the reason so let's go back to my to my VM so how would you proceed for the download well basically and as for the other missions you would just click on this arrow to download the program so just download and it would trigger the action so of course and as you can see those are 48 products and then we would have to do the same for march 2020 so you can see it's a massive amount of products of course you can download them manually but I would recommend you to try to automate as this data download process with some kind of script because if not you're gonna have to click like hundred times to download the data for two months in one study area I'm not going to show the automatic download via scripts from the Copernicus hub because the women are already to extensive with the Python code but we can live maybe this for the canonization or maybe for an upcoming webinar so let's imagine I have all my data already downloaded which I have done in advance and I'm going to show you my structure my photo structure for this exercise so I have everything related to this exercise in my training kit which is in the path shirt training at MU 0 100 entering pollution underscore Italy and basically in the original folder you can find all the data so as you can see we have only three items sorry not for 393 items quite heavy data set almost 4 gigabytes and as you can see let me show you we have eight products from the 1st of April 1st of March 2019 until the 31st of March 2019 so 46 products for that period and then we have the the data set for March 2020 in this case again 46 products so this is the data we are going to use for the last part of our exercise where I will show you how to process a lot of files in python using harp and x-ray however for the first part of the exercise where I want to show you how to just play with one single product in Python I will use a near real-time product and just to show you the difference between neo real-time and offline so you can see both at the same time so this is my data my original later of course in my processing folder I will be saving all my outputs and in my house data folder I have as you can imagine accidentally information for this exercise so we can start the exercise and for that let's launch the software we're going to be using so today as I said we are going to be using Python in the woods vector machines you have already principal and a condom so what is anaconda makanda for those of you that are new to Python or to programming anaconda is just a free and open source distribution of the Python and our programming languages and this is mainly for the scientific presented computing and the main objective with anaconda if you compare to let's say a regular Python download is that it helps you to simplify all the package management and deployment so in the roots Copernicus to the apps anaconda is already installed and you can access it you can access anaconda either via common line or either via its graphical interface so just for you to know if you type anaconda / Navigator you can launch the graphical interface of anaconda and let me just show you very briefly what you can find that goes okay so what you will find when you open an economy give for the first time is that it comes already with the distribution of Python and R but on top of that it gives you the option to install or launch specific applications that will help you to develop your processing scripts for example we have our studio if you're into art but also you have spider and other developing development environments today we are going to be using jib Jupiter a Jupiter notebook more precisely we will be using Jupiter lab Jupiter lab is just a new version of the new evolution of Jupiter notebook and Jupiter notebooks are very well known within the scientific community because it's a very nice way to share code with others so the to be the notebook is an open source web application so it's run inference on the web that allows you to create and share documents that contain code equations visualizations and narrative text so you will see now when once we open the notebook but basically you can combine text images videos together with active code cells that you can run to process any data so just another overview of the notebooks because I think it's very important for you to understand what it is no books or documents contain both computer code for example it can be Python or R and which elements such as paragraph equations figures links etc and the notebooks are documents that can be both read by humans but at the same time contain the cold cells that can be read by the computer so you'll aspect about anaconda is of course the its ability to manage the packages so maybe you don't know but when you installed Python for the first time for example the basic installation doesn't come with everything you can do in Python so you have to install what's called the libraries the modules the packages that contain the specific algorithms for the specific tasks so depending if you are working in an earth observation or if you're working in another sector in the bank industry you will have to install and load different modules and usually when you run a project you want to have all the relative packages into an environment so that you have all the versions and everything controlled so in the in this case I have created a new environment that is called whose what I have installed all the packages are necessary for this exercise so what we are going to do is to launch Jupiter lab from this environment where I know that I have all my packages installed and how to do that I'm going to show you how to do it in the via DD command line so to launch the Jupiter lab in the right environment the first thing you have to do is to activate that environment and for that we right calm down activate and then the name of the package in this case goes once this is done we can just write Jupiter and the web application of Jupiter lab will watch so let's wait for it okay there it goes I am just going to go full screen so that we have a barrows okay great so here is the notebook we are going to be using here's the notebook we are gonna use today so actually I had already opened but when you open Jupiter laughter it should be the lab floor first and this is what you will see what you have to do is go to your browser here within jupiter lab and navigate to the folder where the note has been stored in this case my notebook is stored in the path of my exercise so share training at Monsieur to revolution Italy Alex data so within the outside a folder you can find in this exercise the Jupiter notebook we are going to be using today so I'm going to close this so again remember we are so the objective of the exercise is going to be and the analysis of the pollution of the north of Italy in March twenty twenty persons March 2019 so let's get start if you are repeating this webinar if you re watching this webinar and you are not in life session it might be interesting for you to go to this text this year but this is just basically our repetition of what I have said in my slides so it's just for you to so let's go now and let's see the structure of our exercise so what we are going to do and this is just again for reputation we will first of all of course load the Python modules we need we will then explore the one single sentence of five P product using its array we will do some basic operations and then and for this we will be using an near real-time product and then we will do the multi temporal analysis using harp so as you can see the the jupiter notebook contains our text links for example here i'm giving you some links for some python tutorials and jupiter not good tutorials but at the same time I can have my my cells with code that I can run and how do you wanna sell our code cell in Jupiter it's very easy either you select the cell and you press ctrl enter either you go and press the play button or either you go and press run from selected cells as you can see there are more options tomorrow but anyway let's move on so the first thing we do is - of course and you're all assuming that the libraries have been already installed so what we do is to load and import those libraries into our the node so the the thing I want here to highlight for those of you are new to Python is that point you load or when you import a library you can do it for example here and importing the X or a library but at the same time and saying import accelerate as X R and this is just I'm just defining an acronym that I will use later on in my code so that instead of writing accelerate dot and the function I will only have to write the acronym which is a little bit shorter so some libraries follow have some kind of standard let's say for example pandas it's usually abbreviated as PB don't is always a very less and P etc so let's go to so we have everything loaded so let's go to the next cell in if you are repeating the exercise again you will find here again the same explanation I've made in my slides so we can skip this because we know already would this figure means and actually what we are going to do is have a load to this folder structure sorry this fine structure so how do we do that in in this exercise so the first thing we are going to do is to define the path the folder where we have all our products stored remember I show you a couple minutes ago all our data is in the shirt training a little - blah blah original folder so first of all we declare the folder where our products are then I'm going to create a list that contains the path to every single of those files and here I'm creating a list for the offline products and then I'm creating an ugly for the near-real-time products remember I have only one near-real-time and many many offline products right okay once those lists are created I'm just printing them to see the output I'm actually printing the length of the lists and then I am creating a Python barrel variable that is going to be called s5p on the score file which is pointing to the first element of my list that contains the new real-time products and I'm just printing that so that's from this oh sorry press the wrong key so what's the output of this cell what's telling me is that I have 92 products 92 offline products remember 46 the month of March 2019 and 46 for the month of March 2020 then I have one near real-time product which is the one I'm going to use in this part of the exercise and then I'm just printing a confirmation that it's telling me that the product that I'm going to analyze right now in this part of the exercise is in this specific location in this part and with this name right so this is done now let's go to the following code cell so what we are going to do is to actually open this data set in python and have a look so this is done very easily the first thing we do is we certainly highlight what I explained the first thing we do is we call the ISA rate library because we need one of the functions of this library and we do it by writing its acronym so XR remember we said import accelerate as XR right so we call the X ray library and then with the we call one of the functions of this library which is called open data set so we say X ray open the data set and then we point to that I said and here I'm just calling the variable that I created before pointing to the file that I want to analyze now right and then I'm going to save all this into a Python library sorry a Python variable that is gonna be called as 5:00 p.m. on the score image IMG underscore ga-ga stands for global attributes so by running this line I will access the global attributes of the file which I'm going to print in the following line then we will have a look to the metadata group but we will not have a look to the metadata group but to the a subgroup lies within the metadata group right if you remember sorry if you remember here you can see that the metadata within the metadata first-level group I have second-level groups right that I can access in Python so let's have a look how you do that so we basically do the same we call the accelerated library we call the open data set function we specify the file that we want to open and then with the key parameter group we specify to which group we want to go in this case we want to go to metadata and for example the kernel description all of this we would save it into a new Python that variable is going to be called as five beam underscore in IMG underscore empty for metadata and then we will just print this variable to Hubble and the last thing we are going to do here is to do the same for the product group remember the product group is the most important one for us because it is their word the actual measurement is so we do the same Excel data set and we say that into the Python library sorry into the Python variable S 5 P underscore main underscore PRD for product and then we printed so let's run this and havoc so I go down okay so what do we see here well the first thing is the global attributes that we were opening as you can see I have since this is a netcdf file which has been opened as an accelerate data set so now we have a accelerate data set of object and as this is a netcdf file we have dimensions we have variables and we have attributes however here there are no dimensions no variables no coordinates why because at the blog at this level of the product we are in the global aji which the only thing we have our attributes remember attributes equals metadata to make it simple right so what kind of information do we have at this level well but you can see it for example I don't know the the idea of the product here you can see the orbit almost the orbit number you can see keywords you can see the creation date etc and now the question is okay nice but some of this I understand for example source is pretty obvious what it means but what is naming Authority well this for example this I mean what I mean here is that by looking into the product user manual that I was making reference to before so let me show you my slides here by reading this document you will see the full description of all those attributes and about everything on the file so again I recommend you to go to this document I would insist several times and read it if you want to get the full picture of the data ok so those are our global attributes nothing very exciting so far then we accessed the metadata the granule description group within the metadata good and again here we have only attribute so metadata right processing mode processing level etc so information that ok good to know but not actually the measurement we want again if you want to know the full description of this go to the pro user manual and then finally we access the product group which in education we can see it has dimensions remember in my presentation this slide dimensions variables attributes coordinates world now this makes more sense when you can see that in the file you have dimensions you have coordinates and you have variables once we access a variable we will see now the attributes so this is the first overview of the product and now you can tell me well this looks great but I don't see anything I don't see any measurement I don't see any image let's say this is Earth Observation and you're right so what we have to do is to actually access the information so so far with the we've seen a lot of groups metadata attributes and whatever but when you want to see the measurement that your physical parameter that has been derived from the measurements of the satellite and where can you find that well within the product group so within the product look here you can see that we have variables and remember in netcdf the variables is what contains the the data so they are used to store the bulk of the data right so in the variables of the product group you will find the for example troposphere column which is the one we are using in this analysis and how do you access that well let's go to our following code what we are going to do is we are going to create a new Python variable that I'm going to call no.2 and this is going to point to my netcdf file product group remember the variable we were using before so we are saying here the no.2 variable is pointing to my netcdf file group product and within the product group I'm going to access the valuable nitrogen dioxide sorry yeah nitrogen dioxide tropospheric column and this is the syntax you have to use so it is like that that you access a variable it is like that that you access the variable that contains the measurement in this case the geophysical parameter and since this is Excel file remember we can access there it's we are going to print the dimensions the coordinates P attributes and values or values here so let's let's run it and have a look so finally we have rich our measurements again we have specific dimensions for this variable in this case time scanline and one pixel scanline and one pixel makes reference to the coordinates so the of of the image so the example of course this is a J located product so the latitude and longitude you can see it on the coordinates then we have the attributes of this variable which are as I said metadata of this variable for example we see the units of the measurement they are provided in moles per sorry in mole per square meter we see for example the standard name of this variable or the long name or whatever and one key attribute I want to point out here is the multiplication factor so this num this attribute here this piece of metadata it's going to allow us to convert the data from its original units from mole per square meter into other units for example here two molecules per square centimeter and then if we access the values of my energy variable here we can see the actual measurement so the is actually an umpire array that is holding the values and if you are a little bit familiar with Python you know that as well you have an Impala right there you are ready to go you can do let's say regular Python manipulation of the data since you have an umpire right right so this is very good that the package gives you that this package gives you the data in this format so as I so now here's the measurement here is my tropospheric column of no.2 and of course you can tell me what this again Miguel it looks very nice but it doesn't help to be at all would you what I need what I would need or what I imagine you need is to plot this data somehow right so let's go now and see how you can very easily plot this data in Python so what we are going to do and this is not mandatory is just for you to know we are going to convert the measurements from their original units which are again mole per square meter into the molecules per square centimeter and why is this just to give you a little bit of context the units are used originally in Sentinel 5 video belong to the International System of Units so in this case for an integrated column value this means more per square meter however traditionally integrated columns are measured in molecules per square centimeter so you can easily move from one units to the others by using the multiplication factor and how do we do that we just overwrite our no.2 variable and this is important pay attention here we are over writing so if you want to go back to the original values you would have to re-import the product this is not how you have to do it this is just how I do it is just the way to proceed so I override my no.2 variable and I say that no.2 is now equal to no.2 times the multiplication factor so it's not that you multiply by the number but we specify that in okay and then we just taught it so with we think they say doing this exercise we are going to create different plots and I'm not gonna go to the code that plots the data because I don't think it's done relevant you can do your plots if you have some exchange with Python in your own way and you can follow your own preferences so I will just I will not go deep into that the only thing I want to mention is that we use the P color mesh function from MATLAB which allows to plot irregular data so let's have look the data remember we import the product we multiply by the multiplication factor and we plot so let's have fun so this is then the first time we are looking at the data and the plot is it's good there is information you for example we can see the latitude and longitude on the Y and X so you can see a little bit patterns in the concentrations the different values etc but I would say from my point of view it is not very helpful because it doesn't help us a lot to locate the the measurements into a geographical area right I mean of course we have the lat and long cord ends here so if you know for example where your city is you can you can basically for example Rome is in level and for the two so I know Rome is over here okay it's not the best way to do it right it's not very scientific so of course we can produce their second plot we can produce another type of plot using a dedicated Python library that is used to produce better visualizations of the data so I know I said I was not gonna go into the code that makes the plot but I should here I will and I will explain you why so let's run this code here remember here we are not touching our data we are just showing it we are just creating visualization and a display so I'm saying that because you can do it the way you want let's first analyze my plot here and then I will tell you how you can make some modifications to it very easily okay so here we have the data we see we have added dedicated title we have added some background information for example the coastlines we see a little bit where we are this the scale bar and some grids of the projection so this is Carta pie this is a card okay visualization and again one of the good things of cut to pie is that it allows you to create projected visualizations let's say again I'm leaving you the link to this library here if you are repeating with the exercise so you can have a look so one of the nice things of car to buy is that you can change your plots very easily for example I'm going to show you for example here we have added words column Caterpie a stock image it's just a background image to put some context if I comment this line and I run my code again you will see the now the plot is displayed without any background image right so if you prefer an image without any background information then you can leave it like that for example we can remove the grid lines so we can just do the same we comment the line so that it's not executed by the kernel and then we get the image like that you can see here I have added the rivers the main rivers which I could remove by commenting this line or for example I could select the option set global which will close without taking into account the extent of my product so in this case we are using an orthographic projection and as you can see this is creating this kind of plot where have your let's say the globe right some like projections like like a UTM or something so this is up to you you can play with with this code as you want and in my case I prefer to not have the global option selected to add some background image some grid lines and the rivers so let's run again also I have added to the plot the some points of interest just to look at our our path so I've added row I've also added Milan and this is the area where we are going to focus but I've also added for example Madrid where we can see it's kind of a hot spot in this image here over Spain so now that we have a second and a bit more advanced visualization of data we can move on and do some basic operations so one of the main things you have to think when working with Earth Observation data I would say in general is the quality of your measurement it's not that you can take it for granted of course there is a having work done by user to share the quality but there are some conditions that might happen in your target I'm thinking about the the condition of the atmosphere if it's cloudy that you can't control and this will destroy the quality of your data so the so the quality of the even individual observations depends in Sanel 50 on many factors including cloud cover surface albedo presence of snow eyes saturation geometry etc so these aspects are taken into account in the definition of the quality assurance value which is available for each individual observation meaning each pixel which provides the user with an easy filter to remove less accurate observations so the QA value is a continuous variable ranging from 0 which is arrow to 1 which means perfect and you should use the filter as you can see here in two ways either you apply a filter larger than 0.75 or larger than 0.5 it is recommended to apply the 0.75 because it removes cloud cover sins and partially snow ice cover sins and errors and paramedical triples so the geopoint high filter is also okay it also removes bad quality data but it also would imply a different processing in terms of averaging the ones but that's not going to that I would recommend you to use 0.6 75 so the you are sure you are using pixels with the better quality so let's actually visualize this quality in our product and let's see what happens when we filter the image so how do we filter actually it is very easy we are doing it here in this line we are saying we are creating a new Python / variable that's called no.2 filter and then we are saying that I want to filter my no.2 product wherever the quality assurance value so wherever the quality assurance value is bigger than 0.75 right and then we just plot so this code here is just to plot so let's go and here we can see the plot on the left side we have the quality flat layer so the pixels are classified from 0 to 1 you can see that most of them have good quality some areas over here are have less than geopoint around 0.4 right when we filter our image with the code I'll show you before this is what we get so as you can see when you work with send a low fat P and you apply the quality filter it might happen that the observation that you have that day for your area it's not valid at all you have to wait for the next day that's why making multi temporal analysis makes more sense because then you can make sure that at least you will have some observations throughout time that are valid as you can see we were interested in north of Italy and we got no data so nothing to do there of course you can lower your threshold but of course you are assuming you are taking some risks by the way in the code we are automatically setting the plots by using this line here which will automatically save a PNG of this visualization so that you can share it with your colleagues or included in your reports so the last thing I want to show you when processing a single file is how to do geographical subsets so this is something very easy in Excel you just need to define the coordinates of the lat/long coordinates of your study area in this case I am defining the upper right let alone coordinates of my area of interest and the lower love the lower left coordinates and then again I use the word function so i'm creating a Python variable called you know two subsets and then I say my filtered product which I saved in the variable no.2 filter I want to do a subset and I say where so I say where the filter product has longitude tides lower than the upper right longitude and at the same time has a longitude ice bigger than the lower left longitude current coordinate and so on just a basic operation so in the way we filter and then we plot so the code here is just making the plot and at the very end I'm saving again this as a PNG in the processing phone so let's run it and have a look ok so that's how we make the subset and as you can see here I did my subset over Madrid and well that's the result very easily you can produce those subsets and let's say zoom into your study area and in this case we are processing only one single product okay so let's move on to the last part of the exercise which is going to be the multi temporal processing of Sentinel 5 beam so it is actually here where we are going to run the analysis over the north of Italy so how are we going to do that well let's first talk a little bit about harp which is the software we are gonna use you know already and it's written here on the jupiter notebook harp is part of the isa atmospheric toolbox project and ames which aims to provide scientific sentenced with tools for ingesting processing and analyzing remote sensing data so heart is a toolkit that will allow us to do this processing it has a command line set of tools but it can also be accessed via the python interface which is what we are going to do today and by appropriately chaining calls to the heart command-line tools or via python we can pre-process satellite data so that it can be compared and have a same temporal and spatial grid and this is the key part to have a common temporal and spatial tree so we are using harp because we need to generate level three products level three products means that our table to energy product is going to be resampled into a common grid a common tree that will have an equal pixel size because by default the pixels of Sentinel 5p are not equal throughout the swath because we are using a because of the characteristics of the optical sensor we are using right and be very large sort of twenty six hundred kilometers so in order to have pixels with the same size we need to let's say create a new let's imagine an empty raster well we define the size of this new raster we define the pixel size of this Russell and when we transfer the information of the original pixels to this new raster that is obviously projected so this is what is going to happen and this we do using HAARP if you want to know more about this in the previous webinar which I mentioned here sorry it's it's here in this webinar we went through this processing using a graphical interface as well so if you're interested in that you can watch an another explanation of this processing in this web lab so how do we do this level 3 conversion well the first thing is I am defining an output folder where I'm going to save my outputs so just like this then I'm going to create a for loop where for all the files I have identified in my variable which contains a list of files so this variable here is just pointing to all my files in the original folder remember we created at the beginning and we are gonna say for every of those files in my original folder I create a Python variable which is going to consist off first of all we call the heart library and within the heart library we call the import product function and then we say for I so every element of my list we're going to run a set of operations in harp so the first thing we do is we filter our data for the quality so everything above 75 then we convert our units to molecules per square centimeter then we create a new variable that's going to be called time stop sorry date/time stop she's going to allow us to have the range of the measurements so from the beginning of the sensing to the end and this is also this is only to help us in the data analysis you could skip this step if you want then we do a subset to our study area so if you see here I'm defining the latitude and longitude of my study area so it's going to be between forty three point six degrees north and less than forty seven point two and so on and then comes the key operation in this set of operations which is the bin spatial tool so pay attention here because this is the key part what you want to do here is to define this spatial grid this kind of new raster where you are going to be sample the data and you do so by defining the properties of this new grid using the lower left corner as preference and I repeat that the lower left corner pay attention because if not you will spend a lot of time trying to figure to try into figuring out this so pay attention here we have six numbers we have three numbers here and then all the three numbers what do they mean what I'm doing here is first of all I am in the second position I am defining the latitude coordinate of my lower left corner of my grid right so we have our study area I will let say study area which starts at forty three point six latitude north then we have to define a resolution for that grid which is going to be defined in degrees so this case zero point zero one and then you have to define the extent of the grid in latitude so we define that here with this number so it doesn't mean 360 latitude that doesn't make sense what we are doing is 316 multiplies multiplied by the resolution which is zero point zero one equals three point six so three point six plus forty three forty three point six will give you the extent in latitude which is going to be forty seven point two so this is the logic behind and again with the same here for the longitude of the lower left corner with seven point six which the robot left longitude coordinate and then we define the extent in the longitude direction by defining the residue she opened you do one and then the number of steps 610 multiplied by 0.01 its 6.10 if you to seven point six plus six point ten it will give you thirteen point seven okay then we just derive the the central coordinates of the pixels and then we define the variables dart that we want to keep as output in our product so once this is done we would save that product into our processing folder here with a specific name it's going to be the same name as input but we only replace the level two in the original name by level three and then we just call the Harpe export product function and we just save it as netcdf file so i'm not going to learn this because it would take it takes like twenty three minutes so i'm going to avoid it but i want to show you that you can actually see this very easily let's see if i manage to show you so if I go to my processing folder you can see that here I have done this already and I have a lot of files in total 96 I can see that they have the same name but here I have changed for example level 3 right there will 3 all the way around ok so let's move on now we have all our products we greeted into a common special grid now we just need to do few steps to see our final analysis the first thing we are going to do once we have to process our data into level 3 is to extract from the product name the time coverage start and time coverage enter and we do this here why do we want that so I'm showing you here an example so for every single product we are going into the attributes of this product so here we go into the attributes and more in detail we go into this specific attribute we are in rested on which is the time coverage start and end and we store that information in order to allow us indexing of the products based on time so this is done here and now we are ready for the multi temporal analysis here ok so let me run this and I'll explain you what's happening what I'm doing is telling in Python okay I have my level 3 products processed by HAARP and they are in this path in the export path I have defined before and they should be named something that contains level 3 energy so I create a list with those names and then what I'm going to do is create a new Python variable here that is going to be equal to the input of these data so I call the accelerate library again and I call the dedicated function to open a list of files which is called open underscore MF data set I provide as input a list of files in this case file name 3 and then I say combine nested this is going to create a stack of files and the files are going to be stacked along the time-dimension to allow time indexing and how is this time dimension created well we do it using this pre process function pre-process function that is defined here on top what i do is for every single of the files that i'm that have created in Harpe i create a so what's being done is that we create a time dimension and we create a time dimension by accessing the time coverage start value that we extracted before in the previous cell this will allow us to create this time dimension that will allow the center us to stack the products over time so as you can see I have netcdf file that has been combined and I have 92 pellets remember and sorry and they go if you see here this time dimension they go from 2019 March 1st until 2020 March 31st so you know already we have the tube full data sets for March in both ears you can see the different data variables that we have kept and with that being said now you can tell me well that's good but what I want to do is to produce the analysis for 2019 and 2020 independently so what we are going to do is we are going to create two we are going to we are going to split the data set in the two years right so I'm creating here and now a new variable is going to call level 3 march 19 which is going to be equal to the previous one so level 3 march 20 1920 and here we are going to do at time slice so we are going to select only the files that are between the 1st of March 2019 and the 31st March 2019 then we will do the same for 2020 you see here the name of the variable it's very similar but ends in 20 and then the question is what is design for well if you remember colonel I didn't have 30 I mean March has 31 days but you have 31 products I have more why because this overlap of the orbits right so what I want to do is to keep one single file per day and I do that by averaging the files that belong to the same day and this is what I do here in this line I'm just doing a resample over the time dimension every one one product I am doing the mean over the time so let's have a look at list and then you can see here is the data set for March and 2019 and the data set for March 2020 it's just the same but now we have one product per day and I have speeded the datasets in the two years in the two months of the poster 99 in 2020 so you see along the time I mentioned now I have 31 products so 31 days for both and you can see the confirmation here the 2019 data set goes from the first to the end of the month in 2018 and the same for 2020 so now that I have the data splitted into two months I'm almost at the end of my analysis the only thing I need to do is to average the measurements throughout each month to produce in this case the mean statistic so I'm going to create a new pattern variable that's going to be calling this case level 3 March 19 underscore mean what I'm doing is I'm telling Python look take my previous variable L 3 March 19 and derived the mean value of the files and do it throughout the time dimension so as you can see it is very useful to have this time dimension because it allows us to do very logic operations that one would think if you think about what you want to do with the data you you say I want to do the average all the time and the syntax here in X array is very easy to follow because it's pretty much following the logic of your head ok so I do this min calculation for 2019 and I create this variable here and then I do the same for 2020 and I create this all the variable here okay fine and once I do so what I'm doing is I am accessing the tropospheric no.2 column specifically and I'm saving that into the final variables of our analysis today which is our gonna be a no to March 19 min and a no to March 20 min and we can have a look to these variables as you can see that's how they look like from the let's say the the first perspective but of course we want to have a plot it so what we are going to do as the last step of the webinar is to plot those variables so here I'm giving you the option to select which most you want to display so this is interact I mean you can just change the script for example if you want to plot 2019 you just change this and now we want this cell and here this very long cell is just doing two plots so I know if you are new to Python or you are not familiar with this it sounds like a lot but this is just making a plot I will not go into the details because it doesn't make sense but it's making the plot of course you can make the pods in many ways and at the very end I am saving the plot as a PNG file so let's have a look you know having just a warning there but you can so here's our analysis and we see the final result for the average energy concentrations at the tropospheric level remembered for March 2019 over the lot of people so yeah I have added here just a little map will show you where we are in Italy if you're not familiar with its geography and here is the our study area so you could change the extent of this plot by going into the bin special tool sorry by going into the harp processing we have done so if you want to change the characteristics of your plot in terms of spatial resolution you can just come here to these two lines and change accordingly and in this way it would it would change okay so those are the results for 2019 and we can now have a look to 2020 so I just changed this by 20 and this again by 20 and I just run again and I have a look to the results I so as you can see the concentration is much lower and if you think about the situation due to the Kobe 19 outbreak you can imagine why is this happening it was a massive lockdown in Italy especially if it started a little bit earlier in the north but it has happening all over Italy and all over the world I have to say so this is the consequence we can appreciate from space using sending o5p data as I told you I have saved the files the plots as PNG so just as a side note you can for example in Python create a gift to actually see that to see the quick display between the plots so I'm going to show you the result of that ok so I have created in my my plots folder sorry in processing plots I have created all the plots basically so you can include them in your reports and send it to your colleagues or whatever and in this case I have done a gif so we can display it very easily even in Jupiter for example I can just drop it down here sorry I can first put a Mont down cell then add it here ok so if I want that you can see you can see the plot so you can see the large decrease in concentration of no.2 at the top level over the north so that's that would be the end of the analysis today it's a long webinar but I think we went through the data we went through a very detailed description of all the files and at CDF characteristics and the way you can process that in Python so we with with being said I just want to give you some highlights for the flow to end the session going back to my slides so what I want to tell you is with the show as an overview of the webinar and of the situation with the Copernicus program with the new Sentinel satellites the challenge in certain remote sensing is no longer data availability but rather how to store and process all the information in addition to that it is necessary to explain that it is necessary to explain how the data can be used and support users in their applications the route service is here to solve those problems by solving by providing vector machines to store and process the data and by offering a dedicated help desk supported by a team of remote sensing experts that can help you in your projects so before moving into the Q&A session the meter union that you can repeat this exercise by your own if you want to practice or do your own applications and for that go to host Copernicus taught you and you can get your who's computer machine and repeat this exercise so let's I will end the session here and then we can go to the webinar we can go to the Q&A session sorry so thank you very much everybody for attending this webinar I hope you are all doing great and I hope you have learned something new today about 750 maybe also about - you will receive a poll after this webinar so please filling the the questionnaire we really appreciate your feedback and it allows us to improve in the future also if you are interested in more python related webinars using and processing sentinel data please let us know and we will try to do it okay thank you very much so again thank you very much for joining the station I hope you have enjoyed the webinar you have learned something new about sending 5p about python or about any other software used I wish you a nice afternoon stay safe stay safe in this context and I will see you in the next webinar so thank you very much and top
Info
Channel: RUS Copernicus Training
Views: 7,084
Rating: 4.9694657 out of 5
Keywords:
Id: CE6BeLPORIE
Channel Id: undefined
Length: 93min 22sec (5602 seconds)
Published: Wed Apr 22 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.