ODSE Conference - Day 5

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay we start now already because not everybody is back but online that's what they do non-stop that's what they do so so we are happy to be back uh with a colleague and uh all colleague raymond slaughter from the netherlands based office we're super happy to have him as a keynote and to give a talk uh about the activities of the netherlands based office especially about the copernicus program so please raymond uh keep it uh keep it to let's say uh 30 minutes and then we can have a discussion please yes okay thank you tom and i'm so happy to see real persons because this is the first and also many online persons because this is really the first presentation i gave in one and a half years so i'm really happy to talk about the copernicus program which we managed which we managed on national level from the netherlands space office but i would like to start with this image showing the constellation of esa satellites that have been built the last 20 to 30 years and most of them the last 10 years and especially the copernicus satellites satellites will be the red line uh during my talk today um i will give an overview of the uh the the operational copernicus program is the sentinel i will talk about the the next generation sentinels and high priority candidate missions i will talk about the earth explorers the scientifical missions of isa and then we'll talk about the ways to access all these data both the classical way and both the novel way and what's coming i'm raymond slauter i'm a physical geographer i did a phd in land remote sensing ngis actually hyper spectral remote sensing and now i'm advisor data and applications at the netherlands based office the netherlands space agency there i'm a delegate for the for esa and the data operations scientific technical advisory group where the entire earth observation program is reported every every three months and i'm doing that already for 10 years so i have quite some historic information in my mind my brain for moreover i started to be the coordinator of the horizon europe space program in the netherlands and i'm a general copernicus expert and a data expert especially on the data infrastructures yeah the copernicus program that's the operational program of satellites from from europe and we started it already it was first called dmes global monitoring of environment and security and it started in late 90s the ids and since 2011 2012 the first satellites came operational and now we have many satellites operational we have six sentinels sentinel one is radar uh is operational and there's uh central two is optical mission three is also optical actually sentinel five and four are still being built and will be launched later and i will uh discuss them in a little bit more detail in next slides and now the a and b units are already launched but the c and d units for every satellite it will be four four satellites to be to have operational data up to 2030 plus and that's free and open data guaranteed so sentinel one is a tsar mission an example of uh flood detection in the amazon area sentinel 2 is a multi-spectral imager like like landsat with 13 bands from visible to uh to short wave infrared uh highest resolution 10 meters the third with the two satellites we now have a revisit time of two to three to five days around the globe and mainly used for land covering use and on the right you see an image of sentinel 2b in the test facilities in in aztec in the netherlands just before it was launched um yeah sentinel 2 i say said already it's it's a landsat like satellite and we have now as you can see on this image we can very good merge both data sets to have very long data data records of um of land cover uh sentinel 3 is um especially for the for the land the ocean and land collar instrument is is most most used uh so around 250 meter resolution but daily um daily coverage and on a large scale uh quite for example for entire europe you see an example of the droughts and then in 2018 i could show the same is for 2090 and luckily this year was a little bit wetter not lent related but sentinel 5b sentinel 5 precursor is a is an instrument that measures atmospheric gravity and for example no2 methane daily global coverage with a resolution of 3.5 by seven kilometers i showed some tulips on that because the instrument is built by the built by the netherlands and the entire satellite is built by uh by isa and this is a very successful mission here you can see an example of it it shows a nitrogen dioxide yearly image globally really showing the the the sources of um yeah of air pollution you can even see uh the ship tracks between asia and and europe and of course china europe uh south africa um we can make these images uh via the daily uh global coverage because it's an optical satellite so we uh with a lot of clouds of course and we really need uh the uh the global image to see this entire picture the yearly pic image there's also a sixth sentinel sentinel six michael freiligh uh actually that's not a land mission but a mission that the altimeter that measures the the sea level and and sea level change and um it was launched last november and um it's now in excellent uh conditions and the operations are started what i said yeah we now have two units of each sentinel operational now there will be also central c and d units to have the continuity up to 2030 because most satellites have a lifetime of around seven years these units are built now actually in the pre-phase studies and the requirements for this unit is that they will give the continuity of the measurements of the a and b units so we will not have big changes for that but there's also a goal to enhance so the goals enhance continuity but there's also a goal to have new products and to improve the performance um the central satellites uh took 15 years to build them and you can imagine that yeah during this 15 years the technology improved so the technology improvements will be will be in the updated satellites but for example if you look to sentinel 2 don't expect that it now will have a 5 meter resolution we will continue with the 10 meter resolution so now the phase 0 studies are ongoing and the launch will be in the mid to late 20s but there is more than only the sentinels because there are more the the basic observations and we need more to we need more observations also with other other techniques and for that the sentinel high priority candidate missions eight pcms are developed and they are now renamed to the sentinels expunge expansion missions so after sentinel six there will be sentinel 7 and sentinel 8 9 10 11 and i don't know if they continue the numbering but what we know is that the first satellite will be the co2 mission to measure co2 and that was uh decided after i think cop 20 cop 21 before the trump era the last copy for trump started and then there was most high priority that we could measure co2 in the right way so this mission uh is now really being built and we hope we can launch it in uh 22.5 to have a good observatory for for co2 emissions and co2 uh development and i hope that we can monitor and decrease in co2 but there are notes only the co2 emission but there will also be a mission on polar ice and snow topography crystal a passive microgrowth radiometer simmer and you can see here on the right the three missions for interesting for land and that's uh the land surface temperature mission the chime and hyperspectral uh imaging mission and rose l and l bensar mission and i will discuss them in the next slides yeah shine we have waited a long time for it especially myself because i'm active in hyperspectral land remote sensing already for years but now uh shine will will be uh we will have a really a hyperspectral mission from space um in the from the in the short wave infrared uh 400 to 2 500 nanometers uh very small bands of 10 in an expected resolution of 20 to 30 meter with a revisit time of 10 to 12.5 days and um yeah the yeah the the there are many applications for hyperspectral or remote sensing like agricultural uh agriculture food security biodiversity uh especially as we have a lot of knowledge and already on my perspective remote sensing based on the uh on some pilot pilot missions and also based on missions on airplanes expected launch late between 2026 and 2030. um there will also be a mission on land service temperature monitoring uh yeah we have this capability we had it for example on the astra satellite and we also have it on on meteor software on a very high on a very low resolution but now and there was on landsat on uh there's also a tier in in europe we didn't have this these capabilities so there will now be built the lstm mission uh thermal infrared with a spatial resolution of 15 meters revisit time one to three days because two satellites will be built it's also important for example for the correction of clouds in sentinel 2 because with thermal infrared you can do that very well and there will be the third mission rose l which is a l band synthetic perturbator it's um then the l bent radar i'm not a specialist really in in radar but in wageningen there are many specialists but l bent radar can penetrate uh through many materials such as vegetation and um and snow and and ice so it's uh it really gives an enhanced information uh uh with respect to the to sentinel one uh c-band and as expected spatial resolution is 5 to 10 meters with a sloth of 260 260 kilometers and it's for example it's it's useful for um yeah detection of ground motion but also detection of biomass and many many applications for this uh for this satellite so the sentinels are the operational missions but these are also working on on scientific missions and sometimes these scientific missions are upgraded to a sentinel but the last uh 10 to 15 years we have the the different uh scientific missions like adm that measures wind with a with a little laser requires set two the radar two uh i will show that in the next slide but adm measures wind earth care is still to be launched we'll we'll measure clouds gotcha is now the mission is finished already but uh but measured to gravity field and swarm is operational now and measures magnetic uh in the magnetic field of the of the earth uh yeah i mentioned cryosat yeah it's an um it's a radar and it's a really it can very good detect eyes so for example if you see the changes of ice in the polar areas it's um it's most the time cryoset that provides this data smosh is the sole moisture and ocean salinity mission i show here only the sole moisture um uh example yeah it's a passive microwave image here so it measures the the radar um signal emitted by the earth and actually that's um on the low resolution uh 30 to 50 kilometers but you can you can have global information on uh on seoul monster if this with this device and more are coming we are now building biomass a radiomission really dedicated to to measure biomass a flex is being built uh it will measure fluorescence um vegetation parameters uh they will be launched the next two years and forum is a mission dedicated to to measure the far infrared and it's really needed to have the the the heat um yeah to to to measure the heat balance in the atmosphere and it was decided very uh this year that there will also be a harmony mission uh measuring um with radar um ocean patterns uh like circulation and in harmony also the netherlands uh they they uh it is an id of the netherlands actually this mission if you combine all these missions you have a lot of information about the climate since since the 1990s and to have very long time series um there is a so-called program that made the essential climate variables the ecvs and that's not only earth observation but within the the climate change initiative for many many perimeters there are there are long time series being built uh for example on sea level sea ice ocean color etc and on the left you see the the land-related ccis like biomass high resolution land cover land surface temperature soil moisture and land cover and with this this brings me okay there is a lot of data around and more and more data is coming and um yeah how can you access this data now the classic data access is the for example the the open access data hub it's um it's a classical way to to download your data and there is a one open access data for for for for everybody and there are other hubs for the copernicus services i will talk about it later umet sat is also providing information through unit cost and um through the copernicus online data access units that is mentioned here because they have activities related to sentinel-3 and there are also collaborative data hubs where countries can have a special link to the data uh for their for the national uh national platforms and data infrastructure um for copernicus you can have the the raw data through the hub but there are also several services uh being developed and um they are especially used also for for policy making by the by the european government and by the national governments and there is for example the land monitoring service and the marine service the atmosphere service emergency monitoring service and the security service and the climate change service and i think last week i didn't attend the entire conference but i think last week the climate change service and at least comps have been presented by ecmwf these services provide remote sensing based data but also data coming from models and data coming from from from observations foreign operated satellites but researchers for the researchers there is the earthnet program and it's really the backbone yeah it's already for for 30 years and there you can as a researcher you can you can obtain data uh from from from third parties like for example high resolution data from from iconos information from rabbit eye spot spot data and to obtain this data you have to to write a proposal um to isa and and then there's a good chance that you are granted access to this uh for me for more information you can see the links here um yeah and access to today it's changing because um yeah it's not only data you also need uh information and i think a good example for this is the uh the the website of the isa climate office also related to the to the ccis and here for example you you see the links to these climate office but also to their uh to their web viewer and they have really a very nice web viewer of all these information from the from from from the ccis here i show you the example of of land cover you can also download this data but especially for example policy makers this is really data they they can yeah they they understand or at least i hope they understand um more there are foreign there was a lot of yeah the the hubs only provided provided raw data so there have also been uh portals being built uh to access the sentinel data and uh one in interesting one is the the so-called sentinel it's operated by uh by synergy ice and there you can also download data and browse all data but also view the view the data um yeah a lot of data and it's it's really increasing every year so there is a sentinel data dashboard where you can see how much data there is and it's growing every year yeah at this moment for the entire sentinel missions we have 380 petabytes of data and with uh more than 40 million uh different products and uh half a million uh half a million registered users and this this number is increasing every year and uh the whole system that's hosting this data is now updated also to uh to cloud environments um and what you see is that yeah during this file in the last five to ten years it's not enough anymore that you can there's so much data you cannot download everything anymore and um you can do it but you need a lot of space then at your own uh and good good network connections but what we see now is uh we see the development of of big data that you want to combine different data sets the data science and um ai really started to become useful the last five years and more than useful because with ai we can really uh do the processing more more efficient so big data ai and data science are really changing the classical ways to access the data and we are going to uh to to cloud platforms where you can access the data still and download the data but also can do your your online analysis and yeah google earth engine was was one of the the first uh platforms that that really provided these um these possibilities uh i will not click on watch video but they still have a nice video explaining which data sets are there for example for the open data hub of the isa uh sentinels they were the main user because they they they downloaded everything from from from esa three open data hub and yeah they provided this in a in a nice very nice platform and an accessible platform but you cannot do everything in it it is still limited but um guys some some other colleagues who can tell you more about that what you can do with uh with google earth engine and whatnot but you see that this really started and yeah development of also other platforms and for example at amazon uh you can uh they know they also have uh the open data available and uh there you can also use um sentinel data from their own systems and microsoft also has it in there in their platforms and there were more things started and esa started already five or six years ago with the so-called thematic exploitation platforms where for certain communities there is a cloud platform made to to have information products and analysis facilities and there are several tabs being made and now they are all finished and they have to do their they are not hosted anymore by isa or finance anymore by isa but they are now commercial um semi-commercial uh platforms and working together with the with the community and there was another initiative um started by the european commission where they said okay we want to have better platform access to the to the to the copernicus data it's also started i think four years ago five years ago and then uh so commercial uh provided providers had startup money to to build up um their platforms providing sentinel data but also providing processing capabilities in the cloud and actually the fours started by the ec that are mundi so blue creo dayas and uh and honda and one of the things we don't know which will will survive the the the next five years but um but they are still uh really working on that on the user um uh yeah the user base it's not only for uh scientific users but it's also for uh for yeah for for commercial users who can make who can do their value adding services within their within their platforms um there was a is a fifth one uh vikio and actually that's a semi-commercial because it's um it's cooperation between ecmwf helmets umidsat and mekkas mkhitaryan ocean actually marked the ocean is the opera the operator of the the marine services and what we see is that we keo is most attracting for the uh for for researchers and vko is also most active to to to have the the virtual environments and to to include their ai uh capabilities in the platform the other ones you can do processing but not really the entire ai thing and it looks like they [Music] they made quite some developments these the last years with that because yesterday i got a very interesting announcement of um of a mooc a massive online open online course on ai for earth monitoring and you can see the link here where you can subscribe for this mooc it's uh i think it's six weeks it's it's open um yes implemented by you made by the the people who are working with uh vto so we clear will be the deals there they will use for their um for their for the for the for the mooc and actually um yeah it's uh the platform is open accessible on a sort of trial basis or if you are a low not not that intense user and if you want to have more or more processing capabilities here you you need to pay um yeah there is even more yeah the collaborative ground segments the idea with sentinel was that isa only should provide the role data and that then the different countries on national level should make their national customized product and some countries did that in the tenants we don't have it it didn't happen that we had enough mass to have this collapsible ground segment but you see here the list of all countries that have such a collapsible ground segment many of these are just only mirror size of of copernicus data and then mirroring the the data sets for further for the country but for example if you are living in the netherlands belgium is also mirroring the dutch territory for sentinel data within their within their ecliptic one segment terrascope and interesting of periscope is also that they are on not only a data repository but they also have a good viewer but they also have um virtual machines available to do uh to do your processing directly on the data same for the eodc from austria it's um yeah for example especially they have many data sets also on on soul moisture and analysis ready data off of sentinel one germany has an initiative code de also yeah providing quite some some functional data and also the the processing capabilities and they are also organizing workshops on that and actually they they also have data outside germany and yeah it's really really interesting and in the uk there is cedar and jasmine and they also have the sentinel data and it's really very well linked to high performance computing in in the uk and the other countries i don't have so much details but i think sweden that's really a big data lab approach that they that they they wanted to have i'll show you the eodc and the terrascope on the right the data cubes and then there is another development data cubes we have the euro data cube that's initiated by the by the eza member states i think it's based on the open data cube and there's a seals open data cube an australian data cube and to have it more google engine like there is the the isa open earth engine being developed now and there are also r in data cubes in r and there's an interesting data cube that's called raster man but actually what's a data cube um yeah it's a multi-dimensional cube and of of spatial spatial data different data sets you can query them and analyze them the different data sets directly but also in time so you really have a multi-dimensional analysis engine to do your to do your analysis it's difficult to explain by word but there is a very nice uh picture on the uh this very nice movie on the on the isa website on the euro data cube uh i yeah i should say after this meeting uh started up or look it up and it's really it's a one minute video but it's really a good it gives a good overview of what you can do with the data cube because you can really do yeah multi-temporal analysis for example on climate parameters and combine them and see in time where the anomalies are yeah there's so much it's really a forest of trees of data platforms available so esa started an initiative which called the network of resources to have at least a search engine and an overview website of these of the the capabilities of all these all these platforms and the id is that this is a sort of one one-stop shop that you can see which data is available but also services that are available uh processing services uh hosting services um and and what it costs uh yeah a network of um of visa of resources it's called yeah um the way forward uh yeah we are now at the point it's yeah it's clear that that the analysis in the clouds will be the the the way forward but we also need good data for that and the analysis ready data is is one of them yeah the higher level products not only the raw data but that you have real good processed higher level products customized for your for your use um yeah it's a important development and the data federations so that we don't use one cloud system but that's the different clouds are connected uh with each other for example the um vto is already federated with the eodc and the terrascope platform so that you can use resources that are not available on your own platform you can use it from another platform and that's and for that you need really interoperable systems open eo is a is a good example of uh yeah of the uh yeah uh an interoperable uh language or um to to to connect all these and you see also that within the european open science cloud which is far hello this is give me a sign okay i can continue now yeah the european open science cloud um has the same um goals for uh the entire scientific community and but it's remote anything is not is one of the ingredients an interesting [Music] example is the c scale can i continue or is it do we have a problem houston okay currently within horizon europe there is now the c scale project and um uh where they they wanted to to connect the earth observation data sets to the european open science cloud and actually they will have a conference i think in in october look up the space office uh the netherlands space of his website if you're interested because he announced it there and then yeah what really um happening now is that we go to the the the digital twins that we have what's the digital twin okay you have the real space the real data that's the physical twin and the id is that we mirror this with all the real-time information we have all the and the historic information we have an in the virtual space and that's the so-called digital twin and for that you need data fusion you need you need to combine different models and you have to combine uh yeah high performance computing you have to do it in the cloud and you need many data sources yeah like for example the opposite the normal observations but also information from the internet of things sends or and that idea of the digital twin is now really running it's really really uh yeah pushed now both on on national level in the netherlands but also on european level yeah the idea is that with such a platform you can really reach the the users and the the policy makers and that you can do analysis that you can ask a question what if i change what if i change this parameter and yeah what should i do really a decision uh decision support system based on on on big data and and for that she really need yeah heavy high-performance computing um ec now starts to build destination earth and one of the driving things for that is that for the european green deal the policy makers need a lot of information to to make their decisions and the european green deal is really the driving force of this of this digital twin this is more an overview of the infrastructure they they foresee there will be a big data ware or warehouse of data lake and as a sort of core and on top of this they will make so-called digital twins and they do that uh thematic and for the first three to four years they will start with a digital twin on weather extremes and a digital twin on on the climate adaptation and of course you need on this very technical platform you need the service the surface layers to provide the information in a useful way for them for the users and you have to connect the data lake the central data lake you have to connect that with the with the other other available information um actually now ecmwf and and um and isa are the the the main builders of this uh of this system so to to conclude my talk now i think it's yeah it's very good visible that earth observation is really essential for for for land observation and and climate i think many things of climate change for example sea level ice changing land cover changes if you have the global images we don't have any global image of all these processes if we didn't don't have earth observations and yeah what you see is we now really have many operational satellite missions more is coming it's all open and don't forget this is only i'm only talking about european data but of course nasa is also providing many data sets and yeah this this more and more data and uh yeah and the data volume is really really exploding and yeah many new applications now are driven by artificial intelligence especially these digital twin things the destination earth it wouldn't be possible if you don't use artificial intelligence to speed up the processing and uh to to to smart explore the data but artificial intelligence is of course not the yeah it's not everything many people forget that's for good um ai things you need training data it's many forgotten yeah everybody forgets it but we need so much uh training data and of course the observation data what's what's happening in real life to do this but i think that has been shown on the on the conference also this this week but we see now really the fast development of the data science and the big data cloud platforms the analysis in the cloud and really that it should be federated and then yeah at the end the digital twins and i think many i don't have many views after that but i think the diesel twin will really be the the main driving force now for the next years thank you [Music] it's a it's a new term but um like uh i mean if you have a tiny series like images [Music] what is what does it make you know what's [Music] and i think the minimum requirement requirement is that you have these ingredients uh so the of the for example lia you mean you you mentioned but uh that you uh really i think the important thing is that you combine it with the other data sets so that not only the remote sensing data sets are available in that system but also the socio-economic data sets data sets from other sources that's making a digital twin and that you reach the end users so that there is really an interface to the data that the end user can can get his the information because you can put it all together in one big data lag and do our data lake and do all your analysis but if the end users and mostly policy makers don't don't get this information then then it's useless and i think that's one of the one of the important things now also for the when you see the developments now with destination earth yeah it's really it was driven from the uh from the the earth science community and yeah because uh technically it's possible now and we see that it's that we have the computing power and we see that that we can do it but really now the the next step is really important that the users [Music] are are engaged um actually in the netherlands there is a professor wilco hazelleicher who was the director of the e-science center uh two two years ago he is now the the dean of the faculty of geosciences and he brought actually these ideas of the climate extremes into the into the the destination earth because they were really because two years ago they said we are going to make a digital twin of the earth and then it became silent and then at one time they came with this id of the of the climate extremes and then and that appeared to be come to come from ecmwf and the uh was then the the one who brought the message and actually the university is also now making a wiki page etc um florida for the ec to to start this uh the user interaction uh the second question you mentioned space agencies like nasa [Music] i don't know so um [Music] i showed you the example of the landsat image in the in the himalaya if the weather where the sentinel images were merged with the um with the landsat images yet there are many uh initiatives now to to merge these data sets and that's done on seals level so seals is the community of earth observation where and within seals my all the international space agencies work together and um [Music] uh thank you for the presentation and you mentioned about the hyperspectral mission that will be launched and for me it's great to think that we'll have a high perceptual data and so because there are so many possibilities to use this data and for you what really like can you explain a bit more what will be the and how many bands what will be the specifications of the mission and what are like the possibilities to use yeah different products okay [Music] this is actually the information i have now so [Music] i don't have the number of bands but to be in hyper real hyperspectral mission i i expect around 200 bands or within yeah between 100 and 200 bands like the uh there are there are some missions now for example the um the prisma mission by italy that's now operational but this is really as a narrow swath but it's actually the only hyperspectral mission usable now and there is a very small satellite that's called hyperscout hyperscout 1 and hyperscout 2 and there that also provides hyperspectral um data but that's still not on the scale that the chime will will provide and yeah this is the information i have now i just checked the last uh quarterly station status reports of isa and it's now really in the in the definition phase and yeah this is what we can expect 20 to 30 meter and the number of bands will be high because otherwise you cannot do any spectroscopy yeah yeah sometimes sometimes people call for example is already hyperspectral because there's 30 bands but in my opinion that's not hyperspectral you really need more bands than that and that's also what these are things [Music] yes [Applause] [Music] can you hear us yes okay we're good to go uh next presenter uh is patrick schatz um he's uh uh he actually works for a private company in a consultancy in zurich in switzerland and and but he contributes to the open source and and especially to our community he's uh he developed quite some functionality already and we're super happy to have patrick here with us to talk about the mlr tree which is the upgraded the heavily redesigned uh machine learning framework in r and which already won some awards mlr3 won quite some awards and it's a it's a group of i think a core group of about 10 people developing so we are very happy to have patrick with us uh to show us the uh the most recent and most hottest functionality of mlr3 with this i pass it on to you patrick yeah thanks tom thanks for inviting me and um giving me the opportunity to talk about ml3 and what we have to offer for spatial temporal data yeah on behalf of the team [Music] i'll want to say hello to the people in the room and to the people following remotely and yeah a short information about me even though tom did already introduce a lot so i previously i was a researcher at the university of vienna and aluminium munich now i'm working in switzerland in an art consulting company called syncra and i'm still also doing a phd in environmental modelling and this phd also was the starting point for me to getting started with mlr because i had to do a lot of modeling and i was in the spatial domain so there was the initial connection and besides i'm doing a lot of other contribution to open source like gt for example but also way way more projects and yeah as don't mention we have the the old m and r here and we have now the new ml3 which was released in 2019 at usr in toulouse um so we're almost two years old again already and yeah but more to that later so for for sinker in switzerland we are about five to ten people we have a strong focus on open source and that's why i'm also able to be here today because we try to support open source in our daily work and also we have a lot of our package there from from different people we are you know studio certified partner and try to have with everything related to r setting up infrastructure but also getting your code more efficient um yeah so let's talk about mlr3 so i'll first give you um kind of a general introduction to ml3 before we dive into the spatial tempo part so you know what ml3 is about how we designed it and and so forth so why do we want to use ml3 and what are the key principles will be the content of the following slides and if you want to see all the code that's spread among the slides there is a gift on github which you can access here and you can also see the slides online using this url at the bottom so usually when you want to do machine learning or modeling in general right we want to do training we want to predict we want to benchmark different methods usually we use multiple data sets sometimes you want to evaluate different tuning methods eventually even also use feature selection and the best outcome would be if we could all and do this using the same syntax right we want to not have to rewrite code for every specific error of them that we want to use but would be so great to to have all in a common um way of writing the code and these were really the design principles of of the old emra but also of the new mr3 and we tried to make it even better in the new version so here are kind of that's an overview of the the machine learning building blocks so you can say so on the left you see we have all the learners or the algorithms that we want to use we can then we can apply these algorithms to certain tasks we call it a write task so classification regression then there are more specialized tasks like cluster tasks survivor tasks so task is a wrapper word that we use for the data set including metadata of what we want to do with the data set so what is the type of the response variable do we have some groupings that we want to account for and so forth so in the in the really inside you have the data set which is in mr3 your data back end and it's wrapped into a task and the task is really the central piece um of your data set information that you can pass to any um operation ml3 then on the right we have the big block of optimization so usually you want to tune the hyper parameters of your model um you also want to do feature selection um eventually so this is all kind of optional but usually it's you always do it right you want to do it and you want to do it in an easy way and you want to explore multiple methods and the the two blocks at the bottom um they're pre-processing and resampling they're kind of um standalone as well so pre-processing is is often um maybe even the biggest part of everything depends how your data set is composed yeah if you have a lot of missing values if you want to do other pre-processing operations before you actually do the training and prediction and for all of the pre-processing so we use the ml3 pipelines package which i will introduce later with more detail and then there is the resampling so the resampling is is usually used for evaluating evaluating the performance of your model so at some point it's it's you need to score and to kind of um give and uh estimate how good your model is using different measures and you usually try to i want to do this in an unbiased way so you want to do this quite often in a repeated way with different kinds of data sets um to really have an accurate estimate of what your model can do if you then do a prediction on unknown data possibly for the full complete world threat and this is where the resampling comes into place so usually it's used in within a cross validation um where you subdivide your data set in training and test and there are multiple ways how you can do the resampling and this is a big block as well so yeah and in eminent3 we actually want to unify all of these blocks right so we want to have unified interfaces to train and predict methods to hyper parameter optimization to the do the pre-processing independently from the data before you put them in into the training and predictions and we also want to give an easy access to paralyze all of this and the same goes for the error handling at the end so this was a three-hour motivation to make things easier especially with all the available algorithms that are available in our so um then another question that comes off quite often well is it is it worth to to learn this this framework right so um you can either go with the guys on the right and say no well we're too busy um whatever that means like and we are obviously right but usually our experience is that learning learning a framework once means you profit from it in the long run and i've never seen a person only fitting one model in their life so usually whenever it comes to scaling up to trying different algorithms different methods you would need to dive into all the single implementations in r that are out there and every package does it a bit differently and with mlr3 you can really have a unified take on this and rely also on the tested functionality so for a lot of implementations and packages we support we kind of have extra tests running in our um in our ml3 ecosystem that uh ensure that these things are really working down to the detail so um we have predefined performance measures also uh which we have listed in a in a large collection in our mrsa book which is the main documentation that you could go to and see how everything um can be done but you can also again make use of the just simplified integrated variation using the future framework which is just making parallelization so easy by just applying one line and saying well i want to go parallel how many workers do i want to have and then you just go so all of this in the end is is wrapped here in the so called emmett reverse that's sounds similar to the tidy verse right and it kind of is because this kind of package is just a rapid package that's loads that loads all of the different mlr3 packages that are out there and if you're wondering why we have so many packages well it just simplifies everything on the development side like the old m r everything was in one package it was almost unmaintainable at the end and now we split it up we can really delegate resources way better and we know that it might be a bit of more of a hassle for people to load so much packages but you can use ml3 verse which just does this job for you all right so let's have a look at some code so this is really a really simple and short broken down example of how you can use mlr3 from starting with an example task that is built in into the uh package here and ending up with a plot of benchmark comparison so these are not not even 20 lines i'd say i don't know how much in detail but let's go through a bit to showcase what's happening here so first we load the ml3 reverse package then we set a seat for reproducibility here and here with in this line we reload two example tasks so these are two data sets that are already enclosed in a task in mlr and then they are just available to you to for such easy examples you probably know the iris data set and german created dataset is also quite quite well known and this is here is the syntax then to just uh yeah load some learners right here we for example load an r part learner and the ranger which is the fastest package for doing random forests in r and you see we always prefix either with with classification or regression um learner depending on on what kind of tasks we want to apply the learner on and this is always the same syntax and we have here the shortcut functions like lrn and s and here like tsks which stands for task and here this means learners and then you just have a list of these learners and then you would just go ahead and say well i want to do a benchmark grid well i want to benchmark all these learners across these tasks and i want to do a cross validation so and then you have your benchmark grid and then you call the benchmark function to have your benchmark result object and we have integrated also a lot of outer plot methods so tg plot autoplot methods which is the same as the generic plot methods in embase r which you can then just apply on the let's get this out of the way a bit which you can just apply to uh the benchmark result objects and say well please compare them on this measure and then you get this plot on the side here so this is really mlr3 in a nutshell usually your code might be a bit longer because you want to do more more tuning like we don't do a tuning here right we just benchmark the raw learners with their default hyper parameters and you want to do feature selection but that's a good way to start here so in general in mr3 we want to um we were using the r6 object oriented framework which is um which overcomes a few limitations of the s3 that we are using before we can really nicely make use of of class inheritance like you see on the right so we have a main learner class and then we have kind of sub classes for the regression classification learners and then we have the individual learners like you for example the r part and whatever is defined in top class gets inherited down to the child classes so um this is really nicely possible and by by using mdata table we are quite fast on the backend side because data table is just a really nice way of handling data frames in a in a fast way an efficient way and combined with the future framework of parallelization and the logger framework for log outputs and for error handling we really have in our view a good selection of um very good implementations in our that play very nicely with the machine learning concept that we want to provide to users so here's a visual overview of how many packages are currently available in the ml3 verse so some of them are colored in green which means they're on on cran and they're stable and we are quite happy with them some of them are in in orange which means yeah they exist and they might even exist on github and you can download them but there's no real guarantee that everything will work in these packages and or they're quite new but they're the the difference between uh these packages in the in the orange state can be kind of quite large and currently there's no package in red which means we have some plans to do that but there it doesn't exist at all yet um yeah so that's just as a quick overview you can see that uh we have here in in the learners side all the packages relate to this category we have some um packages specifically for for data handling to get the data in uh also db connectors so for common databases we have a lot of packages for different tuning methods and for hyper parameter handling and you can see that they're also quite fancy things among them like flexible mixed integer evolutionary strategies so um yeah given that the mlr team is composed of statisticians from university of munich in dortmund mainly we really have a strong focus also on this tuning uh side and then here down uh bottom right you see um it's it's labeled as task so it means um packages for different specific for for specialized fields and i would count like the spatiotemporal area as kind of a subfield specialized field in that so what do we have in the back here we have this emira three spatial temp series which is for special spatial temporal resampling methods and we have the image of mlr3 spatial down here which is quite new and was just finished before before the workshop here so um which i'm very happy to understand uh what it what it can do and hopefully make make special data handling a bit easier all right so let's dive in um to spatial temporal data um what's in the back what do i need to be aware of what is actually still missing what is still up for development and well can you actually contribute of course you can but like how um so ml3 spatial temperature v is for the resampling methods that you would use in cross-validation for example and with ml3 spatial we have support for spatial data back-ends and integrated parallelized prediction support for any kind of raster image that you can imagine so let's showcase ml3 spatial so the common spatial classes and packages in r are terra raster and stars for for the raster on the raster side and sf on the vector side so we have dedicated data back-ends now for all of these classes so you can just use these objects as you have them in in your session put them into a data backend in a task and just label them in the ml3 framework we have parallelized future based predictions which means if you do a prediction on a large raster you can parallelize this prediction because usually it takes quite long to to do some large predictions on large roster objects if if you have a high resolution and many values or you want to go for a worldwide prediction and the last point that's actually quite unique is that we have memory aware predictions which means you can do these predictions in chunks because at some point you need to load the raster data into your memory even if you just do it on a subset and this takes a lot of memory and sometimes it exceeds your memory not everybody has a huge server there and we have the option built in now that you can select kind of a chunked way of doing this prediction it takes a bit longer of course than doing everything at once but at least it offers the possibility to to conduct these large predictions at all and this is not available in any of the nato's packages directly so let's let's show an example what how ml free spatial can be applied so here we load some packages then we load an example landsat 7 dataset from the stars package we just read that in and then we have a stars object and then we just call as data backend on this object to transform it into an mr3 data backend here and now that we have this data back in so maybe to explain a bit more about this data set it's just a standard landsat 7 um etm scene with six layers and um yeah we can later see also what what's what's inside but that's not really the the most important point here right now um because i really want to showcase you how you can do this on the code side and how the the syntax is going so we define a regression learner here and say okay we want to create a regression task with this backend that we just created and our response variables should be the first layer then we just for the sake of the example subdivide here the data that we've just read in in training prediction set usually you would have a dedicated prediction object that you would want to predict on but this is just for for showcasing reasons now so we subdivide and train a prediction set and say okay let's now take the learner and train it on that task with the subset of the training rows here yeah well then we have trained the learner and then we can call the predict spatial on it on the task and we so what we do here is that we we train we predict we use the trained model to predict it on the full task that we've just seen and we can also specify the output format here so you can select between stars terra ruster and you get out the predicted object as a stars object here and here's uh in the layer one after of the prediction object you just see what you have just right now predicted if we plot this using the integrated block method of the stars package because we have a start object now you can see what what we got out so this is the actually the cadmium concentration here and um by coloring it you can see that in the bottom right we have we have higher values and in the in the top left we have lower values um so this is what the prediction in the end um showed us and we could just quickly apply all these all these nice functions from the packages directly because we have them in generated format that's really mlr3 spatial from getting your data in training a learner predicting to a spatial object in a in a nutshell and usually these objects that we predict on are quite big right so they it can have millions of values and it takes some time to run even if you are now on a large machine so what is usually required here is that you go parallel i mean we have seen a lot of parallel applications also in the first days of our training sessions here and many more people like to use all of their resources on their computers especially given that in the last 10 years the machines became quite good and have a lot of course also the local machines so well let's use it and we can easily do this because we make use of the internal mnr3 predict function which just is able to to paralyze everything it gets and uh using the future framework so i've um can show you a benchmark here how we compare how this compares on a file that is 500 megabyte on disk which actually is not that big right but we created one that has around it has around 25 million values um this is the the function you can use in the ml3 spatial package to recreate this and here on the right you see a benchmark it might be a bit small so i'll explain what you can see here so um we have benchmarked mlr3 um atera 4 course so meaning we have used the terra object to load it into a data back end and then use the ml3 with ml3 spatial to to make a parallelized prediction here on the second line from the bottom we have used the tera package just natively without mlr3 something is going on with the with the parallel predictions of terra it starts but it takes very long to initialize into together things so it might be that if we have done something wrong here but i currently it just looks like this so but it could be could be wrong so it's just to say because i'm quite suspicious that it quite so slow so and here on the top um we've done the same things uh for for the raster package so we we loaded the rod the the spatial file object as in with the raster package into an ml3 data backend and then we parallelized it and here we did the native way and you see that if you use the the way with mlr3 combining it with terra we are quite quite faster than all the other methods here so um you see here we are about like 20 seconds um depends we have three runs here and the others are um are 30 seconds or 29 seconds and i'm doing it uh with the terra 4 course it's quite slow but i i'd maybe even exclude that i'm not sure if that's uh if there is an issue right now so but you can see we can scale up here a bit and quite easily and even faster than the the native predictions and that's all what ml3 can offer so direct data back handling um for all the raster packages and and also for the vector classes which i didn't show here paralyzed predictions and also i did not show how to do the trunk predictions um but you can also check that out in the documentation and ml3 book so next topic let's go to ml3 spatial tems tv for the resampling so ml3 spatial tempsv is contains a lot of spatial temporal resampling methods i think there are almost 10 in total now supported i would have to put the exact number up in detail besides only providing these three sampling methods in a in the same easy syntax that you can use any other resampling method in mm3 it also comes with the autoplot methods for for visualizing all your spatial resamplings that were actually done on the data set so you can actually see how things for this are distributed um we have an upcoming paper um that we are going to submit to jss soon and we are currently wrapping um uh well i have it on slide so no need to guess eight resembling methods from from four different packages in r and uh potentially there will be even more packages because people have apparently like to publish their uh resampling methods in in standalone package rather than directly contributing it maybe to a framework so we keep on wrapping single island solutions from r so that you as a user have a simple way of addressing them so what is the problem actually when doing cross-validation and resampling in r with spatial temporal data so usually if you do a non-special resampling and you don't account for for your spatial data you um get into trouble because you you will have an overestimate in your performance mainly due to spatial or spatiotemporal autocorrelations or they exist in both ways meaning that your training and test set are quite similar but just because for the sake of being close together and usually you want to have independence of your data which is not guaranteed if you don't account for a spatial autocorrelation so there is no single best method that i i could say well use this method or that method i always get this question because it really comes down to the data characteristics of your data set and what you want to predict on so there's the term of the target oriented prediction idea so what i what do i want to click actually with the model that i want to fit and in the same way that i you answer this question you have to um set up your resampling because your resampling should quite closely reflect what you want to actually do with the model in the end so then it's it's a fair estimation of what you want to do so there is no single best method every method has its advantages and disadvantages and it really depends on your data set and currently there is also quite recently there was a debate coming whether all of the spatial temporal resembling methods if you apply them um might be might lead to a too pessimistic outcome of your performance well there's ongoing research about this and there will always be a debate probably and in my opinion that the truth is lying somewhere in between so clearly non-spatial resampling is too over optimistic it might be that certain special resamplings lead to a maybe two pessimistic one but it's really hard to to really say well that's there's the truth right the truth is somewhere in between in my opinion and uh we should do science to evaluate this even more and see what what we can find out and what is the best middleway so let's let's look at some examples for the spatial resampling methods um let's do some cross validation using random forest again for predicting landslide events either they record or not dynacore in in ecuador so we have a built-in data set in industry especially tempsv but i also show you how to recreate that again from scratch here so we use the built-in data set ecuador we are setting the coordinates and creating an sf object and if you look where we are here using the map view package you can see here oh yeah right we are even we're in fact in ecuador south in ecuador so that's the data we are currently using here um then we wrap it into a spaceship temporal classification task using that the backend setting setting our response variable to slides and say well the positive class is true here actually so that's what we then get that's how the print out of the task looks like so we have all the metadata information here we have information about what are the features and what are our coordinates then we define again our our learner here and here is the actual important part so we define the spatial resampling method so here we say okay we want to do a repeated spatial cross validation and here chords is the method which refers to um a k-means-based uh clustering method after branding and we want to use four folds and we want to repeat it two times so in reality you would use more repeats here usually to get an accurate estimate and reduce the variance overall then we just use the resumpting method put in the task the learner and our created resume object and then we aggregate all of the results um using the built-in aggregate method score and using the classification error here ce as the score so that's again a really broken down nutshell example of how this is going to be executed in the end and can be done but i think it nicely illustrates firstly the syntax and also here the only thing you need to change to go spatial is instead of writing cv repeated spatial cv chords then i want to showcase you the auto plot output so here at the top we put in the the spatial resampling object that we've just created from the task and we can say then yeah please visualize us this on the task that we just created i want to see the first two folds and here you see how the clustering happened so the k-means clustering um you see that it groups the observations in training tests in a k-means way so you have a while they cluster around themselves and it tries to break it down to what we've uh instructed you to do to four folds and in the first fold here the upper right part would be the test set and here the test set would be somewhere in the middle so this is really nice illustration to also demonstrate what you're actually like running training and predicting on and this works for all resampling methods that we have supported so more resources you can check the spatial temporal analysis chapter in mlc book which should be the first go-to place if you want to seek help you can check the function references of the packages for example ml3 spatial tempsv and when it comes to spatial partitioning hyperparameter tuning i recommend these two two papers here one of them is also from for myself which deals with nested cross validation for spatial hyper parameter tuning so what about spatial temperature methods well we have two spatial temporal capable methods in the in the package um the one is cstf which is the one from hana meyer and the other one is a spatial temporal clustering methods from from unfortunately non-open source algorithm named pluto they support clustering divorce in space and time in general spatial temporal it's not an easy thing right specialists let's say good quite easy because you're only dealing with one dimension but when it comes to two-dimension it's quite challenging and there are not so many things that we can also just use and and uh build into the package um we have the special the ml3 temporary package also which we want to use for for temporal applications but i'm more into spatial also and we really would love to see help and contributions from the community here also manpower you know or women power let's say like this just engage with us if you want to participate here if you like the concept um i'll do this mainly on the side as i've introduced i'm also working uh in in a company usually um i've limited time i try to do my best we all try to do our best um talk to us if you if you like this engage with us and yeah then make things better hopefully so thanks to to mark becker especially also he's one of the main contributors on the special side that helps me here with his knowledge and his his coding um thanks also to the sponsors of mlr especially open2hub which kindly helps us to to host some workshops also and donates to the whole um project for c and thanks to you for being interested in m r and hopefully um maybe even interested more now in getting started to simplify your your modeling and if you try this and save some time even if it's just for parallelization or whatever else then we did our job and we're happy that that our efforts work out so thanks a lot for listening and yeah i'm open for questions now [Music] no no i don't see any questions here questions yes then uh there are special spatial learners what do you use to standard verbs yeah good question um there are special learners they would be supported in the email three extra learners package then um or maybe in the spatial package that that's not yet clear so there's also yet a note on the package um i'm not on top of my head know what kind of special learners we currently support um i would have to look that up in the list but so they're definitely special earners and are in general and these could be then used in the same way if you wrap them but for in ml3 we first have to kind of support them in the first place so we have to wrap them um in our package so does the work we have to do so but in general yes we've done some benchmarking same data same algorithm but scientific learning no unfortunately not that would be an interesting thing we well yeah tried to we talked also with the scikit-learn guys at some point to see other what they are doing and to um yeah get ideas exchange ideas but we didn't do yet i think a one-to-one benchmark that's an interesting idea okay more questions basically you know i like [Music] [Music] [Music] so one of the examples is the overfitting overfitting exterpolation is there any building [Music] yeah that's a good point you mentioned with the um knowing what what you're doing and then overfitting potentially because the problem is of course you always get a value back and it's a number and you have to put meaning to it right so otherwise it stays a number and yeah we have we don't have any kind of like easy uh um things built in that would prevent you from from doing bad things i mean in the end you're at some point on your own but when it comes to model interpretation and other things we have connectors to the iml and lx package for model interpretation and further diagnostics that you can do there um especially for for very important things and others but when it really comes down to the to the core implementation of what your code does you need to dive in yeah and this is always and understand what you do so this is always a problem the more simple you do if you make complicated stuff the more easier it becomes to make mistakes and i think at some we have to find the middle way there and maybe the best solution to this in the end is also to have well um educated supervisors or you educate yourself in a good way um because machine learning is not a thing you do on one day and then you have the model and then you go to your boss and say it's done you really need to understand the details what it's doing and um i i would highly recommend this to everybody to after you've done your model you've done everything you've done the cross validation you've done the prediction go through it again rethink every step what it's really doing and if it's doing that what you would expect it so yeah okay thank you so uh [Music] thank you [Music] yes just [Music] three um okay so we're good to go okay um test closer okay oh like this i don't hear it let me just change i [Music] yes yeah what do you want to do um okay okay perfect um so hello again from my side uh we've met before in the first day when we talked about open source and now we are organizing within the frame of our project your harmonizer uh panel discussion that we a discussion of related to the open data entire idea so it is the almost the end of our event the end of our conferences and you have heard a lot of wonderful projects a lot of wonderful solutions and i can hear myself okay i apologize uh so as i was mentioning you have heard about a lot of wonderful projects initiatives solutions uh packages and so on and so forth and you have also heard a lot of uh the open data paradigm so a lot of this a lot of a lot of times this concept um popped up and um although it might seem um like the normal way of doing things at this at this point it still isn't and i am just going to start with about a few slides um a short introduction into the wider concept of open data and especially geospatial data the one that we work with even if we are i don't know hardcore developers we still cannot take our solutions away from the from the data so it um it has an influence on everyone so i will start with about a few slides 10 15 minutes of presentation into the into the topic um raise some questions that i hope i will you know will spark some some discussions and i would also like to invite you to join this slido you can just enter from your phone from whatever device and introduce that code uh there we will post a few questions related to uh to our discussion also there is a possibility for you to ask a question but um we will all monitor all channels through which you can you can contribute to the discussion okay okay so we start as i was mentioning at the invitation on the first day this is a discussion that is taking place within the context of a european project that we are developing you've heard a lot about it during these during these days it is the geo harmonizer project that is based on uh korean open source solutions for geospatial and open data it is based on open data it produces open data and on the on the on the lower lower side you can see that of course there are some financial uh some financial information related to related to the project so the consortium happily won the project and we started doing the job we started working we started building an app data portal adding various functionalities we started computing and developing and producing new uh new data sets new added value data sets based on um an existing data set and you've heard more about these projects within this line of financing from the european commission on the first on the first day so the idea is that everything is fine within the within the framework of the project but what happens afterwards what happens after the finance the financing is is over as i was mentioning on on the first day in this particular case the opengl hub took the took the responsibility of supporting the the portal as well as the development of the data products because it is a time series time passes so you have to you have to enhance your uh your data set they took the responsibility of keeping up this um the initiative for five more years so um that is a wonderful uh that's a wonderful initiative but is this the sustainable way and there is no um [Music] um no clear indication that this is something that will be you know repeated by other consortium other companies in another project so on and so forth and i am sure that everyone in this room had an issue at least once trying to identify data sets from other projects european or not and have not succeeded and got mostly a 404 error so as um as you've already you know as you already know you probably know this this section of data is just one part of a larger context and the open data is a part of a larger context related to the to the open paradigm and uh what is much more uh popular let's say among the among our community is the open source right but open data is also an important an important resource but it's just one of the resources there and it's just one small part that we will that we will focus in this in this uh in this discussion so what we decided when we started to and we wanted to start brainstorming some ideas on how we can make an open data project sustainable on the long term was to see in the geospatial world related to geospatial data that is of importance but not only for the scientific community or not for just for a niche and the first thing that came to our mind was data sets that are related to uh location right we all have to get from point a to point b right no matter if you're a scientist a software developer or whatever else so we looked at globally speaking who were the main providers of right of geospatial data allowing you to to navigate so at the beginning we had of course the analysts navtech maps right and then uh atlas was uh bought by tom tom navtech by uh nokia absent and here and you can also notice the osme like small at the beginning um initiative but what happened in for example in 2008 in 2008 google maps announced that um and all these data sets global data sets were of course proprietary in 2008 with the development and the extreme progress let's say of the xiaomi smartphones in september google announced this turn by turn navigation accessible through these new smartphones right so um that announcement in 2008 made um made some effects in the in the in the market related to global proprietary data sets for navigation so you can see this is just a screen from screen capture from the way that the stock prices for tomtom right um made a made a bit fall at that point in september 2008 and then garmin and if you have any curiosity in this direction you can see how that that announcement together with what it meant made um had consequences within the the owners and the sellers of the data sets of course together with the uh with the navigation tools so uh going going further in time coming closer to our to our days at this point um most of us know about the open street map initiative openstreetmap initiative also offers a global data set a global map but it has a completely different model of sustainability it is a volunteer project it was initiated in 2004 and it built upon the success of wikipedia at that point at the development of gps technology and not only development but lowering prices it was not it was no longer just the uh you know the um the let's say the right of the possibility for cadastral companies to buy gps receptors but it became more and more uh pervasive so uh in 2004 the initiative was started but as i mentioned it is fully uh based on volunteer on volunteer contribution the entire infrastructure is being supported by the academia the academia environment so um that is their that is their model and of course on the other on the other side of the um of our slide we we have the other uh let's say orientation um data set which is which has the proprietary model and even if uh open street map had a different had a different model it was sustainable it was the same and maybe one of the best examples and one of the best proves in that direction is that it is it sustains important companies and important and powerful companies and this ecosystem is is growing so probably most of you have heard about madbox mobility uh probably some of you have played pokemon go which started with having as its map google maps but then it's changed to open street map um the humanitarian open street map team which is also for a project from osb four square uh ways which you know changed from open street map to to google maps uh facebook which based its geographical information on awesome and data and so on and so forth so that is one example related to how a data set which is of interest for most um for uh for the the bigger and the wider community in time so another another very close to home placed in our in our case is related to satellite data right satellite imagery we had a very good presentation a very informative presentation earlier and the open data notion appeared a lot of times but things weren't always this way so um you know that the landsat mission is the longest men of human program for observing the observing the earth but it wasn't always open and the information was the data sets where the data wasn't um wasn't freely available that changed in 2008 yeah again in the in the figure on the right side you can see who was responsible with energy within the united states let's say um um who is responsible for collecting the data for storing it and so on and at the beginning usgs was just one of the receivers of that of the imagery other receivers were for were calculated if i can say so in an international um any single point isis international uh rhythm anyway it was uh there were private companies that would pay a fee to usgs in order for them to to receive the image the imagery and then they could redistribute it so in 2008 when there was uh when was this change of policy an important change in policy uh they have uh discovered and they have understand that understood that the segmentation of the landsat imagery would uh hinder true uh progress related to making a special spatial analysis on the entire archive so in 2010 usgs launches this lanza global archive consolidation initiative which basically brought to one uh one storage uh all the all the lands and imagery this is a work in progress and it it is basically the reason why we have the possibility of going to a one-stop shop and downloading or processing the all the landsat data for a specific for a specific region so this is uh this is another example related to the way the geospatial data went from proprietary to to the open data and keep in mind that this is a public funded program um this is um one figure uh related to the to the impact that the change of policy uh uh the impact within the community as well as within the uh the sustainability of the of the of the mission uh it is important to mention that since 2008 change of policy change to open data this change has been um repeatedly questioned this is the reason why we also have this paper from which we we extracted this figure to understand if keeping the lanza data open is truly beneficial for the program for the community and so on so um i suggest you you take a look at this at this um this paper um the change of landsat policy had important implications for the european community as well because it has been already uh it had been it made an important impact on the decision maker makers related to how pernicious data would be given would be um shared with the wider community the scientific community the business sector and so on and so forth so in 2013 they have decided that all copernicus data should be made freely available freely accessible again um again a public funded program that offers uh open data as um as a [Music] for the for the development of the of the scientific community of the business community of the uh public public community and you have here to um to uh to a short um let's say proofs of the of the of their decision so we've seen what happens with in the public sector but we do know that satellite imagery comes from commercial sector as well uh so it would be interesting to see if um if that exists paradigm in this way of of working could also function in the private sector and we do have an example in that in that direction i don't know how many uh you know how many of you do remember that but when planet announced their their initiative maybe a bit naively at that point they wanted to make the all the data that they collect all the imagery they wanted to make it open data so they could spur innovation more research more more development unfortunately the economic realities of our world didn't really allow them to go into that direction so um this is one one example from the uh from the private uh private sector direction um uh that uh since we are since we are in this this this topic i would like to suggest if time allows you to listen to a podcast that you have here on this slide talking about the uh the open data and the perception of the research community as well as the perception of the public public sectors as well as private related to accessibility of satellite imagery and it's i believe it's something that we should we should be aware of okay so we have seen an example related to what open data means with respect to the public sector making the data available for the writer community we had a small uh example related to what it means to the private sector um the idea and the idea of our discussion today is try to understand if the open data is indeed sustainable with respect to um uh entities and consortium like the ones that we uh we formed for the for the geoharmonizer or even going further if it if it is sustainable and how could it be sustainable for anything else that is not the public sector so what we decided was to look at other models of sustainability within this open paradigm and as i was mentioning open source is one um i think it's obvious that open source is no longer a hobby and it it is a it is a certified way of developing software it is mature and it is highly used on the international scale and i'm not talking only about geospatial only about the geospatial sector um and more than that open source uh is also a business model and here there are just a few elements related to how that can and how that has been sustainable over time uh during the first day of the conference you you listened to angelo estasos who is the director of ozgio osgeo is a not-for-profit foundation founded in the united states and its main scope is to foster geospatial software development um and a a flagship event for for the osgeo is represented by the phosphor g for phosphorus g it also represents the global uh the global conference also represents the main source of income for austria and austria just uh gathers the money and then um gives it in uh with uh with clear uh with clear um and transparent process to the project within the within the foundation all this information is available in the wiki so everything is done in a transparent way another example is the art consortium that you probably that you probably know of and it exactly has its name it's a state it's a consortium meant to support the development of our uh of the r um i want to call it solution but i also our software and um [Music] you have also probably heard about the non-focus which is also a not-for-profit organization also based in the u.s and their their main job let's say is to be a fiscal collector for open source for open source uh projects open collective has the same the same manager the same activity only that open collective is um is um is a company and just as uh just as an example uh cheetah has has now joined non-focus uh for uh exactly for this for this purpose to be able to uh find their developments and uh to do it in a more um you know in a seamless way another another way of supporting the open source solutions represents philanthropic organizations and foundations and the very good example in that direction is represented by the flora foundation they have um they have a funding initiative that is called critical digital infrastructure research and the call stands exactly for supporting the open source infrastructure that is considered critical in a specific in a specific domain not only for not not specifically for geospatial but for the further any for any kind of purposes so this is this are examples of the way that open source is sustainable another another model that we looked at open standards model and um in this in this situation in this in this particular case we looked at three uh organization um international standard organizations you are most probably familiar with with all of them all three and even more they found they work together they collaborate but they have different ways of sustaining their activities for example iso is an international is the international standardization organization and they offer membership only at national level on the other hand ogc which is the open geospatial consortium has membership open including for individual members they also have um development projects uh of course with the related to geospatial geospatial standard development with institutions such as isa such as nasa and so on and so forth so and the business model and the way that they sustain their activity also differs and this is uh this is an interesting uh this is an interesting example as for eso by its internal regulations sell their standards uh on the other hand uh ogc um releases uh as an uh under an open an open license all their other standards so so far at least for the open data in geospatial world we have noticed until now that there are three main sources of making them making this open data initiative sustainable and that first one is related to the public funding and we've seen the examples that we have from uh the landsat missions from copernicus and you can also think of all the open data initiative from the from the public from the public sector you have probably used data from data.gov for example or other open data portals on another on another another potential final financial supporter could come from the big companies that have an interest in that open data model in that in that data set and a good example in this direction is represented by the support that macbox is offering to to openstreetmap and another and the third uh way of sustaining this uh the open data initiative is of course through philanthropy and we've seen the example from the uh from the ford foundation and there are many many other examples when philanthropy organizations support the development of open data of open data sources that can data sets that afterwards can be used for um let's say the greater good in different in different other projects so this is this was supposed to be a short introduction related to the problem that we have at hand and uh at this point i would like to also um remind you one more time of the slido so let's see [Music] we look on the slide on slido i cannot see the question on slider i'm sorry where is it [Music] no audience question and answered but um [Music] you can do that maybe they deleted it [Music] let's um you said there are some examples where you open data can i just take this [Music] we managed to find the either way online or here but both at the same time um
Info
Channel: Tomislav Hengl (OpenGeoHub Foundation)
Views: 169
Rating: undefined out of 5
Keywords:
Id: olH32TORd9o
Channel Id: undefined
Length: 123min 44sec (7424 seconds)
Published: Fri Sep 10 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.