Hengl, T. "Introduction to spatial and spatiotemporal data in R"

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so good morning everyone uh my name is tom hengel i'm the founder of opengeohub and also technical director and uh welcome to the open data science europa workshop um it's a one week event with the first two days as you know training sessions um and uh training sessions are split into uh two parallel sessions i hope you managed to find the right session this one is an r the one the above us it's on the python so we have a two training sessions and we'll do uh two two days training and then uh wednesday thursday to friday it's a conference with the conference program with live talks but uh you will see also that it's also going to be quite similar similar topics the things that we've been we will be doing the training you will see that in presentations they will come they will just be on a high level um so the program we will follow the one you find online on the open data science.eu let me share my slides and as you see uh let me share my screen first actually as you see from the from the the program uh we are using this virtual box so i'll be using this virtual boss don't uh don't get confused i i can switch in my windows 10 system i can switch to ubuntu so i just press the button and now i'm on ubuntu and if i do control f then looks like i'm on ubuntu computer but it's running on my local machine it's really magical so we will be doing this virtualbox why we do virtualbox because we want that you can all access exactly the same and we don't lose any time on installing downloading setting up also have in mind that these slides you can make available offline so if you um i shared all the slides uh right you can find them on the in the metamods channel so i shared all the slides uh and you can access the slides and if you want to save bandwidth then just go and make them available offline so that would be uh something like here um so this once we are starting so i would just go file uh make available offline and this way you save bandwidth because if you don't make it then switching between the slide can cause bandwidth conjunction so that's why you should try to do it uh save things on offline so i can start with the with this first block um the blocks are one and a half hour um the uh we will leave every every uh block we will leave about 15 minutes uh which we will stop doing youtube broadcast and then we focus more on your questions but in the meantime we will answer questions from people that are online uh we will have every day i think we will have maybe 50 to 100 people following online depending on the block and some people in different time zones so probably the way to connect later on uh let me start the presentation here or i can just use like this i think uh so we have uh today we have today and tomorrow i i will be the teacher i'll have some help from carmella he's not with us now but carmelo will uh help with the special temporal ensemble machine learning especially because he is working with the real data so that's the that's the one in the afternoon and then also we will do one more technical session working with cloud optimizations um it's something maybe new to many of you so we will do uh quite a gentle steps uh we will guide you step by step how you use the cloud optimization and again especially with the intent to show you how to do it in r and the guys upstairs they will do practically similar sessions but more focus on python python and r of course are not one-to-one there's many functionality that is really different in python and from r and there's many uh packages that exist only in r there's many packages that in functionality exists all in python and some things computationally it's uh more suited for python and some things for like plotting and statistics is more suited for r so when people ask okay which one should i learn usual question my answer is both it's very simple and you can we can maybe show that but in our studio today you have also a functionality to connect to so if i go to r studio uh there is a functionality also to connect to uh to reticulate uh to connect to the uh using uh so here if i go think um under let me see send the tools i think so you can connect to the uh we're reticulate to python and then you can do python programming also uh so that's about python and r going back uh so this morning first block uh it's a let's a gentle introduction to special temporal data uh specifics of the data um you know some things may be aware of some things maybe here for the first time i'm hoping most of things you will hear in this block for the first time um and then after that in the after the break we'll do uh some modeling exercises and then slowly i will start jumping from r to python sorry from r to slides um so you can see me actually doing programming and working with real data and then this morning basically it's a kind of warm up for the real session which is the special temporal ensemble machine learning so and that's something i think possibly new for you and this is going to be really hands-on but then we just explain you how it's done and some people tell me oh you know you think like it's simple but actually this ensemble machine is highly if you do it for the first time it's uh there's a lot of uh components and it's really highly complex so uh it requires like really i think gentle introduction and and that you feel the the the data and the code uh everything i will do it's uh basically in this uh notebook um so there's this notebook uh so it's here and you see this notebook is under our training and i did share it with you already last week i don't know if you managed to look at it but this one uh it has all the it has all the examples and even has text so you can really follow it it's really like a computational notebook so you can really follow it and at the beginning it is a bit of introduction but then very quickly you have you have functions and you run the functions and and then you can let's say you can test your own thing so uh we will be moving through the this tutorial just here in our studio uh and we will be looking at uh different things and so you see i can move to some section i can run a function i source the function and then i can i get some warning message i don't know but i can compute something and get a number uh so so that's this computational notebook and that's what the guys use using python upstairs uh this uh jupiter whatever so it's a basically looks the same and we can scroll through the different sections um and then we will test things and then you can you can modify that uh code by the way um vr studio you can modify it you can play with it it's no problem if you do something wrong uh there is a there is a shortcut let me see so you can do git stash and this will remove all the changes so it's very very nice to know the git command so i can just do git stash if i go here um sorry let's see so if i do this uh in the i have to get to the folder though first so that's somewhere under let me see where am i i need to remind myself so that's under the home odsc work directory com.com odsc worker dsc then ah obviously work directory and then code and then odsp workshop so first the change here and then i can do this git stash and this git stash will basically uh uh remove all the changes it will leave all the new files you make but if you if you mess up something i don't know what they did and then you just do git stash and that will reward it into the original okay so you see i have this cheat sheet i can share this with you but more or less it's uh it's just a basic uh survival uh if something happens we have to restart the docker um so that's also survival tip but i can share that with you um so that's the the plan for the uh plan for the these days is we use this tutorial and and then we will go step by step through different steps and then you can ask questions and we can slow down and we can zoom in into things uh also i shared the as i said i shared the slides with you so don't you don't have to write anything down you don't have to make screenshots remember this is also video recorded everything i say i do the code you can follow after the course so you can really be relaxed and just really focus on the course and ask questions and see if you can reproduce something on your laptop why why we like that people use laptops it's uh uh well it's a less maintenance for us and um obviously you can use different operating systems uh but also we want that you uh learn how to use software so as soon as you go back uh to your office and to your work that you can continue using the software uh also why we wanted to use laptops is uh you can interact with us you can ask questions and the metro most is very important because you can do a screenshot if you have a problem and say i have a bug something's happening and you can send a screenshot and then we can try to help you and as you can see there were many people asking questions about the virtualbox and there was screenshots people said i have this problem i have this problem so people were sending so we were debugging already and helping people and i hope now you all have that virtual box running by the way i'm also new to virtualbox i i didn't use it before i would just connect to my workstation at office using a no machine uh or teamviewer uh but now uh landry convinced me it is the best way it's very elegant and you just it's kind of that's why it's called box so you have everything out of box you have all the software everything is customized we all see the same and you can easily just turn it off and you can put it in standby i don't know if you tested it you put it in standby and and then when you start the machine you get exactly the same thing where you left so it's really magical um okay so that's the matter most anybody not connected or the art development group please connect if you don't see the arc developing group you have to go to this plus sign and you say you have to say bros channels and then then pick up the channels you want to see uh if you are if you are a physical participant if you have questions about logistics here in the at the week then you should ask it in this group don't mix it if it's a general question about methods uh problems with code then you ask in the town square or you're asking rdev but if you have a question specifically i have followed wi-fi i have then you ask it here uh so that's uh and it's very important to use metromost um because it's uh it keeps a track of everything and you can open uh lots of discussion groups uh and each everything you post basically it becomes a little discussion group and if you reply on something as you see here uh when you reply you can see all the the uh thread so that's also nice about the most now that you can go back to the same issue raised and then you can respond there and usually if we responded to something that you asked we will not respond back but we are going to just send you the link and you can link to any thread or any post in a matter most it's a it's a kind of like in gitlab and so uh so you can link any to any post and this way this way we can debug but we cannot if we debug a problem once we don't want to debug it two times i hope you understand it because that's a waste of time okay everybody looks very serious uh there's no exam by the way this course so you can relax uh this is uh and it's also special times uh this is corona pandemic times and we are really doing our best and improvising uh we had to teach ourselves how to do parallel broadcast and you know how to use all the zoom functionality we haven't used it before uh and also there were many cancellations by the way for physical attendance uh 25 people canceled so really high cancellation rate uh and but we had the last year exactly here at week we had a summer school and there were more people almost two times more people last year when there was no vaccinations and more chaos and now that you would think okay there's vaccinations but the delta unfortunately arrived and that made it a bit complicated but nevertheless we'll keep it in uh we'll keep everybody safe of course and so we will do our best to follow the corona rules of course and um and we think it's going to be a successful event um nevertheless when you if you're new to r uh it's not i don't i cannot promise you you're going to leave this place and say i went to this training course two days and i learned on some ensemble machine learning and r i never used r before that would be very difficult this is quite advanced uh topic uh so um you know there are ways to teach somebody in one day things uh with the right motivation um but um i mean imagine this a person here you know trying to satisfy kim young and if he fails something of course it's uh it's not a good problem for him but uh there's a very difficult to teach something very advanced in one day so what i recommend is that you kind of if you are new to many things open your mind and just make sure you you know take the copies of the slides and remember there's going to be a video published also so you can go back to some of the discussion points and you can see me coding and you can you can teach yourself after the course you can go back and and try to learn things the reason why we're here of course is because we have a we have a european funded project which is called actual projects called geoharmonizer but that's a two technical name so we renamed it to open data science eu and so that's the reason why we have this because we had some budget to organize this type of workshops um and to do code development sessions so this falls under the one of the core development sessions because we also expect uh to get feedback from you but so this is our project and you can read on the bottom you see that it's been funded it used to be called iner innovation networking agency uh now it's called hadea but you can read about this project here under the link and it's mentioned on the bottom of the page i can see there's some people now connected from u.s which is really crazy because i think it traveled like uh 4am or something um but i see lots of people connected now following um around europe and there's some people following looks like from india uh so you can read more about the project and we post here if we have the news then we post some articles how we did processing it but they're very short so we are hoping we're writing now uh articles to explain how we did the processing so that's uh thanks to this and as you see in that project we made this data portal which is on the maps open data science so if i actually have it here i can go so under the maps opens open data science.eu we made this uh uh what we call a european environmental data cube uh and this is the data cube viewer so we made this data portal and um and as you see we are here in netherlands and we can zoom in um so let's say we are now here wageningen and so what we did we created this land cover predictions using ensemble machine learning so the things we're teaching you this is what we did but we did it with the seven million points and with the about 400 covariates so the regression classification matrix is only only classification matters about three gigabytes uh and so we computed all these things and we we also put we can show you we put the the points uh here uh and so we trained with seven million points and we made the predictions then you can go and scroll through time and you can see how the land cover changes and what you can see here is that this uh for example fenendal uh like this area you can see that there's a growth so this all these places are new um and and we did the proper i think land cover mapping because we mapped every class so if you if you look at urban uh we we mapped every class as a probability so you can see also probability of each class mapped and so if i click somewhere here uh i will get uh it will do a query on the geotiffs and it will get the values of that probability changing through time so we can see in which time they started building our estimate yes and also when i say we do things properly we also put the uncertainty so if i if i turn this off we put also the uncertainty and you can see uncertainty per pixel we estimate which of the pixels if it's a brighter yellow it means there's a higher uncertainty and it will pop up some places you will see a high uncertainty and it means that you should be careful using the data so i think it's a let's say a proper framework to do land cover mapping and we are proud that we did it using uh ensemble machine learning in this case this uh these predictions and the modeling was not in python actually but we also have the the vegetation maps that carmel and mir going to show this was all done in our so we once you choose our python it's not easy to mix but some projects we chose to use are some projects we chose to use python um so yeah that's this project and that's what we're going to talk about and we're very excited that carmelo who is a phd student at opengeohub interaction university he's going to show you in the afternoon uh really with a really large data set we're going to show you how we compute space-time distribution of forestry species and we will show you that it really matches the ground data we also have about uh one million point i think training points uh so we are very excited to show that results this is also um uh first time we're showing uh to public these results um okay space-time data so i personally work a lot with the spatial data and i did a lot of predictive mapping uh projects and i even produced some global data sets uh european data sets africa united states even won some awards for some uh publications on doing predictive mapping uh but i also got interested uh over the year 2012 i knew that it would be very interesting to do space time and then i published my first paper i think on space time in 2012 space time prediction and so we will use that uh paper uh it's one of the case studies is the the mapping the daily temperatures and then later on somewhere in 2018 2019 i wanted to switch from doing spatial to space-time and this project is a proof of concept that you could do all the mapping projects you can switch to space-time as long as you have enough data so when you look at the land cover mapping we don't map a land cover of year 2020 we map a land cover of last 20 years or maybe we can map 30 years but so we switch from doing a spatial model into space time but when you look at the space-time data it's not just like if you have a space you have a 2d when you look at the space-time data it's not that you just are the third dimension so you go 2d 3d but you have a 2d plus time and it's a it's a special dimension and so what's special about it well number one is for example in uh in a spatial dimension x y dimension uh if you look in left and right then the left can impact the right and right can impact left yeah it can be connected there can be a let's say the diffusion movement in the space but when you look at the time the future cannot impact past right so you have the causality it's only one direction then also with the time data i will show you there's a all the field there's a whole field of statistics which is called time series analysis which looks at the behavior of variables through time and it's way more complex that's just uh changes of flowerballs through space so we are going to talk about that uh time dimension it's easy to plot remember that but it's difficult to model basically that's the um and then what is especially very uh difficult is to predict the future and that's what some people in time serious analysis when they blindly use the models they say okay now i predict the future but the prediction future can be very poor and so it's a very difficult to especially to predict future you can predict inside the uh observed space and time like so if we predict land cover for last 20 years then we can do lots of testing but if i have to predict a land cover for 2022 you understand it's going to be very difficult because you need something on the level of magic really to do it but you could for some variables you get very close so you can actually predict future also if the variables are kind of like the train component the systematic component it's a let's say you it's proven to be like dominant then you can also predict some models you can extrapolate in the future so we're going to talk a bit about time series analysis uh visualization of data and i will point you to many books and references so you can maybe some things are new maybe some things you know but i will point you any data that has a so-called special temporal reference is a special temporal data so obviously you need the coordinates and if you talk about geographical data the coordinates are latitude and longitude um and then uh what is also very useful to know is the location accuracy so that tells you about this x y coordinates whether they should they relate to something you know which is plus minus 100 meter 500 meter i don't know but also there is the size of the block you sample um you know you could also do block sampling and you have for example for soils that do block sampling so they say this is the value that relates to a volume of of a soil it does relate to a point and then you have the height above surface the elevation because this is elevation is of terrain but you can also have for example temperature daily temperature is it the surface temperature is it the 2 meter or is it 100 meter you know so you can have a different vertical dimension and that's this third dimension and then we have the time and time you represent actually with two columns uh that's the beginning of the measurement and end of the measurement i will show you that that's very important because time has also so-called temporal support um so some things we measure like a daily values monthly values uh value at the second so it also it's very important and then you get the special temporal data and once you know once you know this coordinates of the data once it's a special temporary reference then you can import it and do special temporal modeling so this is actually what i said the time is just another spatial dimension and there are specific methods that develop for space-time data and when i started in r when i was using r i think in 2013 i think then or 2012 already edger and the colleagues they made this space-time package and it was uh really a great solution because at that time there was a only spatial data sp uh and then he says okay he looked at some problems in the data and then he made this space-time package i don't know if you ever used it but when you when you read some um the tutorial uh then you actually it becomes a bit complex because this there's this full full spatial stack and irregular data stacks so he make these classes of data so it's a quite complex to understand it's a bit abstract but in principle the most important is that you have correctly for data you know the space time reference and then if you have that then you can import it in our organization space time class and then we will show you with some data examples you can plot the data you can do space-time overlay you can do space-time modeling and this is this paper also from uh from edzer uh called space time it's published in the journal of statistical software have you used space-time do you anybody use space-time completely new it's uh it's also magical when you think about this the so the space-time data imagine you can have a space-time time series of satellite images and you load them into r and if you organize them as you say this is the class space time regular data frame uh of course space time full data stack and then you prepare the point data and this can be irregular and then you say over the points versus the space-time time series and it will do a space-time overlay so it will match the points uh with the with the cells in the space-time and so that's really magical because you then do it in one line the problem is when you work with the large data uh you cannot load it in r your your ram is not going to handle it uh around the r is not so let's say we are not so efficient with uh you know putting things in ram so then you have to maybe this space-time package wouldn't work and i don't use it by the way because we use we work now exclusively with large data sets so i will show you we're going to use the terra extract uh which will do then the space time overlay but my code unfortunately maybe not so not so easy uh because this space-time overlay um it's uh i will show you just very quickly how the code looks like so it's a bit it's a bit longer code don't get scared and it runs in parallel so this is the space-time overlay uh for meteorological data we have a time series of modis lens surface temperatures and we do a space time overlay and so i made this code it's not so it's not in a package yet but i should put it in a package and you see this code will run in parallel so it runs in parallel and it i made a little function called extract st extract space time and this this thing is a universal function so this function if you use this code with your data it will i promise it's the fastest way you can overlay a large amount of points over large uh stacks of geotiffs and they don't even have to be stacked to uh exactly the same they can be loose structures um so that's the time overlay and then when you do after the overlay you will get this so called space time uh regression matrix and then you can do modeling so i'm going to do this code today i'm going to show you how it works and we're going to go step by step and then i will show you the data structure what comes out i will also show you when i do computing how it goes into parallel so we will look at it observe it here when it goes to parallel and so that you get the feeling that you understand uh what's the behind but again i build up uh the things i showed you i built it up on top of the package made by robert hymas which is called terra package and this one allows uh so that's this extract st you see this is the function that allows to do very quick uh overlay and it it's actually magically fast i we overlaid millions of points over like i don't know one terabyte of data and it will take two minutes so it's a magical fast um and this is mainly because the tera is a program mainly in c plus plus it's another language that is uh something you should consider but c plus plus it's a different story uh okay going back to the slides uh so there is also a new book called spatial data science have you heard of this book and this is the url url is very critical keemschwartz 3146c4 it's a very critical url don't ask me but edzer is the main author and they are writing it you can subscribe to it and you will get a notification so every time they write a new thing and the things they discuss you can follow it so it's a live book basically they they write the book and you can follow them you can see all the mistakes they do and the fixes so they write book in front of everyone so it's a new new age uh writing of science i think it's the right way by the way it is a book down base so it's uh you can see all the code everything which you see it's a in a in a code and you see there's a chapter 13 a multivariate and special temporal data i don't know how much they have but they have i think only special temporal geostatistics so if you go to this book which will be cutting edge uh you will not find anything basically that we are going to teach you i don't know very little so everything we teach you it's not in the book because we do the machine learning for space time so i think we have a lot of original things for you uh but maybe they will add it i don't know because as i said it's open book maybe at one point they will say okay we are the one chapter on space time machine learning for mapping okay this is the just to show the space time reference if you use a google earth or any type of data in kml uh keyhole markup language then you can see how it looks like a space-time reference so you see here i have the a name this is a station a station that we use in the first exercise and we see that there's a measurement at some a point and point has the coordinates and it has a begin end time and you see that's all these theoretical concepts that tell you they are implemented in many software so for me actually as i said for me now mainstream is to do space-time modeling space-time analysis and for me any gis where i cannot set up a space-time for me it's a it's a cumbersome we cannot do our work so we have to use only the gs or we can do space-time and so one of the most popular gs is the qgs does it support space-time does it support space-time do you know pablo you don't think so i don't know there is a plugin there is a plugin to go around it but it's not if you just start the qgis and you say here i have a time series you know you're not going to get the slider right that's one of the reasons why we develop original uh interface because you see all of our data it has a slider there's only a few data sets there's no slider but 95 percent of data the slider so as soon as you start the data there is a slider right and so when you go to qgis if i open some project you know you say well i have a this is this temperature data uh i want to slide through time you won't get it but there is a plugin there is a plugin for time series data and this uh it's not going to animate changes you have to play with the animation but if you move around with the mouse you will get the curves you will see the curves of the values and that's also amazing uh somebody made this uh plugin uh what's the name of the plugin do you remember carmelo no another value for the time series uh i have to also remind myself but there is a plugin um which allows to time series time series explorer there's a couple of plugins but i have to actually remember which one is the one that lando uses upstairs maybe maybe please uh check with the lander so but there is a plugin and you there's more plugins of course as i said but there is a plan plug-in to put it and then you can see how the values change if you have a time series of values how does it change through time at the moment if i want to see changes to time i have to do this i have to click on and off that's the only way that's the qgs people that are experts in qgs if they watch this video they will say no no no you're doing it the wrong way you should do like this but officially uh it's not that easy to just go hey here's the space time data i want to see changes it's not easy in qgs but it's possible to set up so when you have a moving object and if you maybe follow me now i might sketch it on the on the screen there so when you have like uh something like a bird right bird is very complex maybe car is better when you when you when you say car then the car moves to some streets right and it goes somewhere here goes from location a to b but uh and when you plot it in the uh when you plot it in let's say uh some web map up you know it's just a point moving but you could also plot it as a trajectory so you plot that line of movement and you can plot the trajectory in space time so you can make a space-time cube something like this so that would be uh x y uh and then this was is the time and time goes this way so if you plot it then then you have something like like trajectory moving up up in the in time have you ever seen this so i was doing that for two years when i was in amsterdam i was plotting these trajectories of birth bird movement and it's very funny because you can then in in this is this space-time cube you can see that how this overlaps between species through time which you wouldn't be able to see if you just have a 2d view so you can see the density you can rotate that cube and you can see the density where they overlap in the movement right and so this is also another way to look at that and so now you have this so you have objects and objects have trajectories and then you have also fields and so what are the fields the regions so uh so i wrote here what could be the fields regions the next slide uh basically it can be one of the three things it can be a [Music] quantity of density of material so it's not a physical entity so that's like molecules you know quantitative molecules let's say um you know some element in soil or if you think about ph of soil this is quantity of elements um and so these are all density then it can be energy flux energy flux because energy also moves like if you look at the heat it moves through space right it's not a physical entity it's a movement of energy so you can also model that and then this is where it gets a bit confusing you can also do modeling probability for currents and so this bird movement i could also represent it with probability of occurrence and then represent it as a region okay but actually in the original sense it's uh entities it's physical entities but i could also model them as regions uh and when you look at this map i showed you basically on the open data science viewer this one it's basically an urban fabric it's a physical entities and most of land cover it's physical entities but we model that here as a probability of occurrence so if you say theoretically speaking how could we model this well we could model every house as a unique entity you understand if you want the total land cover a gs then i will model every three every house everything which is a physical entity i will model it as a physical entity and this is maybe in about 10 years this is how the land cover uh maps will look like because maybe in 10 years in everyone's already they inventory all the trees i think did you know that if you're not from netherlands they have a data set you can access they map the whole country one to one thousand now i think and they met basically every tree and the next thing they will chip every tree yes so they know how the tree is doing and there's already a concept called tree talker have you heard of it so you they put a chip on a tree and they follow the health of the tree every you know 10 minutes you get automated sensor network and so that will be a land cover in the future that you wouldn't map the probabilities but you will take any object that you consider as a land cover land use and then you will track it through time and so when you come to gs like this maybe uh if i'm still working in maybe 15 years we go here's the netherlands and here all the homes that disappeared and the new homes built and the the right that's how the land cover maps look like so but this is just talking about concepts uh first uh distinguish that there are models space-time models of objects for physical entities and there's models of quantities and quantities can be densities or you know masses of volumes or it can be energy fluxes or it can be a probability of occurrence this is the space-time cube this data set you see here this is this data set we're going to use uh here which is this methodological stations for croatia and you will see that when we make on the end we will make predictions and predictions will look uh something like this uh now i have to actually show you that in the because i don't have it computed but i have to show you here so we will make this predictions and they will look uh something like this so you will see we will do space-time interpolation of daily temperatures and we will predict for every day we will predict how the daily temperature changes and you see this is the stations and these are the predictions so this is the same data set this is this data set it's uh shown in a space-time cube and here is shown on a 2d map yeah it's not easy to match right you'd like well no it's not the same but there are methodological stations and you have the measurements every day their measurements you see some stations they are they are not they know there are no measurements so what happened here how come there are no measurements they had a day off and the methodological service no so um what we did when we first i started working with this data so i said hey let's do space time so we say meteorological data space time it's high quality there's lots of data so let's do methodological data and when we started working if you look at it they have something like 160 stations and they may have measurements daily yes so you have 365 days by 160 stations that's a lot when you load it in hour when i started doing woo this is not going to work it looks like it's only 160 points but when you load all the space time data and if you take let's say three years you are suddenly you're loading millions of values and that's the difference between space time and space in space time you get overloaded with the data very quickly so what i had to do just to do this plot just to make this plot i had to subset to i think maybe two percent this is what you see here is a two percent or something like this because i couldn't i couldn't plot it in r and i needed it for the book and for the paper so i said oh this is going to work let me subset so i just randomly took out some points so you get still you get an idea but uh for the sake of plotting i maybe plot here maybe 50 000 points sometimes when i say point by the way now imagine this when i say a point if you're in a space-time modeling then you say well are you talking space-time points you understand because the point still in many gis when you say point the point is it's adjust this this thing with the location the station location they said that's the point but in a space-time modeling the point is every measurement in time so one station can have millions of points okay so also be careful with that when you say i have a problem with that point usually when you do space-time projects then when you sell probably the point he said i have problem with this coordinate and this time this day that's why i have the problem okay but there's no problem just with the station coordinates okay um so this this is this thing and uh back in 2012 uh when i started going like okay let's work with space time and this thing we call 2d plus time so you have two dimension and you have the time we don't model changes above above surface below whatever we didn't model it so we did this 2d plus time and at that time i work a lot with technique in geostatistics i work with geostatistics and one of the techniques i work a lot with what's it's called uh regression clicking or universal trigging and then i said let's do this let's apply universal training to space-time data and we managed to publish this paper and you see there's quite quite smooth predictions and so we published that that data set and predictions but we did it using universal training it means we only use basically linear statistics and what we're going to do here is the same same basically same thing we do special temperature relation but using ensemble machine learning so we don't use kriging etc we use ensemble machine learning to do space-time interpolation okay and you will see it's a uh it's a bit more computational when i say a bit more computational it's about 10 000 times more computational but the uh effect that you can gain is that you can possibly increase the mapping accuracy and that's why we of course use that's the magic the if you get the machine learning to better represent some non-linear phenomena and to better subset and uh optimize fine-tune parameters then you you are motivated to switch to machine learning that's usually why people switch to machine learning and we're going to talk about that also in the second block today and in the thread block this is the so this is what i said what we did 2012 fast forward 2000 so last year we started doing this we prepared a 10 terabyte uh cloud options duotif a landsat time series we get feel it we removed all the artifacts we prepared it it's a 10 terabyte data set then we do space-time overlay space-time overlay takes about four hours so carmelo do you remember for the land cover or longer how how long will it take uh no no to overlay just the overlay one one day so like a 20 hours or something so it takes 20 hours to overlay 7 million points over 10 terabytes of data okay you would think wow it's a one day try to do it with any other method if you get it faster we'll be very very much uh like to hear from it this is the fastest we could get it imagine the fastest we could get is a one day to overlay um and then so basically one days you leave it also overnight and we run it in parallel right we run it in parallel and you can paralyze maybe further but eventually it takes a long time so we did a space time overlay and we had also some other covariates and then we fit the model we do space terminally fit a space-time machine learning and we made these predictions and this is this is what you see here this is the predictions um and you know there's people criticized as some homes you know they appear and disappear so we have some uh places where you have a home one year and then next year there's no home they say it's impossible right uh so there is uh absolutely there's noise uh but we did some time serious analysis through that and when you smooth out the noise uh we get uh accuracy on the second level and third level it comes to 80 percent so it is comparable if you if you look at the korina let's say land cover this is now the the state-of-the-art european data set so this is korina and this is what we did and they have also space time there's also space time so you can scroll but it's every i think every five years and so and also the polygons are quite large and when you look at our data we we have it every year and we have 30 meter resolution so much much much higher resolution and we can update we don't need you know every time they do new karina they start from scratch not from scratch but they have to you know get all the data and the data and we only fit a single model imagine a single model to model land cover it's mind-blowing we had a single model and now for the 2022 we don't need any point data we don't need it we only need the satellite images and the night light images that's it and we make the prediction of land color and so that that's that's also one of the reasons to use the the space time and so you can play with all these things uh it's super cool super important to have this functionality uh to do this uh swiping and so you can see the the differences in different data sets and we will keep on adding data sets and also if you do a european project if you produce a pan-european data set that matches the specifications we use we are most happy to host it so we're hosting it's on us so we're most happy to host it let me go further this is this results of predictions this is just a one tile uh when we do predictions we we split the whole european tiles uh i think it's 8 000 tiles we use the 30 by 30 kilometer tiles and then we do predictions and you can see this is the one um one tile in sweden and you can see the changes at the start of the project i was thinking oh my god land cover changes 20 years europe maybe you won't see so many changes we were thinking oh maybe this is uh um it would be difficult to see changes then when we got the results we were surprised actually lots of europe is very dynamic first we have urbanization uh happening in sweden there's a lot of i think forest management is more dynamic than in southern europe so we saw a lot of lot of changes and now we're actually very excited because we would like to go back we will go not back to the future we would like to go back to the past and we would like to map it up to 1990s so that you can even see what was the history of europe uh land cover-wise and environmental data-wise going from 1990 even 1985 if possible up to today um this uh all these things i'm telling you by the way there is a paper and you can read it it's not it's still in review so it's not accepted yet so use it with a grain of salt but more or less it's all uh described here and you can read and if you want to cite this you can you can always decide so one thing worth learning is that in noting is that when you have a 2d data you know if you do geostatistics you have a variable of three parameters and you have ns locations and s is for slack station or space uh when you have the 3d data you have a depth uh then you have four when you do 2d plus time then you have many more measurements they they go exponentially up and if you do 3d plus time then it's mind blowing then you have not only changes through time but also through through vertical dimension and then you you have literally you can have like a billions of observations and there is in our tutorial there is a one data set also super happy we published that paper it's called cook farm data set and this is a four-dimensional gis so you have measurements with the automated sensor network where the sensors are different depth in soil and you can do predictions also in 4d and so i will show you also the here we use this altitude so we use a depth also as a covariate and then we model machine learning machine learning changes using the depth and here i show you only one prediction i think i make one prediction because it's over a depth of 30 centimeter right but i could make predictions through space-time uh for the different times and for different depths now how do you visualize that statically how do you visualize the 4d when you have when you can only do it in a paper how do you do it well it's not it's not too bad let me show you how we did it so we did this um so i i downloaded all these papers locally so i don't have to look online um so here's the papers and this is the the 2015 paper and so how do you visualize the predictions in 4d you have to make something like like like this this one so you have the this is the time the time it's this direction and then this is the depth and believe it or not you can visualize it as gs the 4d data you can visualize in gs in which gs you can visualize 4d data which is the gis that supports 4d data do you know it's google earth in google earth you can visualize for the data and i will talk about this uh uh the second block and also tomorrow i have a session on google earth so please come to google earth i will show you you can actually put these slices and you can visualize them through time so you can see how variable changes through depth and through time it's possible but then you have to move to google earth to kml basically but it's possible to visualize 4d data i don't know about qgs um time series this is a time series example from wikipedia i think this is some just some noise with the trend yes but uh so they use it to visualize this is like time series dances data can look something like this it can be very noisy uh when you do time time dimension then you are you have to consider i have to understand time series analysis time series analysis is a lot of things i don't have time to talk about all these things so what i just give you a really really super short like a one minute run-through time series so number one about time series is that uh for lots of analysis is focused on so-called decomposition decomposition if you look this is the original signal i kind of it's obvious when you look at it so so you see there's some trend trend component and there's some seasonality component they can be split you split them then you say this is the trend this is not it and then you have like a random component and that's many times series does this decomposition and then also how you feed this trend you know which smoothing you use how do you separate from what is the noise what this is analytic and what is trend this is the whole science of time series analysis one example of the time series data is the coronavirus in the netherlands these are the infections there is a corona dashboard for netherlands and so you can go uh here in this dashboard it's in it's in dutch of course but it's very well done i don't know if your country has it but at any stage it's kind of like in sports you can follow the scores and tables and but it's a dark humor of course it's a it's a serious thing uh so you can follow the people getting into hospitals a number of uh vaccinations um and these are the the positive tests so we can go it is also gs gis very important at all so i can go and find uh um this is a we are here at the wackening and so we have uh 10 um 10 positives uh 100 000 so that to be about uh four it's about four so four uh yesterday i don't know four people had uh they were corona violence positive uh and then you can follow unfortunately they had also a time slider so you can go through the time you can slide through a time you see where in space there was more now it's very chaotic it's really like uh like leopard skin so it's very chaotic you used to be big cities they used to be the hot spots uh which you will expect but now the even amsterdam has uh less people less infections than some little area in friesland i don't know but this one is a time series uh and and this plot is of course wrong um because it looks like the last year in march there was a less infections so this plot is wrong by the way it's bias plot and why do they give a bias plot that's another question i wouldn't have done it i would have normalized it so it gives the correct picture but imagine the correct picture will be here is a peak here's a peak that goes down because then there was no testing the testing just started i don't know in april or something then when they started testing then it went down because there was a lockdown and then there was the we had the summer school here summer school when we had the summer school there we started going up but still uh less than now and now it's in the uh and now this was the vaccination starting somewhere here from april vaccination starting then the numbers dropping dropping dropping and this was the delta delta came from i don't want to put geographical locations but the netherlands most likely via england the delta came and delta was a big problem but not so scary when you go back to the to the uh the the hospital people are having really problems they have to go to hospital then you see that the delta was not as a big problem as in october last year so it's about even like seven eight times less so it means the virus is less deadly for us basically but you see this is the correct picture because this is in uh april last year so there was probably much more infections but there was no testing so they never normalized the data so that plot is actually it's a bias plot it doesn't show the real picture of what you're interested in and and so you see this was the peak so netherlands never went back to that peak from the last year so we went to some little peaks here and here so you can say there were three uh there were like say uh three distinct peaks and then this was like a fourth fourth milder and and this one there's a lit this is delta here but less deadly so that's let's say some good news you know in a way the reason why we're doing this uh workshop is that we knew that uh it's a less less deadly like way less five times less deadly so otherwise we would have done everything virtual but we said look it's getting people get vaccinated congratulations to netherlands netherlands is i think in the five most vaccinated countries in percent uh fully vaccinated their top five in the world uh from the big bigger countries than one million i think that's canada netherlands um israel uh so a couple of countries they have a really high vaccination rates and i think um as far as i know 90 percent of people do want to get vaccinated in netherlands there's only ten percent don't want to get vaccinated um okay so you see this is the time series and you have these components and uh you have the trend component uh so what is the strain component so we will take a look at that and what is seasonality so let's look at the temperature uh let's say uh doesn't matter if it's surface above two meter but at the wall temperature if you look this world temperature have these components from time series data and one interesting component is the long-term oscillations of the temperature and they are connected also with ice ages so this here was the ice age yes so this is the ice age then the ice age finished about 9 000 years ago and then you have the global mean temperature going up but it's a gentle change it's a you know going from my minus 0.3 uh to plus 0.3 so it's a change of 0.6 degrees and this is how normally the temperature of the planet goes through this interglacial periods goes up and down and there are some oscillations also here if you want to read about this glaciation and why does the global temperature change is very complex actually it's the whole field of physics but what you have to know is just there is a distinct long-term trend and the temperatures change and this is industrial revolution going 150 years ago and now this plot in this plot that's why the industrial revolution is scary and global warming is here because in this plot you see this red line this is the uh this is what happened now with temperature we are going to rise the temperature one and a half degree it goes beyond this spot it goes beyond this is a bit older plot it's only two up to year two thousand but uh we completely disrupted this long-term trend in the global temperature and then when you zoom into uh years years and days then you can see the temperature and also here water content uh temperature is much more regular this is actual data and then we just fit a spline this is the cook farm data set and we fit a spine for different depths and super interesting when you look at the soil soil is like a buffer for temperature so you see on the surface you have higher and lower uh oscillations and as you go deeper in the soil the temperature stabilizes it or oscillates more in soil moisture other way around you have actually more smaller oscillations on surface and then more oscillation in the deeper soil because the water goes through the soil it doesn't stay usually on the surface but it goes through the soil and then accumulates somewhere so when it accumulates becomes more stable and you see there's this seasonality effect and this is something that seasonality can happen uh there's a daytime seasonality so you have a day night oscillations in temperature and you have the seasonal uh oscillations so that's like uh winter summer et cetera so the temperature actually has a two cycles it has the the uh uh oscillations inside the day and oscillations inside the year and they're both very systematic right there's always uh nights are cooler than the day summers are warmer than the winters etc so it's always very systematic okay so that's the special temporal data how do you visualize spatial temporal data so you can do uh static plots so some of the plots i showed you this will be a static plot and you can see you can visualize four dimensional data it's possible and this is is this objects or is this regions am i plotting here objects or regions somebody so obviously there's a there's no like i don't use any vector structure and we're talking about water content right and water content what did you say it's kind of a density so it's definitely a field you represent it as a field so i present it with the greater data yeah so i just show what is the percent of the water in soil going from 0.15 so 15 percent to 60 percent so there are some pixels that potentially get to 60 percent water depending on the period so that's a static visualization and then i can do i can do time slices time slices is doing something like here in the qgs these are the time slices when i do this you see these are two days this is 2006 uh august 15 and this is 2006 august 18 so when i produce time slices if i do this i visualize space-time yes so this is visualization using uh slices um so that's another way to visualize and then the other thing we can do we can do animation or interactive plots and animation interactive plots this is this thing here that you go and you create animation and you can choose your speed you can slow down or do it faster yes but you can visually space time data so these are the main ways as far as i know that you can visualize uh space-time data the most simple way actually is to have like just a one static image and then put a point to that time series data and then just show show something like this then you just show the cross uh through that through that moment for that spatial entity that's the most simple way to visualize then you you simplify the data from 2d plus time you simplify it into the time only and you just have then the time series data this is kind of the most simple way to visualize and this is when you go in the data portal here um and if i for example uh click somewhere uh here i don't know then as you see then i get this thing and this is the the most easy way to visualize space-time data just uh having a one static image and then having a time series of the values for that specific query point okay and then we also visualize based on data but we only know about the point um now here's a nice example uh this is the package i made years ago but i'm very sorry i don't maintain it so well so uh i'm not sure if it runs still smooth what i can do i can put this data here loaded package this is called a foot and mouth disease data and as you see i can load the data it's in a package already and i can pick up the points pick up the dates and i add to the dates i add a new column which is the report the day of disease so foot amount is that somewhere in uk uh where they uh they observed uh uh location and time when some uh cow got the disease okay and so that's a space-time data set and you see i can put the data into the special temporal irregular data frame sdidf this is from the space time package so i start with having just the data set looks like this let me see so i have something like this or it's already converted uh let me see i'm going to do like this so i have it uh as a simple uh spatial object first so this is just a spatial point data frame so i have the 648 observations and i have the report today but it's just as a spatial object just so you see the the some of the coordinates uh if it's if it was not irregular if it was a regular then it would have been repetition in coordinates but because an irregular every space-time point has a new coordinate and then i convert that into the space-time data frame and then this one looks like this so now it's a it's a space-time object and it's a bit it's a bit more complex structure because it has the data the values then it has this locations and it has the time you see and even has some time index etc okay so it's a different it's a higher uh complexity object in this case it's a higher complexity object but it's the most simple extension and it's called space-time irregular data frame it's a more simple extension because basically it's just a 2d plus time point data set where time is not kept as a location in the sp in the sd part but the time is kept in a separate slot which is the time slot and if you do it if you do it like that when you come to this uh when you come to this example uh if you use the plot kml package you can go and make that plot you do plot kml and then we created we create a kml file that i can open in google earth and we can now look at it if the google earth starts now maybe i'll have to restart it but we can look at that data how the values change uh through space time um let's see how do i kill the google earth [Music] no i think i have to check with the lander how was the installation made for google earth i could i could try to pass this data i made let me just see if i can do it code but here i made the data set and i think i can pass it to this windows 10. and it should be visible from my windows yes let's see if i can open in windows and i will check i will check in the break what's up with the the google earth under ubuntu so here now we're looking at this dataset foot and mouth disease and here's the data i plotted it with plot camel on purpose i put going from blue color to red color blue color is the early cases red color is the most recent cases you have to read remove all these other things um and so here's a space-time data set and uh imagine how we can visualize it uh the guys who make google earth engine they understand statistics so you see that i have this temporal support so i can play with the temporal support and i can uh i can scroll through uh going up or forward then it will go very fast so uh so why why you why do i use this visualization well it's a complex thing that this foot and mouth disease and what do you see from that foot and mouth disease this is so at the beginning there were a couple of uh let's say outbreaks so there will be only like three outbreaks and they started spreading they're spreading all around and then they spread also further down here and then again uh there will be just a one because they're here they stopped here more or less they stopped at the beginning it was very fast then it stopped and then kind of a second outbreak here um and but you you see interesting like this outbreak here it happened but then if nothing happens serious but when i look at here then it got again bad and and that's the way you can visualize space-time data is this uh are these regions or are there objects come on guys wake up this is obvious from the image well how do i use to represent it what do i use to represent i use like a vector structure so obviously these are the these are the actual cows that uh uh observed as they're getting sick and so these are kind of objects but i could now uh what could i do to create that to convert it into grids what could they do well i could calculate a density i could for every let's say a week because if this goes a couple of weeks i mean it's over from march to august september i could take every week and i would calculate the density of cases like a kernel density and then i have a time series yes so anyway but you see it's a it's a nice thing to be able to go from uh r directly to google earth yeah and you can play with that and that's something i will talk tomorrow i will show you the plot came out and you can easily change the way you use the color you put another color scale and you can add also elevation i don't know so you can play with this what we call space-time aesthetics and and then you can create your own visualizations um so that's the other very nice thing is the google time lapse uh so if we look at the google time lapse have you looked at the google timelapse you can pick up any location in the world and this one is the wageningen and you can see last 40 years you can see this landsat images last 40 years it plays animation so i cannot zoom in more because the resolution is limited but you can see how the vocabulary change going back all the way back from the 1980s so it starts from 1985 and you can see that you can see our salons are getting better uh you can see campus growing you know so you can see campus growing campus used to be very small of course 1980s was like two buildings i don't know nothing it was just the cows there i think just a farm and now it's a big campus so you can play that video for any location in the world um and there's also of course the google pickup the ones that are most interesting so that would be some really construction sites or somewhere in uh in um i don't know in a tropical forest disappearance and things but please play with it i don't have time for that now so i will turn it off but please play with that google time lapse it's a history uh history the planet except is just history through images uh there is excellent book on space-time data by oscar perpignan he was also in our summer school one year but he has all these workout examples and has tutorials so please take a look at that if you're interested the book is just behind the link and lots of things it's available just in the code so you can reproduce it with your own data you can also make animations so if you never heard of it it's a fantastic book slightly outdated now there's a lot of new developments there's the tmap package which you can watch the tutorial from our summer school last week tutorial by uh martin tenecas and also excellent package and there is also animation in the in the tutorial so if i look um if i look at the if i will go back then you can see then you can see this animations and there is also a book called geocomputation with r also fantastic book by robin robin loveless and yakube giannis so also fantastic book and you can also read about how do you visualize uh space time using for example in your book you know so you can have a book you can make a uh with the art you can make a book online book a tutorial which will have animations and so you honey you visualize things with animation so you don't have like a classical book with a static image but you scroll through images and you will see the you will see actually animations of things uh so um so that's somewhere animated maps you see so i'm looking at the book it's really magical that uh one uh picture in a book it's actually animation right uh so it's not a interactive animation there will be books soon that you have an interactive space-time animation so you can play with the things like the the the things i showed you here imagine if i had uh inside the book that you can go and and play directly with the google earth or something so that would be also awesome but you will soon have books that will have that through leaflet or to some other way that you will be able to play in a book with the space-time data and with this thing i think i'm done with everything just the conclusions um so what's important what i said today well you maybe know more than me what's important but if i can emphasize myself what's very important to understand that the extending from space to time is not trivial there's a lot of things you have to teach yourself especially the time series analysis then the second thing prepare for the large data jumping from space to space time it can be from 10 times to 10 000 times more data and then if you want to do space time don't forget you cannot force any data to space-time analysis if you miss if you miss for example time for some locations you with many data sets we have to drop the data because so unfortunately surveyors didn't record the date when they went on the field and many points we have to drop from the space-time data sometimes when you drop so many data from space-time analysis let's say if you drop 80 then your slippery slope uh because you you really are trying on force to do some analysis with the data which is not fit for that analysis so that's something you should also think about um and then for to to analyze the data you can use geostatistics and i showed you i did that with the space-time trigging you can do process-based modeling or you can do hybrid methods and you can do machine learning right and we are going to focus this course we're going to focus on number two the machine learning so we're not going to talk about process-based modeling and uh we're not going to talk about geostatistics um but my interest next thing is to do hybrid methods so to combine machine learning with process-based modeling but that's another extra layer of complaints there are some groups that are doing it by the way experimentally but that would be another let's say paradigm in a spatial spatial temporal data science and with this thing i close the first session i will stop the live broadcast
Info
Channel: Tomislav Hengl (OpenGeoHub Foundation)
Views: 594
Rating: undefined out of 5
Keywords:
Id: hRsBjHhXE30
Channel Id: undefined
Length: 82min 45sec (4965 seconds)
Published: Mon Sep 06 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.