Geo for Good 2022: Intro to Machine Learning and Earth Engine

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] EMILY SCHECHTER: I'm Emily Schechter. I am a product manager on the Earth Engine team. NOEL GORELICK: I'm Noel Gorelick, I'm one of the founders of the Earth Engine team. EMILY SCHECHTER: And today we're going to be giving an introduction to machine learning and Earth Engine. So if you've never tried machine learning before this session is for you because I'm going to start with the very basics of what it is and why it's useful in Earth observation. And if all of that stuff is a review for you, this session is also for you because Noel is going to talk through applications of machine learning with classifiers in Earth Engine. So first we'll talk about the introduction to machine learning and then Noel will go through classification in practice, techniques, as well as some issues and limitations with the classifiers that we have in Earth Engine. So first, introduction to machine learning, why should machine learning matter to you? Well, people like you in organizations all over the world, whether a researcher, government, NGO, nonprofit, a business, you want to understand the landscapes around you to make changes that positively impact those landscapes. And so one way to do that is by mapping land cover. Land cover is the observed physical cover of the Earth's surface which describes the distribution of things like vegetation and water and human-built areas, and that distribution is a map. So we can take imagery and we can turn the pixels into a land cover map and say, we'd like to do this at scale say for maybe the whole surface of the world for a time series of imagery. When you can develop maps for a time series of imagery you can start to do things like understanding deforestation, change in water resources, changes in cropland, changes in carbon emission. So being able to take a lot of imagery over time and turning a lot of those pixels into scaled land cover maps is really useful. And the way we're going to talk about creating those maps at scale is with machine learning. So what is machine learning? It's simply an approach to making lots of small decisions. So things like is this email spam yes or no, or maybe how much should I bid on this auction item? The first of those examples is classification, where you're predicting different classes, and the second is regression, where you're predicting a quantity. And when you're trying to make a land cover map this is a classification problem, where you're trying to predict whether a pixel is maybe vegetation or water or built urban area, and we'll do this in our examples later on. So the actual approach for how we get to that point is very similar to traditional programming, you always have some information, you have some code, a recipe that produces an answer. So when we talk about traditional programming, have an engineer that's writing rules for how you get from the information to the answer. But when we talk about machine learning you start at the other end. So you start from your inputs and outputs and then you use these inputs and outputs, and that produces the recipe, which is the machine learning model. So machine learning is the science of programming computers so that they can learn from data. So a couple terms in machine learning, the examples that the system uses to learn are called training data. The individual predictors in that data are called features, and then the model is the recipe that's used to make that decision. And there are so many types of different ML systems that it's useful to think about them in terms of broad categories. So one type of categorization is based on the amount and type of human supervision they get during training. So in supervised learning, you know what your outputs these are called labels, and a typical task is classification, where for example, I know that I want my model to spit out classes of yes or no is this spam or is this urban or water or vegetation for land classification. Another typical task is predicting a target value, which is regression. And then in unsupervised learning, you're asking the model to actually tell you what those output groups are, and I might not know how many of those groups there are. This is a great way to do exploratory analysis. And an example of unsupervised learning is clustering, which detects groups of similar inputs. And for completeness, there's also semi-supervised learning with partially labeled data, and there's also reinforcement learning where an agent chooses what to do in a certain situation but we won't go into those in too much depth yet. So for today's example, we're going to focus on supervised learning for that land cover classification example that I talked about earlier. So here is a simplified view of the machine learning workflow, and I'll go through each of these steps using that example of land cover classification. So first you decide on the inputs and the outputs, what's going to be my starting information, and what will my answer look like. Next, you gather that starting information for your inputs, and we call this gathering training data. Then you're going to select your model or the type of recipe that the machine will use to determine the relationship between the input data and the outputs. We then use the training data to train the model and then apply that data-- apply that model rather, to new data, which is called predicting, and then you see how you did. Many of you probably know that there are really more steps hidden in here to do things like split your training data and to tune your model, but let's start here just to all get on the same page first. So first, let's take that land cover problem and decide on inputs and outputs. So let's say we're working with satellite imagery, here we're working with a Landsat composite image. And let's decide that what we want to do is classify every input pixel into an output of three classes vegetation, water, or urban. And let's say here in Landsat we decide we're using six spectral bands. So for each pixel, we look at the reflectance value in those six bands, and then our model will tell us whether the pixel is vegetation or water, or urban. So the next thing to do is to get some training data to work with. So to get our training data, we need to identify the places, in this case, the pixels that we're grabbing data from. One way to do this is using the drawing tools in the Earth Engine code editor to handpick points where we'll grab the values for the training data. So you can see here, this is in the Earth Engine code editor I picked a bunch of points that I think are water, some that I think are vegetation, and some that I think are urban. And then at each of these points, we'll grab the six bands of reflectance, put that into a table and that becomes our training data. Now, this might not be the best way because I'm handpicking the training data points, and we'll talk about that a little bit more later. So now we get to select a model to work from. So let's say for the ease of drawing a graph here I only had two classes, let's say red's urban, green's vegetation, and I wanted to create a model. So in this case, you might imagine just drawing a simple line, maybe I draw it right here, that's actually an ML algorithm, it's called a Support Vector Machine, and deciding where to draw that line is the machine learning algorithm. And this can really highlight the importance of getting good training data because where the line is drawn depends on the training data. So there are a bunch of different types of algorithms, here are a few examples of supervised learning algorithms. But what all of them are doing is segmenting your data so that when new data comes in, it applies the model and tells you what it thinks the output is. So when it's time to train the model, the algorithm is tuning some parameters to fit the model to the training data. So now we have our trained model, we're ready to show it some new data and see what it thinks so that when a new data point comes in here the model is predicting that it's going to be in the green vegetation class. So when we go back to the land cover example, when we take our trained model and we use it to predict the land cover class for each pixel of that Landsat image, we might get something like this, where each pixel has been classified into vegetation, urban, or water, represented by the three colors. So we're now left at our final step, to assess how well we did. And depending on how well we did we can tune the model and rinse and repeat. Now, there are a bunch of reasons why things might go awry since what you're doing with machine learning is taking some data and selecting a model to train on that data. Two types of things that might go wrong are bad data or bad model. So what makes bad data? Well, the system won't perform well if your training data is too small or if the data is not representative, it's noisy, or maybe it's polluted with irrelevant features. Figuring out a good set of features to train on is a critical part of the process called feature engineering. And what makes a bad model? The model needs to be neither too simple nor too complex, which might result in the model overfitting or underfitting. So that was your basic class on what machine learning is. And to show you what this actually looks like in practice in Earth Engine I will hand it over to Noel. NOEL GORELICK: Thank you. Hi there. [APPLAUSE] A thing we didn't get to do at the start, I like to know my audience a little bit. So how many of you have already built a classifier in Earth Engine before? Everybody. OK, we'll see how this talk goes. All right, thanks. OK, we start off the talk with the thing you're very-- not ever supposed to do, which is show some code. If you're trying to build a classifier in Earth Engine, this is more or less all the steps that you need. And in fact, maybe even a little more than you need for the absolute basics. Instead of handpicking a bunch of points, often you'll want to sample from something like a land cover map. So you might choose 50,000 points from some existing land cover map to build your training with. And so there's a number of pieces in Earth Engine around sampling, and we'll talk about those more in a minute but sample is one of them. And really, you just give it a polygon or a square and it goes and gets every pixel in the square and puts them into a table so that you can train with it. In this particular example, we're going to then slice that table up into two pieces, one for training and another for doing holdout validation. And once they're in a table it's quite easy to do that, you just add a random column and then filter on that column so that in this case, anything less than 0.7 becomes training, and anything greater than 0.7 becomes holdout, the random numbers between 0 and 1. And it's a uniform distribution so this is a pretty good way to get an almost exact number of points. If you have 10,000 points, this will give you about 7,000 points in training but since it's statistical, it could be 6,900, could be 7,100, somewhere around there. You then need to pick a model and train it. So in this example, we're making a CART classifier, I'll talk more about that in just a second. And we're training it with that training data. And you have to tell it what property inside the table is your supervised class. So if we sampled a bunch of points from a land cover map, in this case, those are going to be stored in a property that I'm pretending is called actual. I'm pointing at this monitor you can't see here, I'm sorry, point up there. And then you classify with it. Once you've got a trained classifier you just apply it to an image or a table and you can classify the elements in those as long as they have the same schema as your training data. Assessment is an important part of doing a classification. And so the 30% of data that we held out we're going to classify, and we're going to get a confusion matrix out of that, where comparing the actual numbers that we got out of our land cover map, versus the predicted numbers that the classifier produced. And then we can get an accuracy or other types of error metrics on top of this. So these what, 10 lines, seven lines there, eight lines. Eight lines, not including loading up a bunch of images to classify on, that's pretty much it and everything else is just enhancements in how to make this classifier better and how to make it believable. You can easily make a classifier that lies to you. And so you want to know whether your classifier is lying or not. We have a bunch of different classifiers, this has changed over time as we've changed libraries that we end up using. For the most part, people use Random Forest just because it's easy, you don't have to think about it very much. But if you really want to squeak the most out of your classifier, SVM might be the way to do that, Support Vector Machine, but that requires a lot of knob tuning. The good news is there's a lot of knobs to tune. The bad news is there's a lot of knobs to tune and typically straight out of the box it doesn't work very well. In our library, CART and Random Forest are basically the same thing, CART is one tree, and Random Forest is a bunch of trees. They're a little bit different but not very. And then we have another classifier that's commonly used for things like spectral angle mapping, which is a thing you do with hyperspectral data frequently, and that's built into the Minimum Distance Classifier. We'll talk about that in a little bit more. There are a couple more here that you use under specialized circumstances. If you're doing species distribution modeling, we added a Maxent classifier about a year ago, maybe a little more. And Gradient Tree Boosting is a version of Tree Classifiers that has a little kick start at the beginning to make it run a little faster. Finally, at the bottom there, there's another classifier that we have, which is used for loading and saving-- loading saved classifiers. So it's a classifier called Decision Tree, and there's another one called Decision Tree Ensemble, which will let you load an existing CART that you've saved or the Ensemble will load. Hello. Well, I'm done. Time's up. OK, thanks for coming. We'll give those guys a second to figure out what's going on. It's dark too. You didn't need to see me anyway. There we go. OK, so a real simple example, building off of the example Emily had a minute ago, when you build a CART, and you give it some training data, it just makes decisions about one variable at a time, and makes what we'd call a horizontal split. So in this example, there were 78 input features, and the most important decision was whether or not the near-infrared band was greater than or equal to 0.08. And if it was, then this classifier said straight away that whatever it was looking at was class II. And the rest of these nodes in this little diagram here are nodes in a tree, and each one of them is a test for-- an if test, and then the highlighted numbers on the right-hand side are the output once it's made those decisions. So 78 points went into this classifier and it made-- well, it made 13 nodes, but it pruned out some of them that it thought weren't important. So that classifier boils down to six if statements that you could implement outside of the classifier if you really wanted to or the classifier will apply it for you. The text representation you see there in the tree-- I keep pointing at this monitor-- the text representation in that tree is actually an output you can get out of Earth Engine once you built the classifier. So each classifier has a function on it called explain, which will tell you something about the trained classifier. And so for CART, it will show you the tree, for Random Forest, it will show you all the trees. For different classifiers, there are different elements in the explain vector that will tell you about that classifier. The classifiers in Earth Engine work both in classification mode and regression mode most of the time. So you can actually specify which way you want a classifier to work. So Random Forest does classification, it will also do a random forest of regression trees. So regression classification are two of our output modes but you can also ask for a probability mode, which will go and compute how many-- in the case of Random Forest, how many trees voted for the final output. And so if it was six trees for and four against, then it's a 60% probability that the answer that came out was true. This worked pretty well for a while but then we had some of our advanced users that really wanted details about what the classifiers were doing. So we added a couple more modes called Multiprobability, Raw, and Raw Regression. Multiprobability just shows you the probability for every class at each point. Raw tells you whatever the classifier was doing with its internal representation. So if you have a bunch of trees, it actually gives you the vote from each tree. If you have a CART, there's only-- there is no raw interpretation but again different classifiers will give you different answers on the raw part. And Raw Regression is the same thing but just for regression mode. So in addition to picking a classifier, you kind of need to pick what kind of output you want from the classifier. This will help guide you a little bit. Not all the classifiers support all the output modes. In fact, the only one that supports Raw Regression is the Random Forest because everything else is just an output, there is no internal representation. This will again help guide you a little bit through that, and-- yeah, we'll keep going. Two of the classifiers I already mentioned, one of them-- these are classifiers we built in-house. Minimum Distance is-- just computes the mean to all of your classes, and then as you bring in new data, it computes the distance to those classes. And how that distance is computed can happen a couple of different ways, including Spectral Angle Mapping, which is a popular model. The thing about Minimum Distance is that it has a mode in regression but what it actually outputs in that regression mode is the distance, how far away were you actually from the class that it picked, and you can use this to chop off things that are just too far away. One more example, I mentioned, Decision Tree, this lets you load in existing trees that you've saved. So if you build a CART classifier and you call Explain, it will tell you the tree. You can save that string and put it back in at a later date into this classifier and it will continue to run. So if you don't want to retrain your classifier or you want to be absolutely certain that the thing that you are using now is the thing that you built before, you can use this rather janky method to be able to save that classifier. Same thing for Random Forest, you can save the list-- the string representation of all the trees and load it into the decision tree ensemble classifier. OK, so a little bit more about the pieces around the classifiers. In this case, talking about sampling and training. As Emily pointed out, this is essentially the schematic diagram of what we're trying to do in building our training data. You've got some points either hand-built or randomly picked, you are assigning a class to each one of those either manually or from something like an existing land cover map. And then a set of covariates that you're sampling at each of those points, turning it into a table, and that table becomes our training data. How you do all that depends on what kind of information you're starting with. So if you have one big polygon and you just want all the pixels out of it or some fraction of the pixels out of it, you can just use sample and be done. It's pretty simple. You can say I want 10% of the pixels, and in the drawing here you'll see a bunch of yellow pixels, which are all of them, and then a couple of green ones interspersed in there which are the just 10% sampling. Sometimes, however, you're more likely to have a bunch of regions, maybe a region inside of an urban area and a region inside of a forest. In these cases, you would do Sample Regions. Sample and Sample Regions are identical to Reduce Region and Reduce Regions in the way they work if you're already familiar with Earth Engine. And in this case, I've got three polygons each with a class, and then as it samples all the pixels in that polygon it assigns that class to each pixel and all my pixels come out with a class. Both of these work, like I said, they're similar to Reduce Region and Reduce Regions and you can do the exact same thing they do with those two tools, but there's some problems with the process of doing that and we got smarter over time. One of the problems is that, as you assign random points, many of those points might not fall on valid pixels. So in this pixel in this picture, I've picked 1,000 points over Cuba, turned vertically for you geography nuts, and a bunch of those pixels fall out in the ocean. And so I'm not getting 1,000 pixels of Cuba, I've just got a box and I've said give me 1,000 pixels in the box. So Sample works exactly the same as this second little example here, it creates 1,000 random points and then samples each of those points if you're using the random sampling part. That doesn't always work so well. And so another thing that you can do here to mitigate a little of that is to just sample all the points and then filter out some of them at the end so that you're getting close to the number of points you actually wanted. This gets really tough as you want exactly 10,000 points to go in your paper, you don't want 9,750, you want exactly 10,000. And so we built a better tool for all this. So you can use Sample, you can use Sample Regions, you understand a little bit how they work. But in reality, I almost always use Stratified Sampling, even if I'm only sampling one class I use Stratified Sample because it knows what to do with masked pixels. So you say give me 10,000 pixels, it goes through, it looks at every pixel, it keeps the best 10,000 at any one point and keeps adding new points into that to make a random pool that it chooses from. If you've got multiple classes it does actual stratified sampling properly. So all the things that we learned over five or six or seven years of using Earth Engine and trying to build classifiers we packaged up into Stratified Sample. So it probably does what you want better than these other tools. OK, onward. When you run Stratified Sample you get an output that looks something like this. In this case, I asked for-- this was not 3,000 points, this was a few hundred points in each of the classes. And you can see there's huge swaths of the map where there aren't points because I don't need them out there, I'm asking for 300 points in each class, once I've got my 300 points or if my class is really big like this green class, the points are quite scattered out but in the urban class, that red class there, there aren't really thousands of points to choose from. There's maybe a little bit more than the 300 asked for. And so they're all quite close together. So that's a thing to keep in mind, that if you ask for 10,000 points, and your map just doesn't have 10,000 points in it for the class that you've asked for, it's going to give you the 700 that are there and that's it because there aren't more points to have. There's an example in this deck that will take you to a nice-- to this map and show you nice ways to do stratified sampling but that leads us to an additional point, which is if all your points are right next to each other, if you sample 1,000 points from one field, that's probably not a good representative set of data. Those points are probably spatially autocorrelated. So a few months ago, I wrote a blog post about how to avoid spatial autocorrelation, it's a much more complex topic than we have time to get into right here but essentially, you pick your points on a really coarse grid and then you sample those locations on a really fine grid. There's a link to the blog post and it makes some pretty cool images. OK, a little tiny bit to say about accuracy assessment, I actually had k-fold cross-validation in here but it was really ugly code so I didn't show it, but the thing to talk about here for our accuracy assessment is that you end up making a confusion matrix either by calling confusion matrix on the classifier or by calling error matrix on a validated table. They both produce a thing called a confusion matrix but technically they're not, one's a confusion matrix, one's an error matrix. Once you've got one of these matrices in hand, you can query it for different types of accuracy assessment. Accuracy, consumers, producers, Capa, and I just added F-score last month. So yeah, get your-- do your accuracy assessment. When I teach this class, I didn't mention that this is usually an eight-hour-long class, a whole day-long class. If you've taken EE102 from me, that's a condensed version that we just kind of cram into three hours. And so you're getting an overview of the whole eight-hour class in 40 minutes. So I don't know if that's a good thing or a bad thing but when we teach the class, 90% of what we do is getting to this point, and then just doing things to the classifier to increase our accuracy score. And hopefully, these are-- the rest of this talk is some techniques that you'll be able to use in the process of making a better classifier. OK, you with me? I know I talk fast. Lots of caffeine today. OK, so some techniques to help you make a better classifier. These talk about temporal context, spatial context, and then some object-based image analysis. Most of the time you're going to be doing something like a land cover classification for forest, non-forest, or maybe you do water, non-water, or different crops, or it doesn't really matter, what matters is you're going to take something like a year's worth of data or maybe one image but most of you are probably working on a year's worth of data because you have to worry about clouds, and you're going to try to figure out how to compress that data down to something you can classify because we can't classify data that looks like this. This is one pixel in one location and there are 40 or so points in this location but one pixel over there might be 42 points, and the next pixel after that, there might be 37 points because of clouds and the way images align. And so a lot of the energy in building a good classifier goes into engineering your feature vector. And one of the simplest things to do, and people do this all the time, you'll see lots and lots of Earth Engine based-publications where people just take a median of a year's worth of data, and it's kind of good enough, right? Especially if you're doing something big and obvious like forest, non-forest, or water, non-water, right? Where the difference between the two is a giant jump in whatever index you're looking on. But we can do better, right? So people do this annual median all the time it kind of works. But one of the guys we learned how to do classification from, Matt Hanson, he uses a method that looks a bit more like this. He does normalizing statistics. So instead of taking one median across the whole thing, he actually takes percentiles, 10, 25, 50, 75, and 90. And one of the reasons this is good is that if you happen to be in a location where there's only three pixels, these statistics are still valid. They're not great numbers, but you can still get a tenth percentile of three numbers. And a 50th percentile and a 90th percentile. So what you end up with-- what you need to do classification is feature vectors all of the same length. And this lets you compress some of the information about, in this case, this NDVI time series into five numbers. And this particular spot that I picked was a forest and so the 90th and 75th percentile are way up there at the top, and that's a good signal for the classifier to use for forest-iness. But I think we can actually do a little better. And so when I teach this class what I teach well, there's how you do the simple percentile composite. What I teach is if you're actually looking for something that involves phenology, something that changes over the course of the year, then let's put something in the feature vector that also changes over the course of the year. And so I teach seasonal composites. And so we take all the images from January, or January, February, and March, and lump them together into one composite. And then April, May, and June into another, and which months and how big those seasons should be, maybe you only want three seasons, maybe you just want two pieces of the year. That's a thing to play with but that lets you model the phenology pretty well. It's just four numbers but that curve kind of looks like you'd expect for a forest green up and senescence curve and it crunches all these 40-some-odd points down into four numbers per band that you can then put into the classifier and be classifying on something that looks and feels a bit like phenology. If you're doing this for water, maybe not the right thing, it needs to be something that changes over time, but also it wouldn't be the worst thing in the world. It would just be spreading the same number out over four pieces of your feature vector. Make sense? The code to do that is real simple, you just run a calendar filter on your collection for the months that you care about, and then shove those into an image collection called toBands on the result. People actually reading code. I love people reading code, that's awesome. OK, onward. A much more sophisticated example of trying to incorporate temporal information into your feature vector is an algorithm that we added a few years ago called CCDC, continuous change detection and classification, and essentially it fits harmonics to a longer time series, and when the time series no longer matches the harmonics well enough, it switches to building another harmonic. And you can classify on these coefficients directly and it works pretty good. I won't say much more than that, because we're actually doing a session on CCDC tomorrow where you can come and actually do a hands-on play with a user interface you can play with. Wednesday at 10:15. And then there are Earth Engine users that have successfully managed to do a full double sigmoid phenological model. I'm simply going to cite the paper because I've never actually used this user's code but there is code available, they managed to work it out. And as you can see, it fits pretty good. All those are just a way to compress your feature vector down into a smaller number of points so that you can have a fixed number of points to classify on. Sometimes, however, what you're looking for is more spatially oriented than temporally oriented. There may still be some-- once you've got a composite, you may want to include some spatial information in it because generally any amount of spatial information that you can include with your classifier is going to make it better. This is a paper that I was reading and I found this quote to be very poignant, it's-- I think this was just someone's PhD dissertation but-- and a bit of a throwaway quote in it but I have found it true. Every time I'm teaching this class on improving your feature vectors and making your classifier better, throwing any texture in there at all, any kind of window texture metric generally helps the classifier. Not a ton, depending on what you're looking for, if it's just forest, non-forest, or water, non-water, doesn't help a ton but it does help, is a few percentage points. And if you are looking for something that is object-shaped, round farms or houses and fields, those sorts of things, then it can make a big difference. A couple of tools that we have built into Earth Engine for handling spatial context, one of them is just NeighborhoodToBands. So if you've got a three band image and you want a three-by-three neighborhood out of it, you can just call NeighborhoodToBands, and it will shove all of that data into your feature vector. But that can be a lot of data. So in this case, I've got three bands, nine points in each band, that's adding 27 elements to my feature vector. I can suddenly have problems with dimensionality. Random Forest is one of those things where if you've got hundreds of elements in your feature vector it may have trouble figuring out what you mean without a ton of training data. So much more likely to be useful is a reduced neighborhood operation, where instead of just shoving all the pixels into the feature vector, you shove some statistical computation on top of those pixels into the feature vector. So think of this as a windowed variance or a windowed mean or a windowed min-max, or even the percentiles, 10th percentile and 90th percentile of your window, that's a technique you've already seen how to do and some amount of windowed information that will help your classifier a little bit. These work. I think Hansen uses variants as part of his thing and a linear fit to get a slope. But what people typically do to include texture is actually call our texture algorithm. The gray level co-occurrence matrix texture algorithm, also called Haralick, Haralick textures. The way those work is they're actually directional, so if you're looking for things with directionality they're really good for that. It takes inside of a window, pairs of pixels in each of the possible directions. So if we've got a three-by-three window, there's only four possible directions but it applies those four directions to every pixel and takes the pixel in that direction. And computes some statistics with it, shoves all those into basically a two-dimensional histogram and then does statistics on the two-dimensional histogram. Some of the things that come out of the GLCMTexture metrics are a total energy measure that's the first one, I don't remember what IDM is, but there's a contrast entropy correlation and nine others, there's 14 measures in here. Some of them are better than others, often it's a case of just throw them all in, see what comes out as the most important one, and then throw the rest away. That isn't the best way to do it but it works. Fast and dirty methods to get your results. You can actually do this-- by default GLCM will actually do this for all the possible directions and average those numbers together, but you can turn off the averaging and say I just want the 45-degree numbers, or if you are a window bigger than three by three you can do numbers smaller, like 22.5 I think would be the next number. And so you can get an angle-- a texture metric specifically for an angle that you're looking for in case you're looking for cows that are facing north, that's a paper that came out. Turns out cows tend to line up north/south for reasons no one quite understands. This is one of the ways of doing-- measuring if you have north-facing cows in a bunch of your images. OK, onward. This is another case where we could do a whole class, in fact, we have done a whole class on object-based image analysis. The idea behind OBIA is that if you've got a field, almost all the pixels in that field have roughly the same spectrum. So you don't need to classify all of them, you can take one representative pixel out of that field and move on. And so you take one representative pixel from each field or each homogeneous area in your image. And these areas are called superpixels. So we have an algorithm for super-pixel generation called SNIC. You drop a bunch of seeds, it does seed growing to find the most homogeneous regions, and you end up with an image that looks a bit like this. And then depending on what you're looking for in each of those regions, you can either classify directly on what you've got, thereby massively reducing your computational costs or you can compute other kinds of patch-based metrics that you can then include in your feature vector as well. I'm on a paper with folks at ETH in Zurich, where they're doing forest patch measurements through this method. Because this is a big topic I'm just going to actually point you at the presentation and video that we've already done where we launched them in 2018. OK, the important part of this, and I think maybe the reason many of you came, the classification system in Earth Engine has some significant limitations. They are generally around things like memory limits and saving classifiers but also the amount of control that you can have in Earth Engine on almost anything is limited. The motto of Earth Engine is let us handle it so you don't have to. In many cases, you might actually want to be able to handle a bit more of the pieces. So I'm going to talk a little bit about that. The first problem is a 100-megabyte limit, any training data that you're going to-- any sampling that you do in Earth Engine produces a computed table. And any classifier that you produce in Earth Engine produces a computed object. Those things have to fit inside of our caching system, and the limit on a cache element is 100 megabytes. If you want to sample a billion points, you're not going to be able to do it in one go in Earth Engine, you can do it, you're just not going to be happy about how you have to go about doing it. One of the things you can do is that you can train a classifier with more than one table. So if you have 300 megabytes of data, you can just split that up into three chunks or four chunks. In this case, is an example where we are making one table per land cover type, and maybe we want a million points or 10 million points in each of those. And that works out to be more than 100 megabytes but then you can just train the classifier four or five, six times to get all that data into the classifier. And that might work except often you will end up with a classifier that is bigger than will fit in the cache. If you have data that is noisy, I call it crunchy, you might end up with lots of nodes in your tree, and that object might end up being bigger than 100 megabytes. And so the next thing you can do is tune how big of a classifier you're willing to allow. One of the good things and bad things about Random Forest is that you can end up with nodes that contain-- that represent a single training instance. So you put in a million or 10 million training instances and it will keep sorting them out until it gets to a threshold where it doesn't make any sense to split any further. But sometimes that threshold is one training point fell into a leaf on your tree. Often those are not very good. And so maybe you want to limit the minimum leaf population on your tree to five nodes or you can just say stop at 10,000 nodes or 100,000 nodes. The correlation between these numbers and 100 megabytes is really tough to tell you because it matters on how many features you're using, how wide they are, whether they're ints or shorts or doubles, really tough to turn that into a number. So these are numbers that you're going to have to play with. If you get an out of memory error while you're building your classifier, the two places it can occur are building your table or building your classifier. If you can print the size of the table, it wasn't the table because you can't print the size of it if it's too big. So do that first. If your table is good, then the classifier is too big and this is how you can limit the size of the classifier. When you're saving classifiers right now the only way to save them is to make a text representation of them and save out the text. And you can store that text in cloud storage and then bring that back in as a blob, and use the blob in decision tree or decision tree ensemble. It's a little bit of a roundabout method to do it but at least it exists and you can do it. We are in the process-- one of the great things about me talking and maybe why they don't let me talk so much is I'll tell you about all kinds of stuff that we're working on. One of the things we're working on is being able to save any classifier. It's almost done, I don't know when it's going to be finished but we're working on it because this is horrible. I don't like doing it, you shouldn't like doing it but you can. All right, so the last issue to talk about here is control. If you are just starting out with Earth Engine, if you're just making your first land cover maps, the path that we've carved probably fits very well. You should be able to sample, train, and classify in a few seconds or maybe a few minutes if you've got a lot of training data. But often you'll work with that and find edge cases and want more control. And if the other pieces I haven't already talked about give you the control that you need to do that with, you may need to find other options. This happens a lot when you have too much training data. If you really have a billion points, doing it through the roundabout method that I demonstrated is probably not your best choice. You might want to use one of the cloud products for doing a larger-scale classification or download that data and build your own classifier. The models that you can build in Earth Engine, you've kind of seen them all. There are not CNNs for instance, they're not deep learning methods built into Earth Engine. If you want a model that's more complex than Random Forest or Support Vector Machine, or Gradient Tree Boosts, again, one of our cloud-based AI platforms might be a better choice. And finally, the cloud-based AI platforms are popular because they have lots and lots of features. If you want a training, what is the curve where you know how-- accuracy versus training curve, I don't remember what those are called, that's tough to make in Earth Engine because you have to train a new classifier for each point, et cetera, et cetera. It's a one-liner in tools like Cape and Vertex. So as you become more advanced with your classifications in Earth Engine, it's very easy to outgrow what we've built because we're not trying to do everything, we're trying to do a certain set of stuff very fast and easy and efficiently. And so the integrations with our Cloud platforms are not so great just yet, but we are definitely working on them trying to make them better. And there's a session on Tuesday at 2:30 specifically about what we've got now, and a little bit about what's coming, right? EMILY SCHECHTER: Right after this. NOEL GORELICK: Right after this. Tuesday, 2:30. All right, so and finally, if you are an Earth Engine machine learning user, our user experience folks would like to talk to you. Please check out the QR code. I think it's a survey, and they'd like to know what you're doing and how you're doing it. And if you're using tools we don't have and what are the pain points and the good points of those sorts of things. And I think that finished much faster than either of us were expecting. So if there's any questions, we'd be happy to try and answer them now. There are microphones up here on the edges. No questions? There's one. AUDIENCE: Hello, my name is Kelsi. I'm currently at the NASA Jet Propulsion Lab in the machine learning group, so we're trying to work on deriving scientific understanding from data science, and haven't really utilized Google Earth Engine a ton, the group itself. I have a little bit, particularly for machine learning. So thanks a lot for the session. I'm curious if you are going to explore, going to include feature importance or if this is already something that is a part of it. I'm not sure because again I'm not very familiar. Particularly, because when you're trying to extract scientific analysis out of a machine learning algorithm, Random Forest, these types of things, feature importance is very valuable. So I'm just curious if that's something. NOEL GORELICK: If you run explain for Random Forest it will give you a feature importance based on G-score. Which is not exactly what everybody thinks of when they want feature importance but it turns into the same thing. To the best of my knowledge, that's the only one of the classifiers we have that comes with an importance output but yeah, there is one. And it's in Random Forest. AUDIENCE: OK, thanks. AUDIENCE: Hello, my name is Callum, I'm a contractor at NASA Goddard. I had a question we're working right now on downscaling air quality data, so gridded air quality data. But one of the data sets that we're interested in incorporating as a potential feature is point source data. And so if I'm trying to classify over a grid but some of my data is points, that's not going to get sampled in a random sampling of points. So I guess, what's the best way to incorporate data like that into one of these models? NOEL GORELICK: I don't know, because that is not the kind of science I do. I can tell you some things you can do and whether or not they're valid decisions is up to you. One of the things you can do is you can just use Kriging to try to make a surface out of all those points. For air quality, I don't know if that's a good choice. For water, OK, sure it works great maybe. Another option is to blow out, buffer those points to be a little bigger and accept the fact that some points will fall on it and some won't. So which one am I nearest? You might even include a distance to one of those points as part of your feature vector, that's just a real simple computation that Earth Engine can do to help the classifier understand it's not near one of these points and so maybe it's less valid. But the remote sensing answer would be Kriging. It's just whether or not that's a thing that you think is viable. AUDIENCE: Cool, thanks. I actually had one more if that's OK. NOEL GORELICK: Sure. AUDIENCE: It's gone from my brain now. Now I got to think about it. Oh, OK, so something I'm trying to-- like having a tough time wrapping my head around with some of these things is we're classifying over space but you're talking a lot about looking over time periods and stuff like when you were doing the NDVI and taking the percentiles or doing seasonal and stuff like that. How do you extend one of these models over time as well as doing a classification in space? That's something I'm still not totally understanding. Like I guess the code process for that? NOEL GORELICK: The reason to include temporal information in your classification is twofold. One is you think that there is additional signal in the temporality, so phenology basically. If you're classifying on an urban area additional points over time don't really help except, depending on where you're working, some of those points might be cloudy or hazy or smoggy or something like that. And so the additional temporality lets you try to pick a best representative point out of some of that data or include some of the time variance in that data. So that's why you do the temporality. Now, I think what you're asking for the second half of that is once you've built a classifier you can apply it to anything you want. And so if I build a classifier that works on air quality in spring of 2020, I can go back in time and apply that to 2019, 2018, 2017. And so the built classifier is built with data that you have and then you apply it back on places where you don't have that classification. So is that kind of what you were asking for? AUDIENCE: No, that's perfect. Thanks so much. SPEAKER 4: We're just going to make sure that the questions from virtual-- we're going to make sure the questions from virtual get-- EMILY SCHECHTER: Great. SPEAKER 4: --asked here. Is there any chance of having a neural network predict function for TF Nets in Earth Engine rather than having to access Vertex? EMILY SCHECHTER: Great question. This is a feature request that we get. And we love to hear it because I think it's really useful for us when we think about the future of what our team builds and prioritization and all of that. Obviously, we have the TensorFlow session after this, which we'll talk about our connection with cloud AI Platform. And I believe they'll probably also talk about the difference between cloud AI Platform and Vertex, and some of the directions we're heading there. But yes, thanks for the request. It's one that we get, so it's useful to hear more voices piping up in favor. Anything to add? SPEAKER 4: Excellent. Thanks. And one more question from online. Is ISO data clustering available in unsupervised classifiers in Earth Engine? NOEL GORELICK: No. ISO data is a global model. It requires all the training data at once. It doesn't work in Earth Engine. It requires all the data at once. It doesn't work in the model that Earth Engine has. So we don't have a version of it. I believe there is now a distributed version of ISO data. We haven't looked at it so no not currently. ZANDER VENTER: Zander Venter from Norwegian Institute for Nature Research. As perhaps relates to an earlier question but I for one am not very experienced with deep learning models. And I was wondering if, in the future, there might be something like EE-neural net in the Code Editor, like small Random Forest for like intermediate step between those who just like myself work with regression or classification tree simpler models. And it sort of feels like a big step up to with the TensorFlow integration, although it's flexible I understand but if you just want something in between to implement like a pre-baked simple neural net and the Code Editor. Yeah, I was wondering if that's on the horizon or whether I shouldn't waste time in-- EMILY SCHECHTER: Yes. Same answer to the one that I just gave before, which was roughly asking for the same thing. I will say that I would love to talk to you more to understand the dimensions in which the TensorFlow with Vertex stuff is either too difficult or too expensive or too many different types of controls or what it is. So that as we think about these things, we can build out what's best for the people who will be using it. But yes, thanks for the request. NOEL GORELICK: Any other questions? SPEAKER 4: There are two more online. One is how dependent is the proportion of samples in the results of a Random Forest classifier? NOEL GORELICK: There is a word missing. SPEAKER 4: Yeah, how dependent-- yeah, how dependent is the accuracy maybe? Is the proportion of samples? Maybe we can clarify that offline? NOEL GORELICK: More samples usually makes for a better classifier. That's I think the best I can do on that answer. SPEAKER 4: Yeah. That makes sense. And in the upcoming session will you provide some rules of thumb regarding when to use a deep learning approach versus Random Forests or other simple classifiers? NOEL GORELICK: You're doing that. EMILY SCHECHTER: Nick, says yes. NOEL GORELICK: Nick says yes. SPEAKER 4: Yes we will. EMILY SCHECHTER: So stick around for six minutes and you'll hear from them on that. NOEL GORELICK: All right, thanks for coming. EMILY SCHECHTER: Thanks so much. [APPLAUSE] [MUSIC PLAYING]
Info
Channel: Google Earth
Views: 5,638
Rating: undefined out of 5
Keywords:
Id: WvaBZbph_cU
Channel Id: undefined
Length: 54min 36sec (3276 seconds)
Published: Fri Oct 21 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.