[MUSIC PLAYING] EMILY SCHECHTER:
I'm Emily Schechter. I am a product manager
on the Earth Engine team. NOEL GORELICK:
I'm Noel Gorelick, I'm one of the founders
of the Earth Engine team. EMILY SCHECHTER: And
today we're going to be giving an introduction
to machine learning and Earth Engine. So if you've never tried machine
learning before this session is for you because
I'm going to start with the very
basics of what it is and why it's useful
in Earth observation. And if all of that stuff
is a review for you, this session is also
for you because Noel is going to talk
through applications of machine learning with
classifiers in Earth Engine. So first we'll talk
about the introduction to machine learning
and then Noel will go through classification
in practice, techniques, as well as some issues
and limitations with the classifiers that
we have in Earth Engine. So first, introduction
to machine learning, why should machine
learning matter to you? Well, people like
you in organizations all over the world, whether a
researcher, government, NGO, nonprofit, a business, you want
to understand the landscapes around you to make
changes that positively impact those landscapes. And so one way to do that
is by mapping land cover. Land cover is the observed
physical cover of the Earth's surface which describes
the distribution of things like vegetation and water
and human-built areas, and that distribution is a map. So we can take imagery and we
can turn the pixels into a land cover map and say, we'd
like to do this at scale say for maybe the whole
surface of the world for a time series of imagery. When you can develop maps
for a time series of imagery you can start to do things like
understanding deforestation, change in water resources,
changes in cropland, changes in carbon emission. So being able to take a
lot of imagery over time and turning a lot
of those pixels into scaled land cover
maps is really useful. And the way we're going
to talk about creating those maps at scale is
with machine learning. So what is machine learning? It's simply an approach to
making lots of small decisions. So things like is this
email spam yes or no, or maybe how much should I
bid on this auction item? The first of those
examples is classification, where you're predicting
different classes, and the second is
regression, where you're predicting a quantity. And when you're trying
to make a land cover map this is a
classification problem, where you're trying to predict
whether a pixel is maybe vegetation or water
or built urban area, and we'll do this in
our examples later on. So the actual approach for
how we get to that point is very similar to
traditional programming, you always have
some information, you have some code, a recipe
that produces an answer. So when we talk about
traditional programming, have an engineer
that's writing rules for how you get from the
information to the answer. But when we talk
about machine learning you start at the other end. So you start from your
inputs and outputs and then you use these
inputs and outputs, and that produces
the recipe, which is the machine learning model. So machine learning
is the science of programming computers so
that they can learn from data. So a couple terms
in machine learning, the examples that the
system uses to learn are called training data. The individual predictors
in that data are called features, and then the
model is the recipe that's used to make that decision. And there are so many types
of different ML systems that it's useful to
think about them in terms of broad categories. So one type of
categorization is based on the amount and type
of human supervision they get during training. So in supervised
learning, you know what your outputs these
are called labels, and a typical task is
classification, where for example, I know that I want
my model to spit out classes of yes or no is this
spam or is this urban or water or vegetation
for land classification. Another typical task is
predicting a target value, which is regression. And then in
unsupervised learning, you're asking the model
to actually tell you what those output groups
are, and I might not know how many of those
groups there are. This is a great way to
do exploratory analysis. And an example of
unsupervised learning is clustering, which detects
groups of similar inputs. And for completeness,
there's also semi-supervised learning with
partially labeled data, and there's also reinforcement
learning where an agent chooses what to do in a
certain situation but we won't go into those
in too much depth yet. So for today's
example, we're going to focus on supervised
learning for that land cover classification example that
I talked about earlier. So here is a simplified view of
the machine learning workflow, and I'll go through each of
these steps using that example of land cover classification. So first you decide on the
inputs and the outputs, what's going to be my
starting information, and what will my
answer look like. Next, you gather that starting
information for your inputs, and we call this
gathering training data. Then you're going to select
your model or the type of recipe that the machine
will use to determine the relationship between the
input data and the outputs. We then use the training
data to train the model and then apply that data-- apply that model
rather, to new data, which is called predicting,
and then you see how you did. Many of you probably know
that there are really more steps hidden in here
to do things like split your training data and
to tune your model, but let's start here just to
all get on the same page first. So first, let's take that
land cover problem and decide on inputs and outputs. So let's say we're working
with satellite imagery, here we're working with a
Landsat composite image. And let's decide that
what we want to do is classify every input pixel
into an output of three classes vegetation, water, or urban. And let's say here
in Landsat we decide we're using six spectral bands. So for each pixel, we look
at the reflectance value in those six bands,
and then our model will tell us
whether the pixel is vegetation or water, or urban. So the next thing to do
is to get some training data to work with. So to get our training data,
we need to identify the places, in this case, the pixels that
we're grabbing data from. One way to do this is
using the drawing tools in the Earth Engine code
editor to handpick points where we'll grab the values
for the training data. So you can see here, this
is in the Earth Engine code editor I picked a bunch of
points that I think are water, some that I think
are vegetation, and some that I think are urban. And then at each
of these points, we'll grab the six
bands of reflectance, put that into a table and that
becomes our training data. Now, this might
not be the best way because I'm handpicking
the training data points, and we'll talk about that
a little bit more later. So now we get to select
a model to work from. So let's say for the ease
of drawing a graph here I only had two
classes, let's say red's urban, green's vegetation,
and I wanted to create a model. So in this case, you
might imagine just drawing a simple line,
maybe I draw it right here, that's actually an ML algorithm,
it's called a Support Vector Machine, and deciding
where to draw that line is the machine learning algorithm. And this can really
highlight the importance of getting good training data
because where the line is drawn depends on the training data. So there are a bunch of
different types of algorithms, here are a few examples of
supervised learning algorithms. But what all of them are
doing is segmenting your data so that when new data comes
in, it applies the model and tells you what it
thinks the output is. So when it's time
to train the model, the algorithm is
tuning some parameters to fit the model to
the training data. So now we have
our trained model, we're ready to show
it some new data and see what it thinks so that
when a new data point comes in here the model is predicting
that it's going to be in the green vegetation class. So when we go back to
the land cover example, when we take our
trained model and we use it to predict
the land cover class for each pixel of
that Landsat image, we might get something like
this, where each pixel has been classified into vegetation,
urban, or water, represented by the three colors. So we're now left
at our final step, to assess how well we did. And depending on how well
we did we can tune the model and rinse and repeat. Now, there are a
bunch of reasons why things might go
awry since what you're doing with machine learning
is taking some data and selecting a model
to train on that data. Two types of things
that might go wrong are bad data or bad model. So what makes bad data? Well, the system
won't perform well if your training
data is too small or if the data is not
representative, it's noisy, or maybe it's polluted
with irrelevant features. Figuring out a good set
of features to train on is a critical part
of the process called feature engineering. And what makes a bad model? The model needs to be neither
too simple nor too complex, which might result in the model
overfitting or underfitting. So that was your basic class
on what machine learning is. And to show you
what this actually looks like in practice
in Earth Engine I will hand it over to Noel. NOEL GORELICK: Thank you. Hi there. [APPLAUSE] A thing we didn't
get to do at the start, I like to know my
audience a little bit. So how many of you have
already built a classifier in Earth Engine before? Everybody. OK, we'll see how
this talk goes. All right, thanks. OK, we start off the
talk with the thing you're very-- not ever supposed
to do, which is show some code. If you're trying to build a
classifier in Earth Engine, this is more or less all
the steps that you need. And in fact, maybe even
a little more than you need for the absolute basics. Instead of handpicking
a bunch of points, often you'll want to sample
from something like a land cover map. So you might choose 50,000
points from some existing land cover map to build
your training with. And so there's a
number of pieces in Earth Engine around
sampling, and we'll talk about those
more in a minute but sample is one of them. And really, you just give
it a polygon or a square and it goes and gets
every pixel in the square and puts them into a table so
that you can train with it. In this particular
example, we're going to then
slice that table up into two pieces, one
for training and another for doing holdout validation. And once they're in a
table it's quite easy to do that, you just
add a random column and then filter on that column
so that in this case, anything less than 0.7 becomes training,
and anything greater than 0.7 becomes holdout, the random
numbers between 0 and 1. And it's a uniform distribution
so this is a pretty good way to get an almost exact
number of points. If you have 10,000
points, this will give you about 7,000 points in training
but since it's statistical, it could be 6,900, could be
7,100, somewhere around there. You then need to pick
a model and train it. So in this example, we're
making a CART classifier, I'll talk more about
that in just a second. And we're training it
with that training data. And you have to tell it what
property inside the table is your supervised class. So if we sampled a bunch of
points from a land cover map, in this case, those are going
to be stored in a property that I'm pretending
is called actual. I'm pointing at this
monitor you can't see here, I'm sorry, point up there. And then you classify with it. Once you've got a
trained classifier you just apply it to
an image or a table and you can classify
the elements in those as long as they have the same
schema as your training data. Assessment is an important
part of doing a classification. And so the 30% of
data that we held out we're going to
classify, and we're going to get a
confusion matrix out of that, where comparing the
actual numbers that we got out of our land cover map,
versus the predicted numbers that the classifier produced. And then we can get an accuracy
or other types of error metrics on top of this. So these what, 10 lines, seven
lines there, eight lines. Eight lines, not including
loading up a bunch of images to classify on, that's pretty
much it and everything else is just enhancements in how
to make this classifier better and how to make it believable. You can easily make a
classifier that lies to you. And so you want to know whether
your classifier is lying or not. We have a bunch of
different classifiers, this has changed over time
as we've changed libraries that we end up using. For the most part,
people use Random Forest just because it's
easy, you don't have to think about it very much. But if you really want
to squeak the most out of your classifier,
SVM might be the way to do that, Support
Vector Machine, but that requires a lot of knob tuning. The good news is there's
a lot of knobs to tune. The bad news is there's a lot
of knobs to tune and typically straight out of the box
it doesn't work very well. In our library, CART and
Random Forest are basically the same thing,
CART is one tree, and Random Forest
is a bunch of trees. They're a little bit
different but not very. And then we have another
classifier that's commonly used for things like
spectral angle mapping, which is a thing you do with
hyperspectral data frequently, and that's built into the
Minimum Distance Classifier. We'll talk about that
in a little bit more. There are a couple more here
that you use under specialized circumstances. If you're doing species
distribution modeling, we added a Maxent classifier
about a year ago, maybe a little more. And Gradient Tree
Boosting is a version of Tree Classifiers
that has a little kick start at the beginning to
make it run a little faster. Finally, at the
bottom there, there's another classifier
that we have, which is used for loading and saving-- loading saved classifiers. So it's a classifier
called Decision Tree, and there's another one called
Decision Tree Ensemble, which will let you load an existing
CART that you've saved or the Ensemble will load. Hello. Well, I'm done. Time's up. OK, thanks for coming. We'll give those guys a second
to figure out what's going on. It's dark too. You didn't need
to see me anyway. There we go. OK, so a real simple example,
building off of the example Emily had a minute ago,
when you build a CART, and you give it
some training data, it just makes decisions
about one variable at a time, and makes what we'd
call a horizontal split. So in this example, there
were 78 input features, and the most important
decision was whether or not the near-infrared band was
greater than or equal to 0.08. And if it was, then
this classifier said straight away that whatever
it was looking at was class II. And the rest of these nodes
in this little diagram here are nodes in a tree,
and each one of them is a test for-- an if test, and then
the highlighted numbers on the right-hand
side are the output once it's made those decisions. So 78 points went into this
classifier and it made-- well, it made 13 nodes, but
it pruned out some of them that it thought
weren't important. So that classifier
boils down to six if statements that you
could implement outside of the classifier if you really
wanted to or the classifier will apply it for you. The text representation
you see there in the tree-- I keep pointing
at this monitor-- the text representation
in that tree is actually an output you
can get out of Earth Engine once you built the classifier. So each classifier has a
function on it called explain, which will tell you something
about the trained classifier. And so for CART,
it will show you the tree, for Random Forest,
it will show you all the trees. For different classifiers,
there are different elements in the explain vector that will
tell you about that classifier. The classifiers in
Earth Engine work both in classification
mode and regression mode most of the time. So you can actually
specify which way you want a classifier to work. So Random Forest
does classification, it will also do a random
forest of regression trees. So regression classification
are two of our output modes but you can also ask for
a probability mode, which will go and compute how many--
in the case of Random Forest, how many trees voted
for the final output. And so if it was six trees
for and four against, then it's a 60% probability
that the answer that came out was true. This worked pretty
well for a while but then we had some
of our advanced users that really wanted
details about what the classifiers were doing. So we added a couple more modes
called Multiprobability, Raw, and Raw Regression. Multiprobability just
shows you the probability for every class at each point. Raw tells you whatever
the classifier was doing with its
internal representation. So if you have a bunch
of trees, it actually gives you the vote
from each tree. If you have a CART,
there's only-- there is no raw
interpretation but again different classifiers will
give you different answers on the raw part. And Raw Regression
is the same thing but just for regression mode. So in addition to
picking a classifier, you kind of need to
pick what kind of output you want from the classifier. This will help guide
you a little bit. Not all the classifiers
support all the output modes. In fact, the only one that
supports Raw Regression is the Random Forest because
everything else is just an output, there is no
internal representation. This will again help guide you
a little bit through that, and-- yeah, we'll keep going. Two of the classifiers I already
mentioned, one of them-- these are classifiers
we built in-house. Minimum Distance is--
just computes the mean to all of your classes, and
then as you bring in new data, it computes the distance
to those classes. And how that
distance is computed can happen a couple
of different ways, including Spectral
Angle Mapping, which is a popular model. The thing about
Minimum Distance is that it has a mode in regression
but what it actually outputs in that regression mode is the
distance, how far away were you actually from the
class that it picked, and you can use this
to chop off things that are just too far away. One more example, I
mentioned, Decision Tree, this lets you load in existing
trees that you've saved. So if you build
a CART classifier and you call Explain, it
will tell you the tree. You can save that
string and put it back in at a later date
into this classifier and it will continue to run. So if you don't want to
retrain your classifier or you want to be absolutely
certain that the thing that you are using now is the thing
that you built before, you can use this
rather janky method to be able to save
that classifier. Same thing for
Random Forest, you can save the list-- the string
representation of all the trees and load it into the decision
tree ensemble classifier. OK, so a little bit
more about the pieces around the classifiers. In this case, talking about
sampling and training. As Emily pointed out,
this is essentially the schematic
diagram of what we're trying to do in building
our training data. You've got some points either
hand-built or randomly picked, you are assigning
a class to each one of those either manually or from
something like an existing land cover map. And then a set of
covariates that you're sampling at each
of those points, turning it into a table, and
that table becomes our training data. How you do all that depends
on what kind of information you're starting with. So if you have one big
polygon and you just want all the pixels out of it or
some fraction of the pixels out of it, you can just
use sample and be done. It's pretty simple. You can say I want 10% of the
pixels, and in the drawing here you'll see a bunch of yellow
pixels, which are all of them, and then a couple of green ones
interspersed in there which are the just 10% sampling. Sometimes, however,
you're more likely to have a bunch of regions, maybe a
region inside of an urban area and a region inside of a forest. In these cases, you
would do Sample Regions. Sample and Sample Regions are
identical to Reduce Region and Reduce Regions in the way
they work if you're already familiar with Earth Engine. And in this case, I've got three
polygons each with a class, and then as it samples all
the pixels in that polygon it assigns that class to each
pixel and all my pixels come out with a class. Both of these work,
like I said, they're similar to Reduce Region
and Reduce Regions and you can do the
exact same thing they do with those
two tools, but there's some problems with the
process of doing that and we got smarter over time. One of the problems is that,
as you assign random points, many of those points might
not fall on valid pixels. So in this pixel
in this picture, I've picked 1,000 points over
Cuba, turned vertically for you geography nuts, and a
bunch of those pixels fall out in the ocean. And so I'm not getting
1,000 pixels of Cuba, I've just got a box
and I've said give me 1,000 pixels in the box. So Sample works exactly the same
as this second little example here, it creates
1,000 random points and then samples each of
those points if you're using the random sampling part. That doesn't always
work so well. And so another thing
that you can do here to mitigate a little of that is
to just sample all the points and then filter out
some of them at the end so that you're getting close
to the number of points you actually wanted. This gets really tough as you
want exactly 10,000 points to go in your paper,
you don't want 9,750, you want exactly 10,000. And so we built a better
tool for all this. So you can use Sample, you
can use Sample Regions, you understand a little
bit how they work. But in reality, I almost
always use Stratified Sampling, even if I'm only
sampling one class I use Stratified
Sample because it knows what to do with masked pixels. So you say give me 10,000
pixels, it goes through, it looks at every pixel,
it keeps the best 10,000 at any one point
and keeps adding new points into that to make
a random pool that it chooses from. If you've got
multiple classes it does actual stratified
sampling properly. So all the things that we
learned over five or six or seven years of using Earth
Engine and trying to build classifiers we packaged
up into Stratified Sample. So it probably does
what you want better than these other tools. OK, onward. When you run
Stratified Sample you get an output that looks
something like this. In this case, I asked for--
this was not 3,000 points, this was a few hundred points
in each of the classes. And you can see there's
huge swaths of the map where there aren't points
because I don't need them out there, I'm asking for 300
points in each class, once I've got my 300 points or if
my class is really big like this green class, the
points are quite scattered out but in the urban
class, that red class there, there aren't
really thousands of points to choose from. There's maybe a little bit
more than the 300 asked for. And so they're all
quite close together. So that's a thing
to keep in mind, that if you ask
for 10,000 points, and your map just
doesn't have 10,000 points in it for the class
that you've asked for, it's going to give you
the 700 that are there and that's it because there
aren't more points to have. There's an example in this deck
that will take you to a nice-- to this map and
show you nice ways to do stratified sampling
but that leads us to an additional point, which
is if all your points are right next to each other, if you
sample 1,000 points from one field, that's probably
not a good representative set of data. Those points are probably
spatially autocorrelated. So a few months ago,
I wrote a blog post about how to avoid
spatial autocorrelation, it's a much more
complex topic than we have time to get into right
here but essentially, you pick your points on
a really coarse grid and then you sample
those locations on a really fine grid. There's a link to
the blog post and it makes some pretty cool images. OK, a little tiny bit to say
about accuracy assessment, I actually had k-fold
cross-validation in here but it was really ugly
code so I didn't show it, but the thing to talk about
here for our accuracy assessment is that you end up
making a confusion matrix either by
calling confusion matrix on the classifier
or by calling error matrix on a validated table. They both produce a thing
called a confusion matrix but technically they're not,
one's a confusion matrix, one's an error matrix. Once you've got one of
these matrices in hand, you can query it for different
types of accuracy assessment. Accuracy, consumers,
producers, Capa, and I just added F-score last month. So yeah, get your-- do
your accuracy assessment. When I teach this
class, I didn't mention that this is usually
an eight-hour-long class, a whole day-long class. If you've taken EE102 from
me, that's a condensed version that we just kind of
cram into three hours. And so you're
getting an overview of the whole eight-hour
class in 40 minutes. So I don't know if that's
a good thing or a bad thing but when we teach the
class, 90% of what we do is getting to this
point, and then just doing things
to the classifier to increase our accuracy score. And hopefully, these are-- the rest of this talk
is some techniques that you'll be able to use in
the process of making a better classifier. OK, you with me? I know I talk fast. Lots of caffeine today. OK, so some techniques to help
you make a better classifier. These talk about temporal
context, spatial context, and then some object-based
image analysis. Most of the time you're going to
be doing something like a land cover classification
for forest, non-forest, or maybe you do water,
non-water, or different crops, or it doesn't really matter,
what matters is you're going to take something like a
year's worth of data or maybe one image but most
of you are probably working on a year's
worth of data because you have to
worry about clouds, and you're going to
try to figure out how to compress that
data down to something you can classify because
we can't classify data that looks like this. This is one pixel
in one location and there are 40 or so
points in this location but one pixel over there
might be 42 points, and the next pixel
after that, there might be 37 points because
of clouds and the way images align. And so a lot of the energy
in building a good classifier goes into engineering
your feature vector. And one of the
simplest things to do, and people do this
all the time, you'll see lots and lots of Earth
Engine based-publications where people just take a median
of a year's worth of data, and it's kind of
good enough, right? Especially if you're doing
something big and obvious like forest, non-forest,
or water, non-water, right? Where the difference
between the two is a giant jump in whatever
index you're looking on. But we can do better, right? So people do this annual median
all the time it kind of works. But one of the guys we learned
how to do classification from, Matt Hanson, he
uses a method that looks a bit more like this. He does normalizing statistics. So instead of taking one
median across the whole thing, he actually takes percentiles,
10, 25, 50, 75, and 90. And one of the
reasons this is good is that if you happen
to be in a location where there's only three pixels,
these statistics are still valid. They're not great
numbers, but you can still get a tenth percentile
of three numbers. And a 50th percentile
and a 90th percentile. So what you end up with-- what
you need to do classification is feature vectors all
of the same length. And this lets you compress
some of the information about, in this case, this NDVI
time series into five numbers. And this particular
spot that I picked was a forest and so the
90th and 75th percentile are way up there at
the top, and that's a good signal for the classifier
to use for forest-iness. But I think we can actually
do a little better. And so when I teach this
class what I teach well, there's how you do the
simple percentile composite. What I teach is if you're
actually looking for something that involves
phenology, something that changes over the course
of the year, then let's put something in the feature
vector that also changes over the course of the year. And so I teach
seasonal composites. And so we take all the images
from January, or January, February, and March,
and lump them together into one composite. And then April, May,
and June into another, and which months and how
big those seasons should be, maybe you only
want three seasons, maybe you just want
two pieces of the year. That's a thing to play with
but that lets you model the phenology pretty well. It's just four numbers but
that curve kind of looks like you'd expect for a forest
green up and senescence curve and it crunches all these
40-some-odd points down into four numbers per
band that you can then put into the classifier
and be classifying on something that looks and
feels a bit like phenology. If you're doing this for water,
maybe not the right thing, it needs to be something
that changes over time, but also it wouldn't be the
worst thing in the world. It would just be
spreading the same number out over four pieces
of your feature vector. Make sense? The code to do that
is real simple, you just run a calendar
filter on your collection for the months that
you care about, and then shove those
into an image collection called toBands on the result.
People actually reading code. I love people reading
code, that's awesome. OK, onward. A much more
sophisticated example of trying to incorporate
temporal information into your feature
vector is an algorithm that we added a few years ago
called CCDC, continuous change detection and classification,
and essentially it fits harmonics to a
longer time series, and when the time series no
longer matches the harmonics well enough, it switches to
building another harmonic. And you can classify on
these coefficients directly and it works pretty good. I won't say much more than
that, because we're actually doing a session on CCDC tomorrow
where you can come and actually do a hands-on play with a user
interface you can play with. Wednesday at 10:15. And then there are
Earth Engine users that have successfully managed
to do a full double sigmoid phenological model. I'm simply going to cite
the paper because I've never actually used this user's code
but there is code available, they managed to work it out. And as you can see,
it fits pretty good. All those are just a way to
compress your feature vector down into a smaller
number of points so that you can have a fixed
number of points to classify on. Sometimes, however,
what you're looking for is more spatially oriented
than temporally oriented. There may still be some-- once you've got a
composite, you may want to include some
spatial information in it because generally any amount
of spatial information that you can include with your classifier
is going to make it better. This is a paper that I was
reading and I found this quote to be very poignant, it's-- I think this was just someone's
PhD dissertation but-- and a bit of a throwaway quote
in it but I have found it true. Every time I'm teaching this
class on improving your feature vectors and making your
classifier better, throwing any texture in there at all, any
kind of window texture metric generally helps the classifier. Not a ton, depending on
what you're looking for, if it's just forest, non-forest,
or water, non-water, doesn't help a ton but it does help,
is a few percentage points. And if you are
looking for something that is object-shaped, round
farms or houses and fields, those sorts of things, then
it can make a big difference. A couple of tools
that we have built into Earth Engine for handling
spatial context, one of them is just NeighborhoodToBands. So if you've got
a three band image and you want a three-by-three
neighborhood out of it, you can just call
NeighborhoodToBands, and it will shove all of that
data into your feature vector. But that can be a lot of data. So in this case,
I've got three bands, nine points in each
band, that's adding 27 elements to my feature vector. I can suddenly have problems
with dimensionality. Random Forest is one
of those things where if you've got hundreds of
elements in your feature vector it may have trouble
figuring out what you mean without a ton of training data. So much more likely to be
useful is a reduced neighborhood operation, where instead of
just shoving all the pixels into the feature
vector, you shove some statistical computation
on top of those pixels into the feature vector. So think of this as
a windowed variance or a windowed mean or
a windowed min-max, or even the percentiles,
10th percentile and 90th percentile
of your window, that's a technique you've
already seen how to do and some amount of
windowed information that will help your
classifier a little bit. These work. I think Hansen uses
variants as part of his thing and a linear
fit to get a slope. But what people typically
do to include texture is actually call our
texture algorithm. The gray level co-occurrence
matrix texture algorithm, also called Haralick,
Haralick textures. The way those work is
they're actually directional, so if you're looking for
things with directionality they're really good for that. It takes inside
of a window, pairs of pixels in each of
the possible directions. So if we've got a
three-by-three window, there's only four
possible directions but it applies those four
directions to every pixel and takes the pixel
in that direction. And computes some
statistics with it, shoves all those into basically
a two-dimensional histogram and then does statistics on
the two-dimensional histogram. Some of the things that come
out of the GLCMTexture metrics are a total energy measure
that's the first one, I don't remember what IDM is,
but there's a contrast entropy correlation and nine others,
there's 14 measures in here. Some of them are
better than others, often it's a case of
just throw them all in, see what comes out as
the most important one, and then throw the rest away. That isn't the best way
to do it but it works. Fast and dirty methods
to get your results. You can actually do this-- by default GLCM will
actually do this for all the possible directions
and average those numbers together, but you can
turn off the averaging and say I just want
the 45-degree numbers, or if you are a window
bigger than three by three you can
do numbers smaller, like 22.5 I think would
be the next number. And so you can get an angle--
a texture metric specifically for an angle that you're
looking for in case you're looking for cows
that are facing north, that's a paper that came out. Turns out cows tend
to line up north/south for reasons no one
quite understands. This is one of the
ways of doing-- measuring if you have
north-facing cows in a bunch of your images. OK, onward. This is another case where
we could do a whole class, in fact, we have
done a whole class on object-based image analysis. The idea behind OBIA is
that if you've got a field, almost all the
pixels in that field have roughly the same spectrum. So you don't need to
classify all of them, you can take one representative
pixel out of that field and move on. And so you take one
representative pixel from each field or each
homogeneous area in your image. And these areas are
called superpixels. So we have an algorithm for
super-pixel generation called SNIC. You drop a bunch
of seeds, it does seed growing to find the
most homogeneous regions, and you end up with an image
that looks a bit like this. And then depending
on what you're looking for in each
of those regions, you can either classify
directly on what you've got, thereby massively reducing
your computational costs or you can compute other
kinds of patch-based metrics that you can then include in
your feature vector as well. I'm on a paper with
folks at ETH in Zurich, where they're doing
forest patch measurements through this method. Because this is a
big topic I'm just going to actually point you
at the presentation and video that we've already done where
we launched them in 2018. OK, the important
part of this, and I think maybe the reason
many of you came, the classification
system in Earth Engine has some significant
limitations. They are generally around things
like memory limits and saving classifiers but also
the amount of control that you can have in Earth
Engine on almost anything is limited. The motto of Earth Engine
is let us handle it so you don't have to. In many cases,
you might actually want to be able to handle
a bit more of the pieces. So I'm going to talk a
little bit about that. The first problem is
a 100-megabyte limit, any training data that you're
going to-- any sampling that you do in Earth Engine
produces a computed table. And any classifier that
you produce in Earth Engine produces a computed object. Those things have to fit
inside of our caching system, and the limit on a cache
element is 100 megabytes. If you want to sample
a billion points, you're not going
to be able to do it in one go in Earth
Engine, you can do it, you're just not going
to be happy about how you have to go about doing it. One of the things you can do is
that you can train a classifier with more than one table. So if you have 300
megabytes of data, you can just split that up into
three chunks or four chunks. In this case, is
an example where we are making one table
per land cover type, and maybe we want a million
points or 10 million points in each of those. And that works out to be
more than 100 megabytes but then you can just train
the classifier four or five, six times to get all that
data into the classifier. And that might work
except often you will end up with a
classifier that is bigger than will fit in the cache. If you have data that is
noisy, I call it crunchy, you might end up with lots
of nodes in your tree, and that object might
end up being bigger than 100 megabytes. And so the next thing you can do
is tune how big of a classifier you're willing to allow. One of the good things and
bad things about Random Forest is that you can
end up with nodes that contain-- that represent
a single training instance. So you put in a million or
10 million training instances and it will keep
sorting them out until it gets to a threshold
where it doesn't make any sense to split any further. But sometimes that threshold
is one training point fell into a leaf on your tree. Often those are not very good. And so maybe you want to limit
the minimum leaf population on your tree to five
nodes or you can just say stop at 10,000
nodes or 100,000 nodes. The correlation between these
numbers and 100 megabytes is really tough to tell
you because it matters on how many features you're
using, how wide they are, whether they're ints
or shorts or doubles, really tough to turn
that into a number. So these are numbers that you're
going to have to play with. If you get an out
of memory error while you're building
your classifier, the two places it can occur
are building your table or building your classifier. If you can print the
size of the table, it wasn't the table because
you can't print the size of it if it's too big. So do that first. If your table is good, then
the classifier is too big and this is how you can limit
the size of the classifier. When you're saving
classifiers right now the only way to save them is
to make a text representation of them and save out the text. And you can store that
text in cloud storage and then bring that
back in as a blob, and use the blob in decision
tree or decision tree ensemble. It's a little bit of a
roundabout method to do it but at least it exists
and you can do it. We are in the process-- one of the great things
about me talking and maybe why they don't let
me talk so much is I'll tell you about all kinds
of stuff that we're working on. One of the things
we're working on is being able to
save any classifier. It's almost done,
I don't know when it's going to be finished
but we're working on it because this is horrible. I don't like doing it, you
shouldn't like doing it but you can. All right, so the last issue
to talk about here is control. If you are just starting
out with Earth Engine, if you're just making your
first land cover maps, the path that we've carved
probably fits very well. You should be able
to sample, train, and classify in a few seconds
or maybe a few minutes if you've got a lot
of training data. But often you'll work with
that and find edge cases and want more control. And if the other pieces I
haven't already talked about give you the control that
you need to do that with, you may need to
find other options. This happens a lot when you
have too much training data. If you really have
a billion points, doing it through the roundabout
method that I demonstrated is probably not
your best choice. You might want to use
one of the cloud products for doing a larger-scale
classification or download that data and
build your own classifier. The models that you can
build in Earth Engine, you've kind of seen them all. There are not CNNs
for instance, they're not deep learning methods
built into Earth Engine. If you want a model that's
more complex than Random Forest or Support
Vector Machine, or Gradient Tree Boosts, again,
one of our cloud-based AI platforms might be
a better choice. And finally, the
cloud-based AI platforms are popular because they have
lots and lots of features. If you want a training, what is
the curve where you know how-- accuracy versus
training curve, I don't remember what
those are called, that's tough to
make in Earth Engine because you have to train a
new classifier for each point, et cetera, et cetera. It's a one-liner in tools
like Cape and Vertex. So as you become more advanced
with your classifications in Earth Engine, it's
very easy to outgrow what we've built because we're
not trying to do everything, we're trying to do a
certain set of stuff very fast and easy and efficiently. And so the integrations
with our Cloud platforms are not so great just
yet, but we are definitely working on them trying
to make them better. And there's a session
on Tuesday at 2:30 specifically about what we've
got now, and a little bit about what's coming, right? EMILY SCHECHTER:
Right after this. NOEL GORELICK: Right after this. Tuesday, 2:30. All right, so and finally, if
you are an Earth Engine machine learning user, our
user experience folks would like to talk to you. Please check out the QR code. I think it's a
survey, and they'd like to know what you're
doing and how you're doing it. And if you're using
tools we don't have and what are the pain
points and the good points of those sorts of things. And I think that finished
much faster than either of us were expecting. So if there's any
questions, we'd be happy to try and
answer them now. There are microphones
up here on the edges. No questions? There's one. AUDIENCE: Hello,
my name is Kelsi. I'm currently at the
NASA Jet Propulsion Lab in the machine learning
group, so we're trying to work on deriving
scientific understanding from data science,
and haven't really utilized Google Earth Engine
a ton, the group itself. I have a little
bit, particularly for machine learning. So thanks a lot for the session. I'm curious if you
are going to explore, going to include
feature importance or if this is already
something that is a part of it. I'm not sure because again
I'm not very familiar. Particularly,
because when you're trying to extract
scientific analysis out of a machine learning
algorithm, Random Forest, these types of things, feature
importance is very valuable. So I'm just curious
if that's something. NOEL GORELICK: If you run
explain for Random Forest it will give you a feature
importance based on G-score. Which is not exactly
what everybody thinks of when they
want feature importance but it turns into
the same thing. To the best of my
knowledge, that's the only one of the
classifiers we have that comes with an importance
output but yeah, there is one. And it's in Random Forest. AUDIENCE: OK, thanks. AUDIENCE: Hello,
my name is Callum, I'm a contractor
at NASA Goddard. I had a question we're
working right now on downscaling air quality data,
so gridded air quality data. But one of the data
sets that we're interested in incorporating
as a potential feature is point source data. And so if I'm trying to classify
over a grid but some of my data is points, that's not
going to get sampled in a random sampling of points. So I guess, what's the best way
to incorporate data like that into one of these models? NOEL GORELICK: I don't
know, because that is not the kind of science I do. I can tell you some things
you can do and whether or not they're valid
decisions is up to you. One of the things you
can do is you can just use Kriging to try
to make a surface out of all those points. For air quality, I don't
know if that's a good choice. For water, OK, sure
it works great maybe. Another option is to blow
out, buffer those points to be a little bigger and accept the
fact that some points will fall on it and some won't. So which one am I nearest? You might even include a
distance to one of those points as part of your
feature vector, that's just a real simple computation
that Earth Engine can do to help the
classifier understand it's not near one
of these points and so maybe it's less valid. But the remote sensing
answer would be Kriging. It's just whether or
not that's a thing that you think is viable. AUDIENCE: Cool, thanks. I actually had one
more if that's OK. NOEL GORELICK: Sure. AUDIENCE: It's gone
from my brain now. Now I got to think about it. Oh, OK, so something
I'm trying to-- like having a tough time
wrapping my head around with some of these things is
we're classifying over space but you're talking a
lot about looking over time periods and stuff like
when you were doing the NDVI and taking the
percentiles or doing seasonal and stuff like that. How do you extend one of
these models over time as well as doing a
classification in space? That's something I'm still
not totally understanding. Like I guess the code
process for that? NOEL GORELICK: The reason to
include temporal information in your classification
is twofold. One is you think that
there is additional signal in the temporality, so
phenology basically. If you're classifying
on an urban area additional points
over time don't really help except, depending
on where you're working, some of those points
might be cloudy or hazy or smoggy or
something like that. And so the additional
temporality lets you try to pick a best
representative point out of some of that data or include
some of the time variance in that data. So that's why you
do the temporality. Now, I think what you're asking
for the second half of that is once you've built a
classifier you can apply it to anything you want. And so if I build
a classifier that works on air quality in spring
of 2020, I can go back in time and apply that to
2019, 2018, 2017. And so the built classifier is
built with data that you have and then you apply it
back on places where you don't have that classification. So is that kind of what
you were asking for? AUDIENCE: No, that's perfect. Thanks so much. SPEAKER 4: We're just going to
make sure that the questions from virtual-- we're going to make sure the
questions from virtual get-- EMILY SCHECHTER: Great. SPEAKER 4: --asked here. Is there any chance of having a
neural network predict function for TF Nets in Earth
Engine rather than having to access Vertex? EMILY SCHECHTER: Great question. This is a feature
request that we get. And we love to hear it because I
think it's really useful for us when we think about the
future of what our team builds and prioritization
and all of that. Obviously, we have
the TensorFlow session after this, which we'll
talk about our connection with cloud AI Platform. And I believe
they'll probably also talk about the difference
between cloud AI Platform and Vertex, and some of the
directions we're heading there. But yes, thanks for the request. It's one that we get,
so it's useful to hear more voices piping up in favor. Anything to add? SPEAKER 4: Excellent. Thanks. And one more
question from online. Is ISO data clustering available
in unsupervised classifiers in Earth Engine? NOEL GORELICK: No. ISO data is a global model. It requires all the
training data at once. It doesn't work in Earth Engine. It requires all
the data at once. It doesn't work in the
model that Earth Engine has. So we don't have
a version of it. I believe there is
now a distributed version of ISO data. We haven't looked at
it so no not currently. ZANDER VENTER: Zander Venter
from Norwegian Institute for Nature Research. As perhaps relates to an
earlier question but I for one am not very experienced
with deep learning models. And I was wondering
if, in the future, there might be something like
EE-neural net in the Code Editor, like small Random Forest
for like intermediate step between those who
just like myself work with regression
or classification tree simpler models. And it sort of feels
like a big step up to with the TensorFlow
integration, although it's flexible I understand but
if you just want something in between to implement like
a pre-baked simple neural net and the Code Editor. Yeah, I was wondering
if that's on the horizon or whether I shouldn't
waste time in-- EMILY SCHECHTER: Yes. Same answer to the
one that I just gave before, which was roughly
asking for the same thing. I will say that I
would love to talk to you more to understand
the dimensions in which the TensorFlow with Vertex
stuff is either too difficult or too expensive or too many
different types of controls or what it is. So that as we think
about these things, we can build out what's
best for the people who will be using it. But yes, thanks for the request. NOEL GORELICK: Any
other questions? SPEAKER 4: There
are two more online. One is how dependent
is the proportion of samples in the results of
a Random Forest classifier? NOEL GORELICK: There
is a word missing. SPEAKER 4: Yeah, how dependent-- yeah, how dependent
is the accuracy maybe? Is the proportion of samples? Maybe we can clarify
that offline? NOEL GORELICK: More
samples usually makes for a better classifier. That's I think the best
I can do on that answer. SPEAKER 4: Yeah. That makes sense. And in the upcoming
session will you provide some rules
of thumb regarding when to use a deep learning
approach versus Random Forests or other simple classifiers? NOEL GORELICK:
You're doing that. EMILY SCHECHTER: Nick, says yes. NOEL GORELICK: Nick says yes. SPEAKER 4: Yes we will. EMILY SCHECHTER: So stick
around for six minutes and you'll hear
from them on that. NOEL GORELICK: All
right, thanks for coming. EMILY SCHECHTER: Thanks so much. [APPLAUSE] [MUSIC PLAYING]