>> Today on this special
build edition of the AI Show, we get to hear from Marco Ribeiro, Senior Researcher from the
Microsoft Research Team. When you build machine
learning models, it's important to know
what they're doing. Marco, the creator of LIME or Local Interpretable
Model-Agnostic Explanations, will talk about the research, how it works and why it's
important. Make sure you tune in. [MUSIC] >> My name is Marco, I'm a Senior Researcher at
Microsoft Research. Focus on my research
is helping humans interact with machine
learning models meaningfully, that includes
interpretability, for example, I wrote the LIME paper
and a bunch of others. I'll talk a little bit
about LIME in this video, but before that I thought a little
history would be interesting. So LIME was the first paper I
wrote in my PhD and at the time, interpretability wasn't
as hot as it is now, there were almost no papers
coming out on this subject, and this is how I got into it. I was doing an internship at a large tech company trying to
use machine learning for a task, and I trained the model and it worked really well in cross-validation. But whenever we sent
it to user testing, it sucked and it took me forever
to figure out what it was. Through a long time of debugging
and trying to understand, I figured out that my model had learned to distinguish
between how we had collected positive and negative data
rather than picking up on what it should be picking
up in the real tests. So it learned a shortcut, it detected how we collected the data and that didn't translate
into the real-world use case. Now, this experience, there are two things that really bothered me. So the first one was that cross-validation accuracy didn't translate into
real-world performance. Maybe this is obvious
for people who are doing machine learning in the real world, but as a researcher, if you take a machine
learning course, you learn that cross-validation
is a good way to evaluate models. The model has not looked at that data and it works well,
so therefore it's good. But models are really good at picking up quirks or
spurious correlations and datasets and that can really mess up performance when your real data
does not have those quirks. But the thing that
bothered me most was that I could not figure out
what my model was doing. So here I was, PhD student, supposed to be an
expert and it took me the longest time to understand how my model was making predictions. So I came back from my internship, and at the time I was working on
a completely different topic, distributed systems
and machine learning, but I decided that this bothered me enough that
I wanted to work on it. So I looked at the literature to
see what people had done in terms of understanding how models
were making predictions. Then I saw a bit of a gap, so I decided to change
topics and work on that, and that's how I got into
interpretability in general. So why should we bother with
interpretability at all? Interpretability has many uses, sometimes they're even forced
to do it due to regulation. But the use case that's dearest to my heart is the one I
was just talking about. You're training your model and you want to figure out, does it work? Is it doing something sensible? How can I improve it? I think that understanding
what the model is doing really helps all of that. It helps you avoid putting a really bad model into
production because you catch things that don't
make sense ahead of time. Machine-learning models
are really good at a particular kind of over-fitting
that I was just mentioning, picking up on spurious correlations, picking up how we get the data, picking up on things that
will not generalize. If you understand it, if you
can see those things happening, usually it's obvious when you see it, but even when you don't
have situations like that, if you understand why mistakes
are being made by a model, it's easier to figure
out how to fix them. So it's easier to figure out
if you need to get more data, if you need to change
model architecture, how to change it, etc. We had some experiments in the LIME
paper where we had people with no machine-learning expertise picking up or trying to decide
between two models, which one is going to generalize
better and even people with no ML expertise could do
really good model selection. The could also do some
feature engineering, so they could look at what
the model was doing and say, the model should not be doing this, and click out on words. In this case, it was text, words that the model should
not have been using. So even people who have no machine-learning expertise at
all can already do something. How much more people who are actually training these models
and applying them? So the actual developers of models, I think, can gain a lot. So understanding what's going wrong, it already gets you almost half
of the way into fixing them, and I think interpretability
is really useful for that, among many other uses. So I've been talking about LIME, but I didn't even
tell you what it is. I'll just give you
the key ideas behind the technique that we use in the paper and the key
ideas are as follows. First idea is, we're not going to try to explain everything at once. If you have a model that's
really complicated, it's a very hard task to
explain everything it's doing. Let's say I'm trying to predict
risk of defaulting on a loan, it's a complicated problem. So maybe you think like, oh, if credit score is super-high
or super-low, it's easy, just look at credit score, but maybe in the middle it's
complicated and then you realize, oh, it interacts with time. So if your credit score
is slow because you're an immigrant and haven't had
time to open up accounts, but you work at Microsoft and
this is actually a problem that I face as an immigrant getting
a credit card approved, but maybe that should
be taken into account. So you don't just
look at credit score. So if you try to explain, this is just a small example, if you try to explain all of
that at once for a good model, you're going to get into trouble. You're going to hide too much or definitions can be super complicated. So what LIME is going to
do is going to try to explain a single prediction. So let's say that the model thinks
that I, Marco, have low-risk, we want to understand why that is, and not try to explain every single
person in my dataset at once. Then the other idea is that
we're going to treat the model as a black box, and
there's a trade-off. If you treat the
model as a black box, you cannot exploit its internals
and you need to do other things. But there's also a gain that
you can explain any model, and we made that decision of
explaining the model as a black box. But if the model is a black box, how can you understand it? How can you explain? I think the only way
you can do it is by perturbing the example in seeing
what the black box model does. Now, there's a lot of ways you
could perturb the example. There's a lot of things you could do, but we did a very simple thing with
LIME, and this is what we did. Let's say we want to
understand why I'm low-risk, what you do is change
different features. So you change, for example, you say Marco doesn't
work at Microsoft, he's unemployed and then you see, does the risk prediction change? If it changes to high risk, that's a good indication that the model is using the fact
that I work at Microsoft, but you don't want to change
only a feature at a time, like remember the case where
maybe credit score was important, but you have to take into account whether a person is an immigrant or whether their accounts
are new and so on. So maybe I change my job and
my credit history a bit, and maybe I change a lot of other
things at once and you perturb a bunch of different times
and see what the model does. After many perturbations, you
learn a simple explanation, a linear model that
says something like, Microsoft lowers the risk by 0.2, being an immigrant raises
it by 0.05 and etc. So this explanation
would be describing the model for people
who are close to me, people who are similar to me. The explanation has to be really faithful or really good
for people who look like me and maybe for people who look like me and looking at
zip code doesn't matter. So the explanation is not saying that zip code is irrelevant for
the model, it's just saying, for people who are in the
neighborhood of Marco, what matters is where you work, being an immigrant, and so on. Anyway, combining these two, the main idea is this, treat the models as black box, perturb the example you want to explain and then learn an explanation that really describes the
model in that region, in that neighborhood,
instead of trying to explain everything at once. That's a very simplified version
of what LIME is and does. I hope I've given you a
little bit of the background and convinced you to at least
consider interpretability, consider, can it help me to understand why my model
is making predictions? I think it can, and LIME can be a useful tool in your tool belt of interpretability. There's many others as well, but signing off. Thanks. [MUSIC]