Hello, and welcome to this AI Coffee Break! Today, we will talk about a way to better
see how machine learning models are making predictions. This is important because we have more and
more AI models around us. They are increasingly complex and are used
in applications such as healthcare, finance, or to show you ads. But do we know on which parts of my browsing
history some AI based its decision to show me this ad? Or why did a model predict that I have a 40%
chance of having diabetes? This is where interpretability comes in. It helps us understand how these models work
and why they make certain predictions. In this video, we will give a short and high-level
introduction into Shapley Value. They are a method that works for ANY model. We will first show some code, and if you keep
watching, you will see what model interpretation has to do with games. Yes, games. But first, let’s thank our sponsor of today,
AssemblyAI! Just last month, they released the Universal-1
automatic speech recognition (ASR) model! It offers more than 92.5% accuracy with only
30.4 seconds of latency thanks to its effective parallelization during inference. Accuracy is important when it comes to understanding
my eastern European accent pronouncing technical words like “GAN”, or “GPT model”,
or RLHF. Other than accuracy, to me the most useful
part is that it was pre-trained on 12.5M hours of multilingual audio data (that’s ~3 petabytes!!). I personally am most excited about Universal-1’s
code-switching capabilities, namely that it can transcribe different languages in the
same sentence, just look: is so useful for multilingual speakers like
me! Check out Universal-1 yourself in Assembly
AI’s playground. It’s very simple to use. Also, know that AssemblyAI offers two tiers
to use Universal-1. "Best" for the most accurate tier and "Nano"
for the fast, lightweight offering which is less expensive. Nano is perfect for batch processing of audio
that does not need the highest quality of speech-to-text. Check out Assembly AI and their new model
with the link in the description below! Now, back to the video. Imagine you have a model that takes some inputs
such as values for age, the sex, the body mass index and predicts the probability of
diabetes. You want to know how much each of these inputs
contributes to the model's prediction. Shapley values can tell you exactly how much
each input contributes to the model's prediction, of let's say, a 40% probability of diabetes. But unlike other ML interpretability methods,
Shapley Values have numerous advantages. They are model-agnostic, meaning they can
be applied to any model and also to any modality, like text, images,
and so on. And these values are meaningful (unlike those
outputted by methods based on gradients or attention, where the numbers are hard to interpret):
positive values are for features that contribute towards the outcome, while negative values
are for features that try to decrease the outcome. Even better, these values are a fair distribution
of the model's prediction among the input features. Specifically, if we take the value for the
age, add the value for the sex (so we subtract it because we add a negative value), add the
value for BP and BMI, we get the model's prediction ... up to the so-called base rate. The base rate is what the model outputs when
all inputs are zero, but more about this later. So, the overall idea is that we start from
this base value, we add the Shapley values for each input and we get the model's prediction. Now, how does this look like for more complicated
models, such as a LLaMA 2 language model? Let's see how we can use the SHAP library
to interpret the model's predictions. First, we need to install the SHAP library,
then we load the model and the tokenizer, and we define a function that takes a sentence
and returns the model's prediction. We then use the SHAP library to explain the
model's prediction for a given sentence. We can see the input here, the output here
and that the SHAP library returns the Shapley values for each token in the sentence, and
we can use these values to understand how the model makes its predictions. Because this is a language model that predicts
token after token, we get a set of Shapley values corresponding to input tokens for each
predicted token. This is a force plot, where we start from
the base value. This is the probability the model assigns
to the outputted word “studying” when there is absolutely no input to the model. Then, we add the red contributions, subtract
the blue contributions because they are negative), and we get the logit of the model predicting
this output token. Neat! We link to this code in the description below
if you want to play it. There, we minimally modified the shap library
to make it work for modern language models, such as LLaMA 2, so we provide that package
as well. Maybe soon, the SHAP library will support
them by default. Now, let's get a little to the theory behind
the code we've just seen and explain how Shapley Values are computed. Shapley values stem from far before deep learning
was cool, namely from 1953 where Lloyd Shapley was thinking about how to fairly distribute
the winnings of a game among the players. So, let's start with an example game: a one-sided
soccer game, where we have a team of robot players that cooperate and try to shoot as
many goals as they can. Based on how well players do in the game,
we want to reward them appropriately. But how to determine how much each player
contributed towards the outcome? Well, first we need to determine the base
value, so the outcome of the game when nobody is playing. Then we can determine the contribution of
each player by looking at all possible coalitions of players and see how much the outcome changes
when we add a player to the coalition. Then we can reward them accordingly. To get to the formula behind all this,
let's first switch to machine learning. In ML, players are inputs or features, for
example word tokens. The outcome of the game is the model's prediction,
for example the probability that this sentence expresses positive sentiment. The importance of an input is based on how
much it contributed towards this prediction and this is what we want to calculate now. To compute the Shapley value for a player,
let's say this one, we do the following: We look at what the prediction of the model
is when this player is active, versus when it is inactive. Then the so-called marginal contribution of
the player is the difference between these two predictions, which is zero in this case,
because the presence or absence of the token "my" did not change anything. But you know, a player is maybe not that important
because Messi and Ronaldo are on the team, so to really determine the effect of the token
of interest onto the outcome, we need to look at all possible coalitions of players and
see how much the outcome changes when we add and remove the other players from the coalition
as well. Now, for this coalition for token “my”,
we see a marginal contribution. And we do this exhaustively, for all possible
teams, we sum up all marginal contributions, normalize by a factor taking care of the combinatorial
effect, and we get the Shapley value phi for the token of interest. Done. Now we have the Shapley value for the token
“my”, and what we did for that token, we can do for all other tokens in the sentence
to get the Shapley values for all of them. Together, they tell us how much each token
contributed to the model's prediction. They are positive if they contributed towards
increasing this probability, negative when they decreased it, and zero if they did not
change anything. Now maybe you should know, that while Shapley
Values are awesome because they can work for any model and have these wonderful properties,
in practice they do have some problems. Namely, before we said that we need to compute
the number of all possible teams that the token "my" can get, and even in this case
with just 3 possible teammates for "my", the number of possible coalitions is 2 to the
power of 3, so 8. In other words, the number of coalitions grows
exponentially with sequence length!! So, in practice, we need to approximate the
Shapley values with Monte Carlo sampling and compute fewer of them, or as many as needed
to have the first digits of the Shapley values correct (in the same way in which we do not
need to compute all digits of pi). Then the other problem, is that Shapley values
assume that the input features are independent, and that they can be safely put together or
ablated. But in reality, this is not the case, as some
input features are correlated: for example, the word "new york" is composed of two tokens,
and if we form a coalition with just "york", but delete "new", we basically have a degenerate
team, and we have split teammates that should never be split. In practice, it is hard to determine these
correlations and keep tokens together. The shap library, for example, handles this
by first clustering the inputs and for a cluster, it either deletes the entire cluster, or not. Of course, there are lots of extensions that
try to do this even better. Now, this was our short introduction into
the huge topic of ML interpretability and if you want more details, go out and explore. I have a thesis to submit now, so I need go,
but I’ll let Ms. Coffee Bean put a link in the description to a great starting reference
into this topic. See you with our next video. Okay, bye!