The goal is for you to come away from this
video understanding one of the most important formulas in all of probability, Bayesâ theorem. This formula is central to scientific discovery,
itâs a core tool in machine learning and AI, and itâs even been used for treasure
hunting, when in the 80âs a small team led by Tommy Thompson used Bayesian search tactics
to help uncover a ship that had sunk a century and half earlier carrying what, in todayâs
terms, amounts to $700,000,000 worth of gold. So it's a formula worth understanding. But of course there were multiple different levels of possible understanding. At the simplest thereâs just knowing what each part means, so
you can plug in numbers. Then thereâs understanding why itâs true; and later
Iâm gonna show you a certain diagram thatâs helpful for rediscovering the formula on the fly as
needed. Then thereâs being able to recognize when
you need to use it. With the goal of gaining a deeper understanding,
you and I will tackle these in reverse order. So before dissecting the formula, or explaining
the visual that makes it obvious, Iâd like to tell you about a man named Steve. Listen
carefully. Steve is very shy and withdrawn, invariably
helpful but with very little interest in people or in the world of reality. A meek and tidy
soul, he has a need for order and structure, and a passion for detail. Which of the following do you find more likely:
âSteve is a librarianâ, or âSteve is a farmerâ? Some of you may recognize this as an example
from a study conducted by the psychologists Daniel Kahneman and Amos Tversky, whose Nobel-prize-winning work was popularized in books like âThinking Fast and Slowâ, âThe Undoing Projectâ,
. They researched human judgments, with a frequent focus on when these judgments irrationally contradict what the laws of probability suggest they should be. The example with Steve, the maybe-librarian-maybe-farmer,
illustrates one specific type of irrationality. Or maybe I should say âallegedâ irrationality;
some people debate the conclusion, but more on all that in a moment. According to Kahneman and Tversky, after people are given this description of Steve as âmeek and tidy soulâ, most say he is more likely
to be a librarian than a farmer. After all, these traits line up better with the stereotypical view of a librarian than that of a farmer. And according to Kahneman and Tversky, this is irrational. The point is not whether people hold correct or biased views about the personalities of librarians or farmers, itâs that almost
no one thinks to incorporate information about ratio of farmers to librarians into their
judgments. In their paper, Kahneman and Tversky said that in the US that ratio is about 20
to 1. The numbers I can find for today put it much higher than that, but letâs just
run with the 20 to 1 ratio since itâs a bit easier to illustrate, and proves the point
just as well. To be clear, anyone who is asked this question is not expected to have perfect information on the actual statistics of farmers, librarians,
and their personality traits. But the question is whether people even think to consider this ratio, enough to make a rough estimate. Rationality is not about knowing facts, itâs about recognizing which facts are relevant. If you do think to make this estimate, thereâs a pretty simple way to reason about the question â which, spoiler alert, involves all the
essential reasoning behind Bayesâ theorem. You might start by picturing a representative sample of farmers and librarians, say, 200 farmers and 10 librarians. Then when you hear the meek and tidy soul description, letâs say your gut instinct is that 40% of librarians would fit that description and that 10% of farmers would. That would mean that from your sample, youâd expect that about 4 librarians fit it, and that 20 farmers do. The probability that a random person who fits this description is a librarian is 4/24, or 16.7%. So even if you think a librarian is 4 times
as likely as a farmer to fit this description, thatâs not enough to overcome the fact that
there are way more farmers. The upshot, and this is the key mantra underlying Bayesâ
theorem, is that new evidence should not completely determine your beliefs in a vacuum; it should update prior beliefs. If this line of reasoning makes sense to you, the way seeing evidence restricts the space of possibilities, and the ratio you need to consider after that, then congratulations! You understand the heart of Bayesâ theorem. Maybe the numbers youâd estimate would be a little bit different, but what matters is how you fit the numbers together to update a belief based on evidence. Here, see if you can take a minute to generalize what we just did and write it
down as a formula. The general situation where Bayesâ theorem is relevant is when you have some hypothesis, say that Steve is a librarian, and you see
some evidence, say this verbal description of Steve as a âmeek and tidy soulâ, and
you want to know the probability that the hypothesis holds given that the evidence is
true. In the standard notation, this vertical bar means âgiven thatâ. As in, weâre
restricting our view only to the possibilities where the evidence holds. The first relevant number is the probability
that the hypothesis holds before considering the new evidence. In our example, that was
the 1/21, which came from considering the ratio of farmers to librarians in the general
population. This is known as the prior. After that, we needed to consider the proportion of librarians that fit this description; the probability we would see the evidence given that the hypothesis is true. Again, when you see this vertical bar, it means weâre talking
about a proportion of a limited part of the total space of possibilities, in this cass,
limited to the left slide where the hypothesis holds. In the context of Bayesâ theorem,
this value also has a special name, itâs the âlikelihoodâ. Similarly, we need to know how much of the other side of our space includes the evidence; the probability of seeing the evidence given
that our hypothesis isnât true. This little elbow symbol is commonly used to mean ânotâ in probability. Now remember what our final answer was. The probability that our librarian hypothesis is true given the evidence is the total number of librarians fitting the evidence, 4, divided by the total number of people fitting the
evidence, 24. Where does that 4 come from? Well itâs the
total number of people, times the prior probability of being a librarian, giving us the 10 total
librarians, times the probability that one of those fits the evidence. That same number shows up again in the denominator, but we need to add in the total number of people
times the proportion who are not librarians, times the proportion of those who fit the
evidence, which in our example gave 20. The total number of people in our example,
210, gets canceled out â which of course it should, that was just an arbitrary choice
we made for illustration â leaving us finally with the more abstract representation purely in terms of probabilities. This, my friends, is Bayesâ theorem. You often see this big denominator written
more simply as P(E), the total probability of seeing the evidence. In practice, to calculate it, you almost always have to break it down into the case where the hypothesis is true,
and the one where it isnât. Piling on one final bit of jargon, this final
answer is called the âposteriorâ; itâs your belief about the hypothesis after seeing the evidence. Writing it all out abstractly might seem more complicated than just thinking through the example directly with a representative sample; and yeah, it is! Keep in mind, though, the value of a formula like this is that it lets
you quantify and systematize the idea of changing beliefs. Scientists use this formula when
analyzing the extent to which new data validates or invalidates their models; programmers use it in building artificial intelligence, where you sometimes want to explicitly and numerically model a machineâs belief. And honestly just for how you view yourself, your own opinions and what it takes for your mind to change, Bayesâ theorem can reframe how you think
about thought itself. Putting a formula to it is also all the more important as the examples get more intricate. However you end up writing it, Iâd actually
encourage you not to memorize the formula, but to draw out this diagram as needed. This is sort of the distilled version of thinking with a representative sample where we think with areas instead of counts, which is more
flexible and easier to sketch on the fly. Rather than bringing to mind some specific
number of examples, think of the space of all possibilities as a 1x1 square. Any event
occupies some subset of this space, and the probability of that event can be thought about as the area of that subset. For example, I like to think of the hypothesis as filling
the left part of this square, with a width of P(H). I recognize Iâm being a bit repetitive,
but when you see evidence, the space of possibilities gets restricted. Crucially, that restriction
may not happen evenly between the left and the right. So the new probability for the
hypothesis is the proportion it occupies in this restricted subspace. If you happen to think a farmer is just as
likely to fit the evidence as a librarian, then the proportion doesnât change, which
should make sense. Irrelevant evidence doesnât change your belief. But when these likelihoods are very different, that's when your belief changes a lot. This is actually a good time to step back
and consider a few broader takeaways about how to make probability more intuitive, beyond Bayesâ theorem. First off, thereâs the trick of thinking about a representative sample with a specific number of examples, like our 210 librarians and farmers. Thereâs actually
another Kahneman and Tversky result to this effect, which is interesting enough to interject here. They did an experiment similar to the one
with Steve, but where people were given the following description of a fictitious woman
named Linda: Linda is 31 years old, single, outspoken,
and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. They were then asked what is more likely:
That Linda is a bank teller, or that Linda is a bank teller and is active in the feminist
movement. 85% of participants said the latter is more likely, even though the set of bank
tellers active in the femist movement is a subset of the set of bank tellers! But, whatâs fascinating is that thereâs
a simple way to rephrase the question that dropped this error from 85% to 0. Instead,
if participants are told there are 100 people who fit this description, and asked people
to estimate how many of those 100 are bank tellers, and how many are bank tellers who
are active in the feminist movement, no one makes the error. Everyone correctly assigns a higher number to the first option than to the second. Somehow a phrase like â40 out of 100â
kicks our intuition into gear more effectively than â40%â, much less â0.4â, or abstractly
referencing the idea of something being more or less likely. That said, representative samples donât
easily capture the continuous nature of probability, so turning to area is a nice alternative,
not just because of the continuity, but also because itâs way easier to sketch out while
youâre puzzling over some problem. You see, people often think of probability
as being the study of uncertainty. While that is, of course, how itâs applied in science,
the actual math of probability is really just the math of proportions, where turning to
geometry is exceedingly helpful. I mean, if you look at Bayesâ theorem as
a statement about proportions â proportions of people, of areas, whatever â once you
digest what itâs saying, itâs actually kind of obvious. Both sides tell you to look
at all the cases where the evidence is true, and consider the proportion where the hypothesis is also true. Thatâs it. Thatâs all itâs saying. Whatâs noteworthy is that such a straightforward fact about proportions can become hugely significant for science, AI, and any situation where you
want to quantify belief. Youâll get a better glimpse of this as we get into more examples. But before any more examples, we have some unfinished business with Steve. Some psychologists debate Kahneman and Tverskyâs conclusion, that the rational thing to do is to bring to mind the ratio of farmers to librarians.
They complain that the context is ambiguous. Who is Steve, exactly? Should you expect heâs a randomly sampled American? Or would you be better to assume heâs a friend of these
two psychologists interrogating you? Or perhaps someone youâre personally likely to know? This assumption determines the prior. I, for one, run into many more librarians
in a given month than farmers. And needless to say, the probability of a librarian or
a farmer fitting this description is highly open to interpretation. But for our purposes, understanding the math, notice how any questions worth debating can be pictured in the context of the diagram.
Questions of context shift around the prior, and questions of personalities and stereotypes
shift the relevant likelihoods. All that said, whether or not you buy this
particular experiment the ultimate point that evidence should not determine beliefs, but
update them, is worth tattooing in your mind. Iâm in no position to say whether this does
or doesnât run against natural human intuition, weâll leave that to the psychologists. Whatâs
more interesting to me is how we can reprogram our intuitions to authentically reflect the
implications of math, and bringing to mind the right image can often do just that. This is just one way to visualize Bayesâ
theorem, and Iâd like to share with you another way that can be generalized to cases where you have more possibilities than a simple yes or no for a hypothesis, maybe even a
continuous range of hypotheses. For example, say you want to update your belief about the mass of the earth based on new measurements you take. Weâll also take a glimpse at the
kind of constructs programmers build on top of this formula as you get more sophisticated. All this with the goal of finding that deeper understanding, and all of this in the next
video.
When YouTube notifications fail, Reddit has your black.
3blue1brown all the way!
I think Bayes' theorem is the single piece of math that changed how I view the world the most. It made me realize we're all intuitively Bayesian thinkers, constantly updating our model of the world. Not just at a high level, but also at the level of perception of sensory information.
I like the formulation in terms of the odds ratios, as it gets rid of the normalization factor, i.e. the posterior odds ratio is simply the prior odds ratio times the ratio of likelihoods. Lots of probabilities we think about are far enough from 1/2 that at least intuitively there's no need to normalize.
I'll admit to being misled by the story about Steve. For some reason, I assumed farmers and librarians came in equal numbers. But yeah, there's usually only one library per municipality, which probably only has about 5 librarians, so that's 1 librarian per few thousand people. On the other hand, a farmer can only feed a few hundred people (all very, very roughly). It's more of an indictment on how bad our intuition is at understanding large groups of people. Or how much food one farmer can make.
I'm less impressed by the Linda story. It's more of a language issue than an issue of math or intuition. Language is inherently very ambiguous, which is totally fine because we add more or less context as necessary, according to certain principles subconsciously known to all conversationalists. That's why someone unironically greeting you with "Hello fellow human" is suspicious, as it's the kind of unnecessary information that we always omit. The Linda problem is phrased in a way which flaunts these principles, making an interpretation where B is the right answer not unreasonable. Let me be clear that I'm absolutely in no way contesting that P(A) is always larger than P(A ⧠B). Just that in any conversation or real-life problem our intuition is built for, we usually consider the choice between things like A ⧠B and A ⧠B
Hey guys, does anyone have recommendations of book for intro to probabilities? I'm at my first year as a math major and I have probabilities next semester ( I heard it's one of the toughest classes at my university).
Thanks in advance!
PS: I do have "Probabilities: the Little Numbers that Rule our Lives" by Peter Olofsoon at home rn.
I don't really like the Steve example cause in the given "rational" argument it is implied that the probability of being shown the given evidence is what it would be of you'd take a random farmer or librarian and describe that person. In reality the evidence doesn't describe an actual person that was chosen by random sample, but a constructed description made to both somewhat fit librarians and farmers. People are clever, do realize this and take it into account to some extent.
That said, I do understand that these studies are hard. I was particularly thinking about how people (unconsciously, being clever) use some optimal decision reasoning to pick their answer based on their posterior distribution so it's tough to measure the average posterior distribution from such experiments; abstractly, given p(A) < p(B), A and B disjoint and always A or B, optimal choice with utility +1 for the correct answer and 0 otherwise would say to always choose B. So suppose 80% of people would think Steve being a librarian is just slightly more likely (say P(librarian | evidence) = 60%), it's reasonable (rational even!) that the distribution of answers has 80% or more librarian guesses, not reflecting the typical posterior at all.
I made this a while back: interactive bayes demo. It's like the diagrams in the video except you can move the percentages around yourself. When you change one diagram it will update the diagrams further down the page as well. I hope some of you will find it useful.
Probably this is an unpopular opinion, but this video disappointed me a little bit. When 3b1b announced his series on probability I thought that it was going to cover abstract probability (I mean in measure theoretical terms) and bring it closer to intuition. I'm aware that I am not the target audience of his videos, but I really liked the series on calculus and linear algebra precisely because he was showing how the mathematical machinery in the background works.
he finally did it