Bayes theorem

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

When YouTube notifications fail, Reddit has your black.

3blue1brown all the way!

👍︎︎ 75 👤︎︎ u/aliveButNotReally 📅︎︎ Dec 22 2019 🗫︎ replies

I think Bayes' theorem is the single piece of math that changed how I view the world the most. It made me realize we're all intuitively Bayesian thinkers, constantly updating our model of the world. Not just at a high level, but also at the level of perception of sensory information.

I like the formulation in terms of the odds ratios, as it gets rid of the normalization factor, i.e. the posterior odds ratio is simply the prior odds ratio times the ratio of likelihoods. Lots of probabilities we think about are far enough from 1/2 that at least intuitively there's no need to normalize.

I'll admit to being misled by the story about Steve. For some reason, I assumed farmers and librarians came in equal numbers. But yeah, there's usually only one library per municipality, which probably only has about 5 librarians, so that's 1 librarian per few thousand people. On the other hand, a farmer can only feed a few hundred people (all very, very roughly). It's more of an indictment on how bad our intuition is at understanding large groups of people. Or how much food one farmer can make.

I'm less impressed by the Linda story. It's more of a language issue than an issue of math or intuition. Language is inherently very ambiguous, which is totally fine because we add more or less context as necessary, according to certain principles subconsciously known to all conversationalists. That's why someone unironically greeting you with "Hello fellow human" is suspicious, as it's the kind of unnecessary information that we always omit. The Linda problem is phrased in a way which flaunts these principles, making an interpretation where B is the right answer not unreasonable. Let me be clear that I'm absolutely in no way contesting that P(A) is always larger than P(A ∧ B). Just that in any conversation or real-life problem our intuition is built for, we usually consider the choice between things like A ∧ B and A ∧ ¬B

👍︎︎ 27 👤︎︎ u/kmmeerts 📅︎︎ Dec 22 2019 🗫︎ replies

Hey guys, does anyone have recommendations of book for intro to probabilities? I'm at my first year as a math major and I have probabilities next semester ( I heard it's one of the toughest classes at my university).

Thanks in advance!

PS: I do have "Probabilities: the Little Numbers that Rule our Lives" by Peter Olofsoon at home rn.

👍︎︎ 4 👤︎︎ u/carbsarelifelily 📅︎︎ Dec 23 2019 🗫︎ replies

I don't really like the Steve example cause in the given "rational" argument it is implied that the probability of being shown the given evidence is what it would be of you'd take a random farmer or librarian and describe that person. In reality the evidence doesn't describe an actual person that was chosen by random sample, but a constructed description made to both somewhat fit librarians and farmers. People are clever, do realize this and take it into account to some extent.

That said, I do understand that these studies are hard. I was particularly thinking about how people (unconsciously, being clever) use some optimal decision reasoning to pick their answer based on their posterior distribution so it's tough to measure the average posterior distribution from such experiments; abstractly, given p(A) < p(B), A and B disjoint and always A or B, optimal choice with utility +1 for the correct answer and 0 otherwise would say to always choose B. So suppose 80% of people would think Steve being a librarian is just slightly more likely (say P(librarian | evidence) = 60%), it's reasonable (rational even!) that the distribution of answers has 80% or more librarian guesses, not reflecting the typical posterior at all.

👍︎︎ 5 👤︎︎ u/FaiIsnaiI 📅︎︎ Dec 23 2019 🗫︎ replies

I made this a while back: interactive bayes demo. It's like the diagrams in the video except you can move the percentages around yourself. When you change one diagram it will update the diagrams further down the page as well. I hope some of you will find it useful.

👍︎︎ 5 👤︎︎ u/malongfanzhendong 📅︎︎ Dec 23 2019 🗫︎ replies

Probably this is an unpopular opinion, but this video disappointed me a little bit. When 3b1b announced his series on probability I thought that it was going to cover abstract probability (I mean in measure theoretical terms) and bring it closer to intuition. I'm aware that I am not the target audience of his videos, but I really liked the series on calculus and linear algebra precisely because he was showing how the mathematical machinery in the background works.

👍︎︎ 18 👤︎︎ u/edelopo 📅︎︎ Dec 22 2019 🗫︎ replies

he finally did it

👍︎︎ 1 👤︎︎ u/Lastrevio 📅︎︎ Dec 22 2019 🗫︎ replies

Captions

The goal is for you to come away from this video understanding one of the most important formulas in all of probability, Bayes’ theorem. This formula is central to scientific discovery, it’s a core tool in machine learning and AI, and it’s even been used for treasure hunting, when in the 80’s a small team led by Tommy Thompson used Bayesian search tactics to help uncover a ship that had sunk a century and half earlier carrying what, in today’s terms, amounts to $700,000,000 worth of gold. So it's a formula worth understanding. But of course there were multiple different levels of possible understanding. At the simplest there’s just knowing what each part means, so you can plug in numbers. Then there’s understanding why it’s true; and later I’m gonna show you a certain diagram that’s helpful for rediscovering the formula on the fly as needed. Then there’s being able to recognize when you need to use it. With the goal of gaining a deeper understanding, you and I will tackle these in reverse order. So before dissecting the formula, or explaining the visual that makes it obvious, I’d like to tell you about a man named Steve. Listen carefully. Steve is very shy and withdrawn, invariably helpful but with very little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail. Which of the following do you find more likely: “Steve is a librarian”, or “Steve is a farmer”? Some of you may recognize this as an example from a study conducted by the psychologists Daniel Kahneman and Amos Tversky, whose Nobel-prize-winning work was popularized in books like “Thinking Fast and Slow”, “The Undoing Project”, . They researched human judgments, with a frequent focus on when these judgments irrationally contradict what the laws of probability suggest they should be. The example with Steve, the maybe-librarian-maybe-farmer, illustrates one specific type of irrationality. Or maybe I should say “alleged” irrationality; some people debate the conclusion, but more on all that in a moment. According to Kahneman and Tversky, after people are given this description of Steve as “meek and tidy soul”, most say he is more likely to be a librarian than a farmer. After all, these traits line up better with the stereotypical view of a librarian than that of a farmer. And according to Kahneman and Tversky, this is irrational. The point is not whether people hold correct or biased views about the personalities of librarians or farmers, it’s that almost no one thinks to incorporate information about ratio of farmers to librarians into their judgments. In their paper, Kahneman and Tversky said that in the US that ratio is about 20 to 1. The numbers I can find for today put it much higher than that, but let’s just run with the 20 to 1 ratio since it’s a bit easier to illustrate, and proves the point just as well. To be clear, anyone who is asked this question is not expected to have perfect information on the actual statistics of farmers, librarians, and their personality traits. But the question is whether people even think to consider this ratio, enough to make a rough estimate. Rationality is not about knowing facts, it’s about recognizing which facts are relevant. If you do think to make this estimate, there’s a pretty simple way to reason about the question – which, spoiler alert, involves all the essential reasoning behind Bayes’ theorem. You might start by picturing a representative sample of farmers and librarians, say, 200 farmers and 10 librarians. Then when you hear the meek and tidy soul description, let’s say your gut instinct is that 40% of librarians would fit that description and that 10% of farmers would. That would mean that from your sample, you’d expect that about 4 librarians fit it, and that 20 farmers do. The probability that a random person who fits this description is a librarian is 4/24, or 16.7%. So even if you think a librarian is 4 times as likely as a farmer to fit this description, that’s not enough to overcome the fact that there are way more farmers. The upshot, and this is the key mantra underlying Bayes’ theorem, is that new evidence should not completely determine your beliefs in a vacuum; it should update prior beliefs. If this line of reasoning makes sense to you, the way seeing evidence restricts the space of possibilities, and the ratio you need to consider after that, then congratulations! You understand the heart of Bayes’ theorem. Maybe the numbers you’d estimate would be a little bit different, but what matters is how you fit the numbers together to update a belief based on evidence. Here, see if you can take a minute to generalize what we just did and write it down as a formula. The general situation where Bayes’ theorem is relevant is when you have some hypothesis, say that Steve is a librarian, and you see some evidence, say this verbal description of Steve as a “meek and tidy soul”, and you want to know the probability that the hypothesis holds given that the evidence is true. In the standard notation, this vertical bar means “given that”. As in, we’re restricting our view only to the possibilities where the evidence holds. The first relevant number is the probability that the hypothesis holds before considering the new evidence. In our example, that was the 1/21, which came from considering the ratio of farmers to librarians in the general population. This is known as the prior. After that, we needed to consider the proportion of librarians that fit this description; the probability we would see the evidence given that the hypothesis is true. Again, when you see this vertical bar, it means we’re talking about a proportion of a limited part of the total space of possibilities, in this cass, limited to the left slide where the hypothesis holds. In the context of Bayes’ theorem, this value also has a special name, it’s the “likelihood”. Similarly, we need to know how much of the other side of our space includes the evidence; the probability of seeing the evidence given that our hypothesis isn’t true. This little elbow symbol is commonly used to mean “not” in probability. Now remember what our final answer was. The probability that our librarian hypothesis is true given the evidence is the total number of librarians fitting the evidence, 4, divided by the total number of people fitting the evidence, 24. Where does that 4 come from? Well it’s the total number of people, times the prior probability of being a librarian, giving us the 10 total librarians, times the probability that one of those fits the evidence. That same number shows up again in the denominator, but we need to add in the total number of people times the proportion who are not librarians, times the proportion of those who fit the evidence, which in our example gave 20. The total number of people in our example, 210, gets canceled out – which of course it should, that was just an arbitrary choice we made for illustration – leaving us finally with the more abstract representation purely in terms of probabilities. This, my friends, is Bayes’ theorem. You often see this big denominator written more simply as P(E), the total probability of seeing the evidence. In practice, to calculate it, you almost always have to break it down into the case where the hypothesis is true, and the one where it isn’t. Piling on one final bit of jargon, this final answer is called the “posterior”; it’s your belief about the hypothesis after seeing the evidence. Writing it all out abstractly might seem more complicated than just thinking through the example directly with a representative sample; and yeah, it is! Keep in mind, though, the value of a formula like this is that it lets you quantify and systematize the idea of changing beliefs. Scientists use this formula when analyzing the extent to which new data validates or invalidates their models; programmers use it in building artificial intelligence, where you sometimes want to explicitly and numerically model a machine’s belief. And honestly just for how you view yourself, your own opinions and what it takes for your mind to change, Bayes’ theorem can reframe how you think about thought itself. Putting a formula to it is also all the more important as the examples get more intricate. However you end up writing it, I’d actually encourage you not to memorize the formula, but to draw out this diagram as needed. This is sort of the distilled version of thinking with a representative sample where we think with areas instead of counts, which is more flexible and easier to sketch on the fly. Rather than bringing to mind some specific number of examples, think of the space of all possibilities as a 1x1 square. Any event occupies some subset of this space, and the probability of that event can be thought about as the area of that subset. For example, I like to think of the hypothesis as filling the left part of this square, with a width of P(H). I recognize I’m being a bit repetitive, but when you see evidence, the space of possibilities gets restricted. Crucially, that restriction may not happen evenly between the left and the right. So the new probability for the hypothesis is the proportion it occupies in this restricted subspace. If you happen to think a farmer is just as likely to fit the evidence as a librarian, then the proportion doesn’t change, which should make sense. Irrelevant evidence doesn’t change your belief. But when these likelihoods are very different, that's when your belief changes a lot. This is actually a good time to step back and consider a few broader takeaways about how to make probability more intuitive, beyond Bayes’ theorem. First off, there’s the trick of thinking about a representative sample with a specific number of examples, like our 210 librarians and farmers. There’s actually another Kahneman and Tversky result to this effect, which is interesting enough to interject here. They did an experiment similar to the one with Steve, but where people were given the following description of a fictitious woman named Linda: Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. They were then asked what is more likely: That Linda is a bank teller, or that Linda is a bank teller and is active in the feminist movement. 85% of participants said the latter is more likely, even though the set of bank tellers active in the femist movement is a subset of the set of bank tellers! But, what’s fascinating is that there’s a simple way to rephrase the question that dropped this error from 85% to 0. Instead, if participants are told there are 100 people who fit this description, and asked people to estimate how many of those 100 are bank tellers, and how many are bank tellers who are active in the feminist movement, no one makes the error. Everyone correctly assigns a higher number to the first option than to the second. Somehow a phrase like “40 out of 100” kicks our intuition into gear more effectively than “40%”, much less “0.4”, or abstractly referencing the idea of something being more or less likely. That said, representative samples don’t easily capture the continuous nature of probability, so turning to area is a nice alternative, not just because of the continuity, but also because it’s way easier to sketch out while you’re puzzling over some problem. You see, people often think of probability as being the study of uncertainty. While that is, of course, how it’s applied in science, the actual math of probability is really just the math of proportions, where turning to geometry is exceedingly helpful. I mean, if you look at Bayes’ theorem as a statement about proportions – proportions of people, of areas, whatever – once you digest what it’s saying, it’s actually kind of obvious. Both sides tell you to look at all the cases where the evidence is true, and consider the proportion where the hypothesis is also true. That’s it. That’s all it’s saying. What’s noteworthy is that such a straightforward fact about proportions can become hugely significant for science, AI, and any situation where you want to quantify belief. You’ll get a better glimpse of this as we get into more examples. But before any more examples, we have some unfinished business with Steve. Some psychologists debate Kahneman and Tversky’s conclusion, that the rational thing to do is to bring to mind the ratio of farmers to librarians. They complain that the context is ambiguous. Who is Steve, exactly? Should you expect he’s a randomly sampled American? Or would you be better to assume he’s a friend of these two psychologists interrogating you? Or perhaps someone you’re personally likely to know? This assumption determines the prior. I, for one, run into many more librarians in a given month than farmers. And needless to say, the probability of a librarian or a farmer fitting this description is highly open to interpretation. But for our purposes, understanding the math, notice how any questions worth debating can be pictured in the context of the diagram. Questions of context shift around the prior, and questions of personalities and stereotypes shift the relevant likelihoods. All that said, whether or not you buy this particular experiment the ultimate point that evidence should not determine beliefs, but update them, is worth tattooing in your mind. I’m in no position to say whether this does or doesn’t run against natural human intuition, we’ll leave that to the psychologists. What’s more interesting to me is how we can reprogram our intuitions to authentically reflect the implications of math, and bringing to mind the right image can often do just that. This is just one way to visualize Bayes’ theorem, and I’d like to share with you another way that can be generalized to cases where you have more possibilities than a simple yes or no for a hypothesis, maybe even a continuous range of hypotheses. For example, say you want to update your belief about the mass of the earth based on new measurements you take. We’ll also take a glimpse at the kind of constructs programmers build on top of this formula as you get more sophisticated. All this with the goal of finding that deeper understanding, and all of this in the next video.

Info

Channel: 3Blue1Brown

Views: 1,204,081

Rating: 4.9634228 out of 5

Keywords: Mathematics, three blue one brown, 3 blue 1 brown, 3b1b, 3brown1blue, 3 brown 1 blue, three brown one blue

Id: HZGCoVF3YvM

Channel Id: undefined

Length: 15min 45sec (945 seconds)

Published: Sun Dec 22 2019