Binomial distributions | Probabilities of probabilities, part 1

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

RIP differential equation series

👍︎︎ 199 👤︎︎ u/Connor1736 📅︎︎ Mar 15 2020 🗫︎ replies

Hey I just taught these topics to my prob/stats class recently! Now that we're going online I'll have to send this to them.

👍︎︎ 44 👤︎︎ u/PhoeniXaDc 📅︎︎ Mar 15 2020 🗫︎ replies

Finding a confidence interval would be a good start

👍︎︎ 108 👤︎︎ u/dogmarsh1 📅︎︎ Mar 15 2020 🗫︎ replies

Finally. Grant comes out with another video! I love all his videos

👍︎︎ 82 👤︎︎ u/Munch7 📅︎︎ Mar 15 2020 🗫︎ replies

Heh. (Around 0:03.)

👍︎︎ 23 👤︎︎ u/david 📅︎︎ Mar 15 2020 🗫︎ replies

What I find beautiful in math is that given the necessary amount of prior knowledge, you can re-discover facts like this one on your own (unlike other sciences, where you will at the very least need equipment to perform experiments with).

A few months ago I asked myself this very same question, but I phrased it to myself in terms of YouTube videos - given a certain percentage of likes and assuming the general audience has similar tastes as me, what is the best estimate (given this limited knowledge) for the "true chance" that I'll like the video as well.

I started working out the problem in pretty much the same way as shown and was quite surprised to find a simple formula. My surprise was partially because I first got an answer that was so complicated I nearly abandoned the idea, but when I re-checked my calculations, the result was so short, (L+1)/(L+D+2). When I split the denominator into (L+1+D+1) it looked... so intuitive. Like, I just have to assume there are one more like and one more dislike?

Sure, Laplace found this three centuries ago, but that somehow just makes this more awesome. You can lean both on his authority for this formula and on your valid train of logic at the same time.

👍︎︎ 30 👤︎︎ u/Shamajotsi 📅︎︎ Mar 15 2020 🗫︎ replies

Here's a link to the blog post he mentions at the beginning. Doesn't really mention much, but worth linking so others don't also google it just to be disappointed at its shortness.

👍︎︎ 8 👤︎︎ u/GLukacs_ClassWars 📅︎︎ Mar 15 2020 🗫︎ replies

My rating is supreme for I am all-knowing. Really isn’t that difficult to understand. Simply do as I say and you will ~~survive~~ be ok.

👍︎︎ 3 👤︎︎ u/[deleted] 📅︎︎ Mar 16 2020 🗫︎ replies

My senpai 3blue1brown <3

👍︎︎ 4 👤︎︎ u/Zenalyn 📅︎︎ Mar 15 2020 🗫︎ replies

Captions

You're buying a product online, and you see 3 different sellers. They're all offering that same product, at essentially the same price. One of them has a 100% positive rating, but with only 10 reviews. Another has a 96 positive rating with 50 total reviews. And yet another has a 93% positive rating, but with 200 total reviews. Which one should you buy from? [music] I think we all have this instinct that the more data we see, it gives us more confidence in a given rating. We're a little suspicious of seeing 100% ratings since, more often than not, they come from a tiny number of reviews. Which makes it feel more plausible that things could have gone another way, and given a lower rating. But how do you make that intuition quantitative? What's the rational way to reason about the trade-off here, between the confidence gained from more data, versus the lower absolute percentage? This particular example, is a slight modification from one than John Cook gave in his excellent blog post "A Bayesian view of Amazon Resellers". For you and me, it's a wonderful excuse to dig into a few core topics in probability and statistics. And to really cover these topics properly, it takes time, so what I'm gonna do is break this down into three videos. Where, in this first one, we'll set up a model for the situation and start by talking about the binomial distribution. The second is gonna bring in ideas of Bayesian updating and how to work with probabilities over continuous values. And in the third, we'll look at something known as a Beta distribution and pull up a little Python to analyze the data we have, and come to various different answers depending on what it is you want to optimize. Now, to throw you a bone before we dive into all the math, let me just show you what one of the answers turns out to be, since it's delightfully simple. When you see an online rating, something like this 10/10, you pretend that there were two more reviews. One that was positive and one that's negative. In this case, that means you pretend that it's 11/12, which would give 91.7%. This number is your probability of having a good experience with that seller. So in the case of 50 reviews, where you have 48 positive and 2 negative, you pretend that it's really 49 positive and 3 negative, which would give you 49/52, or 94.2%. That's your probability of having a good experience with the second seller. Playing the same game with our third seller who had 200 reviews, you get 187/202 or 92.6%. So, according this rule, it would mean your best bet is to go with seller number two. This is something known as Laplace's rule of succession - it dates back to the 18th century - and to understand what assumptions are underlying this, and how changing either those assumptions or your underlying goal can change the choice you make, we really do need to go through all the math. So without further ado, let's dive in! [Music] Step 1: How exactly are we modeling the situation and what exactly is it that you want to optimize? One option is to think of each seller as producing random experiences, that are either positive or negative, and that each seller has some kind of constant underlying probability of giving a good experience, what we're gonna call the "success rate," or "s" for short. The whole challenge is that we don't know "s." When we see that first rating of 100% with 10 reviews, that doesn't mean the underlying success rate is 100% It could very well be something like 95%. And just to illustrate, let me run a little simulation where I choose a random number between 0 and 1, and if it's less than 0.95, I'll record it as a positive review otherwise, I'll record it as a negative review. Now do this ten times in a row... And then make ten more... And ten more... And ten more, and so on to get a sense of, what sequences of 10 reviews, for a seller with this success rate, 0.95, would tend to look like. Quite a few of those - around 60% actually - give 10/10. So the data we see seems perfectly plausible if the seller's true success rate was 95%. Or... maybe it's really 90%, or 99%. The whole challenge is that we just don't know. As to the goal, let's say you simply want to maximize your probability of having a positive experience, despite not being sure of the success rate. So, think about this here. We've got many different possible success rates for each seller, any number from zero up to one, and we need to say something about how likely each one of these success rates is, a kind of probability of probabilities. Unlike the more gamified examples like coin flips and die tosses and the sort of things you see in an intro probability class, where you go in assuming a long-run frequency, like 1/2 or 1/6, what we have here is uncertainty about the long-run frequency itself, which is a much more potent kind of doubt. I should also emphasize that this kind of setup is relevant to many, many situations in the real world, where you need to make a judgement about a random process from limited data. For example, let's say that you're setting up a factory producing cars, and in an initial test of 100 cars, 2 of them come out with some kind of problem. If you plan to spin things up to produce a million cars, what are you willing to *confidently* say about how many total cars will have problems that need addressing? It's not like the test *guarantees* that the true error rate is 2%, but what exactly *does* it say? As your first challenge, let me ask you this: if you did magically know the true success rate for a given seller, say it was 95%, how would you compute the probability of seeing, say 10 positive reviews and 0 negative reviews? Or 48 and 2? Or 186 and 14? In other words, what's the probability of seeing the data given an assumed success rate? A moment ago, I showed you something like this with a simulation, generating 10 random reviews. And, with a little programming, you could just do that many times, building up a histogram to get some sense of what this distribution looks like. [music] Likewise, you could simulate sets of 50 reviews and get some sense for how probable it would be to see 48 positive and 2 negative. You see, this is the nice thing about probability; a little programming can almost always let you "cheat" a little and see what the answer is ahead of time by simulating it For example, after a few hundred thousand samples of 50 reviews, assuming the success rate is 95%, it looks like about 26.1% of them would give us this 48/50 review. Luckily, in this case, an exact formula is not bad at all. The probability of seeing exactly 48/50 looks like this. This first term is pronounced "50 choose 48", and it represents the total number of ways that you could take 50 slots and fill out 48 of them. For example, maybe you start with 48 good reviews and end with 2 bad reviews. Or maybe you start with 47 good reviews, and then it goes bad-good-bad and so on. In principle, if you were to enumerate every single possible way of filling 48/50 slots like this, the total number of these patterns is "50 choose 48", which, in this case, works out to be 1225. What do we multiply by this count? Well it's the probability of any one of these patterns. Which is the probability of a single positive review, raised to the 48th [power], times the probability of a single negative review squared. Crucial is that we assume each review is independant of the last, so we can multiply all the probabilities together like this, and with the numbers we have, when you evaluate it, it works out to be 0.261, which matches what we just saw empirically with the simulation. You could also replace this 48 with some other value and compute the probability of seeing any other number of positive reviews. Again, assuming a given success rate. What you're looking at right now, by the way, is known in the business as a binomial distribution, one of THE most fundamental distributions in probability. It comes up whenever you have something like a coin flip, a random event that can go one of two ways, and you repeat it some number of times. And what you want to know is the probability of getting various different totals. For our purposes, this formula gives us the probability of seeing the data, given an assumed success rate, Which, ultimately, we want to somehow use to make judgments about the opposite: the probability of a success rate given the fixed data that we see. These are related, but definitely distinct. To get more in that direction, let's play around with this value of "s" and see what happens as we change it to different numbers between 0 and 1. The binomial distribution that it produces kind of looks like this pile that's centered around whatever "s" times 50 is. The value we care about, the probability of seeing 48 out of 50 reviews, is represented by this highlighted 48th bar. So let's draw a second plot on the bottom representing how that value depends on "s". When "s" is equal to 0.96, that value is as high as it's ever gonna get. And this should kind of makes sense. Because when you look at that review of 96%, it should be most likely if the true underlying success rate was 96%. As "s" increases, it kind of peters out; going to 0 as "s" approaches 1. Since someone with a perfect success rate would never have those 2 negative reviews. Also, as you move to the left, it approaches 0 pretty quickly. By the time you get to "s" = 0.8, getting 48 out of 50 reviews by chance is exceedingly rare. It would happen 1 in a 1000 times. This plot we have on the bottom is a great start to getting a more quantitative description, for which *values* of "s" feel more or less plausible. Written down as a formula, what I want you to remember is that, as a function of the success rate s, the curve looks like some constant times "s" to the number of positive reviews times 1 minus "s" to the number of negative reviews. In principle, if we had more data, like 480 positive reviews and 20 negative reviews, the resulting plot would still be centered around .96, but it would be smaller and more concentrated. A good exercise right now would be to see if you could explain *why* that's the case. There is a lingering question, though, of what to actually *do* with these curves. I mean, our goal is to compute the probability that you have a good experience with this seller. So what do you do? Naively, you might think that probability is 96% since that's where the peak of the graph is, which, in a sense, is the most likely success rate. But think of the example with 10 out of 10 positives. In that case, the whole binomial formula simplifies to be "s" to the power 10. The probability of seeing 10 consecutive good reviews is the probability of seeing *one* of them raised to the 10th. The closer the true success rate is to 1, the higher the probability of seeing a 10 out of 10. Our plot on the bottom only ever increases as "s" approaches 1. But even if "s = 1" is the value that maximizes this probability, surely you wouldn't feel comfortable saying that, you personally, have a 100% probability of a good experience with this seller. Maybe you think that, instead, we should look for some kind of center of mass of this graph and that would absolutely be on the right track. First though, we need to explain how to take *this* expression for the probability of the data we're seeing, given a value of "s", and get the probability *for* a value of "s" (the thing we actually don't know) given the data (the thing we actually know). And that requires us to talk about Bayes rule and also probability density functions. For those, I'll see you in part two. [music]

Info

Channel: 3Blue1Brown

Views: 1,422,120

Rating: 4.9676876 out of 5

Keywords: Mathematics, three blue one brown, 3 blue 1 brown, 3b1b, 3brown1blue, 3 brown 1 blue, three brown one blue, bayes rule, probability

Id: 8idr1WZ1A7Q

Channel Id: undefined

Length: 12min 34sec (754 seconds)

Published: Sun Mar 15 2020