What the Heck is Bayesian Stats ?? : Data Science Basics

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hey everyone welcome back so in this video we're going to be talking about something really interesting but also a topic that it took me a really long time to wrap my head around so i kind of first got introduced to it in undergrad i didn't fully understand why it was useful or what it meant i learned it again in grad school i still didn't fully grasp it but as i've kind of started using it in some of my work and research and started learning some of its merits i've kind of come to realize how it kind of mirrors the way that we as human beings think in general regardless of whether you study stats or not and this topic is called bayesian thinking bayesian reasoning or more generally bayesian statistics so you might have heard that in stats there's kind of something called a frequentist approach to a problem and a bayesian approach to a problem maybe even heard this hype that they're always fighting with each other about what's the right way i don't think it really helps to think like that i think it does help to think about in which situations would either one be better what are the pros and cons of each one so in this video i purely just want to explain to you the pros of thinking in a bayesian way and also some of the cons we're not going to be going over any like high level stats at all we're going to be using a very down to earth problem and hopefully using this example we can kind of start understanding why it would help to think in that way or what are the costs that we pay to think that way so here's the example let's say that you live in a apartment and this apartment just has a bedroom and it has a study so we have two rooms in the apartment and this is you and the problem that you face is that you are always forgetting your cell phone in one of the two rooms before you leave for work in the morning so you either have left your cell phone in the bedroom or you have left it in the study so you always have to go back into your house you use your landline at who even has those anymore but you have a home phone that you use to call your cell phone it starts ringing and you hear a noise from somewhere that noise is either coming from the bedroom or the study but you can't precisely pinpoint it and the natural question is where should i go check should i go to the bedroom to check for my phone or should i go to the study now let's bring in a little bit of data to help us make our decisions so here's the data we'll be using for this problem pretty simple two by two table the rows are whether the phone is actually in the bedroom or the study so b is bedroom and s is study and the columns are n which is the noise so the columns are whether you hear the noise or you do not hear this specific noise so you hear a different points so how to interpret this table for example what does this 15 mean this means that of all the times you've lost your phone in the past year 15 times you have heard the specific noise that you're hearing right now and it's been in the bedroom five times you've heard the specific noise that you're hearing right now but it was actually in the study and the reason for this uncertainty is you know your apartment has like echo to it you might not hear things one day and you might hear things another day so there's some variation some randomness built into this problem 135 times you heard some other noise so tilde n means not this current noise you heard some other noise and it was actually in the bedroom and 15 times you heard some other noise and it was in the study so how do we use these four numbers how do we use this data in order to make some kind of well-founded decision about where i should go check and by the way the column sums are here the row sums are here so there's a total of 170 observations or days worth of your phone being missing in the data let's do approach number one and this is the non-bayesian way and we're going to do this so that we can kind of go from there so approach number one says that i'm going to calculate these two probabilities these two conditional probabilities the first one is the probability that you hear this noise given that your phone is actually in the bedroom so again intuitive this is saying that if my phone was actually in the bedroom what are the chances that i would be hearing this noise right now and we can compute this by looking at the first row so this is all the cases where my phone is actually in the bedroom and the probability i'd be hearing this noise is 15 cases out of those 150 which is 10 so there's a 10 chance that if my phone was in the bedroom i would be hearing this noise right now similarly you can check for yourself that if the phone was actually in the study the probability that i'd be hearing this noise right now is 25 and so based on this data i ask you looking at just these two numbers alone right now where would you go check for the phone and i think most people would say i would go check the study and let's kind of elaborate on that a little bit more to build up this idea of why why are we saying this conclusion so if we look at this conditional probability one of these things is known one of these things is unknown the right hand side is unknown because that's saying where is my phone of course i don't know that for sure but i know that if my phone was in the bedroom then the thing that is known the thing that i am actually literally observing which is the noise would have a 10 chance but if my phone was in the study which is again unknown then the chances of hearing the actual thing i am literally hearing are 25 and the natural conclusion comes from saying that even though i don't know where my phone is i know that one of these possibilities leads to a higher chance of hearing what i'm actually hearing and that's a very natural way to think about things if you've studied some stats before you know what i'm talking about is the maximum likelihood problem we'll talk about at the end of the video but before even assigning a name to it this is basically saying that there is an event that happened in the world what is the most likely state of this unknown quantity that would produce that event in the world okay so if that's a little fuzzy take a second to think about it but we're just saying that there's some unknown event but that unknown event generated some known data or known event which is the noise what is the most likely scenario for that unknown quantity and looking at this alone we would say that it is in the study because that has a 25 chance versus a 10 chance so using approach number one we go check the study and see if the phone is in there okay seems pretty sound now let's think about this in a different way and at first it seems like i'm not doing anything crazy maybe we should get the same conclusion because what i'm doing is just taking these two probabilities these two conditional probabilities and reversing their orders so notice here i had probability of the noise given bedroom here i'm trying to get the probability of the bedroom given the noise here i had probability of noise given it's in the study here i'm just trying to get the reverse the probability that it's in the study given that i hear the noise so am i really doing anything different well it turns out i am doing something rather different and we will go over the crux of this difference in just a moment but first let's calculate the exact numbers and see what conclusion we get here so before i calculate the numbers this is saying that what's the probability that my phone is in the bedroom given that i hear this noise so this is asking a slightly different question this is asking that given that i heard this noise given this observed data that i have just gotten given that what's the probability that my phone is in the bedroom versus what's the probability that my phone is in the study now this quantity easy to compute again from our table we just look at the column which is n so 15 times it's in the bedroom and five times it's in the study and the total times is 20. so that's why i have 15 numerator here 5 numerator here and these are my probabilities notice one quick thing these two need to add up to 100 because i'm conditioning on the same thing and looking at all the possibilities for the item in the front so these need to add up to 100 but these probabilities don't need to add up to 100 because i'm conditioning on different things so there's no reason for them to necessarily add up to 100 okay but looking at this data alone where would you go check it seems kind of paradoxical because now it seems like i should go check the bedroom right because what this is saying is that if i hear that noise so that's a given i've heard the noise given that i heard the noise what is the most likely place for my phone to be that also seems like a natural question just like this other one was a natural question but even though they're both natural questions they lead to two different conclusions this one says that there's a 75 chance that my phone is in the bedroom therefore i should check the bedroom so on one hand it seems like how can this be how can i just flip the order of these conditional probabilities and i get a completely different result to kind of understand that and understand why this is not a paradox in any way let's take these two quantities and kind of break them up into their parts and see how the second approach that we're doing is actually very intimately related to the first approach that we were doing so that's what i'm doing down here so i'm saying that these are my two probabilities again so these two same as these two now let's just look at how we can split up this probability here the probability that my phone is in the bedroom given that i hear the noise so this is just going back to your probability 101 course but this is equal to the probability of my phone being in the bedroom and i hear the noise divided by the probability that i hear the noise so this follows from literally the definition of what a conditional probability is so i put the thing that's after the bar in the denominator because i'm saying that out of all these outcomes what's the probability that i also have this outcome so definition of conditional probability now i'll leave the denominator as it is but let's process the numerator just one step further so we're asking what's the probability that my phone is in the bedroom and i hear the noise so that's going to be this cell here 15 divided by the total number of observations which is 170. now a different way to get that same result that same probability would be to do it in a two-step fashion i could first say what's the probability that my phone is in the bedroom and then i could say given that event what's the probability that i hear the noise that's exactly the way i've split it up here taken this numerator split it up as probability of bedroom times probability of noise given bedroom so very important that you understand how i got from here to this form this yes this is just called bayes theorem but i wanted to work it out from first principles in case anyone was unfamiliar or if you don't know why bayes theorem is true and so we can work out the one on the bottom in a very similar way so now we're just trying to figure out which of these is bigger is this quantity bigger or is this quantity bigger notice their denominators are the same so we don't even need to consider them we just need to consider which numerator is bigger so now we're kind of deciding between is this quantity which is the numerator here is that bigger or is this quantity which is the numerator here is that bigger before we actually work out the numerical here let's take a look at where have we seen these quantities before so probability of noise given bedroom that was the quantity we computed right here this 10 so we see that even though we are working in the context of approach two we have started getting quantities that we got in the approach one and we see that again by looking at probability of noise given study that's the exact quantity we computed here 25 so now we start seeing the link between these but that's not the only thing that are in these formulas there's two additional terms and this is what leads to the heart of the difference between approach number one which was not bayesian and approach number two which is beijing we get this first leading quantity here of probability of bedroom let's think about what this means this is saying that not conditional anything i haven't even thought about the noise i'm hearing just if i walk in the door to my apartment i don't even call the phone yet what are the chances that it's in the bedroom so this is kind of my prior belief i'm using that word prior very much on purpose but we'll get to that in just a second but this is my prior belief about where my phone is based on past data this is saying that if i just walk in the door i don't even call my cell phone yet what is the probability my phone is in the bedroom without observing any data and that's probability of b similarly what's the probability my phone is in the study as soon as i walk in the door haven't observed any data and that's probability of s and we can actually get numbers for these right what's the probability of my phone is in the bedroom that happens 150 times out of 170 so that's about 88 what's the probability my phone is in the study that has to just be the complement of that because they add up to one and that's 12 if i multiply those by these two numbers which again comes straight from approach number one then i get point zero eight eight on the top and point zero three on the bottom and that explains numerically again why i chose to go with the bedroom in this case because .088 is higher than .03 but let's try to frame this further in terms of a story because we're almost there all the parts are here we just need to link them together into a narrative what does approach number two add on top of approach number one so let's look at this top one here this is saying that i walk in the door and i haven't even called my phone and i know that there's a 88 chance that my phone is in the bedroom and a 12 chance that my phone is in the study so if i were to stop here at just these prior beliefs i would go ahead and check the bedroom for sure by far 88 is much bigger than 12 percent but i don't stop there i observe some data about the world which is that i call my phone and i observe the noise i hear the noise that comes back the purpose that that serves is to update these prior beliefs so that's what i'm going to write here so the sense in which we update the prior beliefs is that we say okay now i call my phone now given that my phone was in the bedroom the probability of hearing that noise would be only 10 so even though there's a very high chance of my phone being in the bedroom before hearing the noise hearing this noise causes us to apply this 10 factor onto this 88 prior what does it do to our other situation so even though there's only a 12 chance that my phone is in the study as soon as i walk in before i call the phone or anything after i call the phone i go ahead and update those beliefs by multiplying this factor of 0.25 and notice this is telling the full story because 0.25 is bigger than 0.1 so even though to begin with there's only a 12 chance of my phone being in the study we're saying that under the assumption the phone is in the study probability that i would hear that noise is actually much higher than the probability of hearing the noise if my phone was in the bedroom so another way to think about it is that i am decreasing this prior by a smaller amount than i am decreasing this prior so even though this was 88 i'm only taking 10 percent of that going forward versus this was 12 but i'm taking a whole 25 of it going forward and the conclusion we get is that we should still check the bedroom but notice that .088 and .03 are only a magnitude of about three apart so about triple but if we didn't observe the data at all the magnitude of difference between 12 percent and 88 is about a factor of seven so even though the conclusion didn't change we got a lot closer to thinking that the phone was in the study and now we truly learn the difference between these two methods approach number one does not take into account at all the pure probability or the prior probability that the phone was in either of these rooms it treats those as equal and that may or may not be a good assumption in this case we see it's not a good assumption there's a much higher chance prior the phone would be in the bedroom than the study but in some situations maybe it is a good assumption in which case both of these approaches arrive at the same exact answer so approach number two in a sense builds upon approach number one by considering actually what is the prior probability of this event and then updates that prior probability after learning about new data so let me give a couple of labels to all the probabilities that we've been talking about and then we'll talk about the pros and cons of bayesian reasoning so probability of b probability of s as i've been calling them this whole time are called prior probabilities or just priors so this is our assumption about the world before learning any data probability of n given b and probability of n given s which were the first ones we computed over here are called likelihoods so this is saying that what is the likelihood of seeing this data in the real world if this unknown thing is true and finally the probability we computed in approach number two which was probability of b given n and probability of s given n are called posteriors which means that given that i have my prior after i incorporate information about the real world which is likelihoods this is my posterior probability and so again i just want to hit this point home make sure it's not unclear in any way the heart of bayesian reasoning if i had to sum it up in one phrase is that prior beliefs about the world get updated as we learn new data so we have some initial understanding of the world we learn data we learn data we learn data and then we update these beliefs accordingly and personally i think this is the biggest pro of beijing thinking because this is how human beings reason about the world anyways for example let's take a situation of looking for a job before you even go to the interview for the job you have some kind of prior belief that i'll get the job maybe it's like 50 then you go to the job you go to the interview and you get some kind of positive feedback from your interviewer that is going to change that prior belief now the probability of job given positive feedback again job is still unknown whether you got it or not but the positive feedback is data that you did observe that probably goes up maybe now your probability of thinking you got the job given the positive feedback is 75 then let's say you wait a week and don't hear anything back at all now you adjust your beliefs again now probability of job given positive feedback but you haven't heard back for a week maybe that's down to 60 so this is how we think about the world we kind of have an assumption about what's the probability of something happening but then as we observe events as we observe data we kind of update that in our minds so i think that's the biggest pro of bayesian thinking now let's finish this video by talking about the biggest con of bayesian reasoning which i've kind of been glancing over and that biggest con is exactly how you construct these prior beliefs so think about these probabilities which ones are the easiest to get looking at the likelihoods this is the probability of hearing the noise given i left the phone in a certain room i can actually just do a simulation in one evening i can just leave the phone in the bedroom in several places i can leave the room in the study in several places and then call it and observe what's the probability of hearing the noise pretty easy to do i could just take an afternoon or an evening and do that what's harder to get is the priors which is the probability that i do leave my phone in the bedroom or i do leave my phone in the study because i can't simulate that i can't simulate for getting something i just have to you know wait enough days where i forgot the phone naturally and then fill in my table here so the biggest criticism i would say and you know sometimes it is valid evasion thinking is that where do your priors come from how did you get them sometimes prior probabilities are really expensive to get you have to wait a really long time in order to get them and sometimes they're impossible to get sometimes you just don't have a good idea about the prior probabilities therefore statisticians and data scientists will assign some kind of mathematically friendly prior but again it's not really well motivated sometimes it's like okay that has good theoretical properties but is it true and if the answer is no then all the work you're going to do going forward is pretty much wrong because your prior assumption was wrong in that case you might as well have been better off just assigning a uniform probability to every single member of your prior which arrives you back again at the approach number one which is maximum likelihood so again in a nutshell bayesian thinking very interesting i really come to appreciate it more as i've learned more and i didn't really appreciate it much in the beginning but biggest pro it kind of simulates the way human beings think anyways about the world biggest con is that it's not always intuitive not always easy not always correct the way we define these priors okay so if you have any questions at all please let me know leave them in the comments like and subscribe for more videos just like this and i'll see you next time
Info
Channel: ritvikmath
Views: 7,713
Rating: 5 out of 5
Keywords: data science, machine learning, bayes, statistics, math
Id: -1dYY43DRMA
Channel Id: undefined
Length: 20min 30sec (1230 seconds)
Published: Mon Feb 08 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.