The better way to do statistics

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this video is sponsored by brilliant more on that later believe it or not there's drama in statistics not like real drama but like nerd drama this drama is between two approaches for doing statistics one approach is frequentist statistics oh that's going to be really annoying to say the other is beian statistics frequentist statistics gets its name from the word frequency based on the fact that probability is interpreted to be the same as a frequency on the other hand beian statistics gets its name from the use of B theorem from the ban perspective probability represents a degree of belief about an event and this degree can change based on a person's prior knowledge if you've learned statistics before or have watched my past videos then most likely you've learned frequen statistics whether we like it or not it's the dominant form of Statistics taught in schools and anybody who wants to work with data has to know how to do it on the other hand ban methods are often neglected and many students will go on barely interacting with such an important part of the field using myself as an example there's only one optional class for learning beijan statistics at my own University on the other hand you're expected to take 2 years with the frequentist statistics in order to get your degree even though it's not as widespread asan statistics are still used to improve people's lives many clinical trials have used beian methods to get new treatments into the hands of people that really need them some of you might remember some pandemic happening a few years ago companies like fizer and madna were rushing to make a vaccine so that we could finally step away from tracking turnup prices and Animal Crossing Mna used the frequentist approach to show that their vaccine worked but fiser used the beian approach beian statistics is really powerful but at the same time they can be really hard especially for someone who's just starting to learn it given that there's not many classes for it in University it's not often left to the student to pick it up for themselves and take it from someone who studies full-time self-studying isn't always easy so in this video I'm going to give you a crash course on beijan statistics if you're new here I'm Christian and this is very normal a channel for making you better at statistics I tried thinking of a statistics pun to put here but none of them were sufficient let's get started level one base theorem to understand beian statistics you need to understand the theorem that gave it its name Bas theorem originally comes from probability not statistics it's named after Reverend Thomas base who published about this theorem in 1763 base theorem goes like this given two events A and B the probability that both of these events will happen can be written in terms of an intersection this intersectional probability can also be written in terms of a product a product of a so-called conditional probability and a marginal probability in this case I've written as the conditional probability of a given that b happened multiplied by the probability that b happened but this product can also be expressed the same way with the roles of the events reversed since these expressions are equivalent we can form a relationship that links these two conditional probabilities just by moving this marginal probability over we get base theorem that was easy to describe how should we interpret this relationship the key part here is here and here we call this part of Base theorem the prior probability you can think of this event as your prior knowledge that an event will happen it could be something as simple as you subscribing to this channel which you should by the way and this is an updated probability that a will happen given that we saw B happen so we're going from our prior belief that the event will happen to this posterior belief given that some other event B happened Bas theum tells us how to update our degree of belief that this a event will happen if the B event was watching this video then this posterior probability would be the probability you subscrib given that you watch this video if I've done my job right hopefully this probability is higher than before you watched it but what does that mean while I can't control your prior probability of watching this video I can't control these other events involving you watching this video this term is a conditional probability that you watch this video given that you were a subscriber according to base theorem if I want to improve my chances of getting a new person to subscribe I need to increase the probability that my current subscribers watch my videos this termine the denominator is tricky it represents the marginal probability that someone will watch this video there are several different ways that new Watchers will come across my video and it's hard to account for everyone I would need to account for all these different ways that someone might see my thumbnail and watch it's easy to stick this probability in the denominator and call it a day but it's a completely different Beast to actually calculate and spoil alert that's going to be a recurring issue throughout this video for many people they hear about base theorem and their prob Mobility class get a homework question on it and forget all about it maybe some of you in analysis positions had to answer a Bas Duram question for an interview this simple expression inspired an entirely new way of thinking and statistics but in its current form Bas theam doesn't tell us anything about statistics to see where that comes in we need to go a level deeper level two beian statistics Bas theem tells us how to update our beliefs about events happening but statistics is about making useful inferences and predictions from data through models of the world so how do we make the jump from this to this in statistics we deal with data and the inherent problem with data is that it's random randomness makes it harder to learn from the data we statisticians got to get that bag so we assume that this Randomness has a predictable form or structure we describe this structure using a probability distribution function or PDF PDFs are functions which describe which values we're most likely to see which ones are rare and which are impossible we want to learn more about the this PDF because it can tell us about important qualities about the possible values for example we can calculate an average or typical value or we could learn about the range and spread of what values are possible what gets in the way is that PDFs are hard to estimate functions are generally infinite dimensional and no one is paid enough to estimate infinitely many things one strategy for getting the PDF is to approximate it using a parametric family with the parametric family we can get an entire function just by estimating a few values called parameters and I'll use data to denote a general parameter these parameters often represent values we want to know or study about a population the binomial family has a single parameter which we denote as P this P represents the probability that an event will happen for example if a certain vaccine protect someone from a certain virus once we collect the data we try to use it to estimate these parameters and learn more about the world from this estimate this is called statistical inference depending on who you ask they'll say that this be approached from different ways the frequentist will tell you that population parameters are fixed and unknown and you can estimate them with a method like maximum likelihood estimation with infinitely many data this estimate will be theoretically close to the true population value then they'll turn to a hypothesis test with no hypothesis testing you're just saying that the data is unlikely to have come from a particular hypothesis or parameter value that's literally the P value but for a lot of people that's not what they want instead they'll want to know a likely value or a range of likely values for the parameter not what it probably isn't what they want to know is the probability of the parameter value after collecting some data frequentists wish they knew this and some trick themselves into thinking that's what they actually have and this is where the beans come in let's have another look at base theorem instead of looking at two simple events we'll change our notation to deal with two random variables the data that we'll collect which will denote as D and the param value Theta we're working from the beijan perspective now so the parameter is considered to be a random variable this is very different from the frequences view where the parameters thought to be fixed after replacing these simple events with these two random variables we get the version of Bas theorem that's used in beian statistics wow it's the same theorem but the level of difficulty has increased let's walk through it this term is still the prior but it's no longer just a single probability but an entire probability distribution which we call the prior distribution the prior distribution represents our beliefs about which parameter values are likely and unlikely not the data we collect but the parameter as an example let's use the response rate of a binomial distribution depending on the shape of the prior distribution I can reflect different beliefs about this response rate if I have no knowledge about what this value could be then any value is likely as any other and you'd represent this using a completely flat distribution this is called an uninformative prior On The Other Extreme I might be convinced that the success probability is 40% and that it can't possibly be anything else in terms of a probability distribution this belief may look like a sharp Spike at 40% and almost zero everywhere else this is an extreme version of what's called an informative prior in more practical applications you could talk to experts or refer to previously published papers to form a good informative prior this is called eliciting a prior between these two extremes of uninformative and informative there's a spectrum the subjectivity can make some people uncomfortable but the prior is what makes beun statistics so powerful it forces you to consider the probabilities of different parameter values before you collect any data and make these probabilities explicit to others nothing will stop you from conducting an analysis with an extremely opinionated prior but then someone can challenge your findings by challenging your prior which opens up the possibility for discussion on the other hand frequentist analyses totally ignore the prior probability of any hypothesis what frequentists focus on exclusively is this term this conditional probability is what's known as the likelihood the likelihood is the joint probability distribution of the data assuming a particular value for the parameter remember that we often model the data as coming from its own parametric family from the beijan perspective once you condition on a particular parameter you can check How likely the data would have come from this particular value the importance of the likelihood is that it allows the data we observe to influence the shape of the posterior distribution this termin the denominator represents the marginal probability of observing the data to get this probability you have to calculate a difficult integral integrating over all the possible parameter values despite how it looks after the data is actually observed this term is just a number a difficult to calculate number but still a number once you perform all this theoretical computation you finally get the posterior distribution which represents your updated belief about the likely and unlikely values of the parameter after observing the data the posterior distribution is the primary object that beian statisticians interact with like the prior the posterior distribution is also a PDF you can figure out the mean of this posterior distribution and you can calculate a range of values that contain a particular amount of probability like 95% in beian parans this is the credible interval and it's what confidence intervals wish they could be looking back at the fiser paper you can see that they report a 95% credible interval taking a step back we can see that this new version of Base theorem is more complex base theorem tells us that you can get a new probability distribution by multiplying these two functions together and dividing by this complicated integral this calculation is so complicated that we usually try to simplify it the main way we do this is to choose a special prior the likelihood is usually decided by the data but we can choose our prior if chosen carefully it's possible to get an easy expression for the posterior distribution as an example let's consider the case where the data comes from a binomial distribution if we make our prior on the response rate a beta distribution then it can be shown that the posterior distribution will also be a beta distribution with updated parameters you can see that both the prior parameters and the data influence the parameters of the posterior distribution in simpler terms the more successes we see in the data the more the posterior beta shifts towards one the more failures we see the more shifts towards zero the end result is a posterior distribution that is informed by data if the prior produces a posterior that comes from the same family we refer to the prior as a conjugate prior here both the prior and posterior are betas which can be said for a lot of people in my life since the likelihood was binomial we refer to this as a beta binomial model in fact this was the model used by fiser to demonstrate that their covid vaccine worked conjugate priors make things easy but they also constrain how we can represent our beliefs through the prior if we want to go past conjugacy we'll have to dive yet another level deeper into beian statistics and face some scary math level three beyond conjugacy what happens if you try to use a non-conjugate prior well the recurring villain Invasion statistics is this integral you need to calculate to get the posterior distribution see that that's the prior right there for a general prior it's not guaranteed that this integral can be written in what's called an analytical form which is just just a fancy way of saying we can't write an equation for it if we don't know the form of this integral then we also don't know about the form of this posterior like I mentioned earlier if you know what it looks like you can calculate important aspects about a random variable purely from the PDF itself what if I told you we could get all these values based on the probability distribution even if we didn't know what it looked like a probability distribution just describes the randomness in a random variable so what if we Ed samples of this random variable instead if we could generate samples from the posterior distribution then the samples themselves can be used to estimate all these values it's indirect but if you have a lot of data then you can get some pretty accurate estimates it's not just the frequentists who can benefit from asymptotics this is the approach taken by marov Chain Monte Carlo algorithms or in short mcmc algorithms mcmc algorithms construct what's called a Markov chain a markof chain is a sequence of numbers where the value of one number in the chain is dependent only on the value that came before it the elements of this chain form a sample much like a data set what's special about the mcmc algorithms is that they construct this Markov chain such that almost by Magic the probability distributions of the samples are the posterior distribution and mcmc will work even if we don't know the form of the posterior itself another approach that I'm just starting to pickup is variational inference the idea behind variational inference is that we want to approximate the posterior distribution with the another distribution that's close to it but easier to use instead of directly dealing with the difficult integral once you have a good look alike you can estimate all your posterior quantities using this approximation instead that's my best take on it I still have a lot to learn about this technique mcmc and variational inference are examples of Advanced Techniques that you'll need to pick up if you want to be a basian statistician I only gave very brief introductions to both of these topics but I wanted to give you a taste of how far we've gone from this simple little theorem that the Reverend gave us in 1963 conclusion I hope that this video has taught you how to be a little bit more Bean in a frequentist world in the past there was drama between the frequentists and the beans but it's all majorly in the past for all its weaknesses frequen of Statistics have a long track record of achievements and the same can be said for beijan statistics you may even be surprised to hear that there are hybrid methods using both frequentist and beian ideas as the budding status ition I'm always learning new things and I hope that I've helped you update your own priors if you like my content you can stay informed by subscribing to my channel and the channel newsletter in the newsletter I talk about topics I don't usually get a chance to cover on the main Channel like my ongoing research or my video making process everyone starts off with uninformative priors but it doesn't have to stay that way if you want to learn something yourself you can take the initiative to update your knowledge and your beliefs and thanks to the sponsor of this video this process is only getting faster and easier brilliant is an online platform for learning math computer science and data science they offer courses that are updated every month and you can solidify your understanding through interactive exercises firsthand experience in problem solving is the best way to stress test your knowledge and Brilliant mix is a top priority recently I've been working through the math for quantitative Finance course since it's a topic in statistics that I don't know much about but want to know more of to try everything brilant has to offer for free for full 30 days visit brilliant.org very normal or click on the link in the description you'll also get 20% off an annual premium subscription thank you to brilliant for sponsoring this video thanks for watching I'll see you in the next [Music] one
Info
Channel: Very Normal
Views: 181,127
Rating: undefined out of 5
Keywords: biostatistics, statistics
Id: 3jP4H0kjtng
Channel Id: undefined
Length: 17min 24sec (1044 seconds)
Published: Wed Apr 03 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.