Introduction to Bayesian data analysis - Part 2: Why use Bayes?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi again i'm Rasmus boat and welcome to this part 2 of a 3 part introduction to Bayesian data analysis which will go into the why of patient data analysis if you haven't checked out part 1 yet I really recommend you do that first so why use patient data analysis why could it be a useful approach rather than using say classical statistics well I'm going to give you a couple of reasons one reason to use patient data analysis is that you have great flexibility when building models and can focus on that rather than on computational issues now if you've done some Bayesian modeling before this might sound a little bit strange to you as there are often computational issues when you want to fit your model what I mean here is that since there is a very clean separation between specifying and fitting a model in a Bayesian framework you often don't have to focus too much on how your model is going to be computed when you construct it that means that you can focus on what assumptions are reasonable and what information you should use rather than on algorithms when doing the actual modeling and with many good tools that help you fit patient models like Stan Jag's and PI MC there is a good chance that just specifying the model actually is enough if it's not too complicated so let me give you an example of how easy it is to change a Bayesian model while the computation stays the same so this is the CEO of Swedish Krish incorporated and he is telling us that I've come up with a new brilliant way of marketing our Sulman subscription service so I guess we no longer have only one method to advertise soundin subscriptions with and that means it's time to bring back that be in a B testing so remember that method a involves sending out a colorful brochure to advertise the salon subscription service and when marketing try this on 16 randomly selected Danes 6 out of 16 signed up the new method our CEO proposes let's call it method B involves sending out the very same colorful brochure but this time accompanied by a sample frozen Salman and marketing has actually already tried this method on another 16 games and this time 10 out of 16 signed up so what we now want to know is which seems to be the better method ensure there is some evidence that method B is better but how certain or uncertain should we be that this is the case so what we want to do is to specify and fit a Bayesian model that helps us answer these questions this is the model we had before when we just had one advertising method we drew a rate of sign up from one prior and ran a generative model that gave us one simulated data set but now I have two advertising methods but the cool thing here is that all we need to do is to copy and paste the one group model so instead we draw two rates of sign up independently from two priors and separately run to generative models to simulate two data sets this is the only change we need to make to fit this new model we can use the same procedure as we use the for import one of this tutorial going on to the long name approximate equation computation so here we again first draw fixed parameter values from the priors this time we happen to draw a sign-up rate of 20% for method a and the rate of 72% for method B and then we plug these parameter draws into the generative models and simulate some data this time we got for sign up for method a and 10 signups for method B but then we keep these parameter draws only if the simulated data match the actual data and this time it didn't so we're going to filter it away short for method B the simulated data match the actual data since we in the reality got 10 signups but it doesn't match for method a as we in reality got 6 signups there and we want all the simulated data to match the real data and for these prompt drawers have to go so we do it again this time we draw some all the parameter values and when we run the generative model this time well what do you know this time we simulate the data that matched so we're keeping these parameter goals and now as last time we do this whole draw simulate react procedure many many times say a million times and what we are left with are two distributions the distributions of the parameters goes for method a and method B that made it past the rejection filtering step here is this distribution for the rate of signup for method a and since it's the probability distribution over likely parameter values that we got after having used the data it's what's usually called a posterior distribution it should look familiar to you as it is the same as before when we only have the data for method a so again it seems likely that the right designer prayed for method a is somewhere between 20 and 60 percent with it most likely being somewhere around 35 percent and here is the posterior distribution for method B and just looking at it it seems there is some evidence that method B would result in more signups as the bulk of the distribution is between 40 and 80 percent with a sign-up rate most likely being around 65 percent but this is just as eyeballing the posterior distributions and we really would like to calculate some probabilities say the probability that method B do have a higher rate of sign-up than method a fortunately this is very easy to do as these posterior probability distributions are represented by a long list of parameter draws so here are the numbers behind the two posterior distributions I only show the first eight rows but there are many many more rows in this table so here each row is a pair of parameter draws that when plugged into the generator model simulated data matching the actual real data so the way these parameter drawers are distribute that represents the uncertainty around what the rate of sign up could be now if you calculate new measures and we do it separately for each row then we retain this uncertainty and the resulting distributions of these new measures can also be interpreted as posterior probability distributions that is what is known about these new measures given the model and the data so what could such a measure be well since we're interested in which your method a and method B gives the highest rate of sign up why not calculate the difference between grade a and rate be using some are like pseudocode it could look something like this and when applied to each row it would give us a new column for the distribution of the difference between method a and method B where a positive number would be in favor of method B so now we could take a look at this new derivative distribution just eyeballing it we see that it is quite likely that method B has a higher rate of sign up almost all of the probability is to the right of the zero mark with the right B being most likely around 25 percentage point higher than rate a again since we are working with a table of parameter growth it is very easy to calculate the probability that rate B is higher than rate a we simply sum up how many rows of the rate difference was above zero that is how many times rate B was higher than rate a and then we divide by the total number of draws this time we get that 92% of the rate difference distribution is above zero that is there is a 92% probability that rate B is better than rate a to arrive at this probability we didn't need to change the way we fitted the model we could use the same method as when we just had data for mass of a all we needed to do was change the model and add a prior and a generative model for method B and then we just did some simple post-processing of the posterior draws using basic arithmetic so another reason to use patient data analysis is that it allows you to include information sources in addition to the data for example expert opinion here is again the CEO of Swedish Fish Incorporated and he's come to tell us that the signup rate has never been higher than 20% not even in Norway and it's usually between 5% and 15% now I'm not really sure exactly how much we should trust our CEO I mean I I think is smoking tobacco but I don't know but for now let's roll with this new information and see how we can include this expert opinion into the model again this is the model we have so far I've forgotten about method B for the time being so now our back just estimating the rate of signup flow method a so how can we include the CEOs information well a natural place to include it is in the prior what the model knows about the rate of signup before seeing the data what we need to do is to change the prior from a uniform prior which basically says that any rate between zero and 100% is equally likely to a more informative distribution that favors values between 5 and 50 percent now there are many ways to define custom prior distributions we could stitch together a couple of uniform distributions where we put more probability on the distributions covering 5 to 15% or we could even draw a probability distribution with pen and paper and scan it in but often the easiest solution is to use assembler or probability distributions that is flexible enough to represent the information that we have and that we will tweak until it represents that information for us a good choice would be the beta distribution so the beta is a continuous distribution bounded between 0 and 1 which is good because the rate of signup compa less than 0% nor more than 100 it has two parameters alpha and beta that allow it to take all the forms depicted here for example when alpha and beta are one it becomes a uniform distribution the larger the alpha and beta parameters are the more keep shaped and peaked it will become so here is a uniform distribution we're using right now as the prior for the signup rate a uniform prior is sometimes called a non lymphoma tip prior as it really doesn't contain that much information with regards to what the signup rate could be and here is a proposal for what a more informative prior could be this is a beta distribution with the Alpha parameter set to 3 and the beta parameter set to 25 but the specific parameter values really doesn't matter here what matters is what shape the distribution has and here I wanted to capture the information from our CEO that the rate of signup usually is between 5 and 15 percent so this informative prior puts most the probability between 5 and 15 percent but does not rule out the possibility that the sign of red could be up to 30 percent you could certainly capture the CEOs information in many other ways but this is what we're going to roll with so this is our new model it's the same as before but now with the informative prior on top and the cool thing again is that we don't need to change the computational part of how we fit this model we can use the same procedure as before the only difference is that we will draw the parameter draws from our new informative prior distribution instead from the uniform distribution as before here is a distribution you should recognize it's the posterior probability distribution of the likely rate of sign up using the uniform non informative prior and here is what we got using the new informative prior looking at it it seems that after having used info from the CEO and the info from the data it is most probable that the rate of signup is between 10 and 30% so the information in the data point is the rate of sign up being somewhere around 40% and the CEO stated that it's usually around 5 to 15% so it shouldn't come as a surprise the resulting posterior distribution looks like a mix between these two information sources now if we had more data the information in the prior would have less and less influence with enough data the prior wouldn't matter at all similarly if we had less data that posture would look more like the prior and if we had no data at all the posterior would be the same as the prior now we are in a slightly confusion situation however that we have run two different models and have two different results from the same data set and at some point we should decide whether we want to go with a non informative prior or the prior from the CEO but it's totally fine to try out different models in different priors and it can be worthwhile to try out an informative prior because if you're not using an informative prior you're leaving money on the table as Robert Weiss puts it that is if you're not using an informant the prior you're really leaving out information from the analysis that you have which seems like a waste all right a third reason why patient data analysis is useful is because they're the result of a Bayesian analysis retains the uncertainty of the estimated parameters which is very useful in prediction and decision analysis here decision analysis is when you take the results of analysis and bring it closer to what you care about usually you don't ultimately care about the parameter value what you care about is often things like money and what decision to make to get mortgage or what you could do to avoid different types of loss we never seem to get rid of our CEO and here he is again he asks us so what should we do and by the way marketing forgot to tell you that the cost of sending a brochure is 30 Kronus the cost of sending a salmon is 300 Krona and if a person signs up we make thousand crona's on average okay so so what should we do here are the two methods that we are considering a sending a colorful brochure or be sending a brochure and a sample frozen Selman and this is the result we got after having fitted the model with the data from both method a and method B we did that before remember and while this showed that it is probable that method B has a higher rate of sign up it doesn't directly tell us what to do because we're not really interested in the rate of sign up we're really interested in which method will give us the most money and while method B seems to have a higher rate of sign up it also involves sending out costly samples Salomon's but since we did a Bayesian analysis and we have access to the raw draws behind these two probability distributions it's very easy to do a quick decision analysis to figure out which method will probably give us the most money so to the left here we have the first eight rows from the many many drawers that make up these two posterior probability distributions and the distribution of these draws represents the uncertainty regarding what the underlying rate of signups are for these two methods and remember that any calculation we perform row wise here will give us a new posterior probability distribution that retains this uncertainty so some reasonable things to calculate would here be the expected profit when using method a which is the rate of signup times a thousand crooners we make per sign up minus the cost of sending the brochure so for the first row that would be 33% 10,000 crooners which means we would make 331 crona's on average minus the 30 kronas the brochure costs so an expected profit of 301 crona's percent for sure and so on for all the rows and similarly we can calculate the expected profit for method B which is almost the same same calculation but now minus 300 crooners for the salmon and finally since we're interested in which of these two methods would give the higher prophets we will calculate the difference in profit between the methods where a positive difference here means method B is better just looking at these first eight rows we see that five out of eight rows are actually favoring method a but of course we should look at the profit difference distribution for all the rows so here we see that there is much uncertainty regarding which method would give the highest profit it could be that net would be is better but if we count up how many drawers are in favor of method a we actually get that there is a sixty-one percent probability that method a would result in better profits so if we had to decide this small decision analysis tells us that we should go for method a even if method B has a higher rate of sign up but the main take over here should really be that given the data that we have there is much uncertainty and we really would need some better data before making a decision all right so we went from estimated rate parameters to a posterior probability distribution of the light difference in profit between these two methods and I hope you saw how easy that was since we started from the result of a Bayesian analysis that is probability distributions represented as a long table of parameter growth if we instead would have used classical statistical methods like maximum likelihood estimation would just have gotten out point estimates which we wouldn't be able to post process into something that informed us about the expected profit and that included some measure of uncertainty or certainty regarding the expected profits but with base it was pretty simple so and last reason to use patient data analysis but there are many more reasons but a last reason to use it is because you probably are already what I mean here is that a lot of classical statistical procedures that you might already be familiar with such as classical linear regression or the bootstrap can be interpreted as a Bayesian model with priors and generative model and the same is true for many machine learning procedures and while you don't have to interpret the statistical model to use from liberation perspective it helped me better understand what many statistical procedures actually do not least was a patient perspective super useful for me when understanding how mixed models and hierarchical models worked which are simple and straightforward from a patient perspective but slightly mysterious from a classical perspective so that were some reasons for why to use patient data analysis let's look at some reasons for why not to use patient data analysis so maybe everything is working fine as it is and you just have to with your tools and your workflow then you might not need patient data analysis or maybe you're not that interested in uncertainty there are many good machine learning tools that just give predictions but with no indication of these predictions uncertainty and if you want that maybe you don't need base or maybe patient statistics is too computationally demanding maybe you would want to fit the Bayesian model but your data set is so large it just take too long time or maybe you just feel a patient statistics take too much work to set up even if you would want to try the cost-benefit situation doesn't allowed these are perfectly good reasons not to use space and what I wanted to say with this slide here is that patient data analysis is just one two out of many in your data science tool belt and while it can be a very useful tool it's not the be-all end-all of data analytical methods even though it's sometimes presented as that so that concludes part two of this three part introduction to patient data analysis if you want to try out what we talked about here you could go back to your solution to the exercise in part 1 and change it according to the CEOs request that is try adding an informative prior to the model change the model so that it can accommodate data from both method a and method B and you can also try replicating the decision analysis where we looked at the expected profit of each method however if you try this you could run into some trouble because when you add the second data source to the model you might find that it takes a really long time to run that's because the method we used to fit patient models in part one approximate patient computation was conceptually simple but also extremely slow so in part three of this introduction I will give you some hints on how you can do speedy Bayesian computation and especially we will look at a useful tool called Stan but for now I'm rational sports and thanks for staying with me to the end
Info
Channel: rasmusab
Views: 62,281
Rating: 4.9828768 out of 5
Keywords: Bayesian, Statistics, Tutorial, Probability
Id: mAUwjSo5TJE
Channel Id: undefined
Length: 22min 59sec (1379 seconds)
Published: Mon Feb 27 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.