Bayesian modeling without the math: An introduction to PyMC3

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

everyone i am yesterday fadi and here with miranda largely thank you for thank you to each and every one of you for being here with us today we are pleased to be able to welcome those of you uh that have been with us for a year as well as those of you who are new for our new audience pi data um by data uh has more than 140 chapters around the world and why data judah is one of them uh so by the tiju is running by seven heroes from saudi arabia there are listed here um so let's move to the excited part it's about our talk i'm happy to introduce our speaker today dr thomas he is the ceo and the founder of ymc labs and he is he will talk about biogen modeling without the math and introduction to ymc3 please join me in welcoming dr thomas i know you are so excited to hear dr thomas talking more than me but i will try to keep it short so i would like to thanks our sponsor for their support and i'm thank you for supporting our uh meet up and to if you want to know more about bi data please visit by data global by data.org the website and to know more about our upcoming events please uh follow us on twitter and one last thing if you have any question please write your question in the chat and renaud will read the question to our speakers so i think i will stop shining now um destroy thomas can you share your screen or um participant screen sharing okay let me give you the okay the permission all right okay and can you see my screen yes we can perfect uh thank you all so much for coming and uh thanks for the organizer for having me i'm really excited to present i didn't know that there was a jeddah or a pi data to begin with so um i'm learning all kinds of new things so hopefully you will too by the end of this we'll have to learn something about basic modeling and really this is out of my own um accord of learning about basic modeling and being really frustrated with this way of how it's often introduced and it often doesn't um and um it's um yeah so out of my own frustration of how business this is often taught with which is very math heavy um and i always thought that there needed to be a better way so i i've always been trying to explain basic statistics in more intuitive ways and this is basically my last iteration of um how this is uh how i think this can be introduced so yeah this is for everyone basically who does not have a solid math background but still wants to know okay how can we think about this and what i think is beautiful about based in statistics is that we actually can do this um so there is an intuitive level of understanding uh that does not require a lot of math to understand okay so let's dive into it um so i'm thomas i studied computational psychiatry um before that i went to study bioinformatics at the university of tubing and now i run a company called times t labs which is a basic consultancy so and we are the authors so i'm one of the authors of prime c3 and i'll talk about what that is and um yeah so this is basically pimp c labs so all of us are core contributors to the library and i'm really proud of the team that we have assembled and um it's basically just the most brilliant bayesian modelers that i ever had the honor of working with so just do the nature of the project we really have assembled all these people and that's um it's really amazing so i'm really happy about that and yeah so it's a team of hitters between uh yeah so with mathematicians neuroscientists a russian mathematician if you have uh so that's maxine mathematicians they need to be russian that's the rule um ravine built rockets at spacex now he's building bayesian rockets at pimc labs yeah so and we solve really challenging industry problems uh wherever bayesian modeling is important okay so how do you think about bayesian statistics uh from high level here's what i think that it allows you to do and that is you can build models that are tailored to data the analog i like to use is when i was growing up there used to be playmobil kids and lego kids and playmobil is basically i don't know how well this translates to an international audience it's a german toy manufacturer i think they're international but anyway you get the ideas you buy this right and it is the ambulance and the only thing you can play with it is the ambulance and here you have um lego on the other hand right which is just a whole bunch of building blocks and you can build the toy that was predestined to be built but you can also just build your own toy right so you can reassemble these tools in their own way and then yeah build whatever you want right so you can really build something that is targeted to your particular use case and so this is basically how i think about basin statistics is you just assemble the model that you want using building blocks and the building blocks as we will see our probability distributions another really key point that is high level about basic statistics and why you might want to use it is that it allows you to include any knowledge you already have about the problem so and this is a star contrast to machine learning right so machine learning it's a very powerful tool but it basically requires you to learn everything about the problem just from the data right you have a random force classifier and you have to have a huge data set to for you to learn the patterns in there that are predictive and you really know much about your data set well then that's great right um but a lot of times you do have a lot of knowledge about the structure of your problem and that type of knowledge you might want to include in your model right so you have some type of expert knowledge like maybe you have nested data and a hierarchical structure in your data set or it's not a prediction problem right maybe it's um a decision making problem or just an inference problem where you don't want to predict something but you just want to learn about your problem um or really it's uh one example i really like is copic 19 right how fast is it spreading uh that is not a prediction problem right it's an inference problem which is what's the reproduction rate at a given point in time and there's all kinds of assumptions that we have and knowledge about how copies 19 spreads that we can bake into this model and then get to that number that we want so a huge class of problems is this inference problem and i think it's in data science we've been a little bit misled to think that machine learning is the only thing that really you should care about because while it is powerful it only applies to a very small set of problems and basic statistics on the other hand is um applies to much broader set because well you can't do machine learning but you can do so much more for example a b testing right this is something where i mean this is not a prediction problem right um so you have two versions of the website you have your control you have the current version and then you have another version of your website where maybe you change the color of the sign up button right and you want to know which one of those leads to more signups or sales or whatever the metric is that you that you care about and so you can basically structure this problem and um so well as i said it's not a prediction problem right we want to know which of those two is better or what is the conversion rate between those two websites so you can say this is a hypothesis testing problem or decision making problem or just generally an inference problem so let's get very concrete here and look at data that we have collected right so we test both of these versions on our website and most of the people don't convert right they don't send up to a website but if it's one then they do convert and we do this for both this is a hundred trials so 100 users visit version a of the website 100 users visit version b of the website and now we want to know well which one is better well how do we decide that well one very simple thing is well we can just take the mean of that right numpy dot mean a conversion 0.07 so seven percent of people convert and for b it's 17 that convert so um apparently version b is better right well who knows right one critical piece of information is missing from this number right and that is how sure am i in this number so for example i could get 0.07 if it was just 10 conversions right or um if it's a million conversions uh if it's a million data points right so the length of this vector right the more data i have well the more sure am i that it's actually zero point uh and that's actually seven percent right if it's just very few well i will be very uncertain but with a lot of data i will be much more certain so this is basically what is missing if you're doing this so this is called a point estimate right because it's just a single number and really what we want is to encode the information in making that decision right uh how certain are we that this version is actually better than this because there is noise in here right okay so let's assume we have a magic machine and that measured machine provides plausible values of conversion rates so rather than just saying okay there's a single number here seven percent is the only number that explains the conversion rate of this data set now we're saying well it could also be 0.075 right that also explained this data set pretty well could it also be 0.9 no probably not right that is very unlikely that's not that's implausible the number of conversions in this being 90 with this data set is probably is impossible um but 0.5 might be possible um or 0.08 might be possible right so these are all possible plausible values that explain this data and this is what that magic machine produces so in this magic machine which we'll of course talk more about but let's just assume we have it and we feed the data and it gives me plausible values of conversion rates so it says okay uh eleven percent could explain this data well or eight percent could explain this data well or five percent could explain this data well and well what i can do then is so for the conversion rate of website a i just draw a histogram right over all plausible values and this basically tells me that okay well 0.07 which is the point estimate that i've gotten before that is the most likely value right but it could also be 15 that's not many plausible values in that range right but still i don't want to rule this out but definitely not going to be larger than 20 that is just incompatible with the data that i observed okay cool so if we had this right i think it's pretty obvious that this would be useful um to drive this point home further there's now a lot of interesting things that we can do with this so for example we might want to know what is the how many plausible values are there above 10 right so maybe 10 is for us an important cutoff where we want to know okay if it's below 10 then it's bad if it's above 10 then it's good and how do we answer that question well we just count right we just take the number of samples that are smaller than 0.1 and then the number of samples that are larger than 0.1 and we just divide by the sum right so 82 of samples are below 0.1 and 18 are above and already now we are doing statistics and the statistical statement behind this is that the probability that the conversion rate of a is larger than 10 is 18 percent right so now all of a sudden just by having plausible values and we don't yet know how we get those possible values right but if we have well then we can just answer all these interesting questions and with the point estimate we weren't able to do that now i can compute these possible values for conversion rates for a set a and for conversion rates for set b and that gives me two histograms right so and here already just by looking at it we can see okay it does seem like that is quite that the conversion that the plausible conversion rates for b are definitely much higher than those for a so that's cool and here we have um the conversion rate for so now we might want to ask the question that we actually care about which is that we have a uh what is the probability that the conversion rate of website a is better than that of version b right so that the old version is better than the new one and what's cool is that well with our plausible values we can't just do that right we can just say okay the conversion rate uh how many samples from my distribution of over here are larger than the values over here and then i just count those up too that is 1.8 so that's nothing right and again now we have a statistical statement embedded here we know that the probability that the conversion rate of a is larger than that of b is 1.8 so that's kind of cool right um so just by having these possible values i can just slice and dice uh this histogram in all kinds of ways and answer all these statistical probability questions so obviously now uh that begs the question how can we build this machine right and and well that's where um a very smart person came in called thomas bass very proud to share his first name at least um who was like okay well um we can just use basic statistics well that's not what he said but that's what everyone else said that so he figured out a very simple reformulation of base formula up here but that's actually not even that important right the key is that this allows us to build this box so you're probably wondering what's what's in the box so again trying to find a high level overview of what how we could think about this let's say that we have solution generator right and that generates values for my parameters and these are conversion rates and i don't really know ahead of time that these conversion rates whether they're plausible or not so this just generates basically it's a random number it's a little bit smaller than that but that's at the core what it does and as you can see the plausible well the values that it generates the random number generator are not even plausible so for example negative numbers so obviously the conversion rate can be negative right uh the minimum it has a zero percent but the random number generator doesn't really care about that so this just generates solutions and then we have our data and then we have a solution evaluator and that is basically the one that says okay a value of 0.054 is plausible so we're going to store that for into our bucket of plausible solutions and then we have for example minus 0.013 that is implausible so we throw that away so yeah so this is basically how this works in a specific setting of course i haven't really told you well how this works or what the solution value is that's what we're going to look at next but from a high level this is how we can think about this now for those of you who are familiar with ben statistics you might notice that i'm using different terms here and i'm using i'm doing this just because i don't really like the original terms they are just archaic right so for example this random number generator is really called markov chain monte carlo sampler but i mean what the hell does that even mean [Music] or these here the plausible solutions to my problem are posterior samples and here the evaluator that is my basin model but just so you know that how these terms map to what other people call itself just me okay so let's dive into this so clearly the most important part the most important part is the solution evaluator how can i say whether a conversion rate that that someone suggests to me is plausible or not does it doesn't explain the data well or not that's really what we're asking here and for that we need two things the first one is well we need a way to evaluate the solution and well we can just think about this for a second so out of 100 visits we get seven conversions for version a so which conversion probability plausibly explains this data so what we want is something where we can input a conversion probability right so for example 90 and then i want to know how well does the conversion rate 90 explain the data and that is what this plot is here on the y-axis so i have this function i input a conversion rate and it tells me whether that is plausible or not and based on that i can decide whether i want to keep that proposed conversion rate or not and the function that does this is called the likelihood function as inputs it takes the number of trials 100 and the number of success is seven and the conversion probably want to evaluate so all three go into this function and then it returns how well this conversion probably explains the data and the particular shape of this function is called by normal distribution so if you have count data of successes and failures then it's the binomial distribution so there are all kinds of different probabilities distributions for all kinds of different data sets so if you have um successes uh then it's binomial if you have just coin flips right then it's bernoulli um or a kind of a normal fusion if it's just continuous and i mean and there's a whole bunch of them if you have arrival times and it's a point distribution so there's all these different likelihood functions that we're often using and um you do have to know something about statistics in order to decide which is which but often the choice is rather clear so the next thing we can think about is how we're updating the space of plausible solutions because uh yeah so this is one thing i haven't told you about and that is that we have not just the posterior space of plausible solutions but we also have a prior space of portable solutions and the way this works is that you can imagine that we have um before we have seen any data right and this is coming back to what i said at the very beginning where we have expert knowledge of our problem and part of that expert knowledge might be that we know that for example no website in this world has a conversion rate of 100 right that just doesn't exist um most likely the conversion rate will be lower than 50 right we just know this we don't have to gather any data to know this so it's going to be low but how low i don't know and we can define this in the model and say okay without having seen any data what do i think are plausible solutions to this problem and this is what is plotted here also as histogram of values so all of these values here do a deem plausible without having seen the data so this is just me based on well not really knowing a lot about websites and a b tests but this is just what i personally imagine now some people here might actually know much more about a b test and they will say well actually this distribution is all wrong here's a better one and then we can use that so that would be great so we start with this prior belief and then what we do is we observe data right this is the binary events that i've observed and then we evaluate the solutions this is what we do with the basic model and then we get to that posterior space of plausible solutions so this is already what we looked at before right so we have this the all the values that plausibly explain the data and the new thing that i told you about now is that we also have um the plausible values without having any data and really this is a very natural way for me to think about basic statistics is that we're just updating our beliefs so we have beliefs of plausible values about the world and this is how everyone how humans just go about in the world right we have some hypothesis of how the world works well we have many hypotheses of how the world work right and then we observe like oh well this hypothesis i had of how the word works clearly was not correct right because you now gather data you get experience in life and see oh um that's not how the word works and then you discard that idea and over time you just refine your hypotheses to those that actually stand the test of time well hopefully that's what you're doing but that's not what everyone is doing but if you are scientifically inclined you see whatever works for you in your life and stick with those hypotheses that seem to explain what you're seeing so this is how basin updating works and how you can think of learning from data by shifting leaves around so far this has all just been very very high level and conceptual but now let's actually see how we are doing this so we've got a pi data meetup so i'm going to show some python code here i'm using pi mc3 which is a probabilistic program that's written in python that allows you to build these types of models these magic boxes in python really what's special about is that you don't have to have an external language that you're using you can just use python for building your models and the first thing we do is we import pioneer c3 as pm so this is what we're going to call it and then the first line and this is boilerplate every line will have this we're setting up a context manager this with keyword so we're instantiating a model and we're tying it to this model variable and that actually doesn't really matter all that much this boilerplate code every model will have this line what gets interesting is what happens beneath that because what this says is that everything that happens underneath here is going to be tied to this model so this just says okay let me define a model that i call model a and that model has parameters and the parameter in this case for modeling the conversion rates of version a is just the conversion rate parameter right this is what i want to infer what is the conversion rate and as i said before i'm assigning priors which are my beliefs of plausible values of conversion rates before i've seen any data and this is this distribution which is described by so-called beta distribution so these are again these probability distributions and beta distributions always occur when you have percentages so this is from zero to 100 percent um so that's why this is a beta distribution and uh this comes from prime c3 so we're just going to say okay without having seen any data i'm going to believe this is beta distributed the first argument is a name the second argument are the parameters to that distribution so these probabilities are parameterized by certain values and they don't often make that much sense so here i just chose two values that give it this shape so this is often how you do it you just know the shape that you want and you give it the parameters to give it that shape so it's not really important how i came up with it i just played around with it to get something that looked like this right if i plug in different values i get different shapes like maybe if i do alpha 10 and beta2 so i just flip them around and this will just look mirrored around 50 so this will be high probabilities okay so that sets up my parameter that i want to infer and assigns a prior to that next thing i need to worry about is the likelihood function and again this is what we talked about and we decided that this was a binomial distribution which is this function that gives me for every conversion probability it tells me how likely that is and here i now again give it a name i specify n 100 so we have 100 different trials that we had and seven of those converted so that is that what i have observed right and this is the keyword argument that makes this interlibrary function so this says okay this is not a parameter to infer this is actually something that i have observed i've observed that seven out of 100 have converted and now i have to say okay well what is the probability of a conversion occurring right so 7 out of 100 but is that actually a conversion probability of 7 could it also be 8 percent right well it's not clear and that is what i want to infer and that's where get that's where i pass in this other parameter i have to find so this is what ties it together i have this likelihood function that describes my data i have this parameter that i want to infer and that itself has a prior distribution so hopefully that makes sense and um now what i need to do is uh so this specifies the model um but there's one thing now i have to basically run this proposal generator for different solutions right so if you remember here we have the solution generator we still need to run this and get a plausible solution so so far we have now only built the solution evaluator now we need to do this but the good thing is that this is easy because this is completely automatic and in prime03 i have to do is call pm.sample so this basically runs that um loop width asks in our model okay is point minus 0.05 possible value and the model says no that's impossible so then we're going to throw that away is 0.07 plausible value yes that's a great value because that's exactly um how many conversions we observed so we're going to keep that in our plausible value store and so this is basically that posterior that collection of variables that we care about and what's cool is now that we have this right so this is all we had to do so this gets us to these values of plausible values for the conversion rate and um yeah so that is um cool and now all we need to do is actually just inquire that posterior distribution so for example the things that i showed you in the beginning we can just say okay what what type of values from the conversion rate a are smaller than 0.1 and that gives me a boolean array and if i call the mean on that then that tells me 82 so 82 probability that the conversion rate is smaller than 10 percent and if you remember from the very beginning that's exactly the thing that i showed you here right so this is how i got that number so i ran the sampler i get this full distribution and then i just say okay what percentage of those samples is smaller than 0.1 and that's that number so this is just to show you how it can then very easily work with that and do statistics and this makes just way more sense to me at least than for example frequent statistics where this would be much more cumbersome to do or that question that i had before of is the conversion rate a higher than conversion rate b so i just compare i run the model for a and run on the model for b and show you the one for b but it's the same and then again that gives me a boolean rate i call the mean i get the probability pretty simple stuff i hope um so this basically just concludes this section of the talk um there is um of course i've just been scratching and serve this year there's so much more and really this is the main point that i want you to take home is so you have this tool and allows you to build models all kinds of models tailored to your data and you don't just get a single answer but you get all answers that are plausible and this allows you to specify the uncertainty and allows you to answer questions in a statistical way but there's so much more you can do you can do linear regressions right so this is a model that where now we have a normal likelihood right because we assume that the error here these data points are normally distributed a normal error and here we have the true regression line yellow and again we don't just get a single answer we don't just get a single regression line as our answer but we get all regression lines that explain this data well so as you can see we can't decide between them they all do a good job of explaining this and this is just a couple of lines in pi c3 or maybe you have outliers right so now i have these three points in here that are skewing this result up right so the normal likelihood function is very susceptible to outliers because there's very small tails so small like large deviations away from the mean uh will affect it strongly and bias this but now because i have the flexibility of statistics i can just say okay well there are outliers in here that the normal distribution is not a good fit instead i can use a different likelihood function to explain this which in this case is doing t distribution and that allows for values far away from the mean this orange line and if i run the irrigation that way i just need to say now change the single line of code in the model all of a sudden i have a robust regression model that is not susceptible to these outliers so i'm aware that i'm going fast so this is just a high level to give you an idea you don't need to understand everything here another really powerful concept is hierarchical models so often you have an asset data for example maybe we don't just have a single website for which we run a b test but we have 100 websites maybe you are the person who is um you're the a b testing company right and you have all these different websites that you're testing all the time and you want to know which of those is better well you can actually learn what you so you can just run separate models for each of those websites right but most likely they will share some similarity in these websites and you can exploit that in hierarchical model by embedding this so you can say okay well i know that all of these individual websites conversion rates actually share similarity and i'm explaining this with a shared parameter between all these different websites so we'll tying them together and that allows us to model the similarities and differences simultaneously so these hierarchical models are very powerful you can also do very classical hypothesis testing say we have a drug and a placebo we want to know which of those is better that's basically a b testing one there is something called bayesian estimation supersede the t test or best um there's a standardized hypothesis test similar to this to the t-test that you know from frequency statistics but this one is a little better the kind of mixture models where maybe this is not just a single distribution but it's made up of three individual distributions and we can infer those distributions underneath all this data if you have data that is not just a single regression but maybe there's a time series off and this could change over time right so maybe this is uh there's a very powerful example of co2 concentration in the atmosphere which is changing over time and uh so there's this very flexible class of models called goes in processes that can that you can fit to all these time series and get something that explains it given the data so this is very powerful and you can also then predict into the future and note that these predictions also have a lot of have uncertainty around that so you don't just get the most likely future path which would be this red line but they also get all kinds of other ones that reasonably well could explain the future so that concept of uncertainty is very important you can do survival analysis um and of course you can also do deep learning so a lot of the field of deep learning now has embraced bayesian statistics of starting to embed these models or rather to allow you can express bayesian you can express deep nets in base in for in terms and then get a lot of interesting benefits from that um yeah so uh if that is interesting definitely check us out at pmclabs.org you can visit my blog we're right about this stuff so a lot of the examples that i just showed in the last section are from our blog here's the website for pimc and here's a great book on learning this in a more proper way so yeah with that i'd like to thank you for your attention and would be happy to take any questions that might have come up uh thank you so much dr thomas for this valuable information we really do appreciate it uh we have some questions we i would like to tell it to you please uh our first question what are some common bitfuls that product fracturings fall into when building bayesian models on py mc3 welcome pitfalls when building these models well there's a whole bunch of them so the biggest one is that you misspecify the model so you write down a model um but you didn't choose the right prior or like for example let's say here right so you choose a prior that actually is not that is far away from the actual solution space and then the sample will have a hard time so so the most difficult thing is building the right model for your data and this is an initial process and it takes time because you can't just say okay here's a bunch of data and i'll predict you have to really understand your data and learn about your data and uh but that is also the superpower of this right is that you can um by being forced to write out your assumptions in this model and then test them on data you really are learning something about your problem um but yeah so this is i would say uh one of the biggest stumbling blocks is when things go wrong right so the sampler here for example it's very powerful and works most of the time but if your model is not specified well then it will fail and then that looks very ugly but it's really just trying to tell you something that there is something wrong with your model so yeah that i would say is that the most common thing that we see oh thank you uh our second question is uh that random sampler is actually randomly generating values for parameters of possible posterior distributions right um very close yes so basically it's um it's generating individual values and then it's testing those values whether that makes sense and then if they make sense then i put them into my collection of posterior samples yes so but it's not generating posterior distributions but it is the thing that makes up the posterior distribution so a lot of the like the sum of the collection of all mcmc samples is the my approximation of the posterior distribution so yeah that's the only correction yes it's clear uh okay uh our third question is is it possible to choose a prayer distribution without an initial set of data to base or assume uh to base our assumption on yes um so usually then you would call that an uninformed prior and uh so for example here i could have also said that without having seen the data right i do have some information here and that is that the conversion rate must be between zero and hundred percent so that's something that i know right so that is part of the prior but now i could say okay well in between that range i'm gonna assume that all values are equally possible so then that would just be uniform distribution from zero to 100 percent and then i would say okay i'm not biasing the inference at all and that is what is called an uninformed prior you can do that but that is often ill-advised because very rarely right do you really think that all values are equally likely you always have some information um but yes i would say the more information you have and the more you you can constrain a prior the better but yeah if you don't know well then you just said very wide price but um yeah i'm uh there's this sentiment right um that is often underlying that type of question where it's like oh i don't want to bias my inference i want the data to speak for itself and that seems that sounds so um seductive right that it must be true but actually it's completely stupid the data can't speak for itself and the data is completely schizophrenic the data is super noisy and um and there's a mess right so thinking that that could speak anything that would be reasonable is um yeah i don't think makes a lot of sense we really need um to have we can only really ever make inferences when we make assumptions right and so yeah the the that there is something that could be um uninformed or uh non-biased is impossible it's always going to be biased in some way so but here we're being very explicit about our biases that we're doing and and and these biases are good um you do want to vice inference to be in a range of values that you know is going to be plausible you don't want answers that just don't make any sense oh yes thank you it's clear okay our first question was how many parameters can pi pymc3 estimate uh some of our attendees are very curious to know uh the scalability pymc can can do um yeah so it used to be the case that you could build like models with 10 parameters or maybe 50 parameters but now these samplers are so extremely powerful and and the tools also are just so fast that so i've seen we've built models with over 50 60 000 parameters um say probably in the hundreds of thousands is um where probably the light speed limit probably could push reasonably well estimated um i would say up to 100 000 is definitely like possible and then that would take a couple of hours um but i do would stress that um so we have built this model right with for example 40 000 grams and that fits super well and it was again took like two hours um but that was still a lot of work to get to we had sampled that quickly so you need uh the model needs to be well specified right so these samplers are extremely powerful but they're not magic and if you have like really weird weird shaped posterior right a difficult posterioris you wish to sample from then that is why it's going to be slow so usually it's slow not because of the number of grams so it's because of the data but usually it's slow because your posterior is really difficult in that case you need to go in and fix your model so that the posterior distribution looks better and that is difficult to do and that is often what we do with our consultancy is people come with us with models that are running but they're very slow and we don't do anything but we just rewrite their model in a way where it's actually then faster the sample so yeah that's mostly the constraint is the how the model is specified um but yeah so you can fit very very complicated mods interesting just know that okay uh our second uh first question uh in the first example uh would would bootstrapping would work for estimating the uncertainty around 0.07 conversion rates like randomly sampling from the data uh 1000 times yep so bootstrap is definitely the right way to think about this right so with a bootstrap i could also derive these samples and and that's fine so if you do any if if the alternative to bootstrap is a point estimate then definitely do bootstrap that's great um the bootstrap however has a whole bunch of drawbacks um which i'm not going to go into but well one of them for example is that all you're doing is resampling data right there's not a real model of how you think the world works behind this so it's very constrained in this so in this particular example for example i could have done this and probably got something different it got something similar but now let's assume again that we have maybe this hierarchical structure the data set where i don't just have one website with an a b test but 100 website with an a b test and i want to share knowledge between them well bootstrap does not allow you to do that so yeah this is where this flexibility comes in and the power of building more complicated models and the other one is also as a theoretical consideration so here the uncertainty that we're getting is very much um stands on very solid theoretical foundation while the bootstrap um does not come with such strong guarantees of this so but yeah think about this like uh that certainly get from bootstrap but that's better than with strap oh cool okay um how to tell if outlayers are the result of noise or from a fat tail distribution and this is a cool question so in this model that i showed here you can't really do that what you can do is build another model and there's an example of this website and you build a mixture model and you say okay there's two different distributions there's one distribution that is the samples that are coming from this linear aggression and then there's another distribution which is my outlier distribution and i'm gonna for every point assume that it comes from one of these two distributions either from the outlier distribution or from the linear regression distribution and that model it's called a hogger model h oog um can then allow you to say okay this these outliers they're coming from this outer distribution which is my mix which one makes someone else demand and these are in layers and they're well explained by this distribution and because these samples then are not going to go into this likelihood function here even if i use a normal distribution from a likelihood function it's still going to give me the right estimate so this won't be biased because these samples up here are explained by another distribution um so yeah this model can't but another model which is a mixture more cash interesting okay can you share some use cases of bayesian analysis analysis that you have come across in your consultancy uh uh is it always better to adopt a bayesian approach than a frequent approach so yeah we have um clients in all kinds of different industries um so one for example is uh with a company called everest who are building risk models so they're um and this might be relevant for saudi arabia right so very wealthy individuals they have a lot of their money in the stock market but they're also in private equity so venture capital funds for example and there what we did for them was um so because they're private right you they're not traded on the stock market and it's very difficult for returns of that or the price of um a vc fund and the model that we built allows you to estimate the returns of private equity returns of private equity investments given capital calls and distributions so um yeah just by how much goes into a fund and how much comes out of a fund you we were able to infer the returns um which is a latent distribution so there's one example in financial modeling but also we're working with an agricultural company and they are planting all these crops on these fields and in different uh different fields they're testing different crops and then i want to know which crops are best but then there's all this variation right sometimes on the right side of the on the field um there's more sun than the other side or there's different waters so i want to estimate this um underlying distribution and then know which crops are the most powerful um so it's used for that for example there's one of our clients another client is the pharma company roche so they are testing humans to see how well a certain drug works impacts their working memory so how many items can they store in their working memory under this drug or off the drug and for that we also built a bayesian model tent for okay how many items can someone keep in their mind at the same time so as you can see it's very wide in these modeling problems and i would yes i would go as far as saying a space statistic is always better than frequentist and the reason is that frequentist really doesn't make sense um i'm not saying that it's wrong because the mathematical theory is correct but it's not the thing that you usually care about and unfortunately i can't really go into this just because frequent statistics really does not does make so little sense that if i start to explain what it's actually trying to compute uh that everyone that would lose everyone on this call where they wish to say well that just doesn't make any sense and that's the point so my summary is frequency statistics gives you answers to questions that don't make any sense well basis gives you answers to questions that do make sense both of them are correct but in one case the questions don't make sense um and just one last point on this if you ask scientists what a p-value is a frequency's p-values they explain to you what a bayesian p-value is no one really understands what this really is that's highly problematic because yeah at the end of the day it's not really what you want yes clear our last but not least question uh how exa ex explainable are patient models those sharp values shop stands for shapely addic addictive explanations applied to bayesian models as well um so yeah that's super um great to have that as a last question so yeah these models are very explainable which is cool so um because we have a model of how the world works and latent parameters that are infer i can really inquire the model and say okay well this is the driver of this pattern of data so yeah you can really communicate these models to other people in the company or you can just ask people in the company okay how is your problem structured right and then you build a model that is mapping that problem most accurately and then yeah you can really acquire the model and learn about your data set and this is in stark contrast to machine learning where you can't really ask the model what it has learned so it's very explainable um i don't know what chap values are so i can't answer that and i've heard of this um but yeah um so yeah i really enjoyed these questions um this is uh amazing i'm glad that um people really seem to have gotten it uh okay we have one extra question scanned up uh do you think that bayesian deep learning is the answer for explainability to the deep learning models um no i don't think so so um usually these models are explainable if you build them in a way that they um are explainable but if you you can certainly build models that are very hard um to understand and if you for example if you build a neural network model as a bayesian model then that is not going to inherently be explainable because still the model is just learning arbitrary things so you're doing machine learning black box and you can't understand in bayesian modeling and then you also still can't explain they can't still not you still can't explain it um in the bayesian framework so that by itself does not make it explainable but usually the type of models that you're building are a much closer map to the problem that you're having and in that case we built a model that rests that is grounded in understanding of your problem right then the model is understandable if you just build a model that just infer some crazy hyperplane some abstract space like a deep net then it's still not going to be explainable oh interesting uh okay there's no more questions yes me the mic is words thank you so much dr thomas for your time and thank you it was a great talk and thank you all for joining uh follow us on twitter to know our upcoming uh events and yeah this is the end of our meet up thanks everyone bye thank you bye bye

Info

Channel: Jeddah PyData

Views: 746

Rating: 5 out of 5

Keywords:

Id: uxGhjXS3ILE

Channel Id: undefined

Length: 54min 42sec (3282 seconds)

Published: Wed Mar 03 2021