COS 302: Monte Carlo

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
let's talk about monte carlo methods monte carlo methods are all about one kind of thing and that is estimating expectations these expectations could be integrals they could look like this where we have some thing i'm going to call a target distribution pi of x with probability density function pi of x so they could be things like this and this we would say is an expectation i'll say under pi of x of the function f x it could also be that we're estimating sums so it could be that x takes values in a discrete space and so we're going to sum over the possible values of x and pi is now a probability mass function and f is something that is consuming discrete values we would still call this an expectation what i'm going to talk about applies to both of these cases but i'm going to focus on the top one because everything i say about this for the most part will translate in a fairly obvious way to the discrete situation the monte carlo principle is the idea that if i have an integral like this that is an expectation and i can draw samples so i'm going to say i imagine that i can draw samples from pi so i'm using this notation to say that i'm drawing some independent samples xn from the distribution pi then i can approximate this with a sample average so i'm drawing n of them big n so from little n equals one to begin now this is an intuitive thing because you're used to averaging things already it's generalizing it to the idea that we can find the expectation of arbitrary functions under this distribution pi so this is what we call the monte carlo principle this annoying integral perhaps i can't do in closed form perhaps for whatever reason it's not possible to use traditional quadrature methods so things like gaussian quadrature maybe it's too high dimensional to be able to do that and so what i do is i get a noisy estimate in which i draw samples from pi and i take those samples plug them into f and then average them i should say that it might look like this kind of thing is obscure and that it doesn't come up very often it turns out that this kind of integration this kind of expectation is something that comes up many many different places in machine learning and statistics and all across physical sciences and in economics we're computing expectations like this all the time this could be something like imagine that you have a simulation in the stock market tomorrow and f of x is how your portfolio does well you'd like to know under all possible ways that the market might go tomorrow how much money am i likely to make or maybe 5x is representing in your self-driving car the possible situations that might arise at the next intersection and you'd like to make a good decision that takes into account all the possible states that that intersection might be in well you might want to find the average you might want to find the expected risk say associated with some decision going into that intersection and that will be again an expectation and so averages seem like simple objects and in some ways they are but we use them absolutely everywhere and they can be very challenging to compute precisely because they involve integrals over perhaps high dimensional complicated spaces with distributions that might be difficult to manipulate if we can draw samples from those distributions then we can get averages and use those as proxies for the exact computation of the interval an important thing about this is that monte carlo estimators are unbiased what that means is that if you do this procedure in expectation you will get the right answer that is to say averaging the averages tends to converge to the truth let's take a couple of minutes and convince ourselves that this is an unbiased estimator that just means that when i take the sample average if i imagine a sample average of sample averages that is what would happen if i do this many times i need that to average to the truth so recall that the game we're playing is we're trying to compute the expectation now under this funny thing i'm going to write under the set of samples we might get with our monte carlo estimator and i'm going to be looking at the sample average under those guys so it's a little bit weird because it's kind of like i'm taking my average estimator and i'm sticking it inside another estimator in order to reason about it okay so this is just a sample average over the samples xn and i'm asking what happens under all possible xn's that i might get and i just want to make sure that that gives me the integral that i want to compute that is that it gives me the right expectation the first thing that you notice of course is that an expectation of a sum is the sum of the expectation and so i can immediately go in and i can replace this with a 1 over n sum equals 1 to n now of this expectation on the inside so now let's write this out just to see what's going on here so we'll write our 1 over n sum over n over little n i often like to use a lower case letter as a dummy index summing up to the uppercase version of that then i can keep track of what index corresponds to what quantity now i'm going to need to to write down the integral for this expectation and so i'm going to introduce some dummy variables that i'm going to call x prime just to make them different from these other ones and in particular we note that that expectation is a multiple integral over all of the x n's right so what that means is that i'm going to have say a pi i said i'm going to use a dummy variable for prime there like that and i've got one of them all the way out of course including x n and then capital n so this is an expectation under all of those pies and so i have a big product representing the independence of those pies but then there's only one f of x n now that looks like a complicated and annoying expression except that we know because all of the pies are probability density functions they all integrate to one and so all of these x primes except for this one can easily go away because when we integrate them they just become one so now i only have one integral that i have to care about and that is going to be pi x prime of little n f applied to that and then only that one now notice that this is exactly the quantity that we're trying to compute this is the expectation under pi of f of x and when i take a sum over n of them i get n copies of that constant right it doesn't depend on n and then i divide by big n and i get the thing i'm interested in now this isn't too surprising it's just sort of saying the averages of averages have the same expectation a little bit more interesting is the variance so now let's think about the variance so we're going to say i'm going to write like this i'm just going to say the variance again under this collection of variables of this quantity this average so just like last time i looked at the expectation now i'm going to write the variance under the possible samples we might get now the very first thing to notice is that there's this constant one over n now whenever we talk about the variance of a random quantity multiplied by a constant that constant can come outside the variance but it comes outside as the square that means i can write this as 1 over n squared multiplied by the variance still over that collection but now over the sum only i've taken these to be independent variables x n and so when i have independent things and i ask about their variance then it's going to be the sum of their individual variances so that's good news we can write this out as 1 over n of the sum of each of their variances now by the same kind of argument we had before there's only one xn that's appearing inside and so i can focus on just the variance of that x n the variance with respect to that x n of this guy f x n there's nothing special about x n relative to the other x's this variance is the same over all of the individual samples this is kind of the one sample variance if you will and so then this means that i can have a 1 over n i could just write this as 1 over n sorry we need squares here that's important of the variance under pi of f x so these are the same from our point of view because all of the xns come from that pi distribution note that this now really doesn't depend on n and we see immediately that what happens is one of those ends cancels out and we wind up with them one over n multiplied by the variance of pi this is a thing that doesn't depend on n at all right this is the variance of one of these samples but as n increases this quantity goes down as one over n so the takeaway here is that the variance of a monte carlo estimator goes as one over n where n is the number of samples as i increase the number of samples the variance is going down linearly that sounds like good news and in some ways it is but variance remember is kind of the square of our error in a sense it's not the actual error itself so this means the kind of error of our monte carlo estimate is kind of going as 1 over root n which is a little bit of bad news because that means as i invest more samples my error is going down but i'm getting diminishing returns this means that it costs more and more in order to get a better answer now the idea of monte carlo estimation relies a lot on your ability to generate data from a particular distribution that is to say that the expectation we're interested in really requires being able to sample from pi now in the last video about random number generation we talked a little bit about how to sample from general probability distributions it could be the case that we want to generate monte carlo samples from pretty complicated distributions and a lot of the work that we put into random number generation often goes into computing monte carlo averages for complicated probability distributions i mentioned ideas like rejection sampling and markov chain monte carlo but in the specific case where you're going to use those random samples to compute an expectation with monte carlo it turns out that you can use a really interesting trick called important sampling so remember the idea here is that we have some expectation under a distribution pi of some function f of x so pi is a distribution on x and we're looking at the expectation of this function so if this is continuous then we would write something like this you know there's some space x we have pi and we have x then the idea is that if we draw some samples say n of them from the distribution pi then we can approximate this with an average okay so this is straightforward but it does require that we'd be able to generate samples from pi the idea of important sampling is to rewrite this integral in a different way and in particular we choose another distribution to introduce into this equation and this distribution which we often call the proposal distribution it has a lot in common with the proposal distribution that is that upper bounding distribution that we used in rejection sampling but here we don't need it to be an upper bound we just need to know its probability density function and it needs to be something easy to sample from like a gaussian say and typically we would denote this with a q so here i'm writing q as the probability density function associated with this proposal the idea is that we can rewrite this expectation by dividing and multiplying by q so i can write something like this and place it this i haven't done anything fancy here all i've done is divided and multiplied by q and as long as q isn't zero in a place where the original integrand isn't zero then this is a thing we feel like we can get away with what's interesting about this is that we've now rewritten this as a different expectation that is an expectation under q now of a different function that is to say that we can write it like this i'm going to write an expectation under q instead of pi of the quantity pi over x so the quantity pi x divided by qx so rather than an expectation under pi of f instead we have a different function that has this extra bit out in front and we're computing the expectation of that function under q the idea being that if q is easier then we could sample a bunch of values from q we just have to kind of fix them up we have to put in a ratio of pi to q in order to make sure that they have the right weights so we usually call these the importance ways associated with important sampling so to make this concrete the idea now is that we have some xns that are drawn from q and now this expectation is approximated by a sum of this more complicated thing so now the object that we're taking the sample average of is a little bit weirder but the idea is that q is really easy to compute and so this is something that we can expect to do and this sample average has the same expectation as this one so instead of sampling from pi we can sample from q and life will be easy one way to think of this is as kind of rejection sampling but rather than rejecting things outright and throwing them away we're down waiting them according to how good of a sample they were from pi you do have to be a little bit careful in general we would expect that we'd like q to be a lot like pi so that we don't have any kind of really large or really small numbers here you want to make sure that there's not some places where you have very small values for q and large values for pi or the variance like we looked at just a minute ago can explode it can even go to infinity if you do a really bad job of picking q but in general if you're thoughtful about q and you pay attention to these kinds of issues then this can be a really nice shortcut to doing monte carlo estimation
Info
Channel: Intelligent Systems Lab
Views: 340
Rating: 4.5999999 out of 5
Keywords:
Id: JeMvBCxJrDg
Channel Id: undefined
Length: 14min 58sec (898 seconds)
Published: Mon Apr 05 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.