(ML 18.1) Markov chain Monte Carlo (MCMC) introduction-12eZWG0Z5gY.mp4

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Markov chain Monte Carlo is an extraordinarily powerful tool for sampling from and computing expectations with respect to very complicated high dimensional probability distributions and in fact MCMC as we call it was named one of the top 10 most influential algorithms of the 20th century at least by the editors of of a publication about journal called computing in science and engineering in the year 2000 so in this video I'd like to tell you a little bit about the situations in which MCMC is applicable and the intuition for why it works and then I'd like to briefly tell you about a very intuitive example which is actually the motivation for one of the the original formulations of MCMC called the metropolis algorithm so the goal of MCMC as I was just mentioning is that we want to sample from some distribution P or approximate an expected value and expectation Y of f of X for some function f and f might be a very complicated function and here this X is distributed maybe according to some complicated distribution P so the picture is we have some maybe this is our P and it's some super complicated probability distribution and and this is notionally I'm just drawing it as one dimensional but notionally this is in some very very high dimensional space and then we have some function f maybe in this situation also and the problem is that this p is just it's just it's just super complicated so doing this analytically is just out of question and so we talked about Monte Carlo just sort of standard straight up basic Monte Carlo which was - if at least for this part - to approximate this expected value to do that by computing this sample where these Excise were drawn iid according to this distribution P now the problem well the first problem that might arise is that this P might be way too complicated to actually sample from that was the first sort of problem and so we talked about another way to get around that which is to do say important sampling was one way to get around that where you have some proposal distribution queue and then you correct for the fact that you're using it the wrong distribution or you could do rejection sampling would be another option but a problem which arises with with both of those approaches in very high dimensions it's that even a proposal distribution which might seem just really good you know like it just really seems to fit it's not exactly P but gosh it's just close our intuition for a good proposal distribution and load you know we only have good intuition for low dimensions to three dimensions maybe four if you're lucky but in high dimensions the problem is that our intuition for volume just breaks down we just have very poor intuition for volume in high dimensions and so a proposal distribution in high dimensions that might seem to be a good one can just utterly fail in high dimensions because it ends up putting probability mass a lot of probability mass in regions that have very very low actual probability P and so your proposal distribution just utterly fails in high dimensions often times this this happens it's just difficult or impossible to get a good proposal distribution that you can actually sample from so the intuition for Markov chain Monte Carlo at least my intuition for for it is the following so this is just our notional sort of very high dimensional space here and the intuition is that this this distribution P distributions like this the ones that we tend to be interested in at least they have some sort of structure they have some sort of regular well maybe not very very regular structure but but they have some at least some sort of they tend to be sort of these connected connected spaces and they're sort of in sort of a manifold in this high dimensional space I use the word I'm using the word manifold very very loosely here not not in a rigorous sense but you know it's sort of this I'm drawing sort of the high dimensional of region of high the high dimension of high probability P sort of this this sort of wispy kind of region the subset of this very high dimensional space maybe it's sort of sort of like this this is just all completely notional and so the idea of Markov chain Monte Carlo is that you you want to be sampling from this very sort of small subset of this high dimensional region and so you know doing it just a just a very simple sort of thing if you you know just taking every possible value is just totally out of the question and constructing a proposal distribution is too hard so the idea is you just start somewhere maybe you start here and then you start just moving around and you try to move toward a region of high probability and then you get there and then you try to stay in the regions of high probability so you sort of doing this random walk and you're trying to stay near this this high probability region so you're just sort of moving around here in just just exploring this this space this high high probability subset of this very high dimensional space so you're just kind of walking around here like this and you do this by forming a Markov chain we'll talk more about what how that's done and feature so this is the idea this is this is the intuition my intuition at least for for what MCMC is doing and sort of why it works and the reason why it works is because you can you can move around you you can if you're just sort of staying in this space then you can you can you can hit it with you know you can you can move around in it much more efficiently you know you're not like just choosing all these other points that that aren't in the space and of course the regions of high probability are are the ones which make a difference they're the ones which really count when you're computing this expectation so it doesn't matter what's going on over here okay so that's the intuition and let me tell you now about an example of this to sort of make this a little more concrete and so here's here's also some pictures of some some of the people who are involved in developing MCMC since john von neumann who built some of the first computers and developed monte carlo methods Gulam and and metropolis wrote a paper in 1949 that that that first sort of laid out the idea of using Markov chains for Monte Carlo approximation and then Rosen Bluth and metropolis and Teller along with the wives of Rosen Bluth and Teller wrote a paper in 1953 in which they applied Markov chain Monte Carlo to the following problems so I'm going to tell you about the problem that they applied it to alright so here's the problem suppose you have so so think about you know if you I don't know if you you remember from from chemistry if you took chemistry you have these phase diagrams and in a phase diagram it sort of looks like this you have this this solid region over here of the phase diagram and then you have a liquid region and then over here you have a gas region and this is on this axis as temperature and on this axis is pressure and so for example you know for a material like water a water has a particular phase diagram and this is of extreme interest to to to physicists understanding these phase diagrams for for different materials and so because this is such a sort of prototypical type of behavior that this sort of phase diagram they all sort of sort of tend to look the same for just a huge range of materials water or co2 or helium or you know just a huge range of different substances have a phase diagram that looks very similar and so as a as a theoretical model to try to understand this behavior the following model was proposed so this this sort of hard disks in a box model as it's called was proposed in which you have this box and then you have these these disks these circles and these circles are they're all the same size and they're meant to represent individual molecules and they're bouncing around in this in this box they're just all moving all around and it's they're called hard disks because they can't overlap you can't have this situation and that constraint so this is a high dimensional space in that if there is n and different particles here then each of them they're they're X and they're Y position in this box that those are those each contribute to to the dimension and so you'd have like the dimension would be something like R to the 2n or whatever the you know this box to the 2n something like that the width of this to the 2n so it's a high dimensional space when you have lots of particles when n is big and it's a complicated bass because of that constraint that they can't overlap so that makes it very complicated if they could overlap then everything would be kind of easy but that hard constraint makes this a very very complicated space and so physicists are very very interested in understanding this theoretical model and these guys who weren't being physicists they were interested in understanding it and so they used they they used Markov chain Monte Carlo to try to simulate this this this model and in fact they were interested in computing expectations they were interested in computing expectations of this form and X here would be the state of all of these particles so the state would be the position each of these particles has a position and X would be the vector that contained the position of all of these different particles and the idea was is very simple so the idea is that you start out so that so we were talking about here you you're sort of you start out and then you just start moving around you start moving around and you're exploring the space and so the space that you're exploring here in this example is the space of all these valid configurations and it's valid valid when you can't you know when no two of these overlap that's what I mean by a valid configuration and so you have to define these moves you know here we were we were moving around in the space and the moves the moves that they used were the following so you so at each step in the in time you move each of these guys or you consider a move so this this particle you would consider a move of the following sort you take its current position and then you consider a box you you have some predetermined parameter which determines the the width and height of this box and then you uniformly draw a a point in this missbach and then according to a sort of randomized rule the metropolis rule that we'll talk about you either accept or reject that proposal and so if you maybe you do at this point and you accepted it then you would move that guy there and then you would consider the next guy and then you might move him here and then you might move this guy here and this guy might say where he is and then you might move this guy and this guy might stay where he is and so on so you so that would so updating each of the particles would give you one iteration of the algorithm and you run that a whole bunch of times and you compute so each of those iterations gives you a configuration X I and then your approximation is very much like the Monte this standard Monte Carlo approximation you just form this this sample mean where each of the excise is is the the the the the points that the possess the configurations that you got as you ran this sort of simulation so it's a very very sort of very sort of intuitive sort of very understandable idea and it turns out that even though these excise are not iid from the true distribution P that this MCMC estimator is actually a good approximation of the true expectation oh I shouldn't be using sorry I'm using n for two different things here this should be so and here is the number of iterations that we ran at for and here and I was using n for the number of particles but let's call that maybe capital n so the number of particles is capital and and the reason why it turns out that for this problem and when you you have to set it up correctly in order for this to converge to the the true thing in order for it to be a good approximation of the true mean but the reason why it converges to the true thing is from what's called the ergodic theorem for Markov chains and we'll take a look at the ergodic theorem for Markov chains shortly in a in another video so the reason why MCMC is it's possible to even do it on this problem is because we could we could make these sort of local moves we could we can move around easily in this space and that's what allowed us to do this so one thing I left out here I guess in this description is I didn't say what probability distribution we defined over this X so X is here is in order to have a an expectation it has some probability distribution and it turns out that the the probability distribution that they used is what's called the Boltzmann distribution and I won't go into the details of what that is but I just wanted to to make it clear that there there is a probability distribution that they were using and it's called the Boltzmann distribution for for the given you know the way that they set up the system okay so I think that's a nice intuitive example to sort of start it start to get your head wrapped around how MCMC works and what it's doing and and how you what you do with it and next we'll we'll start to take a more detailed look at at why MCMC works

Info

Channel: Hao Du

Views: 12,664

Rating: 4.3587785 out of 5

Keywords:

Id: 3ZmW_7NXVvk

Channel Id: undefined

Length: 17min 4sec (1024 seconds)

Published: Fri Mar 31 2017