Inverse Transform Sampling : Data Science Concepts

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone in this video I'm really excited to talk to you about a data science concept called inverse transform sampling so this name sounds a little bit scary and in fact it won't be clear why this is the name of this method until the end of this video when we figure out how to actually do the thing that we're trying to do but in general this is a data science method for taking a certain distribution such as the uniform distribution and doing some kind of transformation to it in order to turn it into a different distribution now before we get into all the math of it and what it actually means let's think about why we might want to do this so we know that if we have any kind of computer program we can pretty much just generate any common distribution we want we can generate the exponential distribution or the normal distribution or the uniform distribution but behind the scenes the computer is doing something to generate numbers from that distribution and this video will basically go into what the computer is doing behind the scenes it's helpful for us to understand that even though we can just access it for free without thinking about it it's gonna be helpful to kind of think about it to get a better grasp of how transformations get made in data science and how we can take a certain set of numbers and map it onto a different distribution of a set of numbers okay so here is the Senate let's say we have uniform numbers in the interval 0 through 1 remember our uniform distribution means any number in 0 through 1 so any real number in that range has an equal probability of being chosen if the numbers are distributed in such a way and let's just say we have this distribution for free we'll make videos in the future about how to actually get at that distribution but let's say for now that we have these numbers for free we have the uniform distribution 0 to 1 so here is the probability density function of the numbers from 0 to 1 we see that there's not a lot of exciting stuff going on every number in this range 0 to 1 has an equal chance of being called the density is the same across the board now here comes the exciting part we want to take these uniform numbers and our goal is to generate an exponential lambda distribution okay a quick refresher an exponential lambda distribution has the following probability density function or PDF where the function is given by e to the negative lambda X if X is bigger than equal to zero and zero if X is smaller than zero what that basically means that the exponential distribution only cares about numbers that are zero or positive and the density the probability density of any given number in that range is given by e to the negative lambda X where lambda is a parameter you put into the exponential distribution okay and the corresponding CDF or cumulative density function quickly refresher the cumulative density function says that for any X you put in it tells you the probability of you picking a number that is X or less so in this case if we choose our X to be right here the CDF will tell you the area that's between zero and X that's under this curve also known as what's the probability you choose a number that's X or less from the distribution so the CDF recall for the exponential lambda distribution is given by 1 minus e to the negative lambda X for all X bigger than equal to zero so it looks very similar to the PDF it just has this one minus in front of it okay so our goal is to somehow we have these uniform zero one numbers we want to somehow apply some kind of transformation to each of these numbers so that the resulting set of numbers the resulting set of numbers we get after applying the transformation to each one follows this new exponential lambda distribution instead how are we gonna do that and in this case we did it for the exponential lambda but there's other cases we might want to do this we might want to generate a normal distribution the typical bell curve you see we might want to generate a gamma distribution it just that we don't know exactly what kind of distribution we want to generate so we want to be prepared in all cases so let's think about this let me first get rid of the PDF and just write the CDF because it's gonna focus a lot more on the CDA ok so here I've just retained the CDF of the exponential distribution again it's this form right here now what do we actually want remember we want to figure out a transformation T so big T here is a function we want to apply that transformation to our uniform numbers so big you here is our uniform distribution that we have right here we want to apply that transformation and we want that to be equal to X where X is numbers that are in this potential distribution so at a high level we want to figure out what's the transformation so this is the T in question that's what we want to solve for what is the transformation we can do to you in order to get the exponential numbers back well the only thing we really have right now is this CDF so let's work from there we know that F X of X is a CDF which is the probability that the exponential distribution is less than or equal to small X remember this is straight from the definition of what a CDF is cumulative distribution function basically just tells us what's the probability that this distribution lies behind any given value in this case it's small X okay so that's the definition of the CDF now we just said that we want to find the transformation so we have tu is equal to X so we can write this as probability T U is less than or equal to X right because T U is the same thing as big X now let's go ahead and take the inverse of T on both sides so we get probability that you remember we take the inverse and apply it to the function it just cancels out so we just get the argument itself which is big U is less than equal to the inverse of T applied to this X right so we applied the inverse of T to both sides here so what's the probability that the uniform distribution is less than or equal to some value T inverse of X so here's the T inverse of X value what's the probability uniform distribution is behind that range okay so it's in this range here well it's really easy for the uniform distribution because we know that the probability of being behind some range is just that value itself to make that more concrete let's say I asked you in uniform distribution what's the probability of being behind point seven five so if T is equal to zero point seven five the probability of being behind point seven five is just point seven five because that's how much area there is behind here and that all comes from the uniform really friendly nature of the uniform distribution that means that the probability of big u so uniform distribution being less than or equal to T inverse of X is simply just T inverse of X itself it seems like we didn't do that much but in fact we've derived a very very important result namely we found that F X of X the cumulative distribution function for the exponential distribution is equal to the inverse of T of X that means that F X and T are inverse functions because F is equal to the inverse of T therefore we know that T is equal to the inverse of X so we can write here let me write it in red to set it apart from everything else we've written we know now well let me write down the first result we have that F X of X is equal to T inverse of x that's what we just derived we can take the inverse on the other side so we get that F X inverse of X is equal to T of X and this is the big result that we've captured here we actually now have a closed form solution for this transformation and this transformation basically says that the closed form transformation is that it's the inverse of this cumulative density function thus cumulative density function being 1 minus e to the negative lambda X if we figure out the inverse of that function we're gonna have exactly the transformation we need in order to apply to all the uniform numbers in order to get the exponential lambda distribution back so the last part of this video is that we're gonna just go ahead and figure out what is that inverse for the exponential distribution but this works in general right because when we derived this we didn't assume anything about the distribution being exponential we just said there's some distribution numbers X that we're trying to get at how do we do that using the uniform distribution and we found that in order to do that you take the CDF of the target distribution you're trying to get to find the inverse of the CDF if possible and that's the function that you're going to run your uniform numbers through in order to get back this distribution okay so let's go ahead and figure out this function this transformation for the exponential lambda distribution ok so again here's a CDF of the exponential distribution we want to figure out what's the inverse of this function here to find the inverse it can be difficult but in this case it's pretty easy let's go ahead and set y equal to this function so y equals 1 minus e to the negative lambda X now to find the inverse we're just going to solve for X so we get 1 minus y is equal to e to the negative lambda X then we're going to take the natural log of both sides so Ln of one minus y is equal to negative lambda X now let's just divide by a negative lambda on both sides so we get X is equal to negative natural log of 1 minus y over lambda and that's it so we found that in order to get from the uniform distribution to the exponent of the solution this is the transformation we'll apply one thing we'll do is replace this Y by au so it's more clear that we're putting uniform numbers into here so we get X is equal to negative natural log of 1 minus u over lambda in fact there's one more slight kind of optimization we can make here so we get 1 minus u right but the uniform distribution since it goes from 0 to 1 using 1 minus U is the same thing as using u because if we do 1 minus u we're gonna get basically the symmetric opposite of U so it doesn't matter which one we use because of the uniform nature of the uniform distribution in fact you can keep it 1 minus U if you want but just to make this formula look a little bit prettier we can just go ahead and substitute 1 minus U for you and that's it so to recap if we have uniform numbers from 0 to 1 and we take the negative natural log of these numbers we pick let's say a million of them from the uniform distribution and for each of the million numbers we run it through this transformation here the resulting numbers we get back the resulting transform numbers are going to follow a exponential lambda distribution and that's how we get there you can do this for similar distributions as long as you can invert the cumulative density function you can do this and that's why this is called inverse transform sampling because we are using the inverse of the cumulative distribution function and we are using that to transform the uniform distribution into the distribution we want ok hopefully that made sense go ahead and leave some questions in the comments if you want any clarification and until next time
Info
Channel: ritvikmath
Views: 22,133
Rating: undefined out of 5
Keywords:
Id: 9ixzzPQWuAY
Channel Id: undefined
Length: 10min 53sec (653 seconds)
Published: Sun Oct 06 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.