Markov Chain Stationary Distribution : Data Science Concepts

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] hey everyone welcome back so today we're going to do kind of a follow-up video on the original video on markov chains so in that one which is linked below we touched a little bit on steady states or stationary distributions of a markov chain which end up being really important both just for the sake of markov chains but also for going forward we want to talk about things like mcmc which is markov chain monte carlo and it'll be important for that too so either way we're just going to dive a lot deeper into what it means for a markov chain to have a steady state also called a stationary distribution and what it means kind of intuitively and then we'll also look at mathematically how to get one so we're going to start with a real example we have a markov chain that has five states a b c d and e and all the arrows that you're seeing here are transition probabilities between each state and every other state so before we look at all the math i think it helps to if you can and you have a small enough markov chain just kind of get a feel for how this chain is going to behave either over time or if we're at different states so let's just start somewhere let's say that i'm at a if i'm at a at a current time step we see that the only arrow in this diagram that has to do with a is this arrow that goes from a to c and it's a hundred percent or probability one which means that if i'm at a there's a hundred percent chance that i'm going to be at c never to return to a ever again so that's kind of an interesting idea already another place where we see a similar flavor but slightly different of that is looking at state e so e is more interesting because there's a 90 chance if we're at e at some time step that we're going to just come back to e in the next time step so it's very likely we're going to just remain at e for a really long time if we're there already but there is a non-zero probability so there's a 10 chance that we jump to b instead and the crucial thing to understand is that there's no way now to get back to e so just like with a even though with a we were certain to go away from there in one step with e it's going to take a little bit longer but both a and e have this property that if we're there eventually we're going to exit from there never to return again so keep those ideas in mind they're going to come up again with the calculation of the steady state if we look at b c and d then there's a little bit more interesting dynamics going on let's say we're at b then there's a hundred percent chance we're going to go to d from there there's a hundred percent chance we go to c and from there there's a 50-50 shot of going to either b or d and so the chain can kind of move freely among these three states but it can't do so with states a and e so i think that's enough kind of descriptive analysis let's now look at the transition matrix so this is called the transition matrix we looked at it in the original markov chain video but just a refresher this is collecting all the probabilities you're seeing with these arrows into more of a structured format so that's all it's really doing it's helpful to think about it as just kind of a bookkeeping tool that's going to allow us to see all the probabilities at once so for example what does this one-half mean that means that if i'm at c there's a one-half probability that the next state i'll be at is b and that corresponds exactly to this arrow here and all the transition probabilities are written just pretty much the same right here so now let's get into the question of the steady state the question of the steady state written very explicitly is this question here is there a vector of starting probabilities so that's going to be called traditionally pi but it's important to realize that pi in the context of this is not 3.14 whatever it's not a number it's a vector of probabilities and specifically the number of probabilities is the number of states and to make that clear i've given each one a subscript of a state so pi is a vector which is pi a pi b pi c pi d and pi e and before continuing on with this sentence let's make sure we understand what that means because it took me a little while when i first learned markov chains to kind of wrap my head around the idea that the markov chain could be at a distribution because you always think about the markov chain being at a certain state or a different state but how can the markov chain be at this distribution what that means is that pretend that a and b were 50 each so this vector has to add up to one because their probabilities of being in each state so pretend a and b were both one half and c d and e pi c d and e are all zero what that means intuitively if i tell you the markov chain is at some state with one half one half zero zero zero it means that there's half a chance the markov chain is at state a and there's half a chance that the markov chain is at state b and then i ask you using that information can you now use the transition matrix in order to tell me what's the probability of being at each of these five states in the next step and then the next step in the next step okay so this took me a while to understand which is why maybe i'm taking too much time to explain it but i really think it's an important concept that although we think about a markov chain as moving from one state to the next state to the next state to the next state we can also talk about the markov chain being at a distribution and what that means is that there's this much probability it's at a this much probability it's a b c d and e and the question the steady state is asking is can you come up with the vector of probabilities like this can you come up with a pi vector like that such that moving the chain forward by one iteration so we say that the chain is currently there and we're going to move the chain forward by one step using this transition matrix can you come up with a vector probability such that moving the chain forward by one step keeps the probabilities at pi another way of saying that is that if this was the distribution the chain was at at some time step and then i allow the chain to proceed by one time step i want the probability of being at each of the states to stay exactly the same at the next step in the markov chain and therefore at the next step after that and forever basically let's think about why this is useful not mathematically but kind of just intuitively what this is saying is that i can say that there is some condition some kind of probability distribution such that if i allow the chain to proceed into the future as long as it wants if i start at that distribution it's going to stay at that distribution and in data science and statistics we care a lot about that predictability about things staying at certain distributions about being predictable in the distributions that they're at so we're basically saying that if i start the markov chain off at this distribution if it exists then it's going to stay at that distribution forever and that's a very powerful concept for us especially as you'll see in future videos if we're trying to sample from some kind of known distribution but that's for a future video so this is intuitive a definition of a stationary distribution or a steady state and the steady state itself would be this vector pi so now the last question in this video is of course how do i solve for this thing and i'll show you a kind of easy mathematical way to solve for it but i don't think that helps in the understanding so first we're going to go the long way and kind of solve it from first principles so going a little bit more mathematically what are we saying with the sentence we're saying that for every single state in the markov chain so for every single s s being either a b c d or e we're saying that the probability that the markov chain is at that state at time step t plus one has to be equal to the probability that the markov chain was there at the previous time step at time step t and if this is true for every single state then we found a steady state of our markov chain okay so let's look at this markov chain for a second and think about just rationally what certain of these pi elements have to be for example remember the first thing we said about a we said that if you're at a if there's even a little bit of a probability of being at a at some given state there's a hundred percent chance you're not at a at the next step because we know that there's a hundred percent chance that if you're at a you're going to go to c that means that if i were to find some kind of pi vector like this the first element pi a which corresponds to this state a must be zero okay and the reason it's zero is because there's no way if i even have a little bit of a chance of being at a at a current step that i am at a in the next step there's a zero percent chance i'll be at a in the next step therefore there has to be a zero percent chance of a to begin with so that's already going to help us simplify some of these equations so we know that pi a is equal to 0. now let's solve for pi e because it's going to be the easiest next one to solve for so let's say they have some vector of probabilities pi what is the probability of me being at state e at the next time step well looking at this markov chain the only way for me to be at step e in the next time step is if i was at step e in this time step and i have a 90 chance of going back to e we can write that sentence in math as follows we're saying that this is the probability of being at e in the previous time step and then there's a 90 chance that if i'm there i stay at e and that is exactly the probability that i am at e in the next time step which by this definition of steady state also has to be equal to pi e so now we're solving this equation here and we see that the only number the only pi e that would solve this is pi e is equal to zero and so we found that both pi a and pi e are equal to zero and even before doing the math this intuitively makes sense because if we look at these states as we said in the beginning they are states that eventually lead away to something else never to return so it doesn't make a lot of sense for us to say that the probability that you're at that state is never going to change over time it's going to eventually have to go to zero we know that so therefore it's really reinforcing that logic to know that pi a and pi e are equal to zero now let's solve the other equations and see that it doesn't take too much math to solve them now that we have these two simplifications down so let's ask about pi b what's the probability that i'm at pi b in the subsequent time step well if we look at b there's only two arrows leading into it which is what we need to answer the question about what's the probability on there in the next time step and the arrows that lead to it come from either c with a 50 probability that's the first part here so probability i'm at c in the current time step times 50 that's one avenue to get to b in the next time step what's the other avenue to get to b that's going to be the probability that i'm at e in the current time step times 0.1 and that's exactly what i wrote over here but we know that pi e is equal to zero so that second term vanishes and we get this pi c is equal to two times pi b okay let's hold on to that and then let's try to figure out the probability of being at pi c in the subsequent time step same analysis what are the arrows that lead into it the only arrow that leads into it is either from a or from d and they're both 100 probability so we see that pi d times 100 plus pi a times 100 percent those are the two avenues to get to c and therefore that has to be equal to the probability of being at c in the next time step which would of course be pi c by the definition of the steady state and pi a is equal to zero therefore we get pi d is equal to pi c so although the math we did is not super high level i do think it helps to kind of pause look at the markov chain look back at the equations i wrote and make sure everything is connecting for you so go ahead and take the time to do that but now we actually have all the information we need to solve for these variables because we actually have this hidden third equation that all of the pies need to add up to one so if you take this equation this equation and this equation you will find that the only pi vector that satisfies all of this conditions is zero one-fifth two-fifths two-fifths and zero and now we've solved it mathematically but let's make sure to tie this back to reality and tie this back to intuition what does this mean this means that if i told you that there is a zero percent chance of being at a in the current step there is a 20 chance of being at b a forty percent chance of being at either c or d and a zero percent chance of being at e if i told you that's the current state of the markov chain then i asked you about what are the probabilities of being at any state at the next step of the markov chain they are the same and then the next step after that they are the same they will never change ever again if i start off the probabilities here and that's a very important thing to note so this is all conditional on the fact that the markov chain either starts at this distribution or eventually it gets there the steady state or stationary distribution is saying that if the markov chain arrives at this distribution we can guarantee that it's going to stay at that distribution it doesn't say anything about whether the markov chain will arrive at that distribution okay so this is kind of a thing that i got mixed up to when i first learned markov chain because i thought of it as if i let this markov chain run forever it's going to be at that distribution that is not exactly correct as we'll see in the last example here but what it's saying is that if it were at that distribution it will stay at that distribution so that's the statement being made and the last thing i'll show you in this video is of course it took a lot of work for us to get all these pies and you could imagine that if this matrix and the number of states was a lot bigger this is not sustainable not something you can really do by hand so there has to be a easier way right and the easier way is this the easier way is to solve this following matrix vector multiplication pi times p is equal to pi p is the transition matrix and just a quick note why is this work why is this intuitive so if i write out pi times p it's going to be the pi vector and the columns of the transition matrix p a b c d and e and why does this make sense the story this is telling for example let's find the first element of this vector matrix multiplication that would be achieved by taking the dot product of this with pa and what that means is that this is the probability of being at a at the current time step multiply that by the probability of staying at a if i were at a add that to the probability of being at b at the current time step times the probability that if i were at b then i go to a next and so what this first term is saying is that consider all the avenues to get to a so if i'm going to get to a then potentially i could come from a b c d or e and so the first term is considering all of those possibilities so that the result is the exact probability that i am going to be at a at the subsequent time step and that has to be equal to pi a by the definition of the stationary distribution and that's why we write pi over here so now that we know this makes sense if we stare at this for two seconds more we see that this is exactly saying that pi is a left eigenvector of this transition matrix p with eigenvalue 1. so typically we're used to thinking of eigenvectors of being on the right-hand side but we can also do eigenvectors on the left-hand side so this is a left eigenvector of this matrix and the eigenvalue is 1 because that's the coefficient on the right hand side so invisible but the coefficient is 1. so if you can solve for the left eigenvector of p with eigenvalue 1 you've got yourself a stationary distribution and now the very last thing i'll show you in this video i kind of ran out of board space but i hope you can see this down here is does the markov chain only have one stationary distribution or can it have more stationary distributions take a look at this very simple looking markov chain so this is even easier to understand than our first one so i just magically changed the numbers because i saw a mistake but if you're at a then there's a 50 percent chance you go to b and a 50 percent chance you go to d if you go to d you're gonna stay at d forever if you go to b then you're just gonna cycle between b and c forever so for the same reason we know that pi a must be equal to zero because if you're at a you can never return there after one time step so same logic as the original a from here so for that reason we know that pi a is equal to zero but this one's a little bit more interesting because if you think about this if i'm at a then there's a chance i could go up to cycle between b and c for all time but there's just an equal of a chance that i could go down to d and stay at d for all time so this markov chain does not have a single unique stationary distribution it actually has many stationary distributions for example let me just give you two to kind of get the idea let's say that i told you that there's a 50 chance of being at b and a 50 chance of being at c so pi b and pi c for this example here would both be 50 and everything else would be 0 that is a stationary distribution because if you're at either b or c you're just going to keep cycling between them and therefore this distribution of one-half one-half will just remain constant for all history so that is a stationary distribution another stationary distribution which is even easier to see is that what if there's a hundred percent probability that i am at d this is kind of trivially a stationary distribution because it means that if i'm at d i'm just going to stay at d so there's always going to be 100 chance of being a d if i were to start at d for sure so we see in this case we actually have these two stationary distributions and you can actually come up with many many more but the main point i'm trying to get across is that sometimes there's a unique stationary distribution for your markov chain but other times there can be many stationary distributions and that kind of reinforces the fact that a stationary distribution does not tell you what happens to the markov chain as it progresses further and further in the future it simply answers the question about if i start the markov chain at this distribution it will stay at that distribution for all time so a subtle but very important difference so if you learned about markov chain stationary distributions or steady states in this video please leave a like and please subscribe to my channel for more videos just like this i hope you learned something and see you later

Info

Channel: ritvikmath

Views: 9,350

Rating: undefined out of 5

Keywords: data science, machine learning, markov chain, statistics, education, linear algebra, big data, ai

Id: 4sXiCxZDrTU

Channel Id: undefined

Length: 17min 32sec (1052 seconds)

Published: Mon Jan 18 2021