Hidden Markov Model : Data Science Concepts

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone welcome back in this video we're gonna be talking about the hidden Markov model in machine learning so I really like this model because it has a pretty elegant framing in my opinion and you might be here just to learn it for a test or to pass a class no worries we're gonna get everyone to understand this model so as always we're gonna start with a real-world kind of situation so we're not just looking at math and numbers throughout this entire video so let's say that you're taking a course at a university which you might actually be doing and you have a professor and this professor is one of two moods every single day so she's either happy or she's sad on any given day furthermore there's certain known transitions between these two states and we know that the mood on any given day depends only on the mood on the previous day so this diagram is given here but the information the same information is given in these two tables if this makes it easier for you to digest so I'll explain it in both so this table is basically saying that if the professor is happy on day t minus one which just means the previous day then there's a point seven chance that she's gonna be happy today which means that there's a point three chance that she'll be sad today because she has to be either happy or sad today so the rows of these transition matrices have to add up to one of course and we see that a depressed her was sad on the previous day there's a 50/50 shot that she's gonna be happy or sad today so the same exact information is captured here you can see with the arrows going between the states now just a quick note what is this topmost node here what is the SS just stands for start which means that on the very first day of the course where you don't have any prior information on if she was happy or sad because there's no previous day this basically just says that okay on the starting point we have a 40% chance of being happy a 60% chance of being sad so that's all that same now here's where the hidden Markov model starts forming starts forming with the next part which is called the emission probability so this was called the transition probabilities and these next part is called the emission probabilities so here's the next part of the story the professor on every day of the course is going to wear one of three color shirts red green or blue and it's not random the color shirt that she's wearing is directly linked to the mood on that given day so if she's happy then there's a certain distribution of red green and blue but if she's sad that day there's a different distribution and that information is captured in the bottom part of this diagram but also in this table so looking at the table because it's a little bit easier to digest we see that if she's happy on any given day then there's a 80% chance that you'll show up wearing a red shirt at 10 percent chance that she will bring a green shirt and a 10 percent chance that she'll show up wearing a blue shirt now if she's sad these are very different it's point two point three and point five so we see that it really depends on the mood what color shirt that she might be wearing that day and the same information is captured here just in a slightly more messy way so that's the framing of the hidden Markov model let me just go over a couple of key definitions before moving on to the inference part we have this diagram we have two basic sets of probabilities the first is called the transition probabilities and that corresponds directly to what we call the hidden States so this is the first time I'm gonna use the word hidden which is the key component in the hidden Markov model looking at all this information we as the students only observe the color shirt we don't know whether the professor is happy or sad no one asks unfortunately so we don't know what the the mood of the professor is which means that these states here happy or sad are hidden so that's where the hidden and hidden Markov model comes from by contrast these states down here which are the color shirt are directly observable we can show up to class we can see what color shirt the professor is wearing and we can observe that so these are called observed States so the basic idea of the hidden Markov model is there's some hidden states that we don't know about but those hidden States directly affect the observed States that we do know about okay so now let's move on to the next part of the story let's say that it's three days into the course and you've noticed the professor showed up wearing a green shirt then wearing a blue shirt and then wearing a red shirt on the third day so there's day one day two and day three so just to put a little bit of notation into all this we have that the color of the shirt on the first day C 1 is equal to green the color of the shirt on the second day c2 is blue and c3 is red same thing you see above now the question we want to ask ourselves is what's the most likely moods of this professor on these three days of course we don't know because we didn't observe them but we do know all these probabilities both the transition and emission probabilities so we should be able to formulate some kind of idea about based on what I'm seeing based on this this green blue red pattern that I saw in the last three days I should be able to derive what's the most likely moods of the professor on these three days so let's go ahead and see how we might do that mathematically so what we really want to do I've written this really big down here but we want to maximize this probability this looks pretty complicated and it's gonna look even more complicated but then it's gonna get really simple because of the elegance of the model but starting from the beginning we want to maximize this probability of six variables all occurring together the six variables being c1 color on the first day c2 color on the second day c3 color on the third day so that's the first three the observed States and the three hidden variables that we don't know anything about which would be the mood on the first day and what the mood on the second day m2 and the mood on the third day m3 and we can choose between any kind of moods so m1 m2 and m3 these are lowercase and so they're the realizations of these three mood random variables so we can basically say in this case a little easy because there's only eight possible combinations the reason there's eight is because there's two moods and we're looking across three days so two to the three is eight so you can imagine just taking each of these eight combinations of moods plugging them into this probability and seeing which of those combinations of moods gives you the highest probability and why is that relevant well if you found the combination of three moods that gives you the highest probability that's basically going to be saying that oh the probability of seeing this color sequence along with those moods is maximized which means that that's the most likely sequence of moods for this professor to have given the data that we observe okay so that's why we're trying to maximize this so now that we're okay with the Y let's go ahead and look at the house so this from your basic probability course can be broken up into several probabilities all written in sequence and you can choose to order the variables in any way but I'm gonna order them in this way just so that the next step becomes really really easy for us so what I've done is taken this probability and follow this arrow up here and it reduces to this form so let me put a bracket around it so this is the former looking at so the probability of C 3 given all the others C 2 given all the others C 1 given the rest cm 3 given the rest M 2 given M 1 and probability of M 1 so go ahead and convince yourself that you're allowed to write this probability in this form these conditional form right here okay it looks like I just made everything really complicated right because I took something that already looked kind of convoluted and just broke it up into several convoluted statements but here's where the power of the hidden Markov model comes in the reason that we can break this down into a very very easy form is because we've made several assumptions one assumption we made is that the color shirt you're wearing on any given day only depends on the mood of that professor on that day doesn't depend on anything else and we've made the Markov assumption and this is where a little bit of background knowledge on Markov chains is useful here but the basic idea and the Markov assumption as it's called is that the mood on any given day only depends on the mood yesterday does not depend on the MU 2 days ago or three days ago directly it only directly depends on the mood yesterday and that's gonna allow us to simplify this form quite a bit so let's go ahead and look at the first one so this says the probability that the color shirt on the third day given all this information of course this color shirt on the third day only depends on the mood on the third day doesn't depend on any of the other stuff in this conditional so I can reduce this guy to just this simple form which says that the probability of the color shirt on the third day given the mood on the third day so already simplifications we can simplify these next two in the very similar way because the color shirt on day n only depends on the mood day n so that's how I simplified the first three and there to simplify the next three I'm using the Markov assumption so let me walk you through that one more time this guy this probability I'm pointing to says that the probability of the mood on the third day given the mood on the second day in the mood on the first day but we know given the Markov assumption in the hidden Markov model that the mood on the third day depends only on the mood on the second day so I can actually just drop the m1 here reducing that to this form and I can do the same for all of these other variables notice I added an S here just for start so this is saying what's the probability that the mood is m1 given we just started it okay so if you need to pause or rewind or writes of stuff on a piece of paper then totally do that because there is a lot here it's not trivial it's not easy in any way but if you walk through the steps I'm convinced that you'll understand it so let's go ahead and again tie it back to what are we actually trying to do remember our goal was to maximize this probability because this would be maximizing the probability of the observed and hidden States and we want to find the combination of hidden States such that this thing is maximized now each of these components we can we have data for we have data for the probability of a color given a mood that comes directly from this table here we also have data for the probability of a mood given a previous day's mood that's coming from the transitions here so as we walk through all eight combinations of moods that we might have we can go ahead and plug all that into these variables here and see which combination gives us the highest product when we multiply all of these things together and it turns out for this I saw I didn't actually do the calculation here but we see that the highest probability is achieved exactly when the sequence of moods is sad sad happy okay so again the story is from the beginning we have some set up where there are some hidden states the mood of the professor which we do not observe but we know the transitions between there's also observed States which is the color of the shirt of the professor which we do observe and we have emission probabilities for and we can use all of this information to take some observed sequence of events here it's just three but you can imagine many many many more and maximize this probability of both seeing the observed data and seeing some combination of hidden states which we have yet to set and the way we set those observed states is by plugging in all combinations of hidden States into these simplified product of probabilities the reason we can make that simplification is exactly because of the Markov property of the hidden Markov model that we have assumed and once we find the maximizing hidden States we say that okay this combination of hidden States gives me the most likely explanation for seeing this data and that is the crux of the hidden Markov model now just the one quick note I'll say is that there are efficient ways to compute these probabilities so if you have a lot of them then there are ways to do it that doesn't require a ton of time but that won't be the focus of this video the last thing that I'll say is that what are the actual applications of the hidden Markov model because of course this was a pretty silly example mostly just designed to help you understand but one that comes to mind right now is natural language processing so for example take a simple sentence like I eat pizza so we have a really simple English sentence like this but of course it flows in this direction so we read from the left-hand side to the right-hand side so you can imagine that each of these words is an observed variable just in the way that the color of the professor's shirt was an observed variable so what would the hidden variables be the hidden variables here would be the part of speech so this is a very popular problem in natural language processing which is part of speech tagging can you take a valid English sentence and can you assign a part of speech whether it's a noun or a verb or adverb or an adjective to each of these words so model that people use a lot at least to begin with is the hidden Markov model because you can imagine that these are just observed states that come from the hidden States down here which would be the part of speech for example eat is definitely a verb so we can say that okay the hidden state verb what's the probability that that verb is eat so that would be the emission and we also know that there's some probability of a verb coming after a noun or adverb coming after a verb or whatever kind of combinations so you can see that hidden Markov models are very very important for this kind of sequence modeling alright so hopefully that was helpful hopefully you're able to understand that but if you had any confusions along the way please leave them in the comments below like and subscribe for more videos just like this and I'll see you next time
Info
Channel: ritvikmath
Views: 22,075
Rating: 4.9666319 out of 5
Keywords: machine learning, data science, math, markov, chain, hidden, model
Id: fX5bYmnHqqE
Channel Id: undefined
Length: 13min 52sec (832 seconds)
Published: Mon Jun 15 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.