Monte Carlo Tree Search (MCTS) Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone so today we will be talking about the Monte Carlo tree search which is a search algorithm for solving certain artificial intelligence problems the gist of it is that it is a way for an AI to think or plan ahead to achieve certain goals now it's not the first decision-making search algorithm but it is very special and the reason is that we all know how easy it is to write programs that can do very specific tasks under very specific circumstances given very specific instructions it is a lot more difficult to write an AI that can kind of think on its own and figure out on its own how to solve certain problems it's also very difficult to write AI that can function in a very open-ended environment where there are many possible many possibilities and many different choices that's why we have we can we have a lot of like dumb machines that can do stuff like this but we haven't really made any kind of walking talking wisecracking robots like bender here but the market research can bring us a bit closer to making smart AIS to understand why it is more advanced we should compare it to minimax which is what deep blue used to defeat the top chess player in 96 first some similarities both minimax and Monica lot research uses both both of them use game trees which are directed graphs that represent all possible moves and positions in a game think of game trees as just like any normal search tree except the nodes represent game States so let's say this is tic-tac-toe if I put my X in this corner here then the game state changes to be this node if my opponent places his circle here then the game would game state would change to be at this node different nodes lead to different moves being available as child nodes just like in real life where different decisions can affect future possibilities and what or what future possibilities are the leaf nodes they represent win or loss or a tie okay now I don't have time much time to review minimax but long story short it's just through the entire game tree or many steps ahead uses an evaluation function to determine what future positions are favorable or unfavorable and then assuming the opponent chooses the best possible move for themselves it will then act accordingly to put itself in the best possible position for victory key thing to take away is that it looks through this entire thing every single branch every node and determines the best choice if the tree is too large then it would take way too long to do all the calculations for the whole tree so it would limit search to all the possible positions up to a certain depth represented by one of these lines the bigger and more complex the tree is the fewer turns it is able to look ahead and basically the more stupid it becomes obviously this is a problem if the problem is too complex and there are just too many possibilities let's say a problem is so complex that minimax can only look one step ahead right like like just this next move in that case the AI pretty much becomes just useless another issue is that another issue with minim axe is that if the game is so complex that you cannot write a good evaluation function to tell the AI what is a good or bad position then of course the IAI is similarly doomed and that is exactly why after developing deep blue we cannot figure out for so long how to make an AI that mastered the game of Go unlike what chefs go is so complex and open-ended that in a I simply cannot brute force its way to victory it's also very tough to write an evaluation an evaluation function for the game States for go and honestly is why a lot of experts for a long time thought that it's pretty much impossible to write an AI that can master the game of go and that was true for a very long time that is until 2016 when Google made alphago which defeated the world champion using a combination of mineral networks which ELISA will talk about later on reinforcement learning and the prize surprise the Monte Carlo tree search the question is how could an AI how could the MCTS how could it actually plan or think ahead in a situation where there are just so many potential moves and countermoves to possibly evaluate all nodes even with the limited depth research the way the MCTS does it is by creating a a symmetric set tree which you tell the value of each position and adapts to the topology of the game tree the algorithm visits more interesting nodes more often to focus its search time you know for on the more relevant parts of the tree and the way it sets and updates these nodes is by running simulations which start out as random and later become less so so don't worry I'll go through what all this means step-by-step let's say this is a hypothetical game tree for go all these nodes represent positions or game States a top node here represents the board at the start of the game so 4 is empty here and these three nodes represent the first of the opening move that you can make obviously the game of go 19 by 19 so there's going to be more than three opening moves but let's just pretend it's it's true which is pretend it's just three so yeah how does an AI know - like which move to make it will start the Monte Carlo tree search we'll start by making its steps tree at the starting point which is you know which is this one right here and it will call one of these four functions selection expansion simulation update depending on the situation here there are child nodes that it has not explored yet so it will add a new node to the sass tree that represents the position the AI will investigate this is the expansion phase it will then from that position basically run simulations for it will like make random moves for itself and its opponent until it reaches the end which can either be a win or a loss and then it would call updates which updates the value of this node that it ran the simulation from so here it would be a one win out of one simulation it would then run do that same thing for the next two nodes and at this moment the numbers say the left node is the better move but you know chances are we don't really know that's true the numbers are really mean anything at this point because the simple size are too small but more simulations will make it more accurate so now you'll wrote it will run the selection function and because all the child nodes have been visited and the AI will run the Select function which determines which child node to be investigated further what it bases the selection on is how good the the steps for the nodes are and how ignored the child nodes has been as well so what that means is if a particular node has not been explored for a very long time it would start to place more weight on investigating that node just to make sure it hasn't missed anything but in this case it's going to choose this node right here because it is the most promising node out of the all three of the child nodes so don't worry about this I don't really have time to explain it in detail but it's basically what the things is the function allows this function is what allows the AI to maintain balance between exploration and exploitation balance between sticking to investigating promising nodes vs. exploring relatively unexplored nodes now because all three have been explored it will purely base its election on how good the node is so in this case it's the left node right here and it will then call expansion again because this node here has child nodes of its own that has not been explored so it will do the same thing it will now start the simulation from the child node of the child node it will run update and note that here doesn't just update the node from where the simulation started running from but it will also update the parent node as well and the key thing to know is that the parent node inherits the stats of its child nodes it would do the same thing for the next few child nodes and once the child nodes of this child once the child nodes of yeah once these nodes have been explored at least once it will run the selection from the top once again and if you notice the left node has a and rate of 50% now instead of 100% but let's just assume it still chooses to select that node for an investigation over the other two because 50% is still pretty high and the other two have only been ignored three times because on the because this knows children have been explored already it would not run expansion again it will run selection a second time so from here it would select the best one out of these nodes and it would do the same thing expansion simulation and update and if you notice this time it updates three nodes and from this point on it will now select so we don't actually know how the how it determines like we don't actually know if at this point if it's been ignored enough to place weight on it so that it will select that node but let's just assume for the sake of argument it does so it's been explored this one five times so now it wants to explore some of these relatively unexplored child nodes so it runs it does a whole thing again for these nodes updates and basically the more this goes on be more accurate these numbers become the more simulations the more powerful the Monte Carlo tree search becomes and the more accurately it can determine the optimal decisions in fact after enough simulations the mana quality search begins to converge to look almost exactly like the game trees and their values in minimax so basically it's it's almost like running a minimax except it doesn't have to go through as many nodes as minimax if we were to tell the AI to stop right here that what decision or what move will it make well it's clearly it's clear here that it will choose to make this move and not because it's four out of seven not because it has four wins out of seven but because it has made it had made seven simulations so the more simulations the more like or the node with the most number highest number of simulations will be selected and so from here the AI will choose to make this move and then it will now build a brand new stats tree from here and do the whole thing over again from to determine what the best moves are from this state like which one of these nodes would be the best move so one more notable point is that in addition to being effective in situation with the high branching factor the MCTS is domain independent which means that you don't really need to write an evaluation function for it to work it will kind of determine on its own how to reach a win so like the same Monte Carlo tree search function can run for chess it can run for a Chinese chess it can run for or it can navigate a maze it can play poker anything you want and you don't really have to change it much which is another reason why it's so awesome yeah I basically just said that and inflammation the Monica College research is a powerful algorithm that allows to solve problems in very open-ended environments it allows to be more adaptable to a variety of situations and it works by building and updating a staff tree using simulation obligatory xkcd comic but we don't have time so I'm just going to cut it short okay thank you guys hope you enjoyed it [Applause]
Info
Channel: Fullstack Academy
Views: 59,784
Rating: 4.884892 out of 5
Keywords: Monte Carlo Tree Search, MCTS, search algorithm, Monte Carlo Tree Search Tutorial, MCTS tutorial, AI
Id: Fbs4lnGLS8M
Channel Id: undefined
Length: 12min 39sec (759 seconds)
Published: Sat May 06 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.