AI Learns to Play MORTAL KOMBAT

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
previously i built fun ais like the instagram face generator the billy eilish clone and the deep fake dance video generator the idea was to use youtube to motivate myself to get better at machine learning i've been doing this for about half a year now and even though i'm much better at trading models i'm still very confused by the math because i usually just copy the hard stuff from github so i decided it was time to pick a specific area within machine learning and learn it more thoroughly also i really want to get back into games because believe it or not this used to be a gaming channel so i started reading about how to train ais to play games the current flavor of the month is deep reinforcement learning that's what researchers use to beat the dota and starcraft pros and reinforcement learning is also just broadly useful for stuff like self-driving cars and autonomous robots so that's what i'm learning about i found an open source framework called gym retro for training reinforcement learning agents to play retro games it already supports over a thousand games which is really helpful because i can focus on training ai's instead of wasting weeks or months trying to hook up my code into whatever game i want to automate i wanted to start off with a multiplayer game so that i can fight against the ai myself and i was going to go with street fighter 2 but then i found a tutorial series by lucas thompson on how he trained his street fighter 2 ai and i didn't want to just copy him so i watched the whole series and used that as my starting point to train my own model and the game i picked was i actually haven't played a mortal kombat game in over 10 years and i've never played mortal kombat 2 before but most of these retro games came up before i was born so i wasn't sure which one to pick to simplify the problem i decided to only train my ai on one character because the characters have very different abilities so i picked sub-zero because i like sub-zero in case you're like me and you haven't played mortal kombat 2 before there are 15 opponents you have to beat in single-player mode and to beat an opponent you have to win two out of three rounds if you lose two rounds to the same opponent game over my agent starts off at the first opponent keeps playing until game over and then goes back to the beginning rinse and repeat thousands or millions of times and hopefully it learns to beat the game the big idea behind reinforcement learning is that you have an autonomous agent operating in an environment and you define a reward function that rewards or punishes the agent when it takes an action ideally the agent tries a bunch of stuff and eventually learns the optimal strategy to get the most reward but it usually isn't this easy to write the reward function we need some info from the game for example how many fights have we won and how much health do we have with modern games it's pretty hard to access these variables unless the game devs made an api for it but back in the good old days on consoles like the sega genesis and the nes there was very little rap so programmers had to keep track of all their memory and store variables in fixed locations in the ram that means we can do something called ram hacking where you manually search through the ram while you're playing the game and try to find the locations of the variables you care about this is a ton of work and i've never done this before but fortunately the people working on jim retro already extracted the location of the most important variables in mortal kombat 2 so i didn't have to do this my reward function was based off of what lucas thompson did but it turns out his code was wrong so my code is wrong too and i didn't realize this until a few days later but let me just show you what happened when i trained my agent using the incorrect reward function the first opponent in single player mode is randomized and it ended up being a sub-zero mirror match so sorry if this is confusing to watch but my ai is the light blue sub-zero that's getting pummeled by the in-game sub-zero [Music] you can probably guess what happens in round two so let's speed it up [Music] [Applause] even though we're getting destroyed by the first opponent the important thing is that the whole pipeline works so now i can run experiments to make it smarter first i did the most obvious thing and trained it for a bit longer turns out not much improvement our ai still keeps slide kicking into a block and getting countered next i changed the main reinforcement learning algorithm from the a2c algorithm that lucas thompson used to the ppo2 algorithm which is more common and it's what openai used to beat the world champions in dota last year i also watched a walkthrough of mk2 single-player mode and i learned that baraka has this cheap move called slice and dice where he waves his knives up and down in front of him and for some reason the in-game opponents like to walk right into it i thought this might be an easy strategy for my ai to learn so i changed my character from sub-zero to baraka at some point i also read the jim retro documentation for the reward function and that's how i realized i was doing it wrong so i fixed that fight our barakah seems preoccupied with dodging and running away and that's probably because they can't figure out how to win so it's just trying to stay alive for a bit longer but it's not doing a very good job of it [Music] fight morocco wasn't learning his special moves so i investigated the input space of my agent the sega genesis has 12 buttons and each button can either be pressed or not pressed that means at every frame there are 4 096 possible button combos which is a lot of button combos so i restricted the input space to only the 42 button combos that baraka needs in order to use his basic and special attacks i didn't include the button combos for his fatalities or other finishing moves because these moves don't help you win the fight i changed the training data to use ram instead of images by default jim retro provides your agent with all the pixels on the screen but you also have full access to the ram if you want it this is kind of cheating because a human player can't read and understand computer memory while they're playing but i decided to start training on ram because some of baraka's moves require sequential button presses which means he needs to remember what actions he took in the past but right now the model doesn't do that it just looks at the current frame of the game and decides what buttons to press space only on that and i didn't know how to fix that but i did have a big brain idea i figured that the programmers probably made a variable that stores the last few moves that the player made i don't have any evidence that this exists i'm just guessing it's there based on how i would program a fighting game if i'm right then trading my agent on the ram might allow it to learn the combos there are a lot of things that could go wrong but that was my reasoning for trying to train up ram data now i had to change the policy model which is the machine learning model that the ppo2 algorithm uses to choose actions the old one i used was only meant for image data so i changed it to another one which seemed to be better for ram data then i switched the game difficulty from very hard to very easy apparently in mk2 even if you play on very easy the difficulty ramps up so it ends up getting really hard so i thought that switching it to the easiest difficulty would help my agent slowly ramp up and get a broader range of experience instead of repeatedly dying to the first opponent i set up everything on aws and trained this model for 100 million time steps so this took a few days fight right off the bat doing much better he's also doing some new moves which is what i wanted but it's hard to know which one of my changes caused him to learn the moves obviously it's my fault for not doing stuff very scientifically but hey this is my channel and i have the right to be lazy also i know it's hard to compare this with the past runs because i just changed the game difficulty to very easy and it's totally cheating but let's just see if we can finally beat sub-zero finish him our ai is not very good at delivering the final blow but we've already won at this point so it doesn't matter in general he relies very heavily on his knife throw what are the pros and cons of this well i would give you some insightful commentary if i was a good mk2 player but i'm not so instead i will give you fast forward [Music] so [Music] [Music] fight [Music] i'm back now this is an important fight to analyze because it highlights the fact that this game is racist okay i don't sound like that when i'm doing kung fu and i don't tolerate racism of any kind which is why baraka will not lose this fight there is no way maracas still hasn't learned slice and dice which is why i picked him in the first place but he did learn some other moves like the knife throw which was pretty useful it seems like learning more special moves gives my ai an advantage so how do we teach the ai lots of special moves the easiest thing i could think of was to pick the character with the shortest combos and that happens to be sub-zero so i ditched baraka and went back to sub-zero also i figured out how to get the model to choose its action based on the last few frames instead of just the last frame you do this with frame stacking which literally means putting the frames in a stack and passing it to the model this works with either image or ram data i tried both but images worked better so i went back to that i also changed the policy again to something that i still don't understand i trained the new model for 8.5 million time steps i went overboard last time it seems like after about 10 million time steps it doesn't actually improve much the first opponent is randomized in case you're confused why we're fighting jacks in the first round i could have set the save state so that it starts off with a sub-zero mirror match again but i forgot and by now this video is so confusing anyways i don't even know if that would help [Music] fight sub-zero is abusing his sliding kick right now which could be countered with a block but the first opponent is a bit brain dead he's also using his freezing projectile and his ice puddle moves and these are his three special attacks so mission accomplished [Music] finish him again beautiful counter to a teleportation attack i just think it's pretty interesting that the ai has learned to do this in spite of the other flaws like for example getting punished for the sliding kick abuse here [Music] [Music] so so we just beat the fourth opponent which is where we died last time but my model always has trouble against the fifth opponent it doesn't even matter if it's barack or not and i think it's because of this pattern right here where sub-zero's slide kicks gets blocked and then the opponent throws him apparently the throw move that the in-game opponent uses has a bigger range than what you get as a player it's just one of the ways that the devs made the game harder and the in-game opponents seem to start abusing the throw move once you get to the fifth guy fight [Music] so i took the model which has already been trained for 8.5 million time steps and continued training it for another 10 million time steps using the exact same parameters except i changed the starting save state so that it starts off at the fifth opponent right away i don't think it needs any more experience fighting the first four opponents so i thought this would make my ai improve faster it sort of worked but it still doesn't beat baraka consistently so i wrote a script to repeat this fight again and again and again so i can record a good result to show you guys remember the model is stochastic so running the same model leads to different results every time after a few thousand simulations this was the best result [Music] right [Music] [Music] fight finish him [Music] fight [Music] i haven't figured out how to beat this guy but if you want to try to improve my model source code is in the description a simple thing you could try could be to save the state of the game right before this fight and then just continue training my model from that point but i don't have high hopes for that because that's what i tried to do with baraka and i'm not beating him consistently but it would be a good way to get your feet wet unfortunately you can't fight against the ai yourself because i was having trouble getting this to work with jim retro and i didn't think it was worth the time to do bug it but i'm gonna do another rl project for the next video which will probably be in a few weeks eventually i might make my own game and train an ai for it but for now i want to focus on getting better at rl my thinking is that if i get really good at this then i'll just be better at math in general which i was never the best at and then it'll be easier for me to write my models from scratch instead of always being limited by other researchers work anyways thanks so much for watching check out my last video on deep fake dancing it was my favorite video but not many people watched it okay that's it bye
Info
Channel: Will Kwan
Views: 130,982
Rating: undefined out of 5
Keywords: mortal kombat, machine learning, mortal kombat ii, mortal kombat 2, mk2, mkii, ai, artificial intelligence, reinforcement learning, deep reinforcement learning, neural network, sega genesis, openai, gym retro, programming, coding, computer science, sub-zero, scorpion, baraka, fighting game, retro game, ppo2, proximal policy optimization
Id: -oUVr_B_cQo
Channel Id: undefined
Length: 16min 50sec (1010 seconds)
Published: Fri Aug 07 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.