Dominating Bot Challenges Donkey Kong Country with Simple Code.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's going on clarity coders and guests in today's video we're going to cover a reinforcement learning algorithm called the brute now we're going to apply this to one of my favorite childhood games donkey kong country for super nintendo this is going to allow us to showcase this algorithm and how powerful and successful it can be all while being entirely blind to the environment how can it be so successful and yet not see the environment at all let's not waste any more time and find out [Music] our algorithm is going to be based on a decision tree and that decision tree is going to allow our algorithm to choose what actions it takes our actions are going to be what our agent in this case donkey kong is able to do in the environment so if we list out our actions we're going to have a left action a right a down a b a y an a y and down together which will ground pound left and y and right and y will be our directional roles now when the agent's actually playing we're gonna show that to you guys with a gamepad that we're gonna highlight the button that the agent is currently selecting okay all that probably makes sense but how's our agent making decisions when i told you it's not able to see the environment at all our decision tree is going to start as a single node in that node it's going to hold the best score that we've gotten so far so when we first start our tree is going to be a single node with a negative infinity score so any reward we get in the environment will increase our score the reward is going to be how the agent knows how it's performing in the environment in our case it might be how far donkey kong progresses through the level or how much damage donkey kong distributes out to the bosses that we fight so back to our actions our agent is actually going to select 6 000 actions before it even plays the game that's right it's deciding ahead of time what it's going to do in the environment so we're going to store these and literally list them out while we play the environment itself in this case we can say we're going right up up a b down left so on and so forth then it's time to actually play the episode so we're going to play the episode through with our 6000 actions so after we play our episode we need to update our tree so now we know this sequence of actions is going to result in this reward being given back so as a short example we might add another node here and inside of that node it's going to be the first action that our agent took in this run add another node the next action it took add another node and in this case we'll say this is the final action now that we have our final action we also know what the reward was for this sequence of actions so in this case we know what would happen if our agent went up right and then hit b it's going to get a reward of 20. now how is our agent going to get better well we're gonna balance exploring the environment or in this case exploring different actions with being greedy and taking the best possible path so the next time our agent plays is going to follow the best path we know for the most part we're also gonna have an exploration factor that says maybe five percent of the time it goes off the beaten path and try something random so in our case for an example we'll say we pushed up because it is the best path so far but then instead of going right our next action on the best path possible we hit our exploration factor so now we're going to try a random action and that random action we can say is why now that we're off the beaten path our agent is going to continue to take random actions because it has no idea what the best path is from this point now this sequence of actions may have improved our reward let's say it got 30. now we can propagate that back throughout our tree and now we have a new best greedy path so we can update our top two nodes as well to 30. now at this point let's go ahead and see how it performs against our first boss but first before we get any farther i want to thank datacamp for sponsoring this video their online platform is one of the quickest and cheapest way to learn programming and data science data camp does a great job of organizing the skills into specific skill paths that you can choose to go down to learn whatever you need to in data science and programming they also have career tracks that can focus on specific jobs such as data engineer machine learning scientists programmer and much more now the best part of data camp is you can try it for free right now by clicking the link down below when you click on the link below you'll be able to test any first chapter of a course for free to see if it's right for you check out this course on introduction to deep learning in python this directly correlates to a lot of my videos and it's an xp based system so you can keep track of how you're leveling up your python and deep learning skills now let's take on our first boss in donkey kong you can see here that when we freeze frame when we push right and roll on our controller it is doing right and roll so that's lining up perfectly now this boss is relatively easy we were able to take it down in under 21 000 time steps and it only took training of about two minutes on my pretty standard laptop you'll notice that the reward is every time we do a damaging hit on the rat [Music] now you're probably thinking okay jake it can't be that easy why do other algorithms exist well there's a big catch here we have to have a certain type of environment now what we have here is a deterministic environment now what that means is if i do something like hit right right up b and get a reward if i do the exact same key combination frame by frame it had better be the same reward in each scenario as long as i'm starting from the same state so our environment can't have any randomness if we use the same input keys then in this case our rat boss should behave the same with that being said let's jump in and try a more difficult boss that you may remember now i'm sure all of you remember this level we're fighting the gwyneth paltrow goop candle [Music] smells funky in this level it shakes up and it spits out enemies you'll notice here that our algorithm is figuring out places that you can stand really close to the candle without getting hit so you can see how this would kind of be useful for maybe speed runners or something like that like right there [Music] whoa look at these big man pajamas i feel like i should censor this to defeat this goopy candle we had to train for over 27 hours now this algorithm does really good on short level wow you'll notice it killed that before it even dropped out of the air nice job dk oh look it still thinks it's playing the level so you always set an end condition in these and my algorithm clearly doesn't know it's the end of the simulation every video has got to have a code bullet moment so next up here we have meth bird to beat big bird here we had to train for four hours this boss shoots nuts at you and our algorithm had to learn to jump over them now you may have noticed from some of the other simulations already that we tend to lose our partner very early you start out with diddy and donkey kong and our algorithm was not trained to know that losing the first person is bad it just knows when you lose both you lose the game so one thing you could do to improve would be to set up a penalty for when it loses the first character now in this one you'll notice that i actually ended it instantly when i got the last hit so a little controversy here we don't get to see me finishing the victory here all right now i'm going to show you how to set up this code and run it for yourself now this is super easy and anyone can do it but if you don't have any interest in this i'm going to show our bot playing so you guys have something to watch while i'm talking about setup here so the first way we can set this up is i have a wraplet instance available so if you go to this url you'll come to a page like this and once you log in which it's free you can fork my code and you can train this right on replit now the great part about that is it's not going to take up any computer space when it's running and training so once you open it up you're gonna come to a screen like this you can see this on the screen if you scroll down you're gonna notice that i have this currently set up for donkey kong button wise but the only rom that ships with this is air strike genesis because it's free of copyright so if you don't have the donkey kong rom you're gonna have to legally obtain that on your own i can't provide that for you and if not you can still train on airstrike genesis so what you can do in this main method here you can flip to train so if you uncomment this and comment out playback you can train your bots so what that's going to do is it's going to run our brute algorithm that we talked about and you can mess around with a code up here and then it's going to create a file called best.bk2 and that's going to be the file that we can use to play back our instance so if we run this code it's going to install our dependencies for us so the cool thing about replit and knows we need jim retro so it's going to go ahead and do that for us and you can see that it outputs that it's getting new best high scores awesome so that's good enough for our example so we can stop running this and then if we want to play back that file we could comment this out and uncomment the playback function and now if we hit run again you'll see that right inside our browser we're able to view our best play of airstrike genesis so it's as simple as that now if you want to get a little more advanced you might want to install this locally so i'm going to run you through that as well remember the link to the replit is going to be in the description as well as the link to my github if you want to install it locally so if you want to install this locally you can hit the code button grab the url go to your desktop and clone down the project now if you don't know how to do some of this you can find a lot of intro tutorials online to figure this out as well so we're gonna do git clone and then our project and now we have it available on our desktop now if we open it up in something like visual studio code we can take a look at one of the files this is the playback file where i draw using opencv i draw the markers based on what our bot is selecting and now we're going to have to install the dependencies as well so i've included the dependencies in requirements.txt it's just jim retro and open cv basically to install that if you look on the github you can see that we have a simple command pip install our requirements.txt so if you copy this and paste it in that should be able to install all your requirements and get you up and running a couple things to note here the playback file is going to play whatever bk2 file you put in on line 48 so you have to change it there and the brute dk one has a donkey kong discretizer here and that's where i select the available moves you can change this around depending on the game or what you're trying to do it also has an exploration parameter that you can change or play around with and a frame skip so how often does your bot make a move you know you've seen in some of the playbacks that our bot is pretty jumpy you can make it select a move every eight frames instead we also have in your main function where you can select what game it's going to be playing here again we're doing airstrike genesis because it is the only rom one of the only roms included now if you google jim retro you'll find the instructions on how to import other roms and you can do it that way thank you so much everyone for watching this is my favorite boss battle actually the angry beaver it's amazing to me that the bot is able to survive in this battle it kind of starts bouncing off him multiple times if you guys like these ai algorithm videos let me know we'll go over some more algorithms in some future videos if you want to see any certain algorithms certain games leave me a comment below if you like the video drop me a like on the video it helps out the channel thank you for watching and until next time keep coding [Music]
Info
Channel: ClarityCoders
Views: 4,165
Rating: undefined out of 5
Keywords: donkey kong, python tutorial, AI, reinforcement learning, reinforcement learning python
Id: uREw7J9l0oY
Channel Id: undefined
Length: 13min 46sec (826 seconds)
Published: Tue Oct 26 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.