Deep Reinforcement Learning for Atari Games Python Tutorial | AI Plays Space Invaders

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

come on just one more dude you suck at this oh yeah um still kinda suck what's happening guys my name is nicholas renaud and in this video we're going to be taking a look at how you can use reinforcement learning to play atari games specifically we're going to be taking a look at space invaders let's take a deeper look as to what we're going to be going through alrighty so in order to train our reinforcement learning model to play atari games we first up need to create an atari environment so to do that we're going to be using the open ai gym environment then what we're going to do is build a deep learning model with tensorflow and we'll set this up so that it works hand in hand with our attire environment and then last but not least we'll use a dqn reinforcement learning agent to be able to go and train our model now let's take a look as to how all of this is going to fit together so first up we're going to be installing all of our dependencies then we're going to build our environment using open ai gym so that's sort of step one then we're going to set up a build pipeline for our tensorflow deep learning model and then last but not least we're going to pass that model to kerasrl to be able to go ahead and train our reinforcement learning model ready to do it let's get to it alrighty guys so in order to build our reinforcement learning model that's going to be able to play space invaders there's five key steps that we need to go through so first up what we need to do is install and import some of our dependencies then what we're going to do is spin up an open ai gym and then just run through a bunch of random steps to see how a random test works then what we'll do is we'll create a deep learning model that scans through our open ai frames so for this we'll be using a convolutional layered neural network then what we're going to do is build a keras rl agent so specifically we're going to be using the dueling dqn agent here and then what we're going to do is save down our weights and reload them from memory so we can take our agent elsewhere if we wanted to first up let's go on ahead and install and import our dependencies so there's a couple of key dependencies that we're going to need here specifically they are tensorflow open ai gym keras rl2 and you're also going to need to install the atari gym so this is what allows us to play games like space invaders a whole bunch of others i believe there's pac-man some other really cool ones here but if you wanted to check out some of the other environments that you've got available go to gym.openai.com up that's not zooming that in but you can see there's a whole bunch of environments here so justjim.openai.com forward slash envs there's a whole bunch that you can play around with alrighty first thing let's install our dependencies all righty so those are our dependencies installed now in order to do that we've written one line but we've installed a bunch of stuff on that line so we've written exclamation mark pip install and then we've passed through our four different packages that we want to install so the first one is tensorflow two three one so 2.3.1 i tend to find is pretty stable and tends to work pretty well then we've installed a gym so this is what gives us all of this good open ai gym stuff and then we've installed keras rl2 so this is a lightweight reinforcement learning package that makes it pretty easy to build reinforcement learning models there are some other new ones coming out right now which we will explore in future videos but keras rl2 has been working pretty well for me lately and then the last thing that we've installed is the atari gym so to do that we've ridden gym and then in square brackets we've passed through atari so this is going to give us all of these game playing environments that you can see here all right let's now import a couple of key dependencies that we're going to need to build our random environment so let's go on ahead and import some stuff okay so we've imported two things there so two lines so we've written import gym and then import random so the reason that we've imported jim is to build our open ai environment and the reason that we've imported random is because we're going to use it to take some random steps inside of this environment just to see how a random model performs within the open ai gym environment so that's that now the next thing that we want to do is create our gym and then we're going to extract a couple of state components our height our width and our channels and we're also going to extract our actions from our open ai doom environment so let's go ahead and do it alrighty so we've written three lines there so the first one allows us to generate our environment or create our gym environment so in order to do that we've written gym dot make and then we'll pass through the type of environment that we want so in this case we want space invaders version zero so this is the frame based one there's also a ram based one as well and then we've stored that in a variable called env our next line is then grabbing some stuff out of that environment so specifically we're grabbing our height our width and our channels from that particular environment frame so this gives us our image shape then what we're doing and in order to grab that we're typing in env dot observation space dot shape so we're going to need this later on when we build our deep learning model then we're grabbing our different actions so to do that we've written e and v dot action space dot n so this gives us the number of actions that we can take now what i've found is that when i'm working with open ai gym i like to know what those actions actually mean i know this might not be super important in the realm of reinforcement learning but it's always nice to know so you can actually grab these out by unwrapping the environment and getting the action name so let's take a look at that and there you go so in order to get our action meetings we can just type in dot get env.unwrapped.getaction meanings and this is the method so you need to pass through curly brackets and you can see there that we've got a bunch of different action meanings so no operation so it's not actually doing anything then the fire so this is our space invaders shuttle firing out then we've got move left move right or move right move left then we've also got right fire and left fire so it's nice to know that inside of our action space that these are the actions that our agent can actually take inside of that particular environment now the next thing that we actually want to do is actually start and play around with our open ai gym so we're going to take some random steps and see how this model performs now this entire tutorial along with the code is all going to be available inside of the description below by github so by all means check that out if you'd like to take a look also if you'd like to see a full-blown reinforcement learning crash course i'll include a link somewhere up above it's about 20 minutes it's a really great video and it walks you through a really simple example of how to get started with keras arrow and some of the other packages that we're going to be using for this so let's go ahead and spin up our environment and then take some random steps and see how we perform alrighty so written quite a fair bit there now i'm going to go through this pretty quickly because we did write the exact same code in the original reinforcement learning tutorial crash course so basically what we've done is we've set up five episodes so this basically means that we're going to be playing five different games of space invaders and then what we're doing is we're looping through each one of those episodes we're first up resetting our environment setting done equals to false because we'll use that intersect and resetting our score variables so we've got a couple of things that we did there so we wrote env dot reset and this resets our environment state done equals false because what we're basically going to do is have a flag so if our game is done then we stop that game so basically either you get a super high score or you die that basically defines whether or not we're done and our score is sort of a bit of a running count then what we're doing is we've ridden while not done so while our game hasn't finished we're going to render our environment and then we're going to make a random choice so remember we had six different actions that we can take up here so what we're doing is we're just going to take a random choice out of one of those six and perform that inside of our environment so our agent in this case is just our random agent it's just running around going left going right firing firing right firing left and just messing around so this allows us to see how our agent or how a random set of choices performs inside of space invaders then what we're able to do is we're able to extract our state our reward whether or not we're done and a bunch of info and so we're able to grab that from env.step and then once we've taken an action so basically this is taking the random action that we've defined up here and it's actually applying it on our environment and then when we actually go and apply that we get all of this good information out then we're taking the score from that particular episode and then we're appending it to our running score total so this allows us to see with each step that we're taking it appends to our score so we get to see our running total as the game progresses and then once our game's done so assuming we die we're able to print out the episode that we're on so either one or five or one through five and then we print out our score then last but not least we close our environment using env.close so quite a fair bit there but this basically allows us to spin up our environment and take a bunch of random steps and see how it actually performs so let's go ahead and run it oh we haven't defined something right up so we need to make this arrange and there you go so now it should pop up down the bottom and you can see our space invaders agent is in fact running and you can see it's just taking random steps at the moment so it's getting a decent score but right now it's just sort of messing around and taking random steps so you can see that it's got 410 235 155 240 120. so it's not all that consistent so whilst it might have got a high score up here it's really just taking random steps it doesn't really have a specific strategy that it's playing to so our core goal at the end of this is ideally to have more consistent and ideally higher rewards as a result of this so the next thing that we're now going to do is actually start building up our deep learning model but before we do that we actually need to import a number of tensorflow dependencies so let's go ahead and import those those are specifically going to be around the sequential api some convolution layers and some dense layers so let's go ahead and do this alrighty so those are our dependencies for deep learning imported so we've gone and ridden four lines there so in the first one we've imported numpy so in order to do that we've written import numpy as np then the second line that we've written is from tensorflow.keras.models import sequential so this is going to bring in our sequential api and allow us to build sequential deep learning models then the next line that we've written is from tensorflow.keras.layers this time and from there we've imported a dense layer a flattened layer and a convolution 2d so because we're working with the space invaders v0 model what we're actually going to get back from our state is just an image so we're going to be using our convolutional layer to be able to scan through that image and see how our model is actually performing then the next thing that we've written is from tensorflow.keras.optimizers import atom so this is going to be our optimizer that we use once we compile our deep learning model if you'd like to learn a little bit more on tensorflow i've got a tensorflow crash course as well again link will be somewhere up there cool all right so the next thing that we want to do is now write up a function that allows us to build our deep learning model now for warning there is quite a fair bit to write here but again we'll take it step by step and i'll walk you through what we're actually writing so let's go ahead and do it okay so the first thing that we're going to do is define a function called build model and to that we're going to pass through our height with our channels and our actions now we've actually imported those up here or extracted those up here so remember we got our height our width our channels from our observation space and we've got our actions from our action space so the reason that we're grabbing these is because the shape of these are going to define what our deep learning model looks like alrighty let's start building our model okay so that's the first part of our deep learning model there now we're going to add a bunch more layers but let's just pause here and take a look at what we've written so we've initialized our sequential api or started one up by writing in sequential and this is really what we've imported up here and then we've stored that in a variable called model then whenever we're defining our sequential api we can then use the add method to begin to start stacking layers inside of our deep neural network now again because we're using an image based model we're going to start with some convolution layers and then flatten it down so the first layer that we've passed through is convolution 2d and remember we imported this from our keras layers then to our convolution 2d layer we need to pass through a number of things so first up the number of filters that we want so in this case we're going to have 32 convolution 2d filters and so what we're actually doing when we're training our convolutional layer is we're training our filters to be able to detect different things within our images so eventually what you might get is a filter that detects um where a particular enemy is inside of our space invaders environment where our mothership is where our bases or our houses are that that are going to allow us to protect us so on and so forth so basically what we're doing here is training a number of filters then we're specifying how big these filters are going to be so they're going to be eight units by eight units so basically they're going to be eight by eight then we're going to have the number of strides or how we stride so if you think about your image as being one big square your filter is being a smaller square within that that sort of slides through your image to see whether or not it can detect stuff our stride is how big of a step those filters take so basically if it's going to be four by four it's going to move four steps to the right and four steps down so each time it's going to be going diagonally across this basically means that it's taking quite large strides then the next thing that we're passing through is our activation function so in this case it's a relu function and then the last thing that we're passing through is our input shape now in this case we've passed through three i'll come back to that in a second and then we've passed through the height the width and the channels for our model so this three basically allows us to pass through a number of different images so when we actually go and set up our agent we're actually going to have this concept of a window data frame or a memory style data frame so we're actually going to pass through a number of different frames from our reinforcement learning model to our deep learning model here okay so that's the first convolutional layer now the next thing that we're going to do is stack a number of additional convolutional layers on top of this so let's go ahead and do that okay so those are our convolution layers added so you can see that we've added another one here and this one's got 64 filters the filter size is 4x4 and our stride is now a little bit shorter so it's two by two and again we've got an activation of relu then the next layer that we've added is again another convolution 64 filters a size of three by three and because we haven't specified strides it's going to have a one by one stride which means it's going to go pixel by pixel and again we've passed through our activation function which is relu then the next layer that we've added is flattened so this is going to take all of our layers and flatten it down into a single layer and this means that we can then pass it to a dense layer so let's go ahead and add our dense layers now so dense layers are also known as fully connected layers so what this basically means is that each unit inside of that particular layer is connected to every single unit in the next so let's go ahead and add them okay that's our build model function done so we've now gone and added our dense layers here so you can see that those are there so just to quickly wrap up what our model looks like we've got three sets of convolution layers which are up here we then flatten them down to be able to pass them through to our fully connected layers which are down here and then in terms of our fully connected or our dense layers what we've basically done is we've got one dense layer here which has 512 units which has an activation of relu we then try to compress this down again so then rather than going straight to our actions we've got another dense layer which has 256 units and again an activation of relu and then the last layer inside of our deep learning model is a dense layer again go figure but this time rather than having 512 or 256 or even 128 units we've got the number of actions that we've got so ideally what our model is going to have is in the final layer we've got six different units now whether or not one of those units is activated determines which action our model actually takes so you can see here that we start out with our image that we pass through over here so we start out with our image we then pass that to our convolution convolution convolution flatten then we go to a dense layer so we start to build a bit of a funnel and then we pass through to another dense layer at 256 and then our final dense layer which has six units each one representing an action so this takes us from an image all the way through to an action so this is sort of what our reinforcement learning model is all about all right now the next thing that we actually need to do is create a model so right now we've defined the function but we haven't actually instantiated one so let's go and create one alrighty so we've now gone and built our model so in this case what we did is we wrote model equals and then we've used our build model method that we had up here to that we've passed through our height with our channels and our actions and remember we grabbed those from up here from our environment now one of the things that i like to do whenever i'm building one of these deep learning models is just see what the model architecture looks like now the easiest way to do this is just type in model.summary and this is actually going to show you the entire architecture for your deep learning model so you can see here that as i was saying that we've got our three convolutional layers here so cons 2d1 2d2 2d3 and it starts out pretty big so we've got three layers remember these are our history layers and then we start out with 51 by 39 by 32 because remember we've got 32 filters over here and then we go down because remember we're using our strides and our filters and this basically acts in a way to compress the data that we've actually got available so in this case the shape goes to 24 or 3 by 24 by 18 by 64 64 filters and again it keeps going down until we get to this flattened layer and then our flattened layer then transforms it because once we connect it to our dense layer we go down to 512 units which is over here and then we go down to 256 units which is over here and all along the side you can see all the parameters that we've got available so all up we've got 34 million 812 326 parameters that we've got available to train within our deep learning model now because this model is so large it can take a little while to train but more on that later so we've now gone and defined our deep learning model the next thing that we need to do is actually go on ahead and build our kerasrl agent but before we do that we need to import our keras rl dependencies so let's go ahead and do it all righty so those are our dependencies imported so we've written three lines there so from rl.agents import dq agent and rl is kerasrl2 so the next line is from rl memory import sequential memory oh sorry our dqn agent is actually our reinforcement learning agent that we're going to be using there are a whole bunch of them out there so let me just show you so if you actually go to the keras rl documentation again i'll include a link to this in the description below if you want to check it out we've got dqn agent we've got nafa agent or naf agent we've got ddpg sasa and sem so there's a whole bunch out there now the other package that i did mention at the start was tensorforce so there are a number of additional agents inside of there as well but for now we're going to be working with kerasrl2 so our next line that we've written is from rl memory import sequential memory so this is going to allow us to hold a knowledge buffer for our reinforcement learning agent so it allows us to retain some memory about previous games the next line that we've written is from rl policy import linear annealed policy and eps grade eq policy so our greedy queue policy is going to be what allows us to find the best reward outcome in our linear annealed policy basically gives us a little bit of decay so that as we get closer to the optimal strategy we start to close in on that so now that that's imported the next thing that we need to do is write a function to be able to build our kerasrl agent so let's do it alrighty that's our build agent function done so in order to write this what we've done is we've written one two three four five six seven ish lines of code so the first line of code is building our function so in order to do that we've used the def statement to be able to define our build agent function and you can see there that we've called it build agent and to that we've passed through our model and our actions now these are going to come from here and from all the way up over here then we've gone and defined our policy and as i said we use the eps grade eq policy and we've passed that to the linear annealed policy method so then what we've done is we've gone and set a couple of keyword parameters so this defines what our search looks like and what our decay looks like so we've specified that the attribute that we want to take a look at is eps the maximum value for this particular policy the minimum value our test value and then the number of steps that we want to tune on and then we've gone and defined our memory so in this particular case we've gone and used sequential memory and we've set our buffer limit so in this case it's a thousand episodes and then we've passed through our window length so this basically states that we're going to store the past three windows for a thousand episodes so we're basically going to be able to capture what our different steps look like then what we've gone and done is we've then gone and defined our dqn agent so to that we've passed through our model our memory and our policy so remember our model comes from over here our memory comes from over here and our policy is our epsq policy which has been passed through to our linear annealed policy and then we've actually gone and set this up as a dueling network so this basically allows us to set up a competing network for our reinforcement learning model and we've set our drilling type equal to average then what we've gone and done is we've set up the actions that we want our agent to take so in this case it's these actions that we had over here so in this case no operation fire move right move left right fire or left fire and then we've specified the number of steps that we want our model to warm up from so this allows our agent to collect a little bit of information before we actually start training now the next thing that we need to do is actually go and spin up our agent and then actually start training so let's do it all righty before we run that let's actually take a look at what we've written so we've written three lines so these lines build our agent compile it and fit it so in this case the first thing that we've done is we've gone and set up our agent so we've used the build agent method and to that we've passed through our models and our actions so this actually uses our build agent function to actually create an agent then what we've done is we've compiled it so this allows us to set our optimizer that we're going to use for our underlying network and in this case we're using the atom optimizer and if you remember we imported that over here so when we imported our tensorflow dependencies and then we've set our learning rate so in this case we've done a simple learning rate we haven't done any scheduling or any decay so we'll set it to one e negative four so zero zero zero or point zero zero zero one then what we've gone and done is we've then gone and fit our model so when we do that we use dqn dot fit oh i probably should have also said in order to compile we've used dqn dot compile and then two that will pass through our atom optimizer then when we're fitting we've used dqm.fit and to that we've passed through our environment how long we want to train for so in this case we've specified ten 000 we'll come back to that in a second and then we've also specified the next keyword parameter which is visualize and we've set that to false because if you visualize while training it slows down training a whole heap and then we've gone and specified how we want to visualize our training output so in this case we've set verbose equals to two which basically gives you the output every couple of episodes you can also set this to one and it sort of shows you a live progress bar now on the bit about training steps whenever you're training hugely sophisticated environments like space invaders or other gameplay playing environments it can take a long time to train these deep learning models and the advice given by the deep mind teams or the guys that built alphago is actually to train this particular model for 10 million steps 10 million to 40 million actually but that's not all that feasible so in this case we're going to train it pretty quickly and i'm going to show you what it looks like and then i'm going to give you some pre-trained weights that have actually been trained for a million steps so you can see what the output looks like but if you want to take this all the way by all means train for 10 million steps and that should get you pretty close to a state-of-the-art model so let's go ahead and hit run and then actually start fitting our model and we'll be back in a sec oh we've got an error we haven't specified that right enable drooling network up this should be one and this should be one l as well okay so this is actually really good i was hoping that this would happen so i could show you what to do if you get stuck so the first two errors that we had were that i just can't spell drooling socially is with one l now this next error is basically saying that the tensor must be from the same graph so whenever you get this error or another error along the lines of the tensorflow package doesn't have a sequential model or something along those lines the easiest way to solve this is just go back up to here delete your model by typing in del model and then rebuild it and then ideally a model should start running it looks like we've got another error there so this is a different error this time so this is probably because c u d and n failed to initialize if you get this error the easiest way to solve this is by restarting your jupyter notebook environment so let's save this and then restart and then we'll rerun it and take a look okay so we've just restarted let's close our old one let's open it back up and let's run through it again so we don't need to run our test this time we can just step all the way through and we're still getting an error so it looks like we've got that tense flow error again let's just delete our model again and you can see that our model is now training so sometimes whenever you're spinning this up you need to kill off your kernel restart it and ideally it should start working again but now it looks like our model is up and running and you can see in this case that we're getting a number of warm-up steps now this is because or in this case it's running really really quickly and this is because we've got nb steps warm up set to 10 000. now the thing that i just realized here is that because we've set up our nb steps warm up to 10 000 and we've set up our training steps to 10 000 we're not actually going to get any training steps so let's just quickly stop this and let's set our nb steps warm up to 1000 and then kick this off again alrighty so there you go so you can start to see that our model is starting to train now ideally what you want to see is that this mean reward metric over here increases over time so ideally if you see it increasing over time you're seeing that your reinforcement learning model is learning to play better in that particular environment so we're going to let this train for 10 000 steps then we'll take a look and then we'll move on to our next step evaluating it and then saving and reloading from memory so we'll be right back all alrighty and that's our model finish training for 10 000 steps now you can see did the main reward is sort of varied now again this is a function of reinforcement learning models and just the nature of their training it's going to take a long time for these models to get really really good but you can see that the mean reward is sort of bounced around between 0.23 i think the highest that i saw was 0.317 so you can see that again it's going to bounce around and it's going to take a long time to train now let's test this out anyway and see how it performs so what we're going to now do is test out our model so let's go ahead and do it okay so before we run that so what we've written is dqn dot test so this is the native testing functionality inside of kerasrl and then to that we've passed through our environment we've passed through the number of episodes that we want to test or the number of games and then we've specified visualize equals true so this is going to allow us to see how a model actually performs once it's been trained now again once we've done this what we'll actually do is we'll load up the weights that i've trained for a million steps and again i'll make that available inside the github repo if you want to play around with it and then the next line that we've written is we've grabbed us historical scores so from scores.history and we've grabbed our episode reward and then we're taking the mean using numpy dot mean so let's go on ahead and test this out so you can see our model is playing a little bit better this time and rather than just taking random steps you can see it's trying to get all the way over to the right now there's a reason for that and eventually you'll see in the model that's been trained significantly more it actually begins to target the mother ship so you'll actually see it go far right and actually try to hit that purple little ship that you can see flying across the top of the screens but you can see that it's now actually starting to learn a strategy so let's try that again and i'll make it a little bit bigger so that you can see it and so you can see that again it's trying to fly over to the right but it's getting stopped then it's still learning its steps or its strategies to actually get there and all the while you can see the episode reward appearing at the bottom there not too bad right so that's our baseline reinforcement learning model that allows us to play space invaders now as i was saying if you want to build an awesome model what you'll actually need to do is train for significantly longer so up to 10 million to 45 million steps now i didn't have that much time this week to go and train for 10 to 45 million steps because it just take my computer out of action for that time but i did get up to a million steps so what we'll do now is we'll save this episode or this dqn agent's weights so that we can use it again later on if we wanted to but then we'll actually load up some different weights weights that have been trained for a million steps and test those out so let's first up save our weights okay so we've now gone and saved our weight so in order to do that we've written dqn dot save weights and then we've passed through the path that we want to save our weights to so in this case it's saved weights forward slash 10k dash fast so again this is just what i'm naming it now you can name it whatever you want and then we've named our weights dqn underscore weights dot h5f so if i actually go to that folder directory now so if i go into saved weights and then 10k fast you can see that we've got a whole bunch of checkpoints and our weights but in this particular case rather than deleting our model and reloading our 10k weights we're actually going to reload the weights that we've trained for a million steps so let's go ahead and load those up so what we'll do is we'll delete our model we'll delete our dqn agent and then what we're going to do is we'll go and reset up our dqn and our model so in this case we can just go up to here to our build model function create a new model and then if we go to our dqn agent or section we just need to grab these two lines here and paste them in so we're going to use our build agent and our dqn compile method to compile it and then what we can do is load up our weights that we've trained for a million steps so let's go ahead and do that so we can copy our file path paste them in here and then rather than loading the 10k fast weights we're going to load the 1 million weights so this basically takes our existing model architecture but we can load up weights that have been trained for significantly longer and again i'll make this all available in the github repo so let's load these weights so that's loaded successfully and then we can go and run this test sequence again to see how it performs so this time what you should see is our model goes all the way to the far right because eventually it's going to try to target the mother ship and you can see it missed at that time but it's going to the far right it's trying to get the mothership missed it again you can start to see that it's actually now starting to clear a lot more levels it's actually getting a lot more consistent it's hitting the 200s pretty consistently and every now and then it does hit the mothership so you can see it's playing pretty well and there you go so you can see that our model is now getting an average of about 282 so this is after about a million steps of training it's already performing significantly better but again if you took this further and went up to 10 million or even 40 million you'd get significantly better results but that about wraps it up so we've now gone and built a reinforcement learning model that's able to play space invaders so if we quickly recap we've gone and installed our dependencies we've gone and built a random environment or our environment that we took random steps in we then built a pretty sophisticated convolutional neural network that we then passed through to our dqn agent that we set up using ksrl and then last but not least we went and tested it out and then we went and saved our weights and loaded up some weights that we'd train for significantly longer and that about wraps it up thanks so much for tuning in guys hopefully you found this video useful if you did be sure to give it a thumbs up hit subscribe and tick that bell so you get notified of when i release future videos and let me know what types of games you'd like to use reinforcement learning to play thanks again for tuning in peace

Info

Channel: Nicholas Renotte

Views: 12,649

Rating: 4.9708738 out of 5

Keywords: reinforcement learning, reinforcement learning python, ai plays games, reinforcement learning game, reinforcement learning example, reinforcement learning tutorial

Id: hCeJeq8U0lo

Channel Id: undefined

Length: 38min 14sec (2294 seconds)

Published: Wed Dec 30 2020