Deep Reinforcement Learning Tutorial for Python in 20 Minutes

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

before reinforcement learning after reinforcement learning what's happening guys my name is nicholas and in this video we're going to be going through a bit of a crash course on reinforcement learning now if you've ever worked with deep learning or machine learning before you know the two key forms are supervised and unsupervised learning now reinforcement learning is a little bit different to that because you tend to train in a live environment now there's a really easy way to remember the core concepts in reinforcement learning all you need to remember is area 51. now you're probably thinking what the hell does area 51 have to do with reinforcement learning well the area in area 51 stands for action reward environment and agent these are the four key things you need in any reinforcement learning model now in this video we're going to be covering all of those key concepts let's take a deeper look as to what we're going to be going through so in this video we're going to cover everything you need to get started with reinforcement learning we're going to start out by creating an environment using open ai gym we're then going to build a deep learning model using tensorflow and keras this same model will then pass to kerasrl in order to train our reinforcement learning model using policy-based learning now in terms of how we're going to be doing it we're going to be largely working within python and specifically we're going to be working inside of a jupyter notebook we'll start out by building our environment using open ai gym we'll then build our deep learning model again using tensorflow and keras and then once we've built that model we're then going to train it using kerasrl we'll then be able to take that same model save it down into memory and reload it for when we want to deploy it into production ready to get to it let's do it so there's a couple of key things that we need to do in order to build our deep reinforcement learning model so specifically we need to first up install our dependencies then what we're going to do is build an environment with open ai gym with just a couple of lines of code so this is going to allow us to see the environment that we're actually using reinforcement learning in later on then we're going to build a deep learning model with keras so we're specifically going to be using the sequential api there and then what we're going to do is train that keras model using keras reinforcement learning and last but not least we're going to delete it all and reload that agent from memory so this is going to allow you to deploy it into production if you want to later on so first up let's install our dependencies so what we're going to need here is tensorflow keras kerasrl as well as open ai gym so what we've done is we've installed our four key dependencies so we've used pip install and specifically we've installed tensorflow 2.3.0 we've installed open ai gym so that's just gym we've installed keras and we've also installed keras rl2 so those are all our dependencies now done and installed now what we can actually go and do is set up a random environment with open ai gym now open ai gym comes with a bunch of pre-built environments that you can use to test out reinforcement learning on so if we actually head on over to gym.openai.com you can see there's a bunch of random environments so here we've got some algorithms we've got atari games so if you wanted to build atari or video game style reinforcement learning engines you could we're going to be working on these classic control ones and specifically we're going to be using cartpol and so the whole idea behind carpol is that you want to basically move this cart down the bottom here in order to balance the pole up there so the whole idea is that for each step you take you get a point with a maximum of 200 points so ideally what we're going to see when we start off is with our random steps we're not going to get anywhere near 200 but once we use deep learning and reinforcement learning we ideally should get a much closer to actually hitting our final result now we've got two movements we can either go left or right so what we're going to see is when we create our environment we're going to have two actions available either left or right if you work in different reinforcement learning environments you might have a different number of actions that you can take so for example you might go up or down left or right if you're working with other things so now what we're going to do is set up this environment so you can work with it within python so if we go back to our jupyter notebook let's start setting that up so the first thing that we need to do is import our dependencies so in order to do that we're going to import openai gym and we're also going to import the random library so we can take a bunch of random steps so those are our two key dependencies imported so and this is specifically for our open ai gym so we've imported gym and we've also imported random now what we can go and do is actually set up that environment so that's our environment set up so what we went and did there is we used the open ai gym library and specifically we used the make method to build our carpol environment so remember that was the carpol environment that we saw here we then extracted the states that we've got so this is available through env which is our environment that we just set up observation space dot shape so we're taking a look at all the different states that we've got available within our environment and we've also extracted the action so if you take a look we're getting that from our action space and we can see that we're going to have a specific number of actions so if we take a look at our states we've basically got four states available and if we take a look at our actions we've got two actions so basically those are left or right moving our carpal left or right now what we can actually go and do is actually visualize what it looks like when we're taking random steps within our carpol environment so ideally what we'll see is that our carpals just sort of moving randomly because we're taking random steps in order to get a specific score so remember with each step that we take where our carpol hasn't fully fallen over we're going to get one point with a maximum of 200 points so let's build our random environment all right so we've written a bit of code there now what we're actually going to do is start by breaking this down from here so the first thing that we're going to do is render our environment so this is going to allow us to see our cut in action when it's moving left and right then what we're doing is we're taking a random step so we're either going left or right so zero or one basically represents one of those steps so we're just taking a random choice to see how that impacts our environment then what we're doing is we're actually applying that action to our environment and we're getting a bunch of stuff as a result of that so we're getting our state we're getting our reward we're getting whether or not we've completed the game so whether or not we've failed or whether or not we've passed and we're also getting a bunch of information then based on our step we're going to get a reward so remember if we take a step in the correct direction and we haven't failed we get one point this basically allows us to accumulate our entire reward now if we fail or if we get to the end of the game then done is going to be set to true so what we're doing is we're continuously taking steps until we're complete so we reset the entire environment up here and then we're also printing out our final reward so ideally what we'll get is the episode number as well as our score so let's go on ahead and run that and see our episodes live and in action actually it looks like we've got a bug there episode all right so you can see our carts moving and it's moving randomly and you can see that our pole is sort of flailing about now what we're actually logging out is the score each time so it looks like we're surpassing a specific threshold and we're failing so we're only getting up to a maximum of about 38 so that's our maximum score now ideally what we want to be able to get is all the way up to 200 and this is where reinforcement learning comes in so basically our deep learning model is going to learn the best action to take in that specific environment in order to maximize our score now this all starts with a deep learning model so let's go ahead and start creating a deep learning model now in order to do that we first up need to import some dependencies and these are largely going to be our tensorflow keras dependencies so let's go ahead and import those so we've imported our dependencies so we've specifically first up imported numpy so this is going to allow us to work with numpy arrays then we've imported the sequential api so this is going to allow us build a sequential model with keras then we've also imported two different types of layers so specifically we've imported our dense node as well as our flatten node and last but not least we've imported the atom optimizer so that's going to be the optimizer that we use to train our deep learning model now what we can go and do is actually go and build that model so we're going to build this wrapped inside of a function so we can reproduce this model whenever we need to so that's our build model function defined so what we've basically gone and done is created a new function called build model and to that we're going to pass two arguments so specifically our states so these were the states that we extracted from our environment up here and we're also going to pass through our actions so these are going to be the two different actions that we've got in our carpol environment in order to build our deep learning model we're first instantiating our sequential model then we're passing through the flatten node and specifically to that we're going to be passing through a flat node which contains our different states so remember our four different states that we had then we're adding two dense nodes to start building out our deep learning model with a relu activation function and last but not least our last dense node has our actions so this is basically going to mean that we pass through our states at the top and we pass through our actions down the bottom so ideally what we should be able to do is train our model based on the states coming through to determine the best actions to maximize our reward or our score that we can see here so let's go ahead and create an instance of that model just by using that build model function and we can also visualize what the model looks like using the model.summary function so you can see here that we're passing through our four different states we've got 24 dense nodes 24 dense nodes so these are going to be our fully connected layers within our neural network and then last but not least we're going to be passing out our two different actions that we want to take within our environment now what we can go and do is take this deep learning model and actually train it using keras rl so first up we need to import our keras rl dependencies so let's go ahead and do that so those are our dependencies imported so we've imported three key things here so we've imported out a deep queue network agent so basically there's a bunch of different agents within the keras rl environment so you can see we've got a dqm agent a naffa agent ddpg sasa sem so all of these are different agents that you can use to train your reinforcement learning model we're going to be using dqn for this particular video but try testing out some of the others and see how you go now what we also have is a specific policy so within reinforcement learning you've got different styles so you've got value-based reinforcement learning and you've also got policy-based reinforcement learning so in this case we're going to be using policy-based reinforcement learning and the specific policy that we're going to be using is the boltzmann q policy which you can see here now the last thing that we've gone and imported is sequential memory so for our dqn agent we're going to need to maintain some memory and the sequential memory class is what allows us to do that so now what we can go and do is set up our agent and again we're going to wrap this inside of a function so we can reproduce it when we want to reload it from memory so let's go ahead and build that function so that's our function defined now what we've basically done is we've named our function build agent and to that we pass through our model so this is our deep learning model that we specified up here and we're also passing through the different actions that we can take so those were the two different actions left or right that we had available within our environment then we set up our policy we set up our memory and we set up our dqn agent and to that dqn agent we actually pass through our deep learning model and memory our policy as well as a number of other keyword arguments so then what we do is we return that dqn agent so let's go on ahead and actually use this dqn agent to actually now go and train our reinforcement learning model so first up we want to start out by instantiating our dqm model then we're going to compile it and then we're going to go ahead and fit all right and there you go so you can see that our dqn model is now starting to train so what we actually did is we instantiated our or we used our build agent function to set up a new dqm model and that was that up here and we passed through our model as well as our actions we then compiled it and we passed through our optimizer so this was that atom optimizer that we imported right at the start and we also passed through the metrics that we want to track so in this case it's mean absolute error then we use the fit function to kick off the training and to that we pass through our entire environment the number of steps we want to take whether or not we want to visualize it so we'll take a look at that in a second and we also specified verbose as one so we don't want full logging we want a little bit of logging now what we can do is just let that go ahead and train to take a couple of minutes and then we should have a fully built reinforcement learning model five minutes later sweet so that's our reinforcement learning model now done dusted and trained so all up it took about 256 seconds to go and train and you can see in our fourth interval that we're accumulating a reward of about 200 now what we can go and do is actually print out and see what our total scores were so remember when we started out up here so just taking random steps we were getting about a maximum score of about 51 but that's not all that great considering that the total maximum score for the game is about 200. so let's go and test this out and see what this actually looks like or how it's actually performing so we can do that using the dqn.test method so let's try that out all right so that's looking better already so you can see in virtually every single episode we're getting a score of about 200 and our mean is 200. so what we did there in order to test that out is we accessed our dqn model and we use the test method to that we pass through our actual environment the number of games that we want to run so in this case they're called episodes so we ran 100 games and whether or not we want to visualize it then what we did is we outputted our mean result now if we wanted to actually visualize what the difference is we can do that as well and you can see our model is performing way better so you can see it's actually able to balance the pole a whole lot better than what it was before when it was just randomly sort of flailing about we can test that out again so this time rather than doing five episodes say we wanted to um 15 for example so you can see that our model again it's performing way better than what it was initially so it's actually able to reiterate itself and resort to balanced it and make sure that that pole stays straight brings a tear to my eye so good sweet so that's all done now what happens if we actually wanted to go and save this model away and use it later on say for example we wanted to go and deploy it into production well what we can actually do is we can actually save the weights from our dqm model and then reload them later on and to try to test them out so we can do that using the save weights method from our dqm model so let's go ahead and save our weights then what we'll do is we'll blast away all of the stuff that we just created and we'll rebuild it by reloading our weights so we've now gone and saved our weight so if we actually take a look in our folder you can see that we've gone and generated two different h5f files so these basically allow us to save our reinforcement learning model weights now if we wanted to go and rebuild our agent first up let's start by deleting our model deleting our environment and deleting our dqn agent and then what we can do is rebuild it using all the functions that we had and reload those weights to test it out so if we go and do that so you can see if we go and try to use our dqn.test method there's nothing there because we've then gone and deleted it but what we can do is we can go and rebuild that environment and test it out so let's go and do that perfect so we've now gone and reinstantiated all of our models so we first up we built our environment we extracted our actions and our states just like we did before then we used our build model and our build agent functions to go and rebuild our deep learning model and reinstantiate our dqn agent and then last but not least we compiled it now what we can do is actually reload our weights into our model and then test it out again so in order to do that we can use the dqn dot load weights method so before up here we use save weights now we can load our weights in order to re-test this out and the file that we're going to pass to our load weights method is the one that we exported out here so we can copy that in and paste that here and now we've gone and reloaded our weights we can actually go and test out our environment again so ideally what we should get is similar results so again you can see it's performing well it's performing just as well as what it did before we deleted our weights and now we went and reloaded them and that about wraps up this video so we covered a bunch of stuff so specifically we went and installed our dependencies we then created a random environment using open ai gym and we got about a maximum score of about 51 we then built a deep learning model using keras and then use keras rl to train that using policy-based reinforcement learning and then last but not least we went and reloaded that agent from memory so that allows you to work with this inside of a production environment if you want to go and deploy it and that about wraps it up thanks so much for tuning in guys hopefully you found this video useful if you did be sure to give it a thumbs up hit subscribe and tick that bell so you get notified of when i release future videos if you do have any questions or need any help be sure to drop a mention in the comments below and i'll get right back to you and all the course materials including the github repository as well as links to documentation are available in the description below so you can get a kickstart and get up and running with your reinforcement learning model thanks again for tuning in peace

Info

Channel: Nicholas Renotte

Views: 50,877

Rating: 4.9309525 out of 5

Keywords: reinforcement learning, machine learning, deep learning

Id: cO5g5qLrLSo

Channel Id: undefined

Length: 20min 55sec (1255 seconds)

Published: Sat Aug 29 2020