Deep Reinforcement Learning: Neural Networks for Learning Control Laws

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome back I'm Steve Brenton and today I'm  gonna talk to you a bit more about reinforcement   learning so in the last video I introduced the  reinforcement learning architecture how you can   learn to interact with a complex environment from  experience and today we're gonna talk about deep   reinforcement learning or some of the really  amazing advances in this field that have been   enabled by deep neural networks and these advanced  computational architectures so again I am I can   Steve on Twitter please do subscribe and do like  do share this if you find it useful and tell me   other things that you would like me to talk about  ok so again in the last video we introduced this   this agent the environment the agent measures the  environment through this state s it takes some   action to interact with the environment given by  a policy so it has a control policy based on how   it acts based on what state it's in and it is  optimizing this policy to maximize its future   rewards that it gets from the environment and so  mathematically this policy is probabilistic so   the agent has some kind of some probabilistic  strategy for interacting with the environment   because the environment might be stochastic or  have some randomness to it and there is a value   function that tells the agent how valuable being  in a given state is given the policy PI that it   is enacting and so today what we're going to do is  we're going to augment this picture by introducing   deep neural networks for example to represent the  policy and so here we have now we've replaced our   policy with a deep neural network so this PI is  parametrized by theta where theta describes this   this neural network and again it Maps the current  state to the best probabilistic action to take   in that environment and so the whole name of the  game is to update this policy to maximize future   rewards and again we have this discount rate gamma  here that says that rewards in the near future are   worth more than rewards in the distant future okay  because again remember these rewards are gonna   be relatively sparse and infrequent most of the  time because we're in a semi supervised learning   framework where these rewards are only occasional  and so it's difficult to figure out what actions   actually gave rise to those rewards this is  gonna be a pretty hard optimization problem   to learn this best policy of what actions to take  but you know and actually the whole reinforcement   learning paradigm is biologically inspired it's  it's essentially inspired by this observation so   there's this this notion called hebbian learning  you may have heard this before and the little   rhyme goes neurons that fire together wire  together and basically what that means is that   when you have neural activity kind of when things  fire together they will essentially strengthen the   wiring and the connections between those neurons  and biological systems and so in these kind of   deep reinforcement learning architectures  the idea is that the reward signal that you   get occasionally should somehow strengthen  connections that led to a good policy when   when the right policy is firing when these neurons  are connected in a way that causes the policy the   correct policy and you get a reward you want to  somehow reinforce that architecture and there's   lots of ways of doing this you know essentially  through back propagation and so on and so forth so   another area where a lot of research is going into  deep learning for reinforcement learning is called   cue learning I talked about this in the last video  where this cue or quality function essentially   kind of combines the policy and the value and  it tells you jointly how good is a current state   given a current action a-okay and so assuming that  I do the best possible thing for all future states   and actions so right now if I find myself in state  s an action a I can assign a quality based on the   future value that I expect given that state and  given the best possible policy I can cook up and   again there are deep queue networks where you  learn this quality function and once you learn   this quality function then when you find yourself  in a state s you just look up the best possible a   that gives you the highest quality for that state  and this makes a lot of sense this is a lot like   how a person would learn how to play chess is they  would kind of simultaneously be building a policy   of okay here's how I move in these situations  these are the trades I'm willing to make and   you're also building a value function of how you  value different board positions and kind of how   you gauge your strength and the strength of your  position based on on the state so it kind of makes   sense that this would be an area for really  expanding with deep neural networks because   these functions might be very very complex  functions of s and a and that's exactly what   neural networks are good at is giving you very  very complex representing very complex functions   if you have enough training data so that's what  we're talking about here these still suffer from   all the same challenges of regular reinforcement  learning like the credit assignment problem so the   fact that I might only get our reward at the very  end of my a chess game makes it very hard to tell   which actions actually gave rise to that award  reward and so you're gonna do some of the same   things that you would normally do like you might  use reward shaping to give intermediate rewards  based on some expert intuition or guidance and there's lots of other strategies like hindsight  and replay and things like that The basic idea is we're gonna take the same reinforcement  learning architecture and we're going to either replace the policy or the the Q function with  a policy network or a Q network And all of this kind of exploded on the scene because  of this 2015 nature paper "Human level control through deep reinforcement learning" where these authors from DeepMind essentially showed that they could build a reinforcement learner that  could beat human level performance in lots of classic Atari video games. So I'm gonna hit play. I love this one... this is one of the  first ones that got me really excited about this. So this reinforcement learner is essentially trying to maximize this score by breaking all of these these blocks  and after a few hours of training it has an epiphany that only really excellent human players ever reach so it essentially finds an exploit in the game where it realizes that if it tunnels in one side... so It's going to tunnel through here. If it tunnels through one side it can essentially use the physics of the game to break all of these blocks for it. And that's pretty amazing. So it in a short amount of time learns a really advanced strategy that only a few humans only a small percentage of humans would actually learn eventually so really impressive... this is a beautiful paper by the way,   you should should go read this they talk about how   they actually build their networks so they use the  pixels of the screen itself as the input and they   use convolutional layers and fully connected  layers eventually deciding what the joystick   should do what actions to take. And a lot like other examples of neural networks this was the   paper that really brought reinforcement learning  and deep reinforcement learning back to everyone's   to the forefront of everyone's mind because this  showed performance that hadn't been attainable   before so this is a lot like the image net of  reinforcement learning okay this brought it back   into the forefront. Since then so Google bought  this company for half a billion dollars because   this promised a big step towards  general artificial intelligence or an artificial   intelligence system that could get good at lots of  things rather than just one very specific task and   since then billions of dollars have been invested  into reinforcement learning in general and deep   reinforcement learning by it by companies and so  in this original paper these are all of the Atari   games that they they try out and this is the  level above this they beat they're at or above   human level performance and there's only a few  games that they're below human level performance   and I actually think it's pretty interesting to  look through these and figure out why why this   this deep mind was unable to figure out these  games but it was able to figure out these games   that's kind of an interest interesting exercise  actually I will point out that the the algorithm   in this paper you could train it on one of these  games and it would get really really good but that   same learned algorithm that reinforcement learner  cannot then be used to play another game without   completely retraining it and so it's still a way  you know a ways away from where we want to be we   want to have a learning system that can learn  to play all of these games and if it learns   one it can learn how to play the other faster and  better just like a human does we're still a ways   away from that but tons of people are working  on it it's one of the big big problems in the   field is kind of transfer learning and general  artificial intelligence using reinforcement   learning so building a learner that can learn lots  of things and learn faster from its experience   that's what humans do you get a kid you teach  them tic-tac-toe tic-tac-toe is easy they learn   the rules they learn how to not lose then you  give them checkers checkers is a little bit more   sophisticated but they remember everything they  learn from tic-tac-toe and they learn checkers   faster again they learn how to not lose then you  give them chess chess is a truly open-ended game   for most humans it takes a lifetime to master  and so based on what they know from tic-tac-toe   and checkers a child will learn chess you know  they can transfer some of that over and then   they go into the real world and they learn lots  of other skills and those problem-solving skills   transfer over that's what we eventually want with  computers we're not there yet but this was a big   step and I got everybody really excited and still  does good this is another video I love this is   from the tech insider showing you know a lot of  these reinforcement learning algorithms are used   to train so this is google's deepmind to train  the reagent how to walk or run or leap or swim   or fly in an artificial environment and that's  pretty amazing I mean this is a very complex   hard thing to do we take it for granted our  bodies are built to move very efficiently and   agile and accurate ways but this is actually  very challenging to do and so the fact that   these these algorithms in a virtual environment  can learn how to run and walk and fly and suin   is really promising for robotic technology so  we eventually want to learn how to do this in   robotic systems and make our robotic agents more  independent and more agile and that's actually   really hard that step from from the artificial  world to the real world is challenging so for   example to my knowledge Boston Dynamics does not  use a ton of reinforcement learning they use a   lot of physics based modeling and kind of by hand  controls maybe they're getting into reinforcement   learning but there's a long way to go to do this  in the real world it's still quite impressive and   this is a video I love so this is training bipedal  walkers essentially generation after generation of   reinforcement learner and eventually the the agent  can learn kind of the physics and learn the right   control policy in this case to walk forwards  stabili and you know keep its neck straight and   keep its legs straight and I think this is just  a really cool video I'm in a nice demonstration   I'll point out there's some great resources  other resources on YouTube you should check out   for reinforcement learning I learned about this  in two minute papers which is a great Channel I   love love love learning about things in two minute  two minutes papers there's also archive insights   has a great series on reinforcement learning and  Brian Douglas has a nice kind of reinforcement   learning for control video I believe it's in  on math works okay good so really impressive   performance just in the last ten years because of  these huge advances in the representational power   of neural networks these can represent functions  that we didn't we previously couldn't represent   because they're extremely expressive and we have  lots more training data we can train these because   our computers have gotten so much faster and more  powerful and there's also open source software   that makes it really really easy to get started  building these neural network representations   and so if you have any interest at all in modern  reinforcement learning you have to check out open   AI Jim this is a wonderful open source kind of  development framework where you can try out your   new reinforcement learning algorithm on all of  these different systems both Atari games simulated   you know running and pendulum and really cool  physical systems that are hard to control that   are nonlinear you can get started really quickly  and easily in the open AI gym and this is one of   the big reasons things are taking off so fast is  because there are these amazing open resources   for people to kind of try things out quickly and  rapidly prototype and again you can try out all   of these Atari games and see if you can build  a reinforcement learner that can learn multiple   Atari games with the same architecture that would  be pretty incredible ok and I also want to point   out these things are getting pretty impressive  so I almost everybody's heard of alphago and   how alphago beat the human best go player in the  world Lisa doll doll from South Korea the the best   go player in the world was defeated by alphago  this was a deep reinforcement learning algorithm   developed at google deepmind with the sole purpose  of learning how to play go okay and so I want to   point out that so reinforcement learning is really  good at learning the rules of the game and how to   win the game when it's very constrained and when  it has all the time in the world to try a millions   or billions of different go games so it's playing  essentially with itself getting better and better   and better Lisa Dahl is a human and he has a life  and although he spends much of his time in the NGO   world he has a much much richer broader world and  he can go home and he can go for a walk and he can   enjoy a sunset and so I I do want to point out  that and he can learn from alphago and come back   and be better the next time and I think that's  really also quite impressive is that the human   masters learned from from from deep mind and  actually got better once they they played up   to their competition I think that's really cool  but anyway this alphago learning system I think   is really interesting this deserves a whole set  of videos just to dive in I think there there's a   documentary on this it's really interesting so you  should check it out the original alphago algorithm   was based on a convolutional neural network at CNN  and it had lots of reward shaping from humans so   expert humans guided the reward structure for  this alphago learner so it didn't have to wait   until the end of the game to figure out if it won  or lost because that would take forever instead   humans using reward shaping helped it to get  intermediate rewards to help it to learn faster   and give it a denser reward structure now that's  tricky because if a human guides the learning   that almost caps out how good it can possibly  be because it's it's fundamentally relying on   human knowledge so the next generation alphago  zero which came a couple of years later was even   better and much more impressive it didn't use any  human features no reward shaping it only learned   using self play it just played itself until it  became so powerful that it could beat everyone   in the world including the original one and it was  based on a residual network architecture kind of   these that had these you know jump connections and  are easier to train with with backpropagation okay   so really cool advances and that actually was one  of the major advances that showed that ResNet was   you know kind of definitely a major contender  in the in the neural network architecture scene   okay so anyway I think this is really cool it's  really interesting but again that is all that   this algorithm does in the world and it doesn't  know how to take what it learned from go and do   anything else better and that's you know that's  what we need that's what we want in these systems   is to get really good at a game and then take  that and allow it you know use that knowledge   to get really good at something else that's what  we do and that's really fun and you know that's   one of our strengths so some other examples I  love this is a video from Stanford and from ETH   Zurich where they are essentially going to Train  using reinforcement learning flying on hat aerial   vehicles a helicopter and a quadrotor so they can  train these very aggressive very high-performance   maneuvers using reinforcement learning and so it  is starting to happen that we're going from these   simulated environments to the real world to real  robotic systems but it's very challenging it takes   you know a lot of training and human guidance  in general and they're still limited examples   of real robotic reinforcement learning actually  someone asked me in my class that I teach it at   the University of Washington what are some real  examples of reinforcement learning in industry   because a lot of you know a lot of times you  hear these game examples Atari or alphago but   I think actually one of the original ones was  in elevator scheduling so it turns out and this   is I would have never thought but it's really  interesting it turns out that in a really really   big building like a super skyscraper it has tons  of elevators and lots of floors scheduling these   things efficiently so that you don't get jammed  up and so that people can get where they're going   as fast as possible is a huge problem it's a  combinatorially hard problem and it's really   hard to solve and reinforcement learning was one  of the early algorithms that was used to kind of   figure out this near optimal scheduling policy  kind of interesting good ok so robotic learning   is getting pretty good okay so I just want to  take a couple of steps back and maybe summarize   so we've talked about reinforcement learning  in general both in the last lecture and in this   one as a kind of framework for learning how to  interact with a complex environment based on your   experiences this is fundamentally biologically  inspired it's trying to mimic how we learn how   animals learn how you train a dog and things like  that and in the last ten years because of advances   in deep neural network and major advances in  this architecture and how we actually build   and optimize these reinforcement learners there  have been big steps towards more powerful more   general learning frameworks that can learn how  to interact with more complex environments like   beating humans at alphago or actually moving you  know real robotic systems and it really incredible   ways but there's still a long way to go so again  I've said this a lot I'm gonna keep stressing this   we humans have bodies and we are curious we go  explore and we touch and we learn and you know   children are curious and they constantly are  soaking up knowledge and when they learn one   thing they can use it through this incredible  human power of abstraction they can use what   they learn in one scenario in a totally different  scenario that is still a completely open problem   in reinforcement learning maybe not completely  open it is a pressing and central challenge in   modern reinforcement learning is how to take  what you learn and generalize how to take a   step back and use your expertise in one problem  in one environment to solve another problem in   another environment that would be real general  artificial intelligence we're a long way away   from that so the good news is that that's not  going to be solved in five years or ten years   that's a hundreds of year problem but it's really  exciting because that means you know there's work   to be done important interesting work in the  field of reinforcement learning for you know   lifetimes of research to be done in figuring  out how to improve these systems figuring out   how to learn faster and better both from from what  you're you're learning and the reward structure of   this problem but also from what you've learned in  other problems okay thank you so much for watching
Info
Channel: Steve Brunton
Views: 41,867
Rating: 4.9699249 out of 5
Keywords:
Id: IUiKAD6cuTA
Channel Id: undefined
Length: 21min 15sec (1275 seconds)
Published: Fri Feb 19 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.