DeepMind’s AI Watches YouTube and Learns To Play! ▶️🤖

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

I want to see an AI rise up as a Super Smash Bros god.

👍︎︎ 18 👤︎︎ u/lasercat_pow 📅︎︎ Mar 27 2021 🗫︎ replies

But does it leave racist and derogatory comments?

👍︎︎ 10 👤︎︎ u/nrkey4ever 📅︎︎ Mar 27 2021 🗫︎ replies

This was in 2018 too. Im sure its like much better now

👍︎︎ 8 👤︎︎ u/governedbycitizens 📅︎︎ Mar 27 2021 🗫︎ replies

2 minute papers is a great channel

👍︎︎ 6 👤︎︎ u/renol5 📅︎︎ Mar 27 2021 🗫︎ replies

Is there anything to be said for focusing on cooperative games?

An AI trained on Starcraft is cool, but doesn't fill me with confidence for the future of humanity.

👍︎︎ 11 👤︎︎ u/BitWeary 📅︎︎ Mar 27 2021 🗫︎ replies

I think it can use to test software(or test game) now, I want to know what human will use it to build and earn money.

👍︎︎ 1 👤︎︎ u/nillouise 📅︎︎ Mar 28 2021 🗫︎ replies
Captions
Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. Between 2013 and 2015, DeepMind worked on  an incredible learning algorithm by the name   Deep Reinforcement Learning. This technique looked  at the pixels of the game, was given a controller   and played much like a human would… with the  exception that it learned to play some Atari games   on a superhuman level. I have tried to train it  a few years ago and would like to invite you for   a marvelous journey to see what happened. When  it starts learning to play an old game, Atari   breakout, at first, the algorithm loses all of  its lives without any signs of intelligent action. If we wait a bit, it becomes better at playing  the game, roughly matching the skill level of an   adept player. But here's the catch, if we wait for  longer, we get something absolutely spectacular.   Over time, it learns to play like a pro, and  finds out that the best way to win the game   is digging a tunnel through the  bricks and hit them from behind. This technique is combination of a neural network  that processes the visual data that we see on   the screen, and a reinforcement learner that  comes up with the gameplay-related decisions.   This is an amazing algorithm, a  true breakthrough in AI research. A key point in this work was that the  problem formulation here enabled us   to measure our progress easily: we  hit one brick, we get some points,   so do a lot of that. Lose a few lives, the game  ends, don’t do that! Easy enough. But there are   other, exploration-based games like Montezuma’s  revenge or Pitfall that it was not good at. And   man, these games are a nightmare for any AI,  because there is no score, or at the very least,   it’s hard to define how well we are doing. Because  there are no scores, it is hard to motivate the   AI to do anything at all other than just wander  around aimlessly. If no one tells us if we are   doing well or not, which way do we go? Explore  this place or go to the next one? How do we solve   all this? And with that, let’s discuss  the state of play in AIs playing difficult   exploration-based computer games. And I think  you will love to see how far we have some since. First, there is a previous line of work that  infused these agents with a very human-like   property… curiosity. That agent was able to do  much, much better at these games…and then got   addicted to the TV. But that’s a different story.  Note that this TV problem has been remedied since. And this new method attempts  to solve hard exploration games   by watching Youtube videos  of humans playing the game,   and learning from that, as you see, it just rips  through these levels in Montezuma’s revenge and   other games too. So, I wonder how does all this  magic happen? How did this agent learn to explore? Well, it has three things going  for it that really makes this work. One, the Skeptical Scholar would say, that all is  takes is just copy-pasting what it saw from the   human player! Also, imitation learning is not new,  which is a point that we will address in a moment,   so, why bother with this? Now, hold on  to your papers, and observe as it seems   noticeably less efficient than the human  teacher was. Until we realize that this   is not the human player, and this is  not the AI…but the other way around!   Look, it was so observant and took away so much  from the human demonstrations that in the end,   it became even more efficient than its  human teacher. Whoa! Absolutely amazing. And while we are here, I would like  to dissect this copy-paste argument.   You see, it has an understanding of the game,  and does not just copy the human demonstrator.   But even if it just copied what it saw, it would  not be so easy because the AI only sees images,   and it has to translate how the images change in  response to us pressing buttons on the controller.   We might also encounter the same  level, but at a different time,   and we have to understand how to vanquish  an opponent and how to perform that. Two, nobody hooked the agent  into the game information,   which is huge. This means that it doesn’t know  what buttons are pressed on the controller,   no internal numbers or the game state are given  to it, and most importantly, it is also not given   the score of the game. We discussed how difficult  this makes everything. Unfortunately, this means   that there is no easy way out - it really has to  understand what it sees and mine out the relevant   information from each of these videos. And as you  see, it does that with flying colors. Loving it. And three, it can handle the domain gap. Previous  imitation learning methods did not deal with that   too well. So what does that mean? Let’s look at  this latent space together and find out. This   is what a latent space looks like if we just  embed the pixels that we see in the videos.   Don’t worry, I’ll tell you in a moment what that  is. Here, the clusters are nicely clumped up   away from each other, so that’s probably good,  right? Well, in this problem, not so much!   A latent space means a place where  similar kinds of data are meant to   be close to each other. These are snippets of the  demonstration videos that the clusters relate to.   Let’s test that together. Do you  think these images are similar? Yes?   Most of us humans would say that these are quite  similar, in fact, they are nearly the same. So,   is this a good latent space embedding? No,  not in the slightest. This data is similar,   therefore, these should be close to each other,  but this previous technique did not recognize that   because these images have slightly different  colors, aspect ratios, this has a text overlay,   but we all understand that despite all that, we  are looking at the same game through different   windows. So, does the new technique recognize  that? Oh yes, beautiful. Praise the papers!   Similar game states are now close to each  other, we can align them properly and therefore,   we can learn more easily from them. This is  one of the reasons why it can play so well. So there you go, these new AI agents can look  at how we perform complex exploration games,   and learn so well from us, that in the end, they  do even better than we do. And now, to get them to   write some amazing papers for us…or, you know, Two  Minute Papers episodes. What a time to be alive! Thanks for watching and for your generous  support, and I'll see you next time!
Info
Channel: Two Minute Papers
Views: 197,912
Rating: undefined out of 5
Keywords: two minute papers, deep learning, ai, technology, science, machine learning, deepmind, learning from youtube, gamedev, game ai learns from youtube, deep reinforcement learning, deepmind ai, deepmind atari
Id: jjfDO2pWpys
Channel Id: undefined
Length: 8min 17sec (497 seconds)
Published: Sat Mar 27 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.