Dear Fellow Scholars, this is Two Minute
Papers with Dr. Károly Zsolnai-Fehér. Between 2013 and 2015, DeepMind worked on
an incredible learning algorithm by the name Deep Reinforcement Learning. This technique looked
at the pixels of the game, was given a controller and played much like a human would… with the
exception that it learned to play some Atari games on a superhuman level. I have tried to train it
a few years ago and would like to invite you for a marvelous journey to see what happened. When
it starts learning to play an old game, Atari breakout, at first, the algorithm loses all of
its lives without any signs of intelligent action. If we wait a bit, it becomes better at playing
the game, roughly matching the skill level of an adept player. But here's the catch, if we wait for
longer, we get something absolutely spectacular. Over time, it learns to play like a pro, and
finds out that the best way to win the game is digging a tunnel through the
bricks and hit them from behind. This technique is combination of a neural network
that processes the visual data that we see on the screen, and a reinforcement learner that
comes up with the gameplay-related decisions. This is an amazing algorithm, a
true breakthrough in AI research. A key point in this work was that the
problem formulation here enabled us to measure our progress easily: we
hit one brick, we get some points, so do a lot of that. Lose a few lives, the game
ends, don’t do that! Easy enough. But there are other, exploration-based games like Montezuma’s
revenge or Pitfall that it was not good at. And man, these games are a nightmare for any AI,
because there is no score, or at the very least, it’s hard to define how well we are doing. Because
there are no scores, it is hard to motivate the AI to do anything at all other than just wander
around aimlessly. If no one tells us if we are doing well or not, which way do we go? Explore
this place or go to the next one? How do we solve all this? And with that, let’s discuss
the state of play in AIs playing difficult exploration-based computer games. And I think
you will love to see how far we have some since. First, there is a previous line of work that
infused these agents with a very human-like property… curiosity. That agent was able to do
much, much better at these games…and then got addicted to the TV. But that’s a different story.
Note that this TV problem has been remedied since. And this new method attempts
to solve hard exploration games by watching Youtube videos
of humans playing the game, and learning from that, as you see, it just rips
through these levels in Montezuma’s revenge and other games too. So, I wonder how does all this
magic happen? How did this agent learn to explore? Well, it has three things going
for it that really makes this work. One, the Skeptical Scholar would say, that all is
takes is just copy-pasting what it saw from the human player! Also, imitation learning is not new,
which is a point that we will address in a moment, so, why bother with this? Now, hold on
to your papers, and observe as it seems noticeably less efficient than the human
teacher was. Until we realize that this is not the human player, and this is
not the AI…but the other way around! Look, it was so observant and took away so much
from the human demonstrations that in the end, it became even more efficient than its
human teacher. Whoa! Absolutely amazing. And while we are here, I would like
to dissect this copy-paste argument. You see, it has an understanding of the game,
and does not just copy the human demonstrator. But even if it just copied what it saw, it would
not be so easy because the AI only sees images, and it has to translate how the images change in
response to us pressing buttons on the controller. We might also encounter the same
level, but at a different time, and we have to understand how to vanquish
an opponent and how to perform that. Two, nobody hooked the agent
into the game information, which is huge. This means that it doesn’t know
what buttons are pressed on the controller, no internal numbers or the game state are given
to it, and most importantly, it is also not given the score of the game. We discussed how difficult
this makes everything. Unfortunately, this means that there is no easy way out - it really has to
understand what it sees and mine out the relevant information from each of these videos. And as you
see, it does that with flying colors. Loving it. And three, it can handle the domain gap. Previous
imitation learning methods did not deal with that too well. So what does that mean? Let’s look at
this latent space together and find out. This is what a latent space looks like if we just
embed the pixels that we see in the videos. Don’t worry, I’ll tell you in a moment what that
is. Here, the clusters are nicely clumped up away from each other, so that’s probably good,
right? Well, in this problem, not so much! A latent space means a place where
similar kinds of data are meant to be close to each other. These are snippets of the
demonstration videos that the clusters relate to. Let’s test that together. Do you
think these images are similar? Yes? Most of us humans would say that these are quite
similar, in fact, they are nearly the same. So, is this a good latent space embedding? No,
not in the slightest. This data is similar, therefore, these should be close to each other,
but this previous technique did not recognize that because these images have slightly different
colors, aspect ratios, this has a text overlay, but we all understand that despite all that, we
are looking at the same game through different windows. So, does the new technique recognize
that? Oh yes, beautiful. Praise the papers! Similar game states are now close to each
other, we can align them properly and therefore, we can learn more easily from them. This is
one of the reasons why it can play so well. So there you go, these new AI agents can look
at how we perform complex exploration games, and learn so well from us, that in the end, they
do even better than we do. And now, to get them to write some amazing papers for us…or, you know, Two
Minute Papers episodes. What a time to be alive! Thanks for watching and for your generous
support, and I'll see you next time!
I want to see an AI rise up as a Super Smash Bros god.
But does it leave racist and derogatory comments?
This was in 2018 too. Im sure its like much better now
2 minute papers is a great channel
Is there anything to be said for focusing on cooperative games?
An AI trained on Starcraft is cool, but doesn't fill me with confidence for the future of humanity.
I think it can use to test software(or test game) now, I want to know what human will use it to build and earn money.