Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. The paper that we are going to cover today
in my view, is one of the more important things that happened in AI research lately. In the last few years, we have seen DeepMind’s
AI defeat the best Go players in the world, and after OpenAI’s venture in the game of
DOTA2, DeepMind embarked on a journey to defeat pro players in Starcraft 2, a real-time strategy
game. This is a game that requires a great deal
of mechanical skill, split-second decision making and we have imperfect information as
we only see what our units can see. A nightmare situation for any AI. The previous version of AlphaStar we covered
in this series was able to beat at least mid-grandmaster level players, which is truly remarkable,
but, as with every project of this complexity, there were limitations and caveats. In our earlier video, the paper was still
pending, and now, it has finally appeared, so my sleepless nights have officially ended,
at least for this work, and now, we can look into some more results. One of the limitations of the earlier version
was that DeepMind needed to further tune some of the parameters and rules to make sure that
the AI and the players play on an even footing. For instance, the camera movement and the
number of actions the AI can make per minute has been limited some more and are now more
human-like. TLO, a professional StarCraft 2 player noted
that this time around, it indeed felt very much like playing another human player. The second limitation was that the AI was
only able to play Protoss, which is one of the three races available in the game. This new version can now play all three races,
and here you see its MMR ratings, a number that describes the skill level of the AI,
and for non-experts, win percentages for each individual race. As you see, it is still the best with Protoss,
however, all three races are well over the 99% winrate mark. Absolutely amazing. In this version, there is also more emphasis
on self-play, and the goal is to create a learning algorithm that is able to learn how
to play really well by playing against previous versions of itself millions and millions of
times. This is, again, one of those curious cases
where the agents train against themselves in a simulated world, and then, when the final
AI was deployed on the official game servers, it played against human players for the very
first time. I promise to tell you about the results in
a moment, but for now, please note that relying more on self-play is extremely difficult. Let me explain why. Self-play agents have the well-known drawback
of forgetting, which means that as they improve, they might forget how to win against a previous
version of themselves. Since StarCraft 2 is designed in a way that
every unit and strategy has an antidote, we have a rock-paper-scissors kind of situation
where the agent plays rock all the time because it encountered a lot of scissors lately. Then, when a lot of papers appear, it will
start playing scissors more often, and completely forget about the olden times when the rock
was all the rage. And, on and on this circle goes without any
real learning or progress. This doesn’t just lead to suboptimal results
- this leads to disastrously bad learning, if any learning at all. But it gets even worse. This situation opens up the possibility for
an exploiter to take advantage of this information and easily beat these agents. In concrete StarCraft terms, such an exploit
could be trying to defeat the AlphaStar AI early by rushing it with workers and warping
in photon cannons to their base. This strategy is also known as a cannon rush,
and as you can see here the red agent performing this, it can quickly defeat the unsuspecting
blue opponent. So, how do we defend against such exploits? DeepMind used a clever idea here, by trying
to turn the whole thing around and use these exploits to its advantage. How? Well, they proposed a novel self-play method
where they additionally insert these exploiter AIs to expose the main AI’s flaws and create
an overall, more knowledgeable and robust agent. So, how did it go? Well, as a result, you can see how the green
agent has learned to adapt to this by pulling its worker line and successfully defended
the cannon rush of the red AI. This is proper machine learning progress happening
right before our eyes. Glorious! This is just one example of using exploiters
to create a better main AI, but the training process continually creates newer and newer
kinds of exploiters, for instance, you will see in a moment that it later came up with
a nasty strategy including attacking the main base with cloaking units. One of the coolest parts of the work, in my
opinion, is that this kind of exploitation is a general concept that will surely come
useful for completely different test domains as well. We noted earlier that it finally started playing
humans for the first time on the official servers. So, how did that go? In my opinion, given the difficulty and the
vast search space we have in StarCraft 2, creating a self-learning AI that has the skills
of an amateur player is already incredible. But that’s not what happened. Hold on to your papers, because it quickly
reached grandmaster level with all three races and ranked above 99.8% of the officially ranked
human players. Bravo, DeepMind. Stunning work. Later, it also played Serral, a decorated,
world champion Zerg player, one of the most dominant players of our time. I will not spoil the results, especially given
there were limitations as Serral wasn’t playing on his equipment, but I will note
that Artosis, a well-known and beloved Starcraft player and commentator analyzed these matches
and said “The results are so impressive and I really feel like we can learn a lot
from it. I would be surprised if a non-human entity
could get this good and there was nothing to learn”. His commentary is excellent and is tailored
towards people who don’t know anything about the game. He’ll often pause the game and slowly explain
what is going on. In these matches, I loved the fact that so
many times it makes so many plays that we consider to be very poor and somehow, overall,
it still plays outrageously well. It has unit compositions that nobody in their
right minds would play. It is kind of like a drunken kung fu master,
but in StarCraft 2. Love it. But no more spoilers - I think you should
really watch these matches and, of course, I put a link to his analysis videos in the
video description. Even though both this video and the paper
appears to be laser focused on playing StarCraft 2, it is of utmost importance to note that
this is still just a testbed to demonstrate the learning capabilities of this AI. As amazing as it sounds, DeepMind wasn’t
just looking to spend millions and millions of dollars on research to just play video
games. The building blocks of AlphaStar are meant
to be reasonably general, which means that parts of this AI can be reused for other things,
for instance, Demis Hassabis mentioned weather prediction and climate modeling as examples. If you take only one thought from this video,
let it be this one. There is really so much to talk about, so
make sure to head over to the video description, watch the matches and check out the paper
as well. The evaluation section is as detailed as it
can possibly get. What a time to
be alive! Thanks for watching and for your generous
support, and I'll see you next time!
Two minute papers is a great educational channel, I suggest anyone who is interested in AI check out. Even if the videos tend to be 5min long.
Thanks for posting. I wanted to do the same, but you beat me to it. :)
In case you haven't seen it.
My main takeway was the effort DeepMind had to put in to have the AI react properly to various strategies such as cannon rushes. Apparently, deep reinforcement learning systems up to now would learn towards a single strategy and do that over and over again. Such a system can be easily exploited. But AlphaStar can handle this!
I won't spoil more. Just watch the video.