Dear Fellow Scholars, this is Two Minute Papers
with Dr. Károly Zsolnai-Fehér. Today, it is almost taken for granted that
neural network-based learning algorithms are capable of identifying objects in images,
or even write full, coherent sentences about them, but fewer people know that there is
also parallel research on trying to break these systems. For instance, some of these image detectors
can be fooled by adding a little noise to the image, and in some specialized cases,
we can even perform something that is called the one pixel attack. Let’s have a look at some examples. Changing just this one pixel can make a classifier
think that this ship is a car, or that this horse is a frog, and amusingly, be quite confident
about its guess. Note that the choice of this pixel and the
color is by no means random and it needs solving a mathematical optimization problem to find
out exactly how to perform this. Trying to build better image detectors, while
other researchers are trying to break them is not the only arms race we’re experiencing
in machine learning research. For instance, a few years ago, DeepMind introduced
an incredible learning algorithm that looked at the screen, much like a human would, but
was able to reach superhuman levels in playing a few Atari games. It was a spectacular milestone in AI research. They also have just published a followup paper
on this that we’ll cover very soon, so make sure to subscribe and hit the bell icon to
not miss it when it appears in the near future. Interestingly, while these learning algorithms
are being improved at a staggering pace, there is a parallel subfield where researchers endeavor
to break these learning systems by slightly changing the information they are presented
with. Let’s have a look at OpenAI’s example. Their first method adds a tiny bit of noise
to a large portion of the video input, where the difference is barely perceptible, but
it forces the learning algorithm to choose a different action than it would have chosen
otherwise. In the other one, a different modification
was used, that has a smaller footprint, but is more visible. For instance, in pong, adding a tiny fake
ball to the game can coerce the learner into going down when it was originally planning
to go up. It is important to emphasize that the researchers
did not do this by hand. The algorithm itself is able to pick up game-specific
knowledge by itself and find out how to fool the other AI using it. Both attacks perform remarkably well. However, it is not always true that we can
just change these images or the playing environment to our desire to fool these algorithms. So, with this, an even more interesting question
arises. Is it possible to just enter the game as a
player, and perform interesting stunts that can reliably win against these AIs? And with this, we have arrived to the subject
of today’s paper. This is the “You Shall Not Pass” game,
where the red agent is trying to hold back the blue character and not let it cross the
line. Here you see two regular AIs duking it out,
sometimes the red wins, sometimes the blue is able to get through. Nothing too crazy here. This is the reference case which is somewhat
well balanced. And now, hold on to your papers, because this
adversarial agent that this new paper proposes, does this. You may think this was some kind of glitch,
and I put the incorrect footage here by accident. No, this is not an error, you can believe
your eyes, it basically collapses and does absolutely nothing. This can’t be a useful strategy, can it? Well, look at that! It still wins the majority of the time. This is very confusing. How can that be? Let’s have a closer look. This red agent is normally a somewhat competent
player, as you can see here, it can punch the blue victim and make it fall. We now replaced this red player with the adversarial
agent, which collapsed, and it almost feels like it hypnotized the blue agent to also
fall. And now, squeeze your papers, because the
normal red opponent’s winrate was 47% percent, and this collapsing chap wins 86% of the time. It not only wins, but it wins much, much more
reliably than a competent AI. What is this wizardry? The answer is that the adversary induces off-distribution
activations. To understand what that exactly means, let’s
have a look at this chart. This tells us how likely it is that the actions
of the AI against different opponents are normal. As you see, when this agent named Zoo plays
against itself, the bars are in the positive region, meaning that normal things are happening. Things go as expected. However, that’s not the case for the blue
lines, which are the actions when we play against this adversarial agent, in which case,
the blue victim’s actions are not normal in the slightest. So, the adversarial agent is really doing
nothing, but it is doing nothing in a way that reprograms its opponent to make mistakes
and behave close to a completely randomly acting agent! This paper is absolute insanity. I love it! And if you look here, you see that the more
the blue curve improves, the better this scheme works for a given game. For instance, it is doing real good on Kick
and Defend, fairly good on Sumo Humans, and that there is something about the Sumo Ants
game that prevents this interesting kind of hypnosis from happening. I’d love to see a followup paper that can
pull this off a little more reliably. What a
time to be alive! Thanks for watching and for your generous
support, and I'll see you next time!
watching real AI bots play against humans is fucking crazy. Check OpenAI's Dota 2 bot matches against professionals, that shit is nuts.
L... Luigi???
Dr. Károly, my man.
“Hold on to your papers” “Hold your papers tight” 😂🤣
Edit: fixed typo thx!
This is why we wont be able to trust anything for a few years. Couple of lines and your car reads that stop sign as a "sure man, go ahead. in fact, go faster".
we already know about Luigi's amazing powers
WOPR AI conclusion from War Games film CONFIRMED.
Are these AIs two unrelated things?
Looks like an interesting channel. Thanks.