OpenAI Plays Hide and Seek…and Breaks The Game! 🤖

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Could this be used to debug physics engines where boundaries sometimes cause unrealistic behaviors?

👍︎︎ 234 👤︎︎ u/woShame12 📅︎︎ Oct 23 2019 🗫︎ replies

That was epic, and also scary/funny how human-like those exploits are.

I spent half my childhood finding exploits like this in video games, and even now when I play I like to find strategies that “break” the game or make it way easier to play/beat.

Great find OP!

👍︎︎ 243 👤︎︎ u/bemental_ 📅︎︎ Oct 23 2019 🗫︎ replies

Box surfing, the obvious solution we've never thought of.

I know that AI still has a very, very long way to go, but seeing stuff like this does really make me worry - not about the rogue AI chat bot, but that the military /governments of the world will definitely try to make stuff that can learn and adapt to situations, not understand what they're really unleashing, and these things get out from under them. Not thinking tanks or anything physical, but cybersecurity and that sort of thing.

That and having emergent behaviour from multiple systems like creating undesired effects.

👍︎︎ 68 👤︎︎ u/InextricableSquirrel 📅︎︎ Oct 23 2019 🗫︎ replies

Original video by the AI people and half the length with all the same info.

https://www.youtube.com/watch?v=kopoLzvh5jY

👍︎︎ 544 👤︎︎ u/mrbaggins 📅︎︎ Oct 23 2019 🗫︎ replies

This is the funniest thing I've seen in quite some time

👍︎︎ 83 👤︎︎ u/sasuke41915 📅︎︎ Oct 22 2019 🗫︎ replies

it’s funny how he gets rid of the ramp in minute 4:03 like “fuck this thing”

👍︎︎ 15 👤︎︎ u/z3r0i7 📅︎︎ Oct 23 2019 🗫︎ replies

Can this experiment be replicated on a normal desktop computer? Or at least on a modest single computer multi-GPU setup?

Or is it one of those things, like Google Alpha Zero, where you can only replicate the results if you are a huge research lab with access to unlimited funds to buy computing time on a GPU farm?

👍︎︎ 11 👤︎︎ u/ebj011 📅︎︎ Oct 23 2019 🗫︎ replies

Maybe I should just read the paper... but I find it incredible how the agents were able to get out of such deep local optima. A testament to how the agents were modelling the game world and searching the state space efficiently I'm sure.

👍︎︎ 20 👤︎︎ u/Vallvaka 📅︎︎ Oct 23 2019 🗫︎ replies

Sorry for not reading the paper myself, but, if anyone does and is kind enough to answer: what's the contribution of this particular paper? It is well known that unsupervised learning can eventually come up with some crazy strategies, that's not surprising.

👍︎︎ 38 👤︎︎ u/teerre 📅︎︎ Oct 23 2019 🗫︎ replies
Captions
Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. In this project, OpenAI built a hide and seek game for their AI agents to play. While we look at the exact rules here, I will note that the goal of the project was to pit two AI teams against each other, and hopefully see some interesting emergent behaviors. And, boy, did they do some crazy stuff. The coolest part is that the two teams compete against each other, and whenever one team discovers a new strategy, the other one has to adapt. Kind of like an arms race situation, and it also resembles generative adversarial network a little. And the results are magnificent, amusing, weird - you’ll see in a moment. These agents learn from previous experiences, and to the surprise of no one, for the first few million rounds, we start out with…pandemonium. Everyone just running around aimlessly. Without proper strategy and semi-random movements, the seekers are favored and hence win the majority of the games. Nothing to see here. Then, over time, the hiders learned to lock out the seekers by blocking the doors off with these boxes and started winning consistently. I think the coolest part about this is that the map was deliberately designed by the OpenAI scientists in a way that the hiders can only succeed through collaboration. They cannot win alone and hence, they are forced to learn to work together. Which they did, quite well. But then, something happened. Did you notice this pointy, doorstop-shaped object? Are you thinking what I am thinking? Well, probably, and not only that, but about 10 million rounds later, the AI also discovered that it can be pushed near a wall and be used as a ramp, and, tadaa! Got’em! Te seekers started winning more again. So, the ball is now back on the court of the hiders. Can you defend this? If so, how? Well, these resourceful little critters learned that since there is a little time at the start of the game when the seekers are frozen, apparently, during this time, they cannot see them, so why not just sneak out and steal the ramp, and lock it away from them. Absolutely incredible. Look at those happy eyes as they are carrying that ramp. And, you think it all ends here? No, no, no. Not even close. It gets weirder. Much weirder. When playing a different map, a seeker has noticed that it can use a ramp to climb on the top of a box, and, this happens. Do you think couchsurfing is cool? Give me a break! This is box surfing! And, the scientists were quite surprised by this move as this was one of the first cases where the seeker AI seems to have broken the game. What happens here is that the physics system is coded in a way that they are able to move around by exerting force on themselves, but, there is no additional check whether they are on the floor or not, because who in their right mind would think about that? As a result, something that shouldn’t ever happen does happen here. And, we’re still not done yet, this paper just keeps on giving. A few hundred million rounds later, the hiders learned to separate all the ramps from the boxes. Dear Fellow Scholars, this is proper box surfing defense…then, lock down the remaining tools and build a shelter. Note how well rehearsed and executed this strategy is - there is not a second of time left until the seekers take off. I also love this cheeky move where they set up the shelter right next to the seekers, and I almost feel like they are saying “yeah see this here? there is not a single thing you can do about it”. In a few isolated cases, other interesting behaviors also emerged, for instance, the hiders learned to exploit the physics system and just chuck the ramp away. After that, the seekers go “what?” “what just happened?”. But don’t despair, and at this point, I would also recommend that you hold on to your papers because there was also a crazy case where a seeker also learned to abuse a similar physics issue and launch itself exactly onto the top of the hiders. Man, what a paper. This system can be extended and modded for many other tasks too, so expect to see more of these fun experiments in the future. We get to do this for a living, and we are even being paid for this. I can’t believe it. In this series, my mission is to showcase beautiful works that light a fire in people. And this is, no doubt, one of those works. Great idea, interesting, unexpected results, crisp presentation. Bravo OpenAI! Love it. So, did you enjoy this? What do you think? Make sure to leave a comment below. Also, if you look at the paper, it contains comparisons to an earlier work we covered about intrinsic motivation, shows how to implement circular convolutions for the agents to detect their environment around them, and more. Thanks for watching and for your generous support, and I'll see you next time!
Info
Channel: Two Minute Papers
Views: 3,871,685
Rating: 4.9402609 out of 5
Keywords: two minute papers, deep learning, ai, openai, openai hide and seek, hide and seek ai, hide and seek, machine learning
Id: Lu56xVlZ40M
Channel Id: undefined
Length: 6min 7sec (367 seconds)
Published: Tue Oct 22 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.