Multi-Agent Hide and Seek

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

on earth the simple rules of natural selection and competition led to the evolution of increasingly intelligent life-forms today we ask if comparably simple rules at multi-agent competition can also lead to intelligent behavior in a new virtual world these agents are playing hide and seek these agents have just begun learning but they've already learned to chase and run away this is a hard world for a hider who has only learned to flee however after training and millions of rounds of hide-and-seek the hiders find a solution the hiders learn to use rudimentary tools to their advantage by grabbing and locking these blocks they can create their own shelter the Seekers are locked in place for a brief period at the start of the game giving hiders a chance to prepare even so the hiders must learn to collaborate accomplishing tasks that would be impossible for any single individual the hiders are not the only ones who can learn to use tools after many generations of failing to break into the shelter the Seekers learned to jump over obstacles using ramps however after many millions of rounds of having their shelter breached the hiders learned to take away the primary tool the Seekers have at their disposal note that we did not explicitly incentivize any of these behaviors as each team learns a new skill it implicitly changes the challenges the other team faces creating a new pressure to adapt we've also put these agents into a more open-ended environment randomizing the objects team sizes and walls in this world they learn to construct their own shelter from scratch requiring that they arrange multiple objects into precise structures to prevent seekers from using the ramps the hiders move them to the edge of the play area and lock them in place we originally believe this would be the final strategy that the agents learned however we found that after more training the Seekers discover that they can jump on top of boxes and surf them to the Hydra shelter in the last stage of emergent strategy that we observe the hiders learn to lock as many boxes as they can before constructing their force in order to defend against box surfing so how do agents acquire these skills they're trained using reinforcement learning an algorithm inspired by the way animals on earth learn the agents play thousands of rounds of hide-and-seek in parallel for many days they train against each other as well as past versions of themselves using an algorithm called self play coevolution and competition on earth led to the only generally intelligent species known to date humans while this world is far less complex than Earth we have found evidence that simple rules can lead to increasingly intelligent behavior from multi-agent interaction we hope that with a much larger and more diverse environment truly complex and intelligent agents will one day emerge [Music]

Info

Channel: OpenAI

Views: 10,399,234

Rating: undefined out of 5

Keywords:

Id: kopoLzvh5jY

Channel Id: undefined

Length: 2min 57sec (177 seconds)

Published: Tue Sep 17 2019