Shift AI 2020: On the Road to Artificial General Intelligence | Danny Lange (Unity)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi I'm Danny Liang I'm vice president of artificial intelligence and Unity Technologies I've really been looking forward to this event since this is my first time at shift I have an exciting talk in store for you it's about getting on the road to artificial general intelligence my talk is more about the journey and less about how long time it will take or what we will find when we get there so let's get started please allow me to introduce Unity Technologies we offer a three-year real-time 3d platform that is used for gaming a VR film or a motor of robotics actually over half of the world's games are being developed on the unity platform we have seen installs of over 4 billion devices and we have over 2 billion monthly active players playing a game written on the unity platform and we are 3,000 employees headquartered in San Francisco so you can say we are the most popular gaming company out there so what what does games have to do with this well games had been playing a long a big role in in the research of AI for a very long time it started as early as 1950 when Claude Shannon he published an article about how to implement a computer game to play chess we have seen other examples where games perform at world-class level Chinook which is a checkers program from 1989 we have seen deep blue and I was at IBM in 1997 when it defeated gay Kasparov in chess and more recently from 2011 there's an example of IBM Watson winning jeopardy all these examples are characterized by really smart implementations by really smart people if you take the Watson example it's really a big database of curated questions and answers yeah in in 2016 there's a change with deep minds alphago we're now moving to a phase where these game algorithms are generic learning algorithms they're being trained to play a game they're not being implemented with a specific strategy in mind we have seen a lot of video games now being used to advance AI we all remember the entire games both from deep mind and oak may I and the whole range of other games including minecraft from Microsoft and very recently unit is own obstacle challenge to improve reinforcement learning so what is it with video games that actually link them so much to AI well this asked the question first what is intelligence this is the dictionary explanation I don't it doesn't mean too much to me so what is the real intelligence that we truly know well this it's the intelligence and biological systems let's take a step back and look at biological systems essentially the idea of senses and computation in nature that allows us to consume eat food to get energy to avoid getting eaten while we are trying to get food become more abundant because no individual lives forever we need the speech to keep the species going and we need to be aware physics in particularly things like inertia and gravity don't fall down from the tree while you're trying to pick an apple and finally agency the ability to change the environment this is the competition this is the real game of nature yeah and what nature then did is it actually invented what you just saw here it invented lots of infrastructure to implement the needed intelligence to survive in this game of nature yeah so chemical mechanism cellular structures multicellular organisms and and so on all these things were evolved in nature to basically create the foundation for intelligence so when we look at a real-time 3d engine like unity it's a spatial environment with a physics engine and you have a self-sufficient ecosystem that actually pretty closely replicates the real world you can say that you with unity have a private AI biodome yeah so you need to allows you to build your own AI using a toolkit called unity ml agents it allows you to build your 3d environment with a physics engine train your machine learning models and deploy it on a device yeah so in this case here you have you have trained a virtual dog go fetch the stick ml agent is a open source environment or open source toolkit sorry with a bunch of environments that you can basically experiment with here's the the URL for the github repository you can take a picture here if you want to visit it later and here we have a couple of reference references for papers that we have written about Emily agents and the obstacle tower challenge now let's go into these learning scenarios yeah so you build these environments with ml agents and now you're gonna train your your your agents in these environments we emphasize what we call the nature's learning method which is reinforcement it's basically observing the environment around you taking an action and reap the rewards or the punishment and negative rewards from that yeah and nature is in a way moving constantly moving from exploration towards exploitation yeah so when you when you move around in this flywheel between observation action and rewards you you learn and you become better and better through exploration so that you can exploit let's take an example of that if it's a little too abstract here at Unity we wanted to to figure out if we could teach chicken to cross the road from scratch from tabula rasa so using a generic reinforcement learning algorithm as I just described so in this case here the system can observe the pixels frame by frame just as you do there are four actions left right forward backwards and there's a reward signal the chicken- reward if the chicken get hits by a car and a positive reward for picking up gift packages yeah now I'll just look at this video it starts from scratch the chicken has no idea the algorithm has no idea is just a learning algorithm and it you can see it moves more backwards than forwards but in a moment it will fetch the gift package up on the right here and then killed by a car what did it take five ten seconds and two bits of information yeah after half an hour you see the chicken actor gets pretty good at picking gift packages and avoiding the cars at least for a few seconds yeah but as you will see when it rains more and more it gets better in this example here we let it train for six hours and now watch the chicken became superhuman yeah it will pick up packages and it will never get get hit by a car yeah incredible from tabula rasa from scratch now let's this go in and look at some specific scenarios that you can play out with ml agents so let's take a control learning here you actually have four exits eight joints that needs to operate in in coordination to move forward using reinforcement learning these quarter pets they basically learned to walk yeah here's a humanoid think about a small child 12 months old learning to stand there's a lot of joints and again here's an example of reinforcement learning trying to learn to coordinate those joints to remain standing or walking for that sake let's look at some other very well-known learning approaches that we as humans recognize so let's talk a bit about curriculum learning so in this case I want the blue agent the blue agent to jump the wall when the wall gets too high it needs to use a tool which is the orange box it needs to be to push the orange box next to the wall and use it as a stepping stone yeah we have found that if you use a curriculum you start easy and then you as the agent graduates you incrementally make the the tasks harder and basically progress depends on graduation in this example here you can see how curriculum learning which is the blue line basically learns faster and learns better whereas if you just throw the computer in and the deep water in the deep water and give it the most difficult chance from the beginning takes a very long time to learn and doesn't learn very efficiently so take a look at this example here pretty impressive the blue agent learns to push the orange cube next to the wall to use that stepping-stone look incredible it's really using the cube as at all there are other examples of features in learning that we recognize very well one of them is memory in this case the blue agent comes in if the cube the big cube in front is orange it uses the should use the orange exit otherwise it should use the red exit we look at this here we don't tell it that the collector matters it learns that that orange mean orange exit and red means red exit yeah there's no software engineers here programming this it's basically by trial and error over and over it figures out what matters is not the size it's not the location it's the color that matters let me show you an example another example that we recognize which is hierarchical learning it's basically navigation and control in this case there's one machine learning model that basically keeps an eye out it uses ray tracing here keeps an eye out for the target and basically you can imagine it says left right straight left right straight try to navigate the legs the legs or whether showed you it's another model is what I showed you earlier is the portal pet walking so we basically have a split four responsibilities here in the hierarchical model multi-agent learning now that's interesting in this case here we have two agents helping each other they basically play to each other the reward here it's collaborative the ward every board is to keep the ball in the air and not touching the ground we became a little more ambitious and here's an example of Sardar we essentially train one player to be offensive another player to be defensive and we train them up against each other and then the offensive player is basically being rewarded for scoring the defensive player is being rewarded or rather punished if they're being scored upon and when you have trained your agents and they didn't kick Duncan and they don't get any better you basically copy them or clone them and create two teams of each two players and it looks like this these guys are playing against each other notice the defensive player became what we recognize as humans as a goalie and the offensive player become more like strikers moving around let me show you another example where we don't assign roles to the players so here the two blue players are playing against the two purple players notice that we never one player from the same team so whenever one purple player goes up the other one sticks behind us a defensive player but now look they switch when the other one moves up in an offensive move the other player moves back in a defensive position very interesting learning from scratch it's nothing that we programmed in here let me show you another example of of two one one soccer so people play soccer will recognize this as a drill so we have one player the purple is the goalie and then we have two offensive players in the field trying to score on that goalie and look what is the best approach here well the best defense apparently is offensive play keep the ball up away from your own goal and and that's beneficial clearly so again by playing you know thousands hundreds of thousands of examples here are episodes the agent learned something relatively profound brings me on to traits in learning we've talked a lot about how we've recognized the thing look at this cute agent here it's cuter than my little coupe SIA what would you call this trait yeah it's it's it's basically cheating yeah why should why should it follow the music if it can just crawl over yeah so look at this next example you remember the curriculum learning we're training you're trying to train the agent to to push the tube next to the wall and use it as a stepping stone but watch what this agent is doing during training yeah now watch out yeah just just like an animal it will find the path of least resistance clearly that wasn't luck in the environment here because it allowed the agent you've just run around the wall yeah so you fix that you fix the environment make it made it more restrictive and the agent will have to learn to use the big cube as a tool yeah but even in computers we see cheating now this brings me on to extrinsic and intrinsic rewards a very very interesting concept extrinsic rewards are in the gaming world things like capture chief collect it's specific to the environment and we think of it as humans sort of the getting rich concept you collect points there are intrinsic rewards in nature as well we haven't talked much about them example comes from related to curiosity in patience patience happiness love empathy and so on and nature invented these intrinsic rewards because they've been official they are specific to the agent not the environment is something personal and we we call those the getting heavy rewards yeah now let's cover this a little deeper there are known limits to standard reinforcement learning when you have very sparse without spaces like I like to compare them to Ithaca Africa Agatha Christie event changed it sits vertically improbable scenarios like this we want the agent to end a house full of rooms in one of the rooms there's a button it pushes it has to learn to push the button and a pyramid will appear in another room it will have to tumble the pyramid to collect the external the extrinsic reward which is a gold cube on top of the pyramid random exploration will not work it's just too improbable that the agent will randomly learn the sequence yeah so we have to favor agency over randomness and the key lever we have here is nature's fantastic intrinsic rewards basically what we did is that we try to define surprisal or curiosity in a mathematical sense and the magic is right here it's it's the reward normally in machine learning we want to to minimize the error rate we want to make as good predictions as possible yeah but in curiosity you do the opposite in a sense you you try to figure out what you know the least about to explore that first yeah let's see how that works out in an agent example here you have an agent in the room in the house with all the rooms clearly random exploration is not leading to anything here you can let this agent try four five and fifteen million episodes and just won't learn it so let's let's try to turn on curiosity we're not sure turning on the extrinsic reward that getting rich full reward yeah we'll just see what will the curious agent do a curious agent is moving from room to room I will always try to explore another room and it gets really curious here around with these piles of of rocks or pyramids are doing.you and it figures out that if it actually hits them something is gonna happen yeah okay looks interesting now let's turn on the extrinsic reward here it's curious and it tries to get it it tries to get rich learns to push the button now it's looking for the pyramid with the gold cube there I found it it knocks it over and come on turn around and there got the cuckoo next episode push the button and so on yeah you see it becomes much much more efficient agent by using the concept of curiosity well that is until you put a TV in there you have the so-called TV trap because the TV will always show something interesting and something new and the curious agent will get stuck in front of that TV so no solution is perfect let me take an example here from a from the gaming world an NPC it's a non playable character what we did is that we basically implement that a virtual robot dog a robot arc or puppy if you want we put it in training camp to train a machine learning model fit to learn to walk run jump turn whatever it has to do to move around so all the animation is trained it's learned from scratch the reward function is very simple it's returned the stick by returning the stick that's the only way that the puppy achieves a positive reward let's see what comes out of this here we have the dark the stick I throw the star stick with my mouse here so throw it and the dog will run after it the animation is not perfect but it's not created by human hand is basically a roll small virtual robot here who learned to move whoops was not perfect yeah and he constantly tried to pick up the stick I could move I tried examples where I had to move the stick left and right repeatedly in an attempt to see if and get the dr. tumble and once it happened to me that the dark basically grabbed the stick from me which is actually what a real puppy would do the reverse function here or the desire to to reap the rewards are so high that it will it will do anything it can to get that stick so that it can return it to me so let's get back and talk a bit about how to get on the road to a greatly improved AI biology has a great trip in store for us as you probably already can feel for my talk we are looking nature over-the-shoulder yeah we are trying to get inspired by looking at nature and I'm gonna show you a number of very specific areas here that is worth exploring the first one is attention it it's basically instead of seeing everything 360 around you what if you only focus on what matters yeah nature developed this property to save energy yeah we focus with our eyes on something and we don't see what's behind us we don't see actually what's next to us that's basically to limit the need for processing power yeah so imagine cameras that only look at areas in the view that actually matters yeah that would save a lot of energy think about self-driving cars where the cameras process more data from from the front view than from the back view yeah episodic memory one-shot learning there are clearly things where you don't need thousands or hundred thousand episodes to learn it you should really just need one if you burn yourself in a candle you probably as a kid you that's probably you know you you will probably avoid touching candles or touch the flame again yeah working memory I showed you an example of long short-term memory LST M there are other sorts of memory that you could you could explore along with reinforcement learning continuous learning continuous learning is very tricky ah there's a trade-off so if you if you're very good at learning new things you're also typically very good at overriding them yeah we see that in children they can learn new languages very quickly I had kids who when we lived in Japan were fluent in Japanese we moved to the US six months later they couldn't count to ten in Japanese any longer but they were fluent in English suddenly unfortunately when we get a little older we're not good at that at learning new languages yeah the languages we learn they get stuck like can't run from my Danish accent there's nothing I can do about it it won't change so the benefit of that of course is that then we don't really forget that easily either yeah so continuous learning is is a very interesting area to explore as well imagination let's think about dreaming that's really like running simulations yeah and and finally we talked a bit about agency intrinsic values this is something that is a high potential area for research nature invented a number of intrinsic rewards of values for us humans and for animals curiosity is just one of them it would be a very good idea to explore more of those and finally decomposition hierarchical learning a lot of the examples I showed you a single model or two model examples but what about three four five or ten models what about a hundred models collaborating to achieve something that's the that's the very interesting topic of multi-agent system and multi model systems huge potential for very interesting research there now at the end of my talking I would like to two-step we talked a lot about nature and biology let me I would like to take a step and talk a bit about cultural evolution because we have humans Homo sapiens we have also evolved over the years we first appeared two to three hundred thousand years ago in Africa about a hundred thousand years ago we tried to migrate out of Africa and we failed in doing so then we tried again between or 45 to 70,000 years ago and we succeeded and we got all the way to Australia yeah what was it that enabled that migration first time it fails second time has succeeded a lot of people believe is the cognitive revolution yeah you see the figure here on my right that's the living bench or the lion human found in Germany in in the late 30s it's made from a mammoth tusk and you can see it's a it's an abstract object yeah it's a it's a it's a strange figure yeah researchers have act to try to recreate such a figure here with stone tools and they they say it takes like three to four hundred hours to do it yeah so thirty five to forty thousand years ago someone did this figure despite having to find food despite having to protect themselves and their loved ones against predators yeah they spent time creating an abstract figure that's an indication that something happened with humans over time and just Homo sapiens itself here yeah so let's think about this there's no Moore's law for the human brain yeah the last 100,000 years we know from DNA that the brain hasn't really changed yeah so no upgrades no new GPUs no no magic on the hardware side but lots of magic on the software side so basically what happened is that over these two to three hundred thousand years emergent behavior language collaboration and anticipation of other people's actions were something that evolved in our software and this is the potential that we have with creating richer machine learning models creating multi model systems and multi-agent systems is to have these AI systems learn their version of emergent behavior language collaboration and anticipation that was what I have for you I hope you enjoyed it you're welcome to link up with me on LinkedIn or follow me on Twitter and of course I encourage you to go out and download ml agents and play with it and create your own e eyes I can tell you as you probably saw from this talk it's actually a really fun

Info

Channel: Shift Conference

Views: 2,435

Rating: 4.7611942 out of 5

Keywords: shift ai, shift conference

Id: R4Mnml-JcEU

Channel Id: undefined

Length: 29min 16sec (1756 seconds)

Published: Thu Apr 23 2020