On the Road to Artificial General Intelligence • Danny Lange • GOTO 2019

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

This is a 35 minute talk from GOTO Chicago 2019 by Danny Lange, VP of AI and ML at Unity Technologies and previously led innovative ML teams at Uber, AWS and Microsoft. I've dropped the full talk abstract in below:

Join this session to discuss the role of intelligence in biological evolution and learning.

Danny Lange will demonstrate why a game engine is the perfect virtual biodome for AI’s evolution. Attendees will recognize how the scale and speed of simulations is changing the game of AI while learning about new developments in reinforcement learning.

👍︎︎ 2 👤︎︎ u/mto96 📅︎︎ Nov 12 2019 🗫︎ replies

does anticipation just appear naturally in some reinforced learning methods?

I believe creative problem solving is a form of creativity.

they should use something like reinforced learning to have agents create their own games in a virtual environment.

Then play with each other that way.

👍︎︎ 1 👤︎︎ u/loopy_fun 📅︎︎ Nov 12 2019 🗫︎ replies
Captions
[Music] you very much hello can you hear me good afternoon something about some reading keep that in mind afterwards I'm going to talk about artificial in intelligence and the special version of it called the general intelligence it's not because I'm crazy it's not it's really because it's not about AGI itself it's not about the destination it's about the journey they're all the technologies and the techniques that we're going to develop to get there those things are going to be very interesting and they are going to be applicable as you can see from my background I come out of the practical use of machine learning I built the Michelangelo platform at uber which is the company white machine learning platform I was GM for machine learning at Amazon and when the internal elastic machine learning service I also launched Amazon machine learning the first machine learning service in AWS Microsoft autonomous agent at IBM research so having done all that I moved to a couple of years ago I moved to unity a gaming platform company they are vr gaming automotive film about 60% of all games in the world are written on the platform we are installed on over 4 billion devices and have a lot of active players every month and 2400 employees so a pretty big deal in gaming particular but you may wonder why the shift for me and that's some of the stuff I'm going to show you today why I think it's important coming with strong enterprise background you know gaming actually brings something to the table that is really really beneficial this first briefly talk about AR and a definition of it I spent eight years and virtual voice enabled Virtual Assistants I can tell you Syrian Alexis not AI its software engineering like thousands of engineers script writers creative writers audio voice talent all that stuff yeah that's probably a little machine learning here and there but it's not AI Netflix and Amazon recommendations smart engineers writing very smart algorithms that get you to watch Netflix home produced shows no matter how much you really want to see something else yeah for detection services equity trading trading algorithms yeah really really some brilliant mathematician creating some trading algorithms that trade stock and make someone else which yeah Facebook feed Nadia at all and probably one of the most successful AI is in the world is all the self-proclaimed AI specialists on link them but that's a different story so what is intelligence that's the dictionary definition what is the only real intelligence we know that that's not abstract its intelligence that we see in biological systems yeah it's basically sensors and nature's algorithms combined it basically does five things yeah number one it enables living beings to consume energy because you know about entropy tries to kill kind of structure yeah so we need to consume energy we need to eat while not getting eaten ourselves that's it yeah we also have to become more abundant because we don't live forever so there needs to be someone taking over and we have to be aware of physics because we can fall fall down from the apple tree and try to get an apple or fall over an edge or something I'll get hit by a boulder or what do I know physics scary and and lastly the principle in nature is agency that is our our ability to impact our environment to improve our probability of survival yeah so those five principles are really what's behind intelligence because nature created infrastructure to achieve that yeah so it created things like chemical mechanisms cellular structures multi cellular structures yeah so it's not good enough to have bacteria themselves but we put them together and we create systems skeletons with masks muscles what do I know huh so all of these things happen through evolution to address this need for intelligence for living beings to survive so that brings us straight in a magic leap two gay men jumps if you think about it yeah in a game engine you have a 3d environment you have a spatial environment yeah you have a physics engine with gravity simulated gravity you have collision you have inertia and you have a somewhat closed environment where you can create action in them so if I want to route my AI in nature this is a scalable way of doing it so therefore at unit C we created something called the emulators toolkit which is basically a plug-in to the Unity game engine that allows anyone to actually play with this stuff at scale and may make a lot of what we know from nature so that's what I'm going to show you today we're not the first ones realizing that game engines are fun to play with if you're an AI researcher yeah or constructive not just fun but constructive to play with or productive yeah there's been BOTS developed for playing visual games has been AI develop for for learning humanoids to walk and have been AI doing cognitive you know chess and go playing yeah so those those a part of our history but what we're seeing with the unity ecosystem is that because it's a gaming ecosystem you have you have all the fish section you have all the graphics high-fidelity graphics of all the 3d challenges and you have also cognitive challenge of everything in gaming yeah everything yeah so that's why game engines are so crucial for developing artificial intelligence yeah here's a quote from the mrs. abyss from deep mind you heard about deep mind with alphago and all the work they have done there yeah it's actually funny to think about the deep mind company mission is to create a GI day we work closely with them they use unity everywhere in that research and it just exemplifies how you can simulate nature and test out your and to test out and develop your AI algorithms that way M I want to talk a bit about nature's learning methods yeah so this is underlying everything I show here everything I showed today it's done by computers not by people yeah we give it a learning algorithm we give it a problem and then the system will observe it will take action and it will take the rewards of penalties from what it did yeah this is called reinforcement learning it goes from exploration when it knows nothing is trying to figure things out to explore it to exploitation when they act to figure it all out yeah just just like just like an Amazon Web page yeah where they try to sell you stuff and over time they they figure out what to sell you and what not to sell you yeah it's the same thing exactly the most used algorithm at Amazon is a version of this called multi-armed bandit so I'm going to show you a little video this is a computer that sees frames of a game just like you the game here is to get the chicken to cross the road not getting killed by cars and keep picking up gift packages on the way I am the computer looks at pixels just like you so no no deep integration here just the visual integration it has four actions whatever they are it doesn't really know but it has four actions and I would go out signal coming back every time the chicken gets hit by a car it's a negative and picks up a gift package that's a positive and his body looks like learning from scratch also called tabula rasa so no software engineers are cheating here look the chicken moves more backwards than forwards here but in a moment it will pick up the gift package right there in front of it and boom and then Bank got killed by a car yeah ten seconds of what two bits of information yeah Casa bat packages are good after half an hour of trial and error it gets pretty good yeah you see a pattern it it will pick up gift packages it will avoid cars and not always but to to a good extent yeah remember that everything is randomized is non-deterministic yeah so the cars and the gift packages are constantly distributed now watch after six hours moving from scratch yeah no cheating it just figures it all out yeah look look some of the patterns where it will stop it'll move from side to side try to navigate these vehicles coming in and the vehicles are random yeah so it's not like it learned some fixed pattern yeah it just learns it's yet Racine in those six hours it's seen enough to seen it all yeah I'm gonna show you some some other scenarios everything done here using the same group of algorithms no cheating no hand coding no rules based systems none of that yeah here's a quarter pet remember all the the the joints here that can move yeah so I'm basically just saying move randomly until you figure out how to move from left to right so it's a lot of coordination going on here learning how to to move four legs so that you actually move forward yeah anyone ever had a small child like nine ten eleven months old when they first time they need to stand up yeah it's hard yeah a pen is hard for the computer to learn - yeah yeah a lot of little muscles that need to to do stuff there to keep the balance yeah I'll learn to walk yeah so this is just like the chicken yeah so I'm not telling it you have to move one leg in front of the other and do this it's like do whatever you want with these mechanics with this geometry figure out how to move forward apparently we figured it out to do the same way as the computer did yeah other things here that is so inspiring which is we play with stupid computers and we see patterns over and over that that are like I've seen this before like this one yeah have a very very hard problem to solve you need to scale a ball when the ball gets too too tall you need track to the blue agent needs to push the orange cube next to the wall as a stepping stone to get over yeah okay so rather than just use brute force let the computer figure it out yeah we start easy yeah and then we make it incrementally harder what does that mean here yeah we start with the ball being like not there at all and then we move move up the wall make it higher and higher and higher yeah every time the agent gets really good at scaling the wall we make it higher we make it graduate yeah it looks graphically like this the orange line is where we just used brute force just let the computer figure it out we won't do anything yeah takes a long time never gets really good at it the blue line is the is curriculum learning yeah you graduate every time you get a little better you graduate and you get a new challenger learns much faster because much better I added yeah just like animals just like humans we don't go to high school straight away we go through the hoops in elementary school first yeah in this case here give it just a little thought look at the agent used the the the the ones cube to help it over watch in this one here if the wall is not that tall I just gonna scale it myself I don't need the tour what's interesting here is that the system actually learns to use a tool to solve the problem it's non-trivial physically to push the cube next to the wall to scale door learns that from scratch we also have things like memory we use long short-term memory it's it's LST M it's a method in this case here we have the agent in two room and if the if the cube is is orange it needs to take the orange exit if it's red it needs to take the red exit the only thing we tells the system is when it's wrong all right yeah it says no wrong exit or right exit yeah okay looks like this it comes in C is that learns over time that if the cube is orange better use the orange exit or I get punished yeah the interesting thing here is that we never told it it's the color that matters yeah we just say yeah na over and over and over and it figures out that collimators not size not location not the time of the computer or whatever yeah the color yeah a small example that it even learns to sort of understand its problem and I'll show you another one here it's it's the portal pet again yeah but this time to machine learning models one of them is the one that walk walk walk walk walk the other one is the one we would the raycast looking for the target yeah so one model is looking and saying turn you know to the left to the right to you know navigate over there and leaves it to the other machine learning model to figure out how to move those four legs in the right way to get over there yeah this is how we we have hundreds of layers of that in our brains here where we basically handoff from one model to another yeah multi agents yes we're just using one agent so far here we have two agents playing playing some sort of tennis yeah they learn to play imagine all the physics they have to learn here they have to be able to sort of understand the ballistics yeah there are other examples yeah to brains to machine learning models in this case here we have a strike object if it's a saga example have a strike or vector which is to score and we have a defensive objective which is you know to keep the goal clean yeah so we have two agents in there and they have raycast meaning they have a single eye and they can see what's going on in front of them and in this case we basically train the striker to hit the goal initially just like with the checking I initially will not they will it will not be very good at it but it will figure out over you know after half an hour whatever how to score yeah we have a defensive player which is you know prevents going again half an hour of training and they will figure it out yeah then we put them together play against each other now they can train each other yeah and they will train each other to be better and better better at scoring and better defending until they converge and they don't improve any longer and at that point I stopped and I basically copied the models and create two teams and looks like this what's interesting is that a defensive player becomes a goalie clearly I stand in front of the goal and that's apparently a good idea to prevents going I'd never again I was not the engineer here saying oh you should be a goalie and you can't move outside them you know away from the goal Illya they just learned that upended defensive player preventing scoring becomes you know what we've recognized as a goalie and the strikers what do they do well they run up why do you notice they run sideways why do they run cyclase because when I'm sideways I can see the entire field I'm sideways up along the line isn't that what we yell at the kids and they those of you kids who play soccer we always say up along the line yeah it's much easier long the line because there's nobody coming from the except the pants out there yeah yeah more saga saga is very interesting actually deep mind and a few hours they took our examples and elaborated on them sagas is awesome to play with when we look about these things here we have one or more agents blue agents they are free to move around they turn red for two seconds when they kick the ball so that's the the cost function yeah you're kicked the ball you freeze for two seconds yeah so there's a there's a cost it's not free to kick the ball because you you get frozen for two seconds yeah okay so let's look at individual rewards versus collective rewards so in in in in this example each agent is trying to win so what happens so what happens is that nobody really wants to kick the ball because someone else may just take it and then they may score yeah yeah so they are cherry picking up in front of the goal but they're nobody wants to kick here this is look almost like a team when I was at Microsoft yeah so individual rewards and obviously don't work if you want to create a team yeah imagine that computer teaches of that yeah so let's take another example here we make it hard we tilt the whole field yeah to create a difficult situation for the agent kicks the ball it freezes up the ball runs behind it it needs to yeah yeah okay so clearly when you're a single agent you can solve this problem yeah so you have to be a team yeah okay this is what a real team looks like the award here the rewards function is for the team okay not for the Aged and look how they they work nicely together to to achieve the objective yeah so again think about this this is just trial and error like the chicken yeah these guys are just trying over and over until they figure it out and you see some patterns and they're like interesting because that emergent behavior yeah let's talk a bit about traits we have a lot of extrinsic traits in machine learning yeah this is what we learn when we work at uber or Amazon all these places or play games yes capture achieve collect it's basically getting richer yeah that's that's how we operate we have extrinsic rewards they are specific to the environment yeah I mean like without a bank account or without a monetary system it doesn't really matter much to collect money yeah okay we also have intrinsic rewards and this is the theme here is think about nature think about nature think about stuff that happens in nature like intrinsic rewards yeah we know there are things traits in us and animals like curiosity or impatience and patience empathy happiness love all these very costly traits are apparently there to help survivability so why not use them in computers as well yeah so they're specific to the agent and they are things like getting happy whatever that is yeah it's not the bank account is something else yeah what we did was to look around through problems that was not easily solved we call those problem spaces one example of those is a category of problems that are sparse reward spaces it's it's it's basically at there for kristi's problems you know if you ever read her books is these kind of ridiculous improbable scenarios it's like something happened page one it's pretty unlikely on page two something even more unlikely happens after that's the first unlikely event and from there and it's just downhill yeah it's it's just improbable yeah we created the same scenario that is of that nature yeah so the agent enters a house full of rooms in one of those rooms a push button will appear it needs to learn all this yeah go on push the push button a pyramid will appear in another room on top of the pyramid is a gold bar knock over physically knock over the pyramid and catch the gold bar that's your extrinsic reward but to solve that problem randomly with any other without any other traits it's just not gonna happen so what we found was that we basically formulated curiosity every science scientist is a curious person yeah small squirrels are curious yeah when they look behind trees to see if this and there's an icon back there yeah they're curious they're not randomly searching in the big empty space yeah we did some math and the the key of the math here is that - lock in there but it says is that curiosity is the opposite of what we are trying to do in machine learning and machinery was trying to minimize the error let's maximize this let's learn to look for the biggest biggest error we can find that is curiosity the thing you cannot predict correct what what it will do so you let me show you the agent is now this is not a choice agent yeah this is just random exploration this is actually how Amazon and Netflix there's a lot of the exploration in their algorithms is is just random they pick random products show it to you you like it you don't like it yeah doesn't achieve much here you see curiosity but no extrinsic values just curiosity yeah see how it how systemic it goes from room to room try to find something it doesn't know anything about it hit the pyramid it sees that the pyramid is is moving that the bricks are moving if I if I push them so it starts you know exploring that and say what what does that mean I've never seen this before look it doesn't explore the walls it doesn't explore the floor because has seen it before there's nothing new there now let's combine curiosity with the desire to get rich and it looks like this yeah it goes systematically from room to room in search for the push button push it it looks for the p.m. it knocks down the pyramid and they got the gold bar next okay I can tell you that using random exploration 1020 million attempts and it won't really learn that maybe once in a while it will learn it but in general not you actually are tsatsa to it between 10 to 20,000 attempts it figures it all out the whole chain of events that's actual breakthrough I want to show you another example pappa a virtual robot it's basically a virtual robot that just like the chicken just like everything I showed you from scratch without any animators from Disney or unity or anywhere else trying to help it needs to figure out how to use the four legs and has to learn to walk run jump turn whatever yeah in a spatial environment with gravity reinforcement learning and the reward function return the stick that's the only thing I tell it that's the the only thing that the software engineer is allowed to put in there return the stick what does that look like we put it in training camp let's have hundreds or thousands of papac's learning and they are not good at it as you notice yeah they have to figure all this out yeah they've got it and then you know we immediately when it gets it we throw it back and put it in a new random place and it has to start all over yeah I don't know a million episodes later it looks like this I throw the stick with the mouse yeah so there's a small demo I throw it papa will chase down the stick catch it you see life is hard for pavo's that the magic is to stop training while they're still purpose now watch this one oops yeah so it's kind of fun yeah no no software engineer develop this this is actually this is a freaking small piece of magic yeah I didn't cheat so just this is just this is just what a dog will do apparently or duck-like thing will do with four legs and the mouth to grab a stick yeah you can put it on a phone too I just did a demo on the phone one day for an exhibition now and I was looking at gonna play it with the dark see if I could get the dark to flip over or whatever so I'm moving the state left and right you know what happened think about the rewards function yeah so I moved the state left right left right it's just grab the stick turn around and drops it I'm like hmm I tried that before too that's what dogs do so it's interesting how these very simple resource functions very simple rewards function shall let lose in a physical in the physics environment start coming up with behavior that we have seen before you have here we see the dark chasing the bones on a track field and what's interesting here you have to be aware normally you do swamp behavior when you build games yeah you you basically have mathematical functions that model swamps yeah in this case here it's literally ten machine learning models competing for space or fifty yeah and look at look at the poor guys out on on on the right side yeah they get pushed over by the guys from the inside because of you know because of physics yeah and the guys on the grass cheating because they're just cutting over because that's the shortest path to the bombs yeah so this is this is actually true flock behavior yeah where you actually have parallel systems chasing yeah animal agents have actually become the number one we used environment for this kind of research it's all open source everything I showed you is there is a link for anyone who wants to play with it it's it has a c-sharp plug-in for unity everything in unity c-sharp then it has a Python component to hook it up to your favorite machine learning system and I want to show you this too so this has been the standard for a lot of these game explorations taking existing 2d games we spend a year developing a computer game a video game just built for computers not for people to play and it's a tower there's a hundred floors in there they're non deterministic and procedurally generated and a computer needs to learn to get from floor to floor and it gets harder and harder to get up there the competition is the two rounds the first round active finishes tonight by midnight you can hear you're home but the idea here is to really really change systems yeah the the leader of first round got to floor 16 humans have played to level 22 I can't get to level 16 so I'm beaten by a computer already it's very complex they have to solve a lot of problems in there you saw there's everything from shadows and lights to things moving around and platforms and there's like so much going on it's actually really really hard for computer to play this the point is that I'm already beaten by one and we're not even finished yeah so it's a link for that too if you want to study it let's oh yeah forgot that couple of papers anyone interested there's a couple of publications describing some of this stuff let me get back to the rotors official general intelligence which is the point thing about curiosity yeah think about I acted one company I'm aware of because they talked to me implementing curiosity in their profit of recommendation yeah and the point is that instead of what what does an Amazon have 300 plus million customers over 1 billion skews in the catalogue yeah yeah 1 billion skews so instead of just grabbing random parts and put them in front of you and try it to get you to buy them they should use curiosity to figure out which product do they know the least about when it comes to you and just keep exploring you yeah I'll be interesting yeah there's a lot of stuff that Nature has done that is really interesting like attention yeah we don't you know we don't necessarily see you know we have eyes that look in in certain directions and they focus on stuff that's to save energy memory I showed you an example of that this last of that nature essentially has already developed for us that we just need to look over the shoulder nature and steal copy and and we'll be fine yeah there's also something else there's a different time line for humans here and human culture and behavior Homo sapiens is two to three hundred thousand years old it's not very old I mean like the first photosensitive the first high the first mechanical eye with a lens and everything is like four hundred fifty million years old yeah that's a long time half a billion years yeah humans Homo sapiens only two to three hundred thousand years olds there's nothing yeah there was an attempt to migrated out of africa under thousand years ago that failed actually and we had to wait another you know thirty forty thousand years to figure out how to get out of Africa and then migration happened and went all the way to Australia yeah we also know that there was a cognitive revolution happening around thirty to seventy thousand years ago the figure out there is a human body with a lion head yeah that's crazy stuff yeah it's 32,000 years old yeah so the person who did this had imagination yeah yeah it wasn't it wasn't it wasn't a monkey doing it yeah it was someone who had some thoughts yeah but here's the the really important thing I want to leave you with which is that there was no Moore's law for humans yeah for about at least a hundred thousand years yeah has been no changes to our brain and the DNA has not changed yeah the processor has not doubled up here every 18 months it has not moved at all yeah yet in that same period of time yeah something happened that where we went from Bay Pickers in the forests to put a person on the moon and what was that well that was emergence language collaboration anticipation and the ability to start reasoning about things around us yeah that that's what changed yeah so I showed you all these little agents today running on NVIDIA GPUs and all that stuff and in it every time every time as a researcher comes to me and one of our developer said I need I need a faster GPU to do this no not not that much more any longer yeah just need more of them yeah I need to put them together and you need to move from single agent scenarios to having a thousand agents learn to solve a problem because that's what we did yeah so we learned emergence language collaboration anticipation learn those things and got really good at yeah and we're getting better and better at better at working together and building stuff yeah so we just need now to have all these computers that are very incapable each of them to learn to work together in their ways to solve problems that's all I have to say thank you
Info
Channel: GOTO Conferences
Views: 17,179
Rating: undefined out of 5
Keywords: GOTO, GOTOcon, GOTO Conference, GOTO (Software Conference), Videos for Developers, Computer Science, Programming, GOTOchgo, GOTO Chicago, Danny Lange, Unity Technologies, AWS, Microsoft, AI, ML, Artificial General Intelligence, Artificial Intelligence, Machine Learning
Id: BByWWTdNI0Y
Channel Id: undefined
Length: 35min 40sec (2140 seconds)
Published: Tue Nov 12 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.